Compare commits

...

2548 Commits

Author SHA1 Message Date
Conrad Ludgate
304ed02f74 Proxy release 2025-07-03 10:07 UTC 2025-07-03 11:07:21 +01:00
Conrad Ludgate
1a7c2450f5 fix redis credentials check (#12455)
## Problem

`keep_connection` does not exit, so it was never setting
`credentials_refreshed`.

## Summary of changes

Set `credentials_refreshed` to true when we first establish a
connection, and after we re-authenticate the connection.
2025-07-03 11:07:21 +01:00
Conrad Ludgate
d08748d4e4 Proxy release 2025-07-02 12:38 UTC 2025-07-02 13:38:03 +01:00
Ivan Efremov
fcf8127900 [proxy]: Fix redis IRSA expiration failure errors (#12430)
Relates to the
[#30688](https://github.com/neondatabase/cloud/issues/30688)
2025-07-02 13:38:03 +01:00
github-actions[bot]
c82d646598 Proxy release 2025-07-01 06:01 UTC 2025-07-01 06:01:46 +00:00
Suhas Thalanki
5f3532970e [compute] fix: background worker that collects installed extension metrics now updates collection interval (#12277)
## Problem

Previously, the background worker that collects the list of installed
extensions across DBs had a timeout set to 1 hour. This cause a problem
with computes that had a `suspend_timeout` > 1 hour as this collection
was treated as activity, preventing compute shutdown.

Issue: https://github.com/neondatabase/cloud/issues/30147

## Summary of changes

Passing the `suspend_timeout` as part of the `ComputeSpec` so that any
updates to this are taken into account by the background worker and
updates its collection interval.
2025-06-30 22:12:37 +00:00
Arpad Müller
2e681e0ef8 detach_ancestor: delete the right layer when hardlink fails (#12397)
If a hardlink operation inside `detach_ancestor` fails due to the layer
already existing, we delete the layer to make sure the source is one we
know about, and then retry.

But we deleted the wrong file, namely, the one we wanted to use as the
source of the hardlink. As a result, the follow up hard link operation
failed. Our PR corrects this mistake.
2025-06-30 21:36:15 +00:00
Dmitrii Kovalkov
8e216a3a59 storcon: notify cplane on safekeeper membership change (#12390)
## Problem
We don't notify cplane about safekeeper membership change yet. Without
the notification the compute needs to know all the safekeepers on the
cluster to be able to speak to them. Change notifications will allow to
avoid it.

- Closes: https://github.com/neondatabase/neon/issues/12188

## Summary of changes
- Implement `notify_safekeepers` method in `ComputeHook`
- Notify cplane about safekeepers in `safekeeper_migrate` handler.
- Update the test to make sure notifications work.

## Out of scope
- There is `cplane_notified_generation` field in `timelines` table in
strocon's database. It's not needed now, so it's not updated in the PR.
Probably we can remove it.
- e2e tests to make sure it works with a production cplane
2025-06-30 14:09:50 +00:00
Erik Grinaker
d0a4ae3e8f pageserver: add gRPC LSN lease support (#12384)
## Problem

The gRPC API does not provide LSN leases.

## Summary of changes

* Add LSN lease support to the gRPC API.
* Use gRPC LSN leases for static computes with `grpc://` connstrings.
* Move `PageserverProtocol` into the `compute_api::spec` module and
reuse it.
2025-06-30 12:44:17 +00:00
Erik Grinaker
a384d7d501 pageserver: assert no changes to shard identity (#12379)
## Problem

Location config changes can currently result in changes to the shard
identity. Such changes will cause data corruption, as seen with #12217.

Resolves #12227.
Requires #12377.

## Summary of changes

Assert that the shard identity does not change on location config
updates and on (re)attach.

This is currently asserted with `critical!`, in case it misfires in
production. Later, we should reject such requests with an error and turn
this into a proper assertion.
2025-06-30 12:36:45 +00:00
Christian Schwarz
66f53d9d34 refactor(pageserver): force explicit mapping to CreateImageLayersError::Other (#12382)
Implicit mapping to an `anyhow::Error` when we do `?` is discouraged
because tooling to find those places isn't great.

As a drive-by, also make SplitImageLayerWriter::new infallible and sync.
I think we should also make ImageLayerWriter::new completely lazy,
then `BatchLayerWriter:new` infallible and async.
2025-06-30 11:03:48 +00:00
Busra Kugler
2af9380962 Revert "Replace step-security maintained actions" (#12386)
Reverts neondatabase/neon#11663 and
https://github.com/neondatabase/neon/pull/11265/

Step Security is not yet approved by Databricks team, in order to
prevent issues during Github org migration, I'll revert this PR to use
the previous action instead of Step Security maintained action.
2025-06-30 10:15:10 +00:00
Ivan Efremov
620d50432c Fix path issue in the proxy-bernch CI workflow (#12388) 2025-06-30 09:33:57 +00:00
Erik Grinaker
1d43f3bee8 pageserver: fix stripe size persistence in legacy HTTP handlers (#12377)
## Problem

Similarly to #12217, the following endpoints may result in a stripe size
mismatch between the storage controller and Pageserver if an unsharded
tenant has a different stripe size set than the default. This can lead
to data corruption if the tenant is later manually split without
specifying an explicit stripe size, since the storage controller and
Pageserver will apply different defaults. This commonly happens with
tenants that were created before the default stripe size was changed
from 32k to 2k.

* `PUT /v1/tenant/config`
* `PATCH /v1/tenant/config`

These endpoints are no longer in regular production use (they were used
when cplane still managed Pageserver directly), but can still be called
manually or by tests.

## Summary of changes

Retain the current shard parameters when updating the location config in
`PUT | PATCH /v1/tenant/config`.

Also opportunistically derive `Copy` for `ShardParameters`.
2025-06-30 09:08:44 +00:00
Dmitrii Kovalkov
c746678bbc storcon: implement safekeeper_migrate handler (#11849)
This PR implements a safekeeper migration algorithm from RFC-035


https://github.com/neondatabase/neon/blob/main/docs/rfcs/035-safekeeper-dynamic-membership-change.md#change-algorithm

- Closes: https://github.com/neondatabase/neon/issues/11823

It is not production-ready yet, but I think it's good enough to commit
and start testing.

There are some known issues which will be addressed in later PRs:
- https://github.com/neondatabase/neon/issues/12186
- https://github.com/neondatabase/neon/issues/12187
- https://github.com/neondatabase/neon/issues/12188
- https://github.com/neondatabase/neon/issues/12189
- https://github.com/neondatabase/neon/issues/12190
- https://github.com/neondatabase/neon/issues/12191
- https://github.com/neondatabase/neon/issues/12192

## Summary of changes
- Implement `tenant_timeline_safekeeper_migrate` handler to drive the
migration
- Add possibility to specify number of safekeepers per timeline in tests
(`timeline_safekeeper_count`)
- Add `term` and `flush_lsn` to `TimelineMembershipSwitchResponse`
- Implement compare-and-swap (CAS) operation over timeline in DB for
updating membership configuration safely.
- Write simple test to verify that migration code works
2025-06-30 08:30:05 +00:00
Aleksandr Sarantsev
9bb4688c54 storcon: Remove testing feature from kick_secondary_downloads (#12383)
## Problem

Some of the design decisions in PR #12256 were influenced by the
requirements of consistency tests. These decisions introduced
intermediate logic that is no longer needed and should be cleaned up.

## Summary of Changes
- Remove the `feature("testing")` flag related to
`kick_secondary_download`.
- Set the default value of `kick_secondary_download` back to false,
reflecting the intended production behavior.

Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>
2025-06-30 05:41:05 +00:00
Dmitrii Kovalkov
47553dbaf9 neon_local: set timeline_safekeeper_count if we have less than 3 safekeepers (#12378)
## Problem
- Closes: https://github.com/neondatabase/neon/issues/12298

## Summary of changes
- Set `timeline_safekeeper_count` in `neon_local` if we have less than 3
safekeepers
- Remove `cfg!(feature = "testing")` code from
`safekeepers_for_new_timeline`
- Change `timeline_safekeeper_count` type to `usize`
2025-06-28 12:59:29 +00:00
Erik Grinaker
e50b914a8e compute_tools: support gRPC base backups in compute_ctl (#12244)
## Problem

`compute_ctl` should support gRPC base backups.

Requires #12111.
Requires #12243.
Touches #11926.

## Summary of changes

Support `grpc://` connstrings for `compute_ctl` base backups.
2025-06-27 16:39:00 +00:00
Christian Schwarz
e33e109403 fix(pageserver): buffered writer cancellation error handling (#12376)
## Problem

The problem has been well described in already-commited PR #11853.
tl;dr: BufferedWriter is sensitive to cancellation, which the previous
approach was not.

The write path was most affected (ingest & compaction), which was mostly
fixed in #11853:
it introduced `PutError` and mapped instances of `PutError` that were
due to cancellation of underlying buffered writer into
`CreateImageLayersError::Cancelled`.

However, there is a long tail of remaining errors that weren't caught by
#11853 that result in `CompactionError::Other`s, which we log with great
noise.

## Solution

The stack trace logging for CompactionError::Other added in #11853
allows us to chop away at that long tail using the following pattern:
- look at the stack trace
- from leaf up, identify the place where we incorrectly map from the
distinguished variant X indicating cancellation to an `anyhow::Error`
- follow that anyhow further up, ensuring it stays the same anyhow all
the way up in the `CompactionError::Other`
- since it stayed one anyhow chain all the way up, root_cause() will
yield us X
- so, in `log_compaction_error`, add an additional `downcast_ref` check
for X

This PR specifically adds checks for
- the flush task cancelling (FlushTaskError, BlobWriterError)
- opening of the layer writer (GateError)

That should cover all the reports in issues 
- https://github.com/neondatabase/cloud/issues/29434
- https://github.com/neondatabase/neon/issues/12162

## Refs
- follow-up to #11853
- fixup of / fixes https://github.com/neondatabase/neon/issues/11762
- fixes https://github.com/neondatabase/neon/issues/12162
- refs https://github.com/neondatabase/cloud/issues/29434
2025-06-27 15:26:00 +00:00
Folke Behrens
0ee15002fc proxy: Move client connection accept and handshake to pglb (#12380)
* This must be a no-op.
* Move proxy::task_main to pglb::task_main.
* Move client accept, TLS and handshake to pglb.
* Keep auth and wake in proxy.
2025-06-27 15:20:23 +00:00
Arpad Müller
4c7956fa56 Fix hang deleting offloaded timelines (#12366)
We don't have cancellation support for timeline deletions. In other
words, timeline deletion might still go on in an older generation while
we are attaching it in a newer generation already, because the
cancellation simply hasn't reached the deletion code.

This has caused us to hit a situation with offloaded timelines in which
the timeline was in an unrecoverable state: always returning an accepted
response, but never a 404 like it should be.

The detailed description can be found in
[here](https://github.com/neondatabase/cloud/issues/30406#issuecomment-3008667859)
(private repo link).

TLDR:

1. we ask to delete timeline on old pageserver/generation, starts
process in background
2. the storcon migrates the tenant to a different pageserver.
- during attach, the pageserver still finds an index part, so it adds it
to `offloaded_timelines`
4. the timeline deletion finishes, removing the index part in S3
5. there is a retry of the timeline deletion endpoint, sent to the new
pageserver location. it is bound to fail however:
- as the index part is gone, we print `Timeline already deleted in
remote storage`.
- the problem is that we then return an accepted response code, and not
a 404.
- this confuses the code calling us. it thinks the timeline is not
deleted, so keeps retrying.
- this state never gets recovered from until a reset/detach, because of
the `offloaded_timelines` entry staying there.

This is where this PR fixes things: if no index part can be found, we
can safely assume that the timeline is gone in S3 (it's the last thing
to be deleted), so we can remove it from `offloaded_timelines` and
trigger a reupload of the manifest. Subsequent retries will pick that
up.

Why not improve the cancellation support? It is a more disruptive code
change, that might have its own risks. So we don't do it for now.

Fixes https://github.com/neondatabase/cloud/issues/30406
2025-06-27 15:14:55 +00:00
Heikki Linnakangas
5a82182c48 impr(ci): Refactor postgres Makefile targets to a separate makefile (#12363)
Mainly for general readability. Some notable changes:

- Postgres can be built without the rest of the repository, and in
particular without any of the Rust bits. Some CI scripts took advantage
of that, so let's make that more explicit by separating those parts.
Also add an explicit comment about that in the new postgres.mk file.

- Add a new PG_INSTALL_CACHED variable. If it's set, `make all` and
other top-Makefile targets skip checking if Postgres is up-to-date. This
is also to be used in CI scripts that build and cache Postgres as
separate steps. (It is currently only used in the macos walproposer-lib
rule, but stay tuned for more.)

- Introduce a POSTGRES_VERSIONS variable that lists all supported
PostgreSQL versions. Refactor a few Makefile rules to use that.
2025-06-27 14:49:52 +00:00
Arpad Müller
37e181af8a Update rust to 1.88.0 (#12364)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Announcement blog
post](https://blog.rust-lang.org/2025/06/26/Rust-1.88.0/)

Prior update was in https://github.com/neondatabase/neon/pull/11938
2025-06-27 13:51:59 +00:00
Peter Bendel
6f4198c78a treat strategy flag test_maintenance as boolean data type (#12373)
## Problem

In large oltp test run
https://github.com/neondatabase/neon/actions/runs/15905488707/job/44859116742
we see that the `Benchmark database maintenance` step is skipped in all
3 strategy variants, however it should be executed in two.

This is due to treating the `test_maintenance` boolean type in the
strategy in the condition of the `Benchmark database maintenance` step

## Summary of changes
Use a boolean condition instead of a string comparison

## Test run from this pull request branch

https://github.com/neondatabase/neon/actions/runs/15923605412
2025-06-27 13:49:26 +00:00
Vlad Lazar
cc1664ef93 pageserver: allow flush task cancelled error in sharding autosplit test (#12374)
## Problem

Test is failing due to compaction shutdown noise (see
https://github.com/neondatabase/neon/issues/12162).

## Summary of changes

Allow list the noise.
2025-06-27 13:13:11 +00:00
Vlad Lazar
ebb6e26a64 pageserver: handle multiple attached children in shard resolution (#12336)
## Problem

When resolving a shard during a split we might have multiple attached
shards with the old shard count (i.e. not all of them are marked in
progress and ignored). Hence, we can compute the desired shard number
based on the old shard count and misroute the request.

## Summary of Changes

Recompute the desired shard every time the shard count changes during
the iteration
2025-06-27 12:46:18 +00:00
Mikhail
ebc12a388c fix: endpoint_storage_addr as String (#12359)
It's not a SocketAddr as we use k8s DNS
https://github.com/neondatabase/cloud/issues/19011
2025-06-27 11:06:27 +00:00
Conrad Ludgate
abc1efd5a6 [proxy] fix connect_to_compute retry handling (#12351)
# Problem

In #12335 I moved the `authenticate` method outside of the
`connect_to_compute` loop. This triggered [e2e tests to become
flaky](https://github.com/neondatabase/cloud/pull/30533). This
highlighted an edge case we forgot to consider with that change.

When we connect to compute, the compute IP might be cached. This cache
hit might however be stale. Because we can't validate the IP is
associated with a specific compute-id☨, we will succeed the
connect_to_compute operation and fail when it comes to password
authentication☨☨. Before the change, we were invalidating the cache and
triggering wake_compute if the authentication failed.

Additionally, I noticed some faulty logic I introduced 1 year ago
https://github.com/neondatabase/neon/pull/8141/files#diff-5491e3afe62d8c5c77178149c665603b29d88d3ec2e47fc1b3bb119a0a970afaL145-R147

☨ We can when we roll out TLS, as the certificate common name includes
the compute-id.

☨☨ Technically password authentication could pass for the wrong compute,
but I think this would only happen in the very very rare event that the
IP got reused **and** the compute's endpoint happened to be a
branch/replica.

# Solution

1. Fix the broken logic
2. Simplify cache invalidation (I don't know why it was so convoluted)
3. Add a loop around connect_to_compute + authenticate to re-introduce
the wake_compute invalidation we accidentally removed.

I went with this approach to try and avoid interfering with
https://github.com/neondatabase/neon/compare/main...cloneable/proxy-pglb-connect-compute-split.
The changes made in commit 3 will move into `handle_client_request` I
suspect,
2025-06-27 10:36:27 +00:00
Dmitrii Kovalkov
6fa1562b57 pageserver: increase default max_size_entries limit for basebackup cache (#12343)
## Problem
Some pageservers hit `max_size_entries` limit in staging with only ~25
MiB storage used by basebackup cache. The limit is too strict. It should
be safe to relax it.

- Part of https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Increase the default `max_size_entries` from 1000 to 10000
2025-06-27 09:18:18 +00:00
Heikki Linnakangas
10afac87e7 impr(ci): Remove unnecessary 'make postgres-headers' build step (#12354)
The 'make postgres' step includes installation of the headers, no need
to do that separately.
2025-06-26 16:45:34 +00:00
Vlad Lazar
72b3c9cd11 pageserver: fix wal receiver hang on remote client shutdown (#12348)
## Problem

Druing shard splits we shut down the remote client early and allow the
parent shard to keep ingesting data. While ingesting data, the wal
receiver task may wait for the current flush to complete in order to
apply backpressure. Notifications are delivered via
`Timeline::layer_flush_done_tx`.

When the remote client was being shut down the flush loop exited
whithout delivering a notification. This left
`Timeline::wait_flush_completion` hanging indefinitely which blocked the
shutdown of the wal receiver task, and, hence, the shard split.

## Summary of Changes

Deliver a final notification when the flush loop is shutting down
without the timeline cancel cancellation token having fired. I tried
writing a test for this, but got stuck in failpoint hell and decided
it's not worth it.

`test_sharding_autosplit`, which reproduces this reliably in CI, passed
with the proposed fix in
https://github.com/neondatabase/neon/pull/12304.

Closes https://github.com/neondatabase/neon/issues/12060
2025-06-26 16:35:34 +00:00
Arpad Müller
232f2447d4 Support pull_timeline of timelines without writes (#12028)
Make the safekeeper `pull_timeline` endpoint support timelines that
haven't had any writes yet. In the storcon managed sk timelines world,
if a safekeeper goes down temporarily, the storcon will schedule a
`pull_timeline` call. There is no guarantee however that by when the
safekeeper is online again, there have been writes to the timeline yet.

The `snapshot` endpoint gives an error if the timeline hasn't had
writes, so we avoid calling it if `timeline_start_lsn` indicates a
freshly created timeline.

Fixes #11422
Part of #11670
2025-06-26 16:29:03 +00:00
Erik Grinaker
a2d2108e6a pageserver: use base backup cache with gRPC (#12352)
## Problem

gRPC base backups do not use the base backup cache.

Touches https://github.com/neondatabase/neon/issues/11728.

## Summary of changes

Integrate gRPC base backups with the base backup cache.

Also fixes a bug where the base backup cache did not differentiate
between primary/replica base backups (at least I think that's a bug?).
2025-06-26 15:52:15 +00:00
Alex Chi Z.
33c0d5e2f4 fix(pageserver): make posthog config parsing more robust (#12356)
## Problem

In our infra config, we have to split server_api_key and other fields in
two files: the former one in the sops file, and the latter one in the
normal config. It creates the situation that we might misconfigure some
regions that it only has part of the fields available, causing
storcon/pageserver refuse to start.

## Summary of changes

Allow PostHog config to have part of the fields available. Parse it
later.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-26 15:49:08 +00:00
Dmitrii Kovalkov
605fb04f89 pageserver: use bounded sender for basebackup cache (#12342)
## Problem
Basebackup cache now uses unbounded channel for prepare requests. In
theory it can grow large if the cache is hung and does not process the
requests.

- Part of https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Replace an unbounded channel with a bounded one, the size is
configurable.
- Add `pageserver_basebackup_cache_prepare_queue_size` to observe the
size of the queue.
- Refactor a bit to move all metrics logic to `basebackup_cache.rs`
2025-06-26 13:26:24 +00:00
Conrad Ludgate
fd1e8ec257 [proxy] review and cleanup CLI args (#12167)
I was looking at how we could expose our proxy config as toml again, and
as I was writing out the schema format, I noticed some cruft in our CLI
args that no longer seem to be in use.

The redis change is the most complex, but I am pretty sure it's sound.
Since https://github.com/neondatabase/cloud/pull/15613 cplane longer
publishes to the global redis instance.
2025-06-26 11:25:41 +00:00
Konstantin Knizhnik
be23eae3b6 Mark pages as avaiable in LFC only after generation check (#12350)
## Problem

If LFC generation is changed then `lfc_readv_select` will return -1 but
pages are still marked as available in bitmap.

## Summary of changes

Update bitmap after generation check.

Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com>
2025-06-26 07:06:27 +00:00
Alex Chi Z.
6f70885e11 fix(pageserver): allow refresh_interval to be empty (#12349)
## Problem

Fix for https://github.com/neondatabase/neon/pull/12324

## Summary of changes

Need `serde(default)` to allow this field not present in the config,
otherwise there will be a config deserialization error.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-25 22:15:03 +00:00
Erik Grinaker
f755979102 pageserver: payload compression for gRPC base backups (#12346)
## Problem

gRPC base backups use gRPC compression. However, this has two problems:

* Base backup caching will cache compressed base backups (making gRPC
compression pointless).
* Tonic does not support varying the compression level, and zstd default
level is 10% slower than gzip fastest level.

Touches https://github.com/neondatabase/neon/issues/11728.
Touches https://github.com/neondatabase/cloud/issues/29353.

## Summary of changes

This patch adds a gRPC parameter `BaseBackupRequest::compression`
specifying the compression algorithm. It also moves compression into
`send_basebackup_tarball` to reduce code duplication.

A follow-up PR will integrate the base backup cache with gRPC.
2025-06-25 18:16:23 +00:00
Matthias van de Meent
1d49eefbbb RFC: Endpoint Persistent Unlogged Files Storage (#9661)
## Summary
A design for a storage system that allows storage of files required to
make
Neon's Endpoints have a better experience at or after a reboot.

## Motivation
Several systems inside PostgreSQL (and Neon) need some persistent
storage for
optimal workings across reboots and restarts, but still work without.
Examples are the cumulative statistics file in `pg_stat/global.stat`,
`pg_stat_statements`' `pg_stat/pg_stat_statements.stat`, and
`pg_prewarm`'s
`autoprewarm.blocks`. We need a storage system that can store and manage
these files for each Endpoint.

[GH rendered
file](https://github.com/neondatabase/neon/blob/MMeent/rfc-unlogged-file/docs/rfcs/040-Endpoint-Persistent-Unlogged-Files-Storage.md)

Part of https://github.com/neondatabase/cloud/issues/24225
2025-06-25 16:25:57 +00:00
Alex Chi Z.
6c77638ea1 feat(storcon): retrieve feature flag and pass to pageservers (#12324)
## Problem

part of https://github.com/neondatabase/neon/issues/11813

## Summary of changes

It costs $$$ to directly retrieve the feature flags from the pageserver.
Therefore, this patch adds new APIs to retrieve the spec from the
storcon and updates it via pageserver.

* Storcon retrieves the feature flag and send it to the pageservers.
* If the feature flag gets updated outside of the normal refresh loop of
the pageserver, pageserver won't fetch the flags on its own as long as
the last updated time <= refresh_period.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-25 14:58:18 +00:00
Conrad Ludgate
9b26b7bde9 Proxy release 2025-06-25 14:35 UTC 2025-06-25 15:35:55 +01:00
Conrad Ludgate
93350f7018 [proxy]: BatchQueue::call is not cancel safe - make it directly cancellation aware (#12345)
## Problem

https://github.com/neondatabase/cloud/issues/30539

If the current leader cancels the `call` function, then it has removed
the jobs from the queue, but will never finish sending the responses.
Because of this, it is not cancellation safe.

## Summary of changes

Document these functions as not cancellation safe. Move cancellation of
the queued jobs into the queue itself.

## Alternatives considered

1. We could spawn the task that runs the batch, since that won't get
cancelled.
* This requires `fn call(self: Arc<Self>)` or `fn call(&'static self)`.
2. We could add another scopeguard and return the requests back to the
queue.
* This requires that requests are always retry safe, and also requires
requests to be `Clone`.
2025-06-25 15:35:55 +01:00
Conrad Ludgate
517a3d0d86 [proxy]: BatchQueue::call is not cancel safe - make it directly cancellation aware (#12345)
## Problem

https://github.com/neondatabase/cloud/issues/30539

If the current leader cancels the `call` function, then it has removed
the jobs from the queue, but will never finish sending the responses.
Because of this, it is not cancellation safe.

## Summary of changes

Document these functions as not cancellation safe. Move cancellation of
the queued jobs into the queue itself.

## Alternatives considered

1. We could spawn the task that runs the batch, since that won't get
cancelled.
* This requires `fn call(self: Arc<Self>)` or `fn call(&'static self)`.
2. We could add another scopeguard and return the requests back to the
queue.
* This requires that requests are always retry safe, and also requires
requests to be `Clone`.
2025-06-25 14:19:20 +00:00
Conrad Ludgate
27ca1e21be [console_redirect_proxy]: fix channel binding (#12238)
## Problem

While working more on TLS to compute, I realised that Console Redirect
-> pg-sni-router -> compute would break if channel binding was set to
prefer. This is because the channel binding data would differ between
Console Redirect -> pg-sni-router vs pg-sni-router -> compute.

I also noticed that I actually disabled channel binding in #12145, since
`connect_raw` would think that the connection didn't support TLS.

## Summary of changes

Make sure we specify the channel binding.
Make sure that `connect_raw` can see if we have TLS support.
2025-06-25 13:41:30 +00:00
Arpad Müller
1dc01c9bed Support cancellations of timelines with hanging ondemand downloads (#12330)
In `test_layer_download_cancelled_by_config_location`, we simulate hung
downloads via the `before-downloading-layer-stream-pausable` failpoint.
Then, we cancel a timeline via the `location_config` endpoint.

With the new default as of
https://github.com/neondatabase/neon/pull/11712, we would be creating
the timeline on safekeepers regardless if there have been writes or not,
and it turns out the test relied on the timeline not existing on
safekeepers, due to a cancellation bug:

* as established before, the test makes the read path hang
* the timeline cancellation function first cancels the walreceiver, and
only then cancels the timeline's token
* `WalIngest::new` is requesting a checkpoint, which hits the read path
* at cancellation time, we'd be hanging inside the read, not seeing the
cancellation of the walreceiver
* the test would time out due to the hang

This is probably also reproducible in the wild when there is S3
unavailabilies or bottlenecks. So we thought that it's worthwhile to fix
the hang issue. The approach chosen in the end involves the
`tokio::select` macro.

In PR 11712, we originally punted on the test due to the hang and opted
it out from the new default, but now we can use the new default.

Part of https://github.com/neondatabase/neon/issues/12299
2025-06-25 13:40:38 +00:00
Heikki Linnakangas
7c4c36f5ac Remove unnecessary separate installation of libpq (#12287)
`make install` compiles and installs libpq. Remove redundant separate
step to compile and install it.
2025-06-25 10:47:56 +00:00
Tristan Partin
a2d623696c Update pgaudit to latest versions (#12328)
These updates contain some bug fixes and are completely backwards
compatible with what we currently support in Neon.

Link: https://github.com/pgaudit/pgaudit/compare/1.6.2...1.6.3
Link: https://github.com/pgaudit/pgaudit/compare/1.7.0...1.7.1
Link: https://github.com/pgaudit/pgaudit/compare/16.0...16.1
Link: https://github.com/pgaudit/pgaudit/compare/17.0...17.1
Signed-off-by: Tristan Partin <tristan.partin@databricks.com>

Signed-off-by: Tristan Partin <tristan.partin@databricks.com>
2025-06-25 09:03:02 +00:00
Tristan Partin
aa75722010 Set pgaudit.log=none for monitoring connections (#12137)
pgaudit can spam logs due to all the monitoring that we do. Logs from
these connections are not necessary for HIPPA compliance, so we can stop
logging from those connections.

Part-of: https://github.com/neondatabase/cloud/issues/29574

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-06-24 17:42:23 +00:00
Matthias van de Meent
6c6de6382a Use enum-typed PG versions (#12317)
This makes it possible for the compiler to validate that a match block
matched all PostgreSQL versions we support.

## Problem
We did not have a complete picture about which places we had to test
against PG versions, and what format these versions were: The full PG
version ID format (Major/minor/bugfix `MMmmbb`) as transfered in
protocol messages, or only the Major release version (`MM`). This meant
type confusion was rampant.

With this change, it becomes easier to develop new version-dependent
features, by making type and niche confusion impossible.

## Summary of changes
Every use of `pg_version` is now typed as either `PgVersionId` (u32,
valued in decimal `MMmmbb`) or PgMajorVersion (an enum, with a value for
every major version we support, serialized and stored like a u32 with
the value of that major version)

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-06-24 17:25:31 +00:00
Dmitry Savelev
158d84ea30 Switch the billing metrics storage format to ndjson. (#12338)
## Problem

The billing team wants to change the billing events pipeline and use a
common events format in S3 buckets across different event producers.

## Summary of changes

Change the events storage format for billing events from JSON to NDJSON.

Resolves: https://github.com/neondatabase/cloud/issues/29994
2025-06-24 15:36:36 +00:00
Conrad Ludgate
4dd9ca7b04 [proxy]: authenticate to compute after connect_to_compute (#12335)
## Problem

PGLB will do the connect_to_compute logic, neonkeeper will do the
session establishment logic. We should split it.

## Summary of changes

Moves postgres authentication to compute to a separate routine that
happens after connect_to_compute.
2025-06-24 14:15:36 +00:00
Arpad Müller
552249607d apply clippy fixes for 1.88.0 beta (#12331)
The 1.88.0 stable release is near (this Thursday). We'd like to fix most
warnings beforehand so that the compiler upgrade doesn't require
approval from too many teams.

This is therefore a preparation PR (like similar PRs before it).

There is a lot of changes for this release, mostly because the
`uninlined_format_args` lint has been added to the `style` lint group.
One can read more about the lint
[here](https://rust-lang.github.io/rust-clippy/master/#/uninlined_format_args).

The PR is the result of `cargo +beta clippy --fix` and `cargo fmt`. One
remaining warning is left for the proxy team.

---------

Co-authored-by: Conrad Ludgate <conrad@neon.tech>
2025-06-24 10:12:42 +00:00
Ivan Efremov
a29772bf6e Create proxy-bench periodic run in CI (#12242)
Currently run for test only via pushing to the test-proxy-bench branch.

Relates to the #22681
2025-06-24 09:54:43 +00:00
github-actions[bot]
7f4f1785a8 Proxy release 2025-06-24 06:01 UTC 2025-06-24 06:01:22 +00:00
Arpad Müller
0efff1db26 Allow cancellation errors in tests that allow timeline deletion errors (#12315)
After merging of PR https://github.com/neondatabase/neon/pull/11712 we
saw some tests be flaky, with errors showing up about the timeline
having been cancelled instead of having been deleted. This is an outcome
that is inherently racy with the "has been deleted" error.

In some instances, https://github.com/neondatabase/neon/pull/11712 has
already added the error about the timeline having been cancelled. This
PR adds them to the remaining instances of
https://github.com/neondatabase/neon/pull/11712, fixing the flakiness.
2025-06-23 22:26:38 +00:00
Aleksandr Sarantsev
5eecde461d storcon: Fix migration for Attached(0) tenants (#12256)
## Problem

`Attached(0)` tenant migrations can get stuck if the heatmap file has
not been uploaded.

## Summary of Changes

- Added a test to reproduce the issue.
- Introduced a `kick_secondary_downloads` config flag:
  - Enabled in testing environments.
  - Disabled in production (and in the new test).
- Updated `Attached(0)` locations to consider the number of secondaries
in their intent when deciding whether to download the heatmap.
2025-06-23 18:55:26 +00:00
Alex Chi Z.
85164422d0 feat(pageserver): support force overriding feature flags (#12233)
## Problem

Part of #11813 

## Summary of changes

Add a test API to make it easier to manipulate the feature flags within
tests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-23 17:31:53 +00:00
John Spray
6c3aba7c44 storcon: adjust AZ selection for heterogenous AZs (#12296)
## Problem

The scheduler uses total shards per AZ to select the AZ for newly
created or attached tenants.

This makes bad decisions when we have different node counts per AZ -- we
might have 2 very busy pageservers in one AZ, and 4 more lightly loaded
pageservers in other AZs, and the scheduler picks the busy pageservers
because the total shard count in their AZ is lower.

## Summary of changes

- Divide the shard count by the number of nodes in the AZ when scoring
in `get_az_for_new_tenant`

---------

Co-authored-by: John Spray <john.spray@databricks.com>
2025-06-23 15:50:31 +00:00
Erik Grinaker
68a175d545 test_runner: fix test_basebackup_with_high_slru_count gzip param (#12319)
The `--gzip-probability` parameter was removed in #12250. However,
`test_basebackup_with_high_slru_count` still uses it, and keeps failing.

This patch removes the use of the parameter (gzip is enabled by
default).
2025-06-23 15:33:45 +00:00
Alex Chi Z.
5e2c444525 fix(pageserver): reduce default feature flag refresh interval (#12246)
## Problem

Part of #11813 

## Summary of changes

The current interval is 30s and it costs a lot of $$$. This patch
reduced it to 600s refresh interval (which means that it takes 10min for
feature flags to propagate from UI to the pageserver). In the future we
can let storcon retrieve the feature flags and push it to pageservers.
We can consider creating a new release or we can postpone this to the
week after the next week.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-23 13:51:21 +00:00
Heikki Linnakangas
8d711229c1 ci: Fix bogus skipping of 'make all' step in CI (#12318)
The 'make all' step must run always. PR #12311 accidentally left the
condition in there to skip it if there were no changes in postgres v14
sources. That condition belonged to a whole different step that was
removed altogether in PR#12311, and the condition should've been removed
too.

Per CI failure:
https://github.com/neondatabase/neon/actions/runs/15820148967/job/44587394469
2025-06-23 13:23:33 +00:00
Vlad Lazar
0e490f3be7 pageserver: allow concurrent rw IO on in-mem layer (#12151)
## Problem

Previously, we couldn't read from an in-memory layer while a batch was
being written to it. Vice-versa, we couldn't write to it while there
was an on-going read.

## Summary of Changes

The goal of this change is to improve concurrency. Writes happened
through a &mut self method so the enforcement was at the type system
level.

We attempt to improve by:
1. Adding interior mutability to EphemeralLayer. This involves wrapping
   the buffered writer in a read-write lock.
2. Minimise the time that the read lock is held for. Only hold the read
   lock while reading from the buffers (recently flushed or pending
   flush). If we need to read from the file, drop the lock and allow IO
   to be concurrent.
   
The new benchmark variants with concurrent reads improve between 70 to
200 percent (against main).
Benchmark results are in this
[commit](891f094ce6).

## Future Changes

We can push the interior mutability into the buffered writer. The
mutable tail goes under a read lock, the flushed part goes into an
ArcSwap and then we can read from anything that is flushed _without_ any
locking.
2025-06-23 13:17:30 +00:00
Erik Grinaker
7e41ef1bec pageserver: set gRPC basebackup chunk size to 256 KB (#12314)
gRPC base backups send a stream of fixed-size 64KB chunks.

pagebench basebackup with compression enabled shows this to reduce
throughput:

* 64 KB: 55 RPS
* 128 KB: 69 RPS
* 256 KB: 73 RPS
* 1024 KB: 73 RPS

This patch sets the base backup chunk size to 256 KB.
2025-06-23 12:41:11 +00:00
Heikki Linnakangas
7916aa26e0 Stop using build-tools image in compute image build (#12306)
The build-tools image contains various build tools and dependencies,
mostly Rust-related. The compute image build used it to build
compute_ctl and a few other little rust binaries that are included in
the compute image. However, for extensions built in Rust (pgrx), the
build used a different layer which installed the rust toolchain using
rustup.

Switch to using the same rust toolchain for both pgrx-based extensions
and compute_ctl et al. Since we don't need anything else from the
build-tools image, I switched to using the toolchain installed with
rustup, and eliminated the dependency to build-tools altogether. The
compute image build no longer depends on build-tools.

Note: We no longer use 'mold' for linking compute_ctl et al, since mold
is not included in the build-deps-with-cargo layer. We could add it
there, but it doesn't seem worth it. I proposed stopping using mold
altogether in https://github.com/neondatabase/neon/pull/10735, but that
was rejected because 'mold' is faster for incremental builds. That
doesn't matter much for docker builds however, since they're not
incremental, and the compute binaries are not as large as the storage
server binaries anyway.
2025-06-23 09:11:05 +00:00
Heikki Linnakangas
52ab8f3e65 Use make all in the "Build and Test locally" CI workflow (#12311)
To avoid duplicating the build logic. `make all` covers the separate
`postgres-*` and `neon-pg-ext` steps, and also does `cargo build`.
That's how you would typically do a full local build anyway.
2025-06-23 09:10:32 +00:00
Heikki Linnakangas
3d822dbbde Refactor Makefile rules for building the extensions under pgxn/ (#12305) 2025-06-22 19:43:14 +00:00
Heikki Linnakangas
af46b5286f Avoid recompiling postgres_ffi when there has been no changes (#12292)
Every time you run `make`, it runs `make install` on all the PostgreSQL
sources, which copies the header files. That in turn triggers a rebuild
of the `postgres_ffi` crate, and everything that depends on it. We had
worked around this earlier (see #2458), by passing a custom INSTALL
script to the Postgres makefiles, which refrains from updating the
modification timestamp on headers when they have not been changed, but
the v14 makefile didn't obey INSTALL for the header files. Backporting
c0a1d7621b to v14 fixes that.

This backports upstream PostgreSQL commit c0a1d7621b to v14.

Corresponding PR in the 'postgres' repo:
https://github.com/neondatabase/postgres/pull/660
2025-06-21 21:07:38 +00:00
Erik Grinaker
47f7efee06 pageserver: require stripe size (#12257)
## Problem

In #12217, we began passing the stripe size in reattach responses, and
persisting it in the on-disk state. This is necessary to ensure the
storage controller and Pageserver have a consistent view of the intended
stripe size of unsharded tenants, which will be used for splits that do
not specify a stripe size. However, for backwards compatibility, these
stripe sizes were optional.

## Summary of changes

Make the stripe sizes required for reattach responses and on-disk
location configs. These will always be provided by the previous
(current) release.
2025-06-21 15:01:29 +00:00
Tristan Partin
868c38f522 Rename the compute_ctl admin scope to compute_ctl:admin (#12263)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-06-20 22:49:05 +00:00
Tristan Partin
c8b2ac93cf Allow the control plane to override any Postgres connection options (#12262)
The previous behavior was for the compute to override control plane
options if there was a conflict. We want to change the behavior so that
the control plane has the absolute power on what is right. In the event
that we need a new option passed to the compute as soon as possible, we
can initially roll it out in the control plane, and then migrate the
option to EXTRA_OPTIONS within the compute later, for instance.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-06-20 18:46:30 +00:00
Dmitrii Kovalkov
b2954d16ff storcon, neon_local: add timeline_safekeeper_count (#12303)
## Problem
We need to specify the number of safekeepers for neon_local without
`testing` feature.
Also we need this option for testing different configurations of
safekeeper migration code.

We cannot set it in `neon_fixtures.py` and in the default config of
`neon_local` yet, because it will fail compatibility tests. I'll make a
separate PR with removing `cfg!("testing")` completely and specifying
this option in the config when this option reaches the release branch.

- Part of https://github.com/neondatabase/neon/issues/12298

## Summary of changes
- Add `timeline_safekeeper_count` config option to storcon and
neon_local
2025-06-20 16:03:17 +00:00
Alex Chi Z.
79485e7c3a feat(pageserver): enable gc-compaction by default everywhere (#12105)
Enable it across tests and set it as default. Marks the first milestone
of https://github.com/neondatabase/neon/issues/9114. We already enabled
it in all AWS regions and planning to enable it in all Azure regions
next week.

will merge after we roll out in all regions.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-20 15:35:11 +00:00
Heikki Linnakangas
eaf1ab21c4 Store intermediate build files in build/ rather than pg_install/build/ (#12295)
This way, `pg_install` contains only the final build artifacts, not
intermediate files like *.o files. Seems cleaner.
2025-06-20 14:50:03 +00:00
Vlad Lazar
6508f4e5c1 pageserver: revise gc layer map lock handling (#12290)
## Problem

Timeline GC is very aggressive with regards to layer map locking.
We've seen timelines with loads of layers in production that hold the
write lock for the layer map for 30 minutes at a time.
This blocks reads and the write path to some extent.

## Summary of changes

Determining the set of layers to GC is done under the read lock.
Applying the updates is done under the write lock.
Previously, everything was done under write lock.
2025-06-20 11:57:30 +00:00
Conrad Ludgate
a298d2c29b [proxy] replace the batch cancellation queue, shorten the TTL for cancel keys (#11943)
See #11942 

Idea: 
* if connections are short lived, they can get enqueued and then also
remove themselves later if they never made it to redis. This reduces the
load on the queue.
* short lived connections (<10m, most?) will only issue 1 command, we
remove the delete command and rely on ttl.
* we can enqueue as many commands as we want, as we can always cancel
the enqueue, thanks to the ~~intrusive linked lists~~ `BTreeMap`.
2025-06-20 11:48:01 +00:00
Arpad Müller
8b197de7ff Increase upload timeout for test_tenant_s3_restore (#12297)
Increase the upload timeout of the test to avoid hitting timeouts (which
we sometimes do).
 
Fixes https://github.com/neondatabase/neon/issues/12212
2025-06-20 10:33:11 +00:00
Erik Grinaker
15d079cd41 pagebench: improve getpage-latest-lsn gRPC support (#12293)
This improves `pagebench getpage-latest-lsn` gRPC support by:

* Using `page_api::Client`.
* Removing `--protocol`, and using the `page-server-connstring` scheme
instead.
* Adding `--compression` to enable zstd compression.
2025-06-20 08:31:40 +00:00
Erik Grinaker
dc1625cd8e pagebench: add basebackup gRPC support (#12250)
## Problem

Pagebench does not support gRPC for `basebackup` benchmarks.

Requires #12243.
Touches #11728.

## Summary of changes

Add gRPC support via gRPC connstrings, e.g. `pagebench basebackup
--page-service-connstring grpc://localhost:51051`.

Also change `--gzip-probability` to `--no-compression`, since this must
be specified per-client for gRPC.
2025-06-19 15:40:57 +00:00
dependabot[bot]
a6d4de25cd build(deps): bump urllib3 from 1.26.19 to 2.5.0 in the pip group across 1 directory (#12289)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-19 14:20:02 +00:00
Arpad Müller
ec1452a559 Switch on --timelines-onto-safekeepers in integration tests (#11712)
Switch on the `--timelines-onto-safekeepers` param in integration tests.
Some changes that were needed to enable this but which I put into other
PRs to not clutter up this one:

* #11786
* #11854
* #12129
* #12138

Further fixes that were needed for this:

* https://github.com/neondatabase/neon/pull/11801
* https://github.com/neondatabase/neon/pull/12143
* https://github.com/neondatabase/neon/pull/12204

Not strictly needed, but helpful:

* https://github.com/neondatabase/neon/pull/12155

Part of #11670
Closes #11424
2025-06-19 11:17:01 +00:00
Heikki Linnakangas
1950ccfe33 Eliminate dependency from pageserver_api to postgres_ffi (#12273)
Introduce a separate `postgres_ffi_types` crate which contains a few
types and functions that were used in the API. `postgres_ffi_types` is a
much small crate than `postgres_ffi`, and it doesn't depend on bindgen
or the Postgres C headers.

Move NeonWalRecord and Value types to wal_decoder crate. They are only
used in the pageserver-safekeeper "ingest" API. The rest of the ingest
API types are defined in wal_decoder, so move these there as well.
2025-06-19 10:31:27 +00:00
Heikki Linnakangas
2ca6665f4a Remove outdated 'clean' Makefile targest (#12288)
We have been bad at keeping them up-to-date, several contrib modules and
neon extensions were missing from the clean rules. Give up trying, and
remove the targets altogether. In practice, it's straightforward to just
do `rm -rf pg_install/build`, so the clean-targets are hardly worth the
maintenance effort.

I kept `make distclean` though. The rule for that is simple enough.
2025-06-19 10:24:09 +00:00
Heikki Linnakangas
fa954671b2 Remove unnecessary Postgres libs from the storage docker image (#12286)
Since commit 87ad50c925, storage_controller has used diesel_async, which
in turn uses tokio-postgres as the Postgres client, which doesn't
require libpq. Thus we no longer need libpq in the storage image.
2025-06-19 10:00:01 +00:00
Erik Grinaker
6f4ffdb48b pageserver: add gRPC compression (#12280)
## Problem

The gRPC page service should support compression.

Requires #12111.
Touches #11728.
Touches https://github.com/neondatabase/cloud/issues/25679.

## Summary of changes

Add support for gzip and zstd compression in the server, and a client
parameter to enable compression.

This will need further benchmarking under realistic network conditions.
2025-06-19 09:54:34 +00:00
Vlad Lazar
3f676df3d5 pageserver: fix initial layer visibility calculation (#12206)
## Problem

GC info is an input to updating layer visibility.
Currently, gc info is updated on timeline activation and visibility is
computed on tenant attach, so we ignore branch points and compute
visibility by taking all layers into account.

Side note: gc info is also updated when timelines are created and
dropped. That doesn't help because we create the timelines in
topological order from the root. Hence the root timeline goes first,
without context of where the branch points are.

The impact of this in prod is that shards need to rehydrate layers after
live migration since the non-visible ones were excluded from the
heatmap.

## Summary of Changes

Move the visibility calculation into tenant attachment instead of
activation.
2025-06-19 09:53:18 +00:00
Suhas Thalanki
20f4febce1 fix: additional changes to terminate pgbouncer on compute suspend (#12153) (#12284)
Addressed [retrospective comments
](https://github.com/neondatabase/neon/pull/12153#discussion_r2154197503)
to https://github.com/neondatabase/neon/pull/12153
2025-06-18 19:31:22 +00:00
Mikhail
762905cf8d endpoint storage: parse config with type:LocalFs|AwsS3|AzureContainer (#12282)
https://github.com/neondatabase/cloud/issues/27195
2025-06-18 17:45:20 +00:00
Elizabeth Murray
830ef35ed3 Domain client for Pageserver GRPC. (#12111)
Add domain client for new communicator GRPC types.
2025-06-18 15:51:49 +00:00
Erik Grinaker
d8d62fb7cb test_runner: add gRPC support (#12279)
## Problem

`test_runner` integration tests should support gRPC.

Touches #11926.

## Summary of changes

* Enable gRPC for Pageservers, with dynamic port allocations.
* Add a `grpc` parameter for endpoint creation, plumbed through to
`neon_local endpoint create`.

No tests actually use gRPC yet, since computes don't support it yet.
2025-06-18 14:05:13 +00:00
Aleksandr Sarantsev
e6a404c66d Fix flaky test_sharding_split_failures (#12199)
## Problem

`test_sharding_failures` is flaky due to interference from the
`background_reconcile` process.

The details are in the issue
https://github.com/neondatabase/neon/issues/12029.

## Summary of changes

- Use `reconcile_until_idle` to ensure a stable state before running
test assertions
- Added error tolerance in `reconcile_until_idle` test function (Failure
cases: 1, 3, 19, 20)
- Ignore the `Keeping extra secondaries` warning message since it i
retryable (Failure case: 2)
- Deduplicated code in `assert_rolled_back` and `assert_split_done`
- Added a log message before printing plenty of Node `X` seen on
pageserver `Y`
2025-06-18 13:27:41 +00:00
Peter Bendel
7e711ede44 Increase tenant size for large tenant oltp workload (#12260)
## Problem

- We run the large tenant oltp workload with a fixed size (larger than
existing customers' workloads).
Our customer's workloads are continuously growing and our testing should
stay ahead of the customers' production workloads.
- we want to touch all tables in the tenant's database (updates) so that
we simulate a continuous change in layer files like in a real production
workload
- our current oltp benchmark uses a mixture of read and write
transactions, however we also want a separate test run with read-only
transactions only

## Summary of changes
- modify the existing workload to have a separate run with pgbench
custom scripts that are read-only
- create a new workload that 
- grows all large tables in each run (for the reuse branch in the large
oltp tenant's project)
- updates a percentage of rows in all large tables in each run (to
enforce table bloat and auto-vacuum runs and layer rebuild in
pageservers

Each run of the new workflow increases the logical database size about
16 GB.
We start with 6 runs per day which will give us about 96-100 GB growth
per day.

---------

Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>
2025-06-18 12:40:25 +00:00
Mikhail
e95f2f9a67 compute_ctl: return LSN in /terminate (#12240)
- Add optional `?mode=fast|immediate` to `/terminate`, `fast` is
default. Immediate avoids waiting 30
  seconds before returning from `terminate`.
- Add `TerminateMode` to `ComputeStatus::TerminationPending`
- Use `/terminate?mode=immediate` in `neon_local` instead of `pg_ctl
stop` for `test_replica_promotes`.
- Change `test_replica_promotes` to check returned LSN
- Annotate `finish_sync_safekeepers` as `noreturn`.

https://github.com/neondatabase/cloud/issues/29807
2025-06-18 12:25:19 +00:00
Heikki Linnakangas
5a045e7d52 Move pagestream_api to separate module (#12272)
For general readability.
2025-06-18 12:03:14 +00:00
Dimitri Fontaine
67fbc0582e Validate safekeeper_connstrings when parsing compute specs. (#11906)
This check API only cheks the safekeeper_connstrings at the moment, and
the validation is limited to checking we have at least one entry in
there, and no duplicates.

## Problem

If the compute_ctl service is started with an empty list of safekeepers,
then hard-to-debug errors may happen at runtime, where it would be much
easier to catch them early.

## Summary of changes

Add an entry point in the compute_ctl API to validate the configuration
for safekeeper_connstrings.

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2025-06-18 10:01:05 +00:00
Heikki Linnakangas
3af6b3a2bf Avoid redownloading rust toolchain on Postgres changes (#12265)
Create a separate stage for downloading the Rust toolchain for pgrx, so
that it can be cached independently of the pg-build layer. Before this,
the 'pg-build-nonroot=with-cargo' layer was unnecessarily rebuilt every
time there was a change in PostgreSQL sources. Furthermore, this allows
using the same cached layer for building the compute images of all
Postgres versions.
2025-06-18 09:49:42 +00:00
Erik Grinaker
04013929cb pageserver: support full gRPC basebackups (#12269)
## Problem

Full basebackups are used in tests, and may be useful for debugging as
well, so we should support them in the gRPC API.

Touches #11728.

## Summary of changes

Add `GetBaseBackupRequest::full` to generate full base backups.

The libpq implementation also allows specifying `prev_lsn` for full
backups, i.e. the end LSN of the previous WAL record. This is omitted in
the gRPC API, since it's not used by any tests, and presumably of
limited value since it's autodetected. We can add it later if we find
that we need it.
2025-06-18 06:48:39 +00:00
Suhas Thalanki
83069f6ca1 fix: terminate pgbouncer on compute suspend (#12153)
## Problem

PgBouncer does not terminate connections on a suspend:
https://github.com/neondatabase/cloud/issues/16282

## Summary of changes

1. Adds a pid file to store the pid of PgBouncer
2. Terminates connections on a compute suspend

---------

Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2025-06-17 22:56:05 +00:00
Mikhail
7d4f662fbf upgrade default neon version to 1.6 (#12185)
Changes for 1.6 were merged and deployed two months ago
https://github.com/neondatabase/neon/blob/main/pgxn/neon/neon--1.6--1.5.sql.
In order to deploy https://github.com/neondatabase/neon/pull/12183, we
need 1.6 to be default, otherwise we can't use prewarm API on read-only
replica (`ALTER EXTENSION` won't work) and we need it for promotion
2025-06-17 17:46:35 +00:00
Alexander Bayandin
a5cac52e26 compute-image: add a patch for onnxruntime (#12274)
## Problem

The checksum for eigen (a dependency for onnxruntime) has changed which
breaks compute image build.

## Summary of changes
- Add a patch for onnxruntime which backports changes from
f57db79743
(we keep the current version)

Ref https://github.com/microsoft/onnxruntime/issues/24861
2025-06-17 16:35:20 +00:00
Konstantin Knizhnik
dfa055f4be Support event trigger for Neon users (#10624)
## Problem

https://github.com/neondatabase/neon/issues/7570

Even triggers are supported only for superusers.

## Summary of changes

Temporary switch to superuser when even trigger is created and disable
execution of user's even triggers under superuser.

---------

Co-authored-by: Dimitri Fontaine <dim@tapoueh.org>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-06-17 15:44:50 +00:00
Erik Grinaker
a4c76740c0 pageserver: emit gRPC GetPage errors as responses (#12255)
## Problem

When converting `proto::GetPageRequest` into `page_api::GetPageRequest`
and validating the request, errors are returned as `tonic::Status`. This
will tear down the GetPage stream, which is disruptive and unnecessary.

## Summary of changes

Emit invalid request errors as `GetPageResponse` with an appropriate
`status_code` instead.

Also move the conversion from `tonic::Status` to `GetPageResponse` out
into the stream handler.
2025-06-17 15:41:17 +00:00
Dmitrii Kovalkov
f2e96b2323 tests: prepare test_compatibility.py for --timelines-onto-safekeepers (#12204)
## Problem
Compatibility tests may be run against a compatibility snapshot
generated with --timelines-onto-safekeepers=false. We need to start the
compute without a generation (or with 0 generation) if the timeline is
not storcon-managed, otherwise the compute will hang.

- Follow up on https://github.com/neondatabase/neon/pull/12203
- Relates to https://github.com/neondatabase/neon/pull/11712

## Summary of changes
- Handle compatibility snapshot generated with no
`--timelines-onot-safekeepers` properly
2025-06-17 15:16:07 +00:00
Dmitrii Kovalkov
dee73f0cb4 pageserver: implement max_total_size_bytes limit for basebackup cache (#12230)
## Problem
The cache was introduced as a hackathon project and the only supported
limit was the number of entries.
The basebackup entry size may vary. We need to have more control over
disk space usage to ship it to production.

- Part of https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Store the size of entries in the cache and use it to limit
`max_total_size_bytes`
- Add the size of the cache in bytes to metrics.
2025-06-17 15:08:59 +00:00
Erik Grinaker
edf51688bc neon_local: support gRPC connstrings for endpoints (#12271)
## Problem

`neon_local` should support endpoints using gRPC, by providing `grpc://`
connstrings with the Pageservers' gRPC ports.

Requires #12268.
Touches #11926.

## Summary of changes

* Add `--grpc` switch for `neon_local endpoint create`.
* Generate `grpc://` connstrings for endpoints when enabled.

Computes don't actually support `grpc://` connstrings yet, but will
soon.

gRPC is configured when the endpoint is created, not when it's started,
such that it continues to use gRPC across restarts and reconfigurations.
In particular, this is necessary for the storage controller's local
notify hook, which can't easily plumb through gRPC configuration from
the start/reconfigure commands but has access to the endpoint's
configuration.
2025-06-17 14:39:42 +00:00
Aleksandr Sarantsev
4a8f3508f9 storcon: Add safekeeper request label group (#12239)
## Problem

The metrics `storage_controller_safekeeper_request_error` and
`storage_controller_safekeeper_request_latency` currently use
`pageserver_id` as a label.
This can be misleading, as the metrics are about safekeeper requests.  
We want to replace this with a more accurate label — either
`safekeeper_id` or `node_id`.

## Summary of changes

- Introduced `SafekeeperRequestLabelGroup` with `safekeeper_id`.
- Updated the affected metrics to use the new label group.
- Fixed incorrect metric usage in safekeeper_client.rs

## Follow-up

- Review usage of these metrics in alerting rules and existing Grafana
dashboards to ensure this change does not break something.
2025-06-17 13:33:01 +00:00
Erik Grinaker
48052477b4 storcon: register Pageserver gRPC address (#12268)
## Problem

Pageservers now expose a gRPC API on a separate address and port. This
must be registered with the storage controller such that it can be
plumbed through to the compute via cplane.

Touches #11926.

## Summary of changes

This patch registers the gRPC address and port with the storage
controller:

* Add gRPC address to `nodes` database table and `NodePersistence`, with
a Diesel migration.
* Add gRPC address in `NodeMetadata`, `NodeRegisterRequest`,
`NodeDescribeResponse`, and `TenantLocateResponseShard`.
* Add gRPC address flags to `storcon_cli node-register`.

These changes are backwards-compatible, since all structs will ignore
unknown fields during deserialization.
2025-06-17 13:27:10 +00:00
Erik Grinaker
d81353b2d1 pageserver: gRPC base backup fixes (#12243)
## Problem

The gRPC base backup implementation has a few issues: chunks are not
properly bounded, and it's not possible to omit the LSN.

Touches #11728.

## Summary of changes

* Properly bound chunks by using a limited writer.
* Use an `Option<Lsn>` rather than a `ReadLsn` (the latter requires an
LSN).
2025-06-17 12:37:43 +00:00
Aleksandr Sarantsev
143500dc4f storcon: Improve stably_attached readability (#12249)
## Problem

The `stably_attached` function is hard to read due to deeply nested
conditionals

## Summary of Changes

- Refactored `stably_attached` to use early returns and the `?` operator
for improved readability
2025-06-17 10:10:10 +00:00
Aleksandr Sarantsev
1a5f7ce6ad storcon: Exclude another secondaries while optimizing secondary (#12251)
## Problem

If the node intent includes more than one secondary, we can generate a
replace optimization using a candidate node that is already a secondary
location.

## Summary of changes

- Exclude all other secondary nodes from the scoring process to ensure
optimal candidate selection.
2025-06-17 10:09:55 +00:00
Alexander Lakhin
01ccb34118 Don't rerun failed tests in 'Build and Test with Sanitizers' workflow (#12259)
## Problem

We could easily miss a sanitizer-detected defect, if it occurred due to
some race condition, as we just rerun the test and if it succeeds, the
overall test run is considered successful. It was more reasonable
before, when we had much more unstable tests in main, but now we can
track all test failures.

## Summary of changes
Don't rerun failed tests.
2025-06-17 08:08:43 +00:00
github-actions[bot]
bf2a21567d Proxy release 2025-06-17 06:01 UTC 2025-06-17 06:01:42 +00:00
Tristan Partin
f669e18477 Remove TODO comment related to default_transaction_read_only (#12261)
This code has been deployed for a while, so let's remove the TODO, and
remove the option passed from the control plane.

Link: https://github.com/neondatabase/cloud/pull/30274

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-06-16 19:38:26 +00:00
Suhas Thalanki
632cde7f13 schema and github workflow for validation of compute manifest (#12069)
Adds a schema to validate the manifest.yaml described in [this
RFC](https://github.com/neondatabase/neon/blob/main/docs/rfcs/038-independent-compute-release.md)
and a github workflow to test this.
2025-06-16 19:30:41 +00:00
Alexander Lakhin
118e13438d Add "Build and Test Fully" workflow (#11931)
## Problem

We don't test debug builds for v14..v16 in the regular "Build and Test"
runs to perform the testing faster, but it means we can't detect
assertion failures in those versions.
(See https://github.com/neondatabase/neon/issues/11891,
https://github.com/neondatabase/neon/issues/11997)

## Summary of changes
Add a new workflow to test all build types and all versions on all
architectures.
2025-06-16 13:29:39 +00:00
Trung Dinh
fc136eec8f pagectl: add dump layer local (#12245)
## Problem
In our environment, we don't always have access to the pagectl tool on
the pageserver. We have to download the page files to local env to
introspect them. Hence, it'll be useful to be able to parse the local
files using `pagectl`.

## Summary of changes
* Add `dump-layer-local` to `pagectl` that takes a local path as
argument and returns the layer content:
```
cargo  run -p pagectl layer dump-layer-local ~/Desktop/000000067F000040490002800000FFFFFFFF-030000000000000000000000000000000002__00003E7A53EDE611-00003E7AF27BFD19-v1-00000001
```

* Bonus: Fix a bug in `pageserver/ctl/src/draw_timeline_dir.rs` in which
we don't filter out temporary files.
2025-06-16 10:29:42 +00:00
Erik Grinaker
818e5130f1 page_api: add a few derives (#12253)
## Problem

The `page_api` domain types are missing a few derives.

## Summary of changes

Add `Clone`, `Copy`, and `Debug` derives for all types where
appropriate.
2025-06-16 09:45:50 +00:00
Alexander Sarantcev
c243521ae5 Fix reconcile_long_running metric comment (#12234)
## Problem

Comment for `storage_controller_reconcile_long_running` metric was
copy-pasted and not updated in #9207

## Summary of changes

- Fixed comment
2025-06-16 05:51:57 +00:00
Alexander Sarantcev
5303c71589 Move comment above metrics handler (#12236)
## Problem

Comment is in incorrect place: `/metrics` code is above its description
comment.

## Summary of changes

- `/metrics` code is now below the comment
2025-06-13 18:18:51 +00:00
Alexander Sarantcev
d146897415 Fix reconciles metrics typo (#12235)
## Problem

Need to fix naming `safkeeper` -> `safekeeper`

## Summary of changes

- `storage_controller_safkeeper_reconciles_*` renamed to
`storage_controller_safekeeper_reconciles_*`
2025-06-13 17:47:09 +00:00
Alexander Sarantcev
d63815fa40 Fix ChaosInjector shard eligibility bug (#12231)
## Problem

ChaosInjector is intended to skip non-active scheduling policies, but
the current logic skips active shards instead.

## Summary of changes

- Fixed shard eligibility condition to correctly allow chaos injection
for shards with an Active scheduling policy.
2025-06-13 13:34:29 +00:00
Dmitrii Kovalkov
385324ee8a pageserver: fix post-merge PR comments on basebackup cache (#12216)
## Problem
This PR addresses all but the direct IO post-merge comments on
basebackup cache implementation.

- Follow up on
https://github.com/neondatabase/neon/pull/11989#pullrequestreview-2867966119
- Part of https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Clean up the tmp directory by recreating it.
- Recreate the tmp directory on startup.
- Add comments why it's safe to not fsync the inode after renaming.
2025-06-13 08:49:31 +00:00
Alex Chi Z.
8a68d463f6 feat(pagectl): no max key limit if time travel recover locally (#12222)
## Problem

We would easily hit this limit for a tenant running for enough long
time.

## Summary of changes

Remove the max key limit for time-travel recovery if the command is
running locally.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-13 08:41:10 +00:00
Alex Chi Z.
3046c307da feat(posthog_client): support feature flag secure API (#12201)
## Problem

Part of #11813 

PostHog has two endpoints to retrieve feature flags: the old project ID
one that uses personal API token, and the new one using a special
feature flag secure token that can only retrieve feature flag. The new
API I added in this patch is not documented in the PostHog API doc but
it's used in their Python SDK.

## Summary of changes

Add support for "feature flag secure token API". The API has no way of
providing a project ID so we verify if the retrieved spec is consistent
with the project ID specified by comparing the `team_id` field.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-13 07:22:02 +00:00
Dmitrii Kovalkov
e83f1d8ba5 tests: prepare test_historic_storage_formats for --timelines-onto-safekeepers (#12214)
## Problem
`test_historic_storage_formats` uses `/tenant_import` to import historic
data. Tenant import does not create timelines onto safekeepers, because
they might already exist on some safekeeper set. If it does, then we may
end up with two different quorums accepting WAL for the same timeline.

If the tenant import is used in a real deployment, the administrator is
responsible for looking for the proper safekeeper set and migrate
timelines into storcon-managed timelines.

- Relates to https://github.com/neondatabase/neon/pull/11712

## Summary of changes
- Create timelines onto safekeepers manually after tenant import in
`test_historic_storage_formats`
- Add a note to tenant import that timelines will be not storcon-managed
after the import.
2025-06-13 06:28:18 +00:00
Trung Dinh
8917676e86 Improve logging for gc-compaction (#12219)
## Problem
* Inside `compact_with_gc_inner`, there is a similar log line:

db24ba95d1/pageserver/src/tenant/timeline/compaction.rs (L3181-L3187)

* Also, I think it would be useful when debugging to have the ability to
select a particular sub-compaction job (e.g., `1/100`) to see all the
logs for that job.

## Summary of changes
* Attach a span to the `compact_with_gc_inner`. 

CC: @skyzh
2025-06-13 06:07:18 +00:00
Ivan Efremov
43acabd4c2 [proxy]: Improve backoff strategy for redis reconnection (#12218)
Sometimes during a failed redis connection attempt at the init stage
proxy pod can continuously restart. This, in turn, can aggravate the
problem if redis is overloaded.

Solves the #11114
2025-06-12 19:46:02 +00:00
Vlad Lazar
db24ba95d1 pagserver: always persist shard identity (#12217)
## Problem

The location config (which includes the stripe size) is stored on
pageserver disk.
For unsharded tenants we [do not include the shard identity in the
serialized
description](ad88ec9257/pageserver/src/tenant/config.rs (L64-L66)).
When the pageserver restarts, it reads that configuration and will use
the stripe size from there
and rely on storcon input from reattach for generation and mode.

The default deserialization is ShardIdentity::unsharded. This has the
new default stripe size of 2048.
Hence, for unsharded tenants we can be running with a stripe size
different from that the one in the
storcon observed state. This is not a problem until we shard split
without specifying a stripe size (i.e. manual splits via the UI or
storcon_cli). When that happens the new shards will use the 2048 stripe
size until storcon realises and switches them back. At that point it's
too late, since we've ingested data with the wrong stripe sizes.

## Summary of changes

Ideally, we would always have the full shard identity on disk. To
achieve this over two releases we do:
1. Always persist the shard identity in the location config on the PS.
2. Storage controller includes the stripe size to use in the re attach
response.

After the first release, we will start persisting correct stripe sizes
for any tenant shard that the storage controller
explicitly sends a location_conf. After the second release, the
re-attach change kicks in and we'll persist the
shard identity for all shards.
2025-06-12 17:15:02 +00:00
Folke Behrens
1dce65308d Update base64 to 0.22 (#12215)
## Problem

Base64 0.13 is outdated.

## Summary of changes

Update base64 to 0.22. Affects mostly proxy and proxy libs. Also upgrade
serde_with to remove another dep on base64 0.13 from dep tree.
2025-06-12 16:12:47 +00:00
Alex Chi Z.
ad88ec9257 fix(pageserver): extend layer manager read guard threshold (#12211)
## Problem

Follow up of https://github.com/neondatabase/neon/pull/12194 to make the
benchmarks run without warnings.

## Summary of changes

Extend read guard hold timeout to 30s.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-12 08:39:54 +00:00
Dmitrii Kovalkov
60dfdf39c7 tests: prepare test_tenant_delete_stale_shards for --timelines-onto-safekeepers (#12198)
## Problem
The test creates an endpoint and deletes its tenant. The compute cannot
stop gracefully because it tries to write a checkpoint shutdown record
into the WAL, but the timeline had been already deleted from
safekeepers.

- Relates to https://github.com/neondatabase/neon/pull/11712

## Summary of changes
Stop the compute before deleting a tenant
2025-06-12 08:10:22 +00:00
Dmitrii Kovalkov
3d5e2bf685 storcon: add tenant_timeline_locate handler (#12203)
## Problem
Compatibility tests may be run against a compatibility snapshot
generated with `--timelines-onto-safekeepers=false`. We need to start
the compute without a generation (or with 0 generation) if the timeline
is not storcon-managed, otherwise the compute will hang.

This handler is needed to check if the timeline is storcon-managed.
It's also needed for better test coverage of safekeeper migration code.

- Relates to https://github.com/neondatabase/neon/pull/11712

## Summary of changes
- Implement `tenant_timeline_locate` handler in storcon to get
safekeeper info from storcon's DB
2025-06-12 08:09:57 +00:00
dependabot[bot]
54fdcfdfa8 build(deps): bump requests from 2.32.3 to 2.32.4 in the pip group across 1 directory (#12180)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-11 21:09:05 +00:00
Vlad Lazar
28e882a80f pageserver: warn on long layer manager locking intervals (#12194)
## Problem

We hold the layer map for too long on occasion.

## Summary of changes

This should help us identify the places where it's happening from.

Related https://github.com/neondatabase/neon/issues/12182
2025-06-11 16:16:30 +00:00
Konstantin Knizhnik
24038033bf Remove default from DROP FUNCTION (#12202)
## Problem

DROP FUNCTION doesn't allow to specify default for parameters.

## Summary of changes

Remove DEFAULT clause from pgxn/neon/neon--1.6--1.5.sql

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-06-11 13:16:58 +00:00
Mikhail
1b935b1958 endpoint_storage: add ?from_endpoint= to /lfc/prewarm (#12195)
Related: https://github.com/neondatabase/cloud/issues/24225
Add optional from_endpoint parameter to allow prewarming from other
endpoint
2025-06-10 19:25:32 +00:00
a-masterov
3f16ca2c18 Respect limits for projects for the Random Operations test (#12184)
## Problem
The project limits were not respected, resulting in errors.
## Summary of changes
Now limits are checked before running an action, and if the action is
not possible to run, another random action will be run.

---------

Co-authored-by: Peter Bendel <peterbendel@neon.tech>
2025-06-10 15:59:51 +00:00
Conrad Ludgate
67b94c5992 [proxy] per endpoint configuration for rate limits (#12148)
https://github.com/neondatabase/cloud/issues/28333

Adds a new `rate_limit` response type to EndpointAccessControl, uses it
for rate limiting, and adds a generic invalidation for the cache.
2025-06-10 14:26:08 +00:00
Folke Behrens
e38193c530 proxy: Move connect_to_compute back to proxy (#12181)
It's mostly responsible for waking, retrying, and caching. A new, thin
wrapper around compute_once will be PGLB's entry point
2025-06-10 11:23:03 +00:00
Konstantin Knizhnik
21949137ed Return last ring index instead of min_ring_index in prefetch_register_bufferv (#12039)
## Problem

See https://github.com/neondatabase/neon/issues/12018

Now `prefetch_register_bufferv` calculates min_ring_index of all vector
requests.
But because of pump prefetch state or connection failure, previous slots
can be already proceeded and reused.

## Summary of changes

Instead of returning minimal index, this function should return last
slot index.
Actually result of this function is used only in two places. A first
place just fort checking (and this check is redundant because the same
check is done in `prefetch_register_bufferv` itself.
And in the second place where index of filled slot is actually used,
there is just one request.
Sp fortunately this bug can cause only assert failure in debug build.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-06-10 10:09:46 +00:00
Trung Dinh
02f94edb60 Remove global static TENANTS (#12169)
## Problem
There is this TODO in code:
https://github.com/neondatabase/neon/blob/main/pageserver/src/tenant/mgr.rs#L300-L302
This is an old TODO by @jcsp.

## Summary of changes
This PR addresses the TODO. Specifically, it removes a global static
`TENANTS`. Instead the `TenantManager` now directly manages the tenant
map. Enhancing abstraction.

Essentially, this PR moves all module-level methods to inside the
implementation of `TenantManager`.
2025-06-10 09:26:40 +00:00
github-actions[bot]
24053ff4ca Proxy release 2025-06-10 08:56 UTC 2025-06-10 08:56:51 +00:00
Conrad Ludgate
58327ef74d [proxy] fix sql-over-http password setting (#12177)
## Problem

Looks like our sql-over-http tests get to rely on "trust"
authentication, so the path that made sure the authkeys data was set was
never being hit.

## Summary of changes

Slight refactor to WakeComputeBackends, as well as making sure auth keys
are propagated. Fix tests to ensure passwords are tested.
2025-06-10 08:46:29 +00:00
Dmitrii Kovalkov
73be6bb736 fix(compute): use proper safekeeper in VotesCollectedMset (#12175)
## Problem
`VotesCollectedMset` uses the wrong safekeeper to update truncateLsn.
This led to some failed assert later in the code during running
safekeeper migration tests.
- Relates to https://github.com/neondatabase/neon/issues/11823

## Summary of changes
Use proper safekeeper to update truncateLsn in VotesCollectedMset
2025-06-10 07:16:42 +00:00
Alex Chi Z.
40d7583906 feat(pageserver): use hostname as feature flag resolver property (#12141)
## Problem

part of https://github.com/neondatabase/neon/issues/11813

## Summary of changes

Collect pageserver hostname property so that we can use it in the
PostHog UI. Not sure if this is the best way to do that -- open to
suggestions.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-10 07:10:41 +00:00
Alex Chi Z.
7a68699abb feat(pageserver): support azure time-travel recovery (in an okay way) (#12140)
## Problem

part of https://github.com/neondatabase/neon/issues/7546

Add Azure time travel recovery support. The tricky thing is how Azure
handles deletes in its blob version API. For the following sequence:

```
upload file_1 = a
upload file_1 = b
delete file_1
upload file_1 = c
```

The "delete file_1" won't be stored as a version (as AWS did).
Therefore, we can never rollback to a state where file_1 is temporarily
invisible. If we roll back to the time before file_1 gets created for
the first time, it will be removed correctly.

However, this is fine for pageservers, because (1) having extra files in
the tenant storage is usually fine (2) for things like
timelines/X/index_part-Y.json, it will only be deleted once, so it can
always be recovered to a correct state. Therefore, I don't expect any
issues when this functionality is used on pageserver recovery.

TODO: unit tests for time-travel recovery.

## Summary of changes

Add Azure blob storage time-travel recovery support.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-10 05:32:58 +00:00
Konstantin Knizhnik
f42d44342d Increase statement timeout for test_pageserver_restarts_under_workload test (#12139)
\## Problem

See
https://github.com/neondatabase/neon/issues/12119#issuecomment-2942586090

Page server restarts with interval 1 seconds increases time of vacuum
especially off prefetch is enabled and so cause test failure because of
statement timeout expiration.

## Summary of changes

Increase statement timeout to 360 seconds.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>
2025-06-10 05:32:03 +00:00
Konstantin Knizhnik
d759fcb8bd Increase wait LFC prewarm timeout (#12174)
## Problem

See https://github.com/neondatabase/neon/issues/12171

## Summary of changes

Increase LFC prewarm wait timeout to 1 minute

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-06-09 18:01:30 +00:00
Alex Chi Z.
76f95f06d8 feat(pageserver): add global timeline count metrics (#12159)
## Problem

We are getting tenants with a lot of branches and num of timelines is a
good indicator of pageserver loads. I added this metrics to help us
better plan pageserver capacities.

## Summary of changes

Add `pageserver_timeline_states_count` with two labels: active +
offloaded.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-09 09:57:36 +00:00
Mikhail
7efd4554ab endpoint_storage: allow bypassing s3 write check on startup (#12165)
Related: https://github.com/neondatabase/cloud/issues/27195
2025-06-06 18:08:02 +00:00
Erik Grinaker
3c7235669a pageserver: don't delete parent shard files until split is committed (#12146)
## Problem

If a shard split fails and must roll back, the tenant may hit a cold
start as the parent shard's files have already been removed from local
disk.

External contribution with minor adjustments, see
https://neondb.slack.com/archives/C08TE3203RQ/p1748246398269309.

## Summary of changes

Keep the parent shard's files on local disk until the split has been
committed, such that they are available if the spilt is rolled back. If
all else fails, the files will be removed on the next Pageserver
restart.

This should also be fine in a mixed version:

* New storcon, old Pageserver: the Pageserver will delete the files
during the split, storcon will log an error when the cleanup detach
fails.

* Old storcon, new Pageserver: the Pageserver will leave the parent's
files around until the next Pageserver restart.

The change looks good to me, but shard splits are delicate so I'd like
some extra eyes on this.
2025-06-06 15:55:14 +00:00
Conrad Ludgate
6dd84041a1 refactor and simplify the invalidation notification structure (#12154)
The current cache invalidation messages are far too specific. They
should be more generic since it only ends up triggering a
`GetEndpointAccessControl` message anyway.

Mappings:
* `/allowed_ips_updated`, `/block_public_or_vpc_access_updated`, and
`/allowed_vpc_endpoints_updated_for_projects` ->
`/project_settings_update`.
* `/allowed_vpc_endpoints_updated_for_org` ->
`/account_settings_update`.
* `/password_updated` -> `/role_setting_update`.

I've also introduced `/endpoint_settings_update`.

All message types support singular or multiple entries, which allows us
to simplify things both on our side and on cplane side.

I'm opening a PR to cplane to apply the above mappings, but for now
using the old phrases to allow both to roll out independently.

This change is inspired by my need to add yet another cached entry to
`GetEndpointAccessControl` for
https://github.com/neondatabase/cloud/issues/28333
2025-06-06 12:49:29 +00:00
Arpad Müller
df7e301a54 safekeeper: special error if a timeline has been deleted (#12155)
We might delete timelines on safekeepers before we are deleting them on
pageservers. This should be an exceptional situation, but can occur. As
the first step to improve behaviour here, emit a special error that is
less scary/obscure than "was not found in global map".

It is for example emitted when the pageserver tries to run
`IDENTIFY_SYSTEM` on a timeline that has been deleted on the safekeeper.

Found when analyzing the failure of
`test_scrubber_physical_gc_timeline_deletion` when enabling
`--timelines-onto-safekeepers` on the pytests.

Due to safekeeper restarts, there is no hard guarantee that we will keep
issuing this error, so we need to think of something better if we start
encountering this in staging/prod. But I would say that the introduction
of `--timelines-onto-safekeepers` in the pytests and into staging won't
change much about this: we are already deleting timelines from there. In
`test_scrubber_physical_gc_timeline_deletion`, we'd just be leaking the
timeline before on the safekeepers.

Part of #11712
2025-06-06 11:54:07 +00:00
Mikhail
470c7d5e0e endpoint_storage: default listen port, allow inline config (#12152)
Related: https://github.com/neondatabase/cloud/issues/27195
2025-06-06 11:48:01 +00:00
Conrad Ludgate
4d99b6ff4d [proxy] separate compute connect from compute authentication (#12145)
## Problem

PGLB/Neonkeeper needs to separate the concerns of connecting to compute,
and authenticating to compute.

Additionally, the code within `connect_to_compute` is rather messy,
spending effort on recovering the authentication info after
wake_compute.

## Summary of changes

Split `ConnCfg` into `ConnectInfo` and `AuthInfo`. `wake_compute` only
returns `ConnectInfo` and `AuthInfo` is determined separately from the
`handshake`/`authenticate` process.

Additionally, `ConnectInfo::connect_raw` is in-charge or establishing
the TLS connection, and the `postgres_client::Config::connect_raw` is
configured to use `NoTls` which will force it to skip the TLS
negotiation. This should just work.
2025-06-06 10:29:55 +00:00
Alexander Sarantcev
590301df08 storcon: Introduce deletion tombstones to support flaky node scenario (#12096)
## Problem

Removed nodes can re-add themselves on restart if not properly
tombstoned. We need a mechanism (e.g. soft-delete flag) to prevent this,
especially in cases where the node is unreachable.

More details there: #12036

## Summary of changes

- Introduced `NodeLifecycle` enum to represent node lifecycle states.
- Added a string representation of `NodeLifecycle` to the `nodes` table.
- Implemented node removal using a tombstone mechanism.
- Introduced `/debug/v1/tombstone*` handlers to manage the tombstone
state.
2025-06-06 10:16:55 +00:00
Erik Grinaker
c511786548 pageserver: move spawn_grpc to GrpcPageServiceHandler::spawn (#12147)
Mechanical move, no logic changes.
2025-06-06 10:01:58 +00:00
Alex Chi Z.
fe31baf985 feat(build): add aws cli into the docker image (#12161)
## Problem

Makes it easier to debug AWS permission issues (i.e., storage scrubber)

## Summary of changes

Install awscliv2 into the docker image.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-06 09:38:58 +00:00
Alex Chi Z.
b23e75ebfe test(pageserver): ensure offload cleans up metrics (#12127)
Add a test to ensure timeline metrics are fully cleaned up after
offloading.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-06 06:50:54 +00:00
Arpad Müller
24d7c37e6e neon_local timeline import: create timelines on safekeepers (#12138)
neon_local's timeline import subcommand creates timelines manually, but
doesn't create them on the safekeepers. If a test then tries to open an
endpoint to read from the timeline, it will error in the new world with
`--timelines-onto-safekeepers`.

Therefore, if that flag is enabled, create the timelines on the
safekeepers.

Note that this import functionality is different from the fast import
feature (https://github.com/neondatabase/neon/issues/10188, #11801).

Part of #11670
As well as part of #11712
2025-06-05 18:53:14 +00:00
a-masterov
f64eb0cbaf Remove the Flaky Test computed-columns from postgis v16 (#12132)
## Problem
The `computed_columns` test assumes that computed columns are always
faster than the request itself. However, this is not always the case on
Neon, which can lead to flaky results.
## Summary of changes
The `computed_columns` test is excluded from the PostGIS test for
PostgreSQL v16, accompanied by related patch refactoring.
2025-06-05 15:02:38 +00:00
Alexey Kondratov
6ae4b89000 feat(compute_ctl): Implement graceful compute monitor exit (#11911)
## Problem

After introducing a naive downtime calculation for the Postgres process
inside compute in https://github.com/neondatabase/neon/pull/11346, I
noticed that some amount of computes regularly report short downtime.
After checking some particular cases, it looks like all of them report
downtime close to the end of the life of the compute, i.e., when the
control plane calls a `/terminate` and we are waiting for Postgres to
exit.

Compute monitor also produces a lot of error logs because Postgres stops
accepting connections, but it's expected during the termination process.

## Summary of changes

Regularly check the compute status inside the main compute monitor loop
and exit gracefully when the compute is in some terminal or
soon-to-be-terminal state.

---------

Co-authored-by: Tristan Partin <tristan@neon.tech>
2025-06-05 12:17:28 +00:00
Dmitrii Kovalkov
f7ec7668a2 pageserver, tests: prepare test_basebackup_cache for --timelines-onto-safekeepers (#12143)
## Problem
- `test_basebackup_cache` fails in
https://github.com/neondatabase/neon/pull/11712 because after the
timelines on safekeepers are managed by storage controller, they do
contain proper start_lsn and the compute_ctl tool sends the first
basebackup request with this LSN.
- `Failed to prepare basebackup` log messages during timeline
initialization, because the timeline is not yet in the global timeline
map.

- Relates to https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Account for `timeline_onto_safekeepers` storcon's option in the test.
- Do not trigger basebackup prepare during the timeline initialization.
2025-06-05 12:04:37 +00:00
a-masterov
038e967daf Configure the dynamic loader for the extension-tests image (#12142)
## Problem
The same problem, fixed in
https://github.com/neondatabase/neon/issues/11857, but for the image
`neon-extesions-test`
## Summary of changes
The config file was added to use our library
2025-06-05 12:03:51 +00:00
Erik Grinaker
6a43f23eca pagebench: add batch support (#12133)
## Problem

The new gRPC page service protocol supports client-side batches. The
current libpq protocol only does best-effort server-side batching.

To compare these approaches, Pagebench should support submitting
contiguous page batches, similar to how Postgres will submit them (e.g.
with prefetches or vectored reads).

## Summary of changes

Add a `--batch-size` parameter specifying the size of contiguous page
batches. One batch counts as 1 RPS and 1 queue depth.

For the libpq protocol, a batch is submitted as individual requests and
we rely on the server to batch them for us. This will give a realistic
comparison of how these would be processed in the wild (e.g. when
Postgres sends 100 prefetch requests).

This patch also adds some basic validation of responses.
2025-06-05 11:52:52 +00:00
Vlad Lazar
868f194a3b pageserver: remove handling of vanilla protocol (#12126)
## Problem

We support two ingest protocols on the pageserver: vanilla and
interpreted.
Interpreted has been the only protocol in use for a long time.

## Summary of changes

* Remove the ingest handling of the vanilla protocol
* Remove tenant and pageserver configuration for it
* Update all tests that tweaked the ingest protocol

## Compatibility

Backward compatibility:
* The new pageserver version can read the existing pageserver
configuration and it will ignore the unknown field.
* When the tenant config is read from the storcon db or from the
pageserver disk, the extra field will be ignored.

Forward compatiblity:
* Both the pageserver config and the tenant config map missing fields to
their default value.

I'm not aware of any tenant level override that was made for this knob.
2025-06-05 11:43:04 +00:00
Konstantin Knizhnik
9c6c780201 Replica promote (#12090)
## Problem

This PR is part of larger computes support activity:

https://www.notion.so/neondatabase/Larger-computes-114f189e00478080ba01e8651ab7da90

Epic: https://github.com/neondatabase/cloud/issues/19010

In case of planned node restart, we are going to 
1. create new read-only replica
2. capture LFC state at primary
3. use this state to prewarm replica
4. stop old primary
5. promote replica to primary

Steps 1-3 are currently implemented and support from compute side.
This PR provides compute level implementation of replica promotion.

Support replica promotion

## Summary of changes

Right now replica promotion is done in three steps:
1. Set safekeepers list (now it is empty for replica)
2. Call `pg_promote()` top promote replica
3. Update endpoint setting to that it ids not more treated as replica.

May be all this three steps should be done by some function in
compute_ctl. But right now this logic is only implement5ed in test.

Postgres submodules PRs:
https://github.com/neondatabase/postgres/pull/648
https://github.com/neondatabase/postgres/pull/649
https://github.com/neondatabase/postgres/pull/650
https://github.com/neondatabase/postgres/pull/651

---------

Co-authored-by: Matthias van de Meent <matthias@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-06-05 11:27:14 +00:00
Konstantin Knizhnik
6123fe2d5e Add query execution time histogram (#10050)
## Problem


It will be useful to understand what kind of queries our clients are
executed.
And one of the most important characteristic of query is query execution
time - at least it allows to distinguish OLAP and OLTP queries. Also
monitoring query execution time can help to detect problem with
performance (assuming that workload is more or less stable).

## Summary of changes

Add query execution time histogram.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-06-05 11:23:39 +00:00
Folke Behrens
1577665c20 proxy: Move PGLB-related modules into pglb root module. (#12144)
Split the modules responsible for passing data and connecting to compute
from auth and waking for PGLB.
This PR just moves files. The waking is going to get removed from pglb
after this.
2025-06-05 11:00:23 +00:00
Alex Chi Z.
d8ebd1d771 feat(pageserver): report tenant properties to posthog (#12113)
## Problem

Part of https://github.com/neondatabase/neon/issues/11813

In PostHog UI, we need to create the properties before using them as a
filter. We report all variants automatically when we start the
pageserver. In the future, we can report all real tenants instead of
fake tenants (we do that now to save money + we don't need real tenants
in the UI).

## Summary of changes

* Collect `region`, `availability_zone`, `pageserver_id` properties and
use them in the feature evaluation.
* Report 10 fake tenants on each pageserver startup.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-05 07:48:36 +00:00
Conrad Ludgate
c8a96cf722 update proxy protocol parsing to not a rw wrapper (#12035)
## Problem

I believe in all environments we now specify either required/rejected
for proxy-protocol V2 as required. We no longer rely on the supported
flow. This means we no longer need to keep around read bytes incase
they're not in a header.

While I designed ChainRW to be fast (the hot path with an empty buffer
is very easy to branch predict), it's still unnecessary.

## Summary of changes

* Remove the ChainRW wrapper
* Refactor how we read the proxy-protocol header using read_exact.
Slightly worse perf but it's hardly significant.
* Don't try and parse the header if it's rejected.
2025-06-05 07:12:00 +00:00
Konstantin Knizhnik
56d505bce6 Update online_advisor (#12045)
## Problem

Investigate crash of online_advisor in image check

## Summary of changes

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-06-05 05:48:25 +00:00
Arpad Müller
dae203ef69 pgxn: support generations in safekeepers_cmp (#12129)
`safekeepers_cmp` was added by #8840 to make changes of the safekeeper
set order independent: a `sk1,sk2,sk3` specifier changed to
`sk3,sk1,sk2` should not cause a walproposer restart. However, this
check didn't support generations, in the sense that it would see the
`g#123:` as part of the first safekeeper in the list, and if the first
safekeeper changes, it would also restart the walproposer.

Therefore, parse the generation properly and make it not be a part of
the generation.

This PR doesn't add a specific test, but I have confirmed locally that
`test_safekeepers_reconfigure_reorder` is fixed with the changes of PR
#11712 applied thanks to this PR.

Part of https://github.com/neondatabase/neon/issues/11670
2025-06-04 23:02:31 +00:00
Conrad Ludgate
1fb1315aed compute-ctl: add spec for enable_tls, separate from compute-ctl config (#12109)
## Problem

Inbetween adding the TLS config for compute-ctl, and adding the TLS
config in controlplane, we switched from using a provision flag to a
bind flag. This happened to work in all of my testing in preview regions
as they have no VM pool, so each bind was also a provision. However, in
staging I found that the TLS config is still only processed during
provision, even though it's only sent on bind.

## Summary of changes

* Add a new feature flag value, `tls_experimental`, which tells
postgres/pgbouncer/local_proxy to use the TLS certificates on bind.
* compute_ctl on provision will be told where the certificates are,
instead of being told on bind.
2025-06-04 20:07:47 +00:00
Suhas Thalanki
838622c594 compute: Add manifest.yml for default Postgres configuration settings (#11820)
Adds a `manifest.yml` file that contains the default settings for
compute. Currently, it comes from cplane code
[here](0cda3d4b01/goapp/controlplane/internal/pkg/compute/computespec/pg_settings.go (L110)).

Related RFC:
https://github.com/neondatabase/neon/blob/main/docs/rfcs/038-independent-compute-release.md

Related Issue: https://github.com/neondatabase/cloud/issues/11698
2025-06-04 18:03:59 +00:00
Tristan Partin
3fd5a94a85 Use Url::join() when creating the final remote extension URL (#12121)
Url::to_string() adds a trailing slash on the base URL, so when we did
the format!(), we were adding a double forward slash.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-06-04 15:56:12 +00:00
Erik Grinaker
e7d6f525b3 pageserver: support get_vectored_concurrent_io with gRPC (#12131)
## Problem

The gRPC page service doesn't respect `get_vectored_concurrent_io` and
always uses sequential IO.

## Summary of changes

Spawn a sidecar task for concurrent IO when enabled.

Cancellation will be addressed separately.
2025-06-04 15:14:17 +00:00
a-masterov
e4ca3ac745 Fix codestyle for compute.sh for docker-compose (#12128)
## Problem
The script `compute.sh` had a non-consistent coding style and didn't
follow best practices for modern bash scripts
## Summary of changes
The coding style was fixed to follow best practices.
2025-06-04 15:07:48 +00:00
Vlad Lazar
b69d103b90 pageserver: make import job max byte range size configurable (#12117)
## Problem

We want to repro an OOM situation, but large partial reads are required.

## Summary of Changes

Make the max partial read size configurable for import jobs.
2025-06-04 10:44:23 +00:00
a-masterov
208cbd52d4 Add postgis to the test image (#11672)
## Problem
We don't currently run tests for PostGIS in our test environment.

## Summary of Changes
- Added PostGIS test support for PostgreSQL v16 and v17
- Configured different PostGIS versions based on PostgreSQL version:
  - PostgreSQL v17: PostGIS 3.5.0
  - PostgreSQL v14/v15/v16: PostGIS 3.3.3
- Added necessary test scripts and configurations

This ensures our PostgreSQL implementation remains compatible with this
widely-used extension.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-06-04 09:57:31 +00:00
Alex Chi Z.
c567ed0de0 feat(pageserver): feature flag counter metrics (#12112)
## Problem

Part of https://github.com/neondatabase/neon/issues/11813

## Summary of changes

Add a counter on the feature evaluation outcome and we will set up
alerts for too many failed evaluations in the future.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-04 06:41:42 +00:00
Mikhail
c698cee19a ComputeSpec: prewarm_lfc_on_startup -> autoprewarm (#12120)
https://github.com/neondatabase/cloud/pull/29472
https://github.com/neondatabase/cloud/issues/26346
2025-06-04 05:38:03 +00:00
Tristan Partin
4a3f32bf4a Clean up compute_tools::http::JsonResponse::invalid_status() (#12110)
JsonResponse::error() properly logs an error message which can be read
in the compute logs. invalid_status() was not going through that helper
function, thus not logging anything.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-06-03 16:00:56 +00:00
Vlad Lazar
a963aab14b pagserver: set default wal receiver proto to interpreted (#12100)
## Problem

This is already the default in production and in our test suite.

## Summary of changes

Set the default proto to interpreted to reduce friction when spinning up
new regions or cells.
2025-06-03 14:57:36 +00:00
Erik Grinaker
5bdba70f7d page_api: only validate Protobuf → domain type conversion (#12115)
## Problem

Currently, `page_api` domain types validate message invariants both when
converting Protobuf → domain and domain → Protobuf. This is annoying for
clients, because they can't use stream combinators to convert streamed
requests (needed for hot path performance), and also performs the
validation twice in the common case.

Blocks #12099.

## Summary of changes

Only validate the Protobuf → domain type conversion, i.e. on the
receiver side, and make domain → Protobuf infallible. This is where it
matters -- the Protobuf types are less strict than the domain types, and
receivers should expect all sorts of junk from senders (they're not
required to validate anyway, and can just construct an invalid message
manually).

Also adds a missing `impl From<CheckRelExistsRequest> for
proto::CheckRelExistsRequest`.
2025-06-03 13:50:41 +00:00
Trung Dinh
25fffd3a55 Validate max_batch_size against max_get_vectored_keys (#12052)
## Problem
Setting `max_batch_size` to anything higher than
`Timeline::MAX_GET_VECTORED_KEYS` will cause runtime error. We should
rather fail fast at startup if this is the case.

## Summary of changes
* Create `max_get_vectored_keys` as a new configuration (default to 32);
* Validate `max_batch_size` against `max_get_vectored_keys` right at
config parsing and validation.

Closes https://github.com/neondatabase/neon/issues/11994
2025-06-03 13:37:11 +00:00
Erik Grinaker
e00fd45bba page_api: remove smallvec (#12095)
## Problem

The gRPC `page_api` domain types used smallvecs to avoid heap
allocations in the common case where a single page is requested.

However, this is pointless: the Protobuf types use a normal vec, and
converting a smallvec into a vec always causes a heap allocation anyway.

## Summary of changes

Use a normal `Vec` instead of a `SmallVec` in `page_api` domain types.
2025-06-03 12:20:34 +00:00
Vlad Lazar
3b8be98b67 pageserver: remove backtrace in info level log (#12108)
## Problem

We print a backtrace in an info level log every 10 seconds while waiting
for the import data to land in the bucket.

## Summary of changes

The backtrace is not useful. Remove it.
2025-06-03 09:07:07 +00:00
a-masterov
3e72edede5 Use full hostname for ONNX URL (#12064)
## Problem
We should use the full host name for computes, according to
https://github.com/neondatabase/cloud/issues/26005 , but now a truncated
host name is used.
## Summary of changes
The URL for REMOTE_ONNX is rewritten using the FQDN.
2025-06-03 07:23:17 +00:00
Alex Chi Z.
a650f7f5af fix(pageserver): only deserialize reldir key once during get_db_size (#12102)
## Problem

fix https://github.com/neondatabase/neon/issues/12101; this is a quick
hack and we need better API in the future.

In `get_db_size`, we call `get_reldir_size` for every relation. However,
we do the same deserializing the reldir directory thing for every
relation. This creates huge CPU overhead.

## Summary of changes

Get and deserialize the reldir v1 key once and use it across all
get_rel_size requests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-06-03 05:00:34 +00:00
Erik Grinaker
fc3994eb71 pageserver: initial gRPC page service implementation (#12094)
## Problem

We should expose the page service over gRPC.

Requires #12093.
Touches #11728.

## Summary of changes

This patch adds an initial page service implementation over gRPC. It
ties in with the existing `PageServerHandler` request logic, to avoid
the implementations drifting apart for the core read path.

This is just a bare-bones functional implementation. Several important
aspects have been omitted, and will be addressed in follow-up PRs:

* Limited observability: minimal tracing, no logging, limited metrics
and timing, etc.
* Rate limiting will currently block.
* No performance optimization.
* No cancellation handling.
* No tests.

I've only done rudimentary testing of this, but Pagebench passes at
least.
2025-06-02 17:15:18 +00:00
Conrad Ludgate
781bf4945d proxy: optimise future layout allocations (#12104)
A smaller version of #12066 that is somewhat easier to review.

Now that I've been using https://crates.io/crates/top-type-sizes I've
found a lot more of the low hanging fruit that can be tweaks to reduce
the memory usage.

Some context for the optimisations:

Rust's stack allocation in futures is quite naive. Stack variables, even
if moved, often still end up taking space in the future. Rearranging the
order in which variables are defined, and properly scoping them can go a
long way.

`async fn` and `async move {}` have a consequence that they always
duplicate the "upvars" (aka captures). All captures are permanently
allocated in the future, even if moved. We can be mindful when writing
futures to only capture as little as possible.

TlsStream is massive. Needs boxing so it doesn't contribute to the above
issue.

## Measurements from `top-type-sizes`:

### Before

```
10328 {async block@proxy::proxy::task_main::{closure#0}::{closure#0}} align=8
6120 {async fn body of proxy::proxy::handle_client<proxy::protocol2::ChainRW<tokio::net::TcpStream>>()} align=8
```

### After

```
4040 {async block@proxy::proxy::task_main::{closure#0}::{closure#0}}
4704 {async fn body of proxy::proxy::handle_client<proxy::protocol2::ChainRW<tokio::net::TcpStream>>()} align=8
```
2025-06-02 16:13:30 +00:00
Erik Grinaker
a21c1174ed pagebench: add gRPC support for get-page-latest-lsn (#12077)
## Problem

We need gRPC support in Pagebench to benchmark the new gRPC Pageserver
implementation.

Touches #11728.

## Summary of changes

Adds a `Client` trait to make the client transport swappable, and a gRPC
client via a `--protocol grpc` parameter. This must also specify the
connstring with the gRPC port:

```
pagebench get-page-latest-lsn --protocol grpc --page-service-connstring grpc://localhost:51051
```

The client is implemented using the raw Tonic-generated gRPC client, to
minimize client overhead.
2025-06-02 14:50:49 +00:00
Erik Grinaker
8d7ed2a4ee pageserver: add gRPC observability middleware (#12093)
## Problem

The page service logic asserts that a tracing span is present with
tenant/timeline/shard IDs. An initial gRPC page service implementation
thus requires a tracing span.

Touches https://github.com/neondatabase/neon/issues/11728.

## Summary of changes

Adds an `ObservabilityLayer` middleware that generates a tracing span
and decorates it with IDs from the gRPC metadata.

This is a minimal implementation to address the tracing span assertion.
It will be extended with additional observability in later PRs.
2025-06-02 11:46:50 +00:00
Vlad Lazar
5b62749c42 pageserver: reduce import memory utilization (#12086)
## Problem

Imports can end up allocating too much.

## Summary of Changes

Nerf them a bunch and add some logs.
2025-06-02 10:29:15 +00:00
Vlad Lazar
af5bb67f08 pageserver: more reactive wal receiver cancellation (#12076)
## Problem

If the wal receiver is cancelled, there's a 50% chance that it will
ingest yet more WAL.

## Summary of Changes

Always check cancellation first.
2025-06-02 08:59:21 +00:00
Conrad Ludgate
589bfdfd02 proxy: Changes to rate limits and GetEndpointAccessControl caches. (#12048)
Precursor to https://github.com/neondatabase/cloud/issues/28333.

We want per-endpoint configuration for rate limits, which will be
distributed via the `GetEndpointAccessControl` API. This lays some of
the ground work.

1. Allow the endpoint rate limiter to accept a custom leaky bucket
config on check.
2. Remove the unused auth rate limiter, as I don't want to think about
how it fits into this.
3. Refactor the caching of `GetEndpointAccessControl`, as it adds
friction for adding new cached data to the API.

That third one was rather large. I couldn't find any way to split it up.
The core idea is that there's now only 2 cache APIs.
`get_endpoint_access_controls` and `get_role_access_controls`.

I'm pretty sure the behaviour is unchanged, except I did a drive by
change to fix #8989 because it felt harmless. The change in question is
that when a password validation fails, we eagerly expire the role cache
if the role was cached for 5 minutes. This is to allow for edge cases
where a user tries to connect with a reset password, but the cache never
expires the entry due to some redis related quirk (lag, or
misconfiguration, or cplane error)
2025-06-02 08:38:35 +00:00
github-actions[bot]
b147439d6b Proxy release 2025-06-02 06:12 UTC 2025-06-02 06:12:02 +00:00
Conrad Ludgate
87179e26b3 completely rewrite pq_proto (#12085)
libs/pqproto is designed for safekeeper/pageserver with maximum
throughput.

proxy only needs it for handshakes/authentication where throughput is
not a concern but memory efficiency is. For this reason, we switch to
using read_exact and only allocating as much memory as we need to.

All reads return a `&'a [u8]` instead of a `Bytes` because accidental
sharing of bytes can cause fragmentation. Returning the reference
enforces all callers only hold onto the bytes they absolutely need. For
example, before this change, `pqproto` was allocating 8KiB for the
initial read `BytesMut`, and proxy was holding the `Bytes` in the
`StartupMessageParams` for the entire connection through to passthrough.
2025-06-01 18:41:45 +00:00
Shockingly Good
f05df409bd impr(compute): Remove the deprecated CLI arg alias for remote-ext-config. (#12087)
Also moves it from `String` to `Url`.
2025-05-30 17:45:24 +00:00
Alex Chi Z.
f6c0f6c4ec fix(ci): install build tools with --locked (#12083)
## Problem

Release pipeline failing due to some tools cannot be installed.

## Summary of changes

Install with `--locked`.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-30 17:00:41 +00:00
Shockingly Good
62cd3b8d3d fix(compute) Remove the hardcoded default value for PGXN HTTP URL. (#12030)
Removes the hardcoded value for the Postgres Extensions HTTP gateway URL
as it is always provided by the calling code.
2025-05-30 15:26:22 +00:00
Alexander Lakhin
8d26978ed9 Allow known pageserver errors in test_location_conf_churn (#12082)
## Problem
While a pageserver in the unreadable state could not be accessed by
postgres thanks to https://github.com/neondatabase/neon/pull/12059, it
may still receive WAL records and bump into the "layer file download
failed: No file found" error when trying to ingest them.

Closes: https://github.com/neondatabase/neon/issues/11348

## Summary of changes

Allow errors from wal_connection_manager, which are considered expected.
See https://github.com/neondatabase/neon/issues/11348.
2025-05-30 15:20:46 +00:00
Christian Schwarz
35372a8f12 adjust VirtualFile operation latency histogram buckets (#12075)
The expected operating range for the production NVMe drives is
in the range of 50 to 250us.

The bucket boundaries before this PR were not well suited
to reason about the utilization / queuing / latency variability
of those devices.

# Performance

There was some concern about perf impact of having so many buckets,
considering the impl does a linear search on each observe().

I added a benchmark and measured on relevant machines.

In any way, the PR is 40 buckets, so, won't make a meaningful
difference on production machines (im4gn.2xlarge),
going from 30ns -> 35ns.
2025-05-30 13:22:53 +00:00
Vlad Lazar
6d95a3fe2d pageserver: various import flow fixups (#12047)
## Problem

There's a bunch of TODOs in the import code.

## Summary of changes

1. Bound max import byte range to 128MiB. This might still be too high,
given the default job concurrency, but it needs to be balanced with
going back and forth to S3.
2. Prevent unsigned overflow when determining key range splits for
concurrent jobs
3. Use sharded ranges to estimate task size when splitting jobs
4. Bubble up errors that we might hit due to invalid data in the bucket
back to the storage controller.
5. Tweak the import bucket S3 client configuration.
2025-05-30 12:30:11 +00:00
Vlad Lazar
99726495c7 test: allow list overly eager storcon finalization (#12055)
## Problem

I noticed a small percentage of flakes on some import tests.
They were all instances of the storage controller being too eager on the
finalization.

As a refresher: the pageserver notifies the storage controller that it's
done from the import task
and the storage controller has to call back into it in order to finalize
the import. The pageserver
checks that the import task is done before serving that request. Hence,
we can get this race.
In practice, this has no impact since the storage controller will simply
retry.

## Summary of changes

Allow list such cases
2025-05-30 12:14:36 +00:00
Christian Schwarz
4a4a457312 fix(pageserver): frozen->L0 flush failure causes data loss (#12043)
This patch is a fixup for
- https://github.com/neondatabase/neon/pull/6788

Background
----------
That PR 6788  added artificial advancement of `disk_consistent_lsn`
and `remote_consistent_lsn` for shards that weren't written to
while other shards _were_ written to.
See the PR description for more context.

At the time of that PR, Pageservers shards were doing WAL filtering.
Nowadays, the WAL filtering happens in Safekeepers.
Shards learn about the WAL gaps via
`InterpretedWalRecords::next_record_lsn`.

The Bug
-------

That artificial advancement code also runs if the flush failed.
So, we advance the disk_consistent_lsn / remote_consistent_lsn,
without having the corresponding L0 to the `index_part.json`.
The frozen layer remains in the layer map until detach,
so we continue to serve data correctly.
We're not advancing flush loop variable `flushed_to_lsn` either,
so, subsequent flush requests will retry the flush and repair the
situation if they succeed.
But if there aren't any successful retries, eventually the tenant
will be detached and when it is attached somewhere else, the
`index_part.json` and therefore layer map...
1. ... does not contain the frozen layer that failed to flush and
2. ... won't re-ingest that WAL either because walreceiver
starts up with the advanced disk_consistent_lsn/remote_consistent_lsn.

The result is that the read path will have a gap in the reconstruct
data for the keys whose modifications were lost, resulting in
a) either walredo failure
b) or an incorrect page@lsn image if walredo doesn't error.

The Fix
-------

The fix is to only do the artificial advancement if `result.is_ok()`.

Misc
----

As an aside, I took some time to re-review the flush loop and its
callers.
I found one more bug related to error handling that I filed here:
- https://github.com/neondatabase/neon/issues/12025

## Problem

## Summary of changes
2025-05-30 11:22:37 +00:00
John Spray
e78d1e2ec6 tests: tighten readability rules in test_location_conf_churn (#12059)
## Problem

Checking the most recent state of pageservers was insufficient to
evaluate whether another pageserver may read in a particular generation,
since the latest state might mask some earlier AttachedSingle state.

Related: https://github.com/neondatabase/neon/issues/11348

## Summary of changes

- Maintain a history of all attachments
- Write out explicit rules for when a pageserver may read
2025-05-30 11:18:01 +00:00
Alex Chi Z.
af429b4a62 feat(pageserver): observability for feature flags (#12034)
## Problem

Part of #11813. This pull request adds misc observability improvements
for the functionality.

## Summary of changes

* Info span for the PostHog feature background loop.
* New evaluate feature flag API.
* Put the request error into the error message.
* Log when feature flag gets updated.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-30 08:02:25 +00:00
Gleb Novikov
3b4d4eb535 fast_import.rs: log number of jobs for pg_dump/pg_restore (#12068)
## Problem

I have a hypothesis that import might be using lower number of jobs than
max for the VM, where the job is running. This change will help finding
this out from logs

## Summary of changes

Added logging of number of jobs, which is passed into both `pg_dump` and
`pg_restore`
2025-05-29 18:25:42 +00:00
Arpad Müller
f060537a31 Add safekeeper reconciler metrics (#12062)
Adds two metrics to the storcon that are related to the safekeeper
reconciler:

* `storage_controller_safkeeper_reconciles_queued` to indicate currrent
queue depth
* `storage_controller_safkeeper_reconciles_complete` to indicate the
number of complete reconciles

Both metrics operate on a per-safekeeper basis (as reconcilers run on a
per-safekeeper basis too).

These metrics mirror the `storage_controller_pending_reconciles` and
`storage_controller_reconcile_complete` metrics, although those are not
scoped on a per-pageserver basis but are global for the entire storage
controller.

Part of #11670
2025-05-29 14:07:33 +00:00
Vlad Lazar
8a6fc6fd8c pageserver: hook importing timelines up into disk usage eviction (#12038)
## Problem

Disk usage eviction isn't sensitive to layers of imported timelines.

## Summary of changes

Hook importing timelines up into eviction and add a test for it.
I don't think we need any special eviction logic for this. These layers
will all be visible and
their access time will be their creation time. Hence, we'll remove
covered layers first
and get to the imported layers if there's still disk pressure.
2025-05-29 13:01:10 +00:00
Vlad Lazar
51639cd6af pageserver: allow for deletion of importing timelines (#12033)
## Problem

Importing timelines can't currently be deleted. This is problematic
because:
1. Cplane cannot delete failed imports and we leave the timeline behind.
2. The flow does not support user driven cancellation of the import

## Summary of changes

On the pageserver: I've taken the path of least resistance, extended
`TimelineOrOffloaded`
with a new variant and added handling in the right places. I'm open to
thoughts here,
but I think it turned out better than I was envisioning.

On the storage controller: Again, fairly simple business: when a DELETE
timeline request is
received, we remove the import from the DB and stop any finalization
tasks/futures. In order
to stop finalizations, we track them in-memory. For each finalizing
import, we associate a gate
and a cancellation token.

Note that we delete the entry from the database before cancelling any
finalizations. This is such
that a concurrent request can't progress the import into finalize state
and race with the deletion.
This concern about deleting an import with on-going finalization is
theoretical in the near future.
We are only going to delete importing timelines after the storage
controller reports the failure to
cplane. Alas, the design works for user driven cancellation too.

Closes https://github.com/neondatabase/neon/issues/11897
2025-05-29 11:13:52 +00:00
devin-ai-integration[bot]
529d661532 storcon: skip offline nodes in get_top_tenant_shards (#12057)
## Summary

The optimiser background loop could get delayed a lot by waiting for
timeouts trying to talk to offline nodes.

Fixes: #12056

## Solution

- Skip offline nodes in `get_top_tenant_shards`

Link to Devin run:
https://app.devin.ai/sessions/065afd6756734d33bbd4d012428c4b6e
Requested by: John Spray (john@neon.tech)

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: John Spray <john@neon.tech>
2025-05-29 11:07:09 +00:00
Alex Chi Z.
9e4cf52949 pageserver: reduce concurrency for gc-compaction (#12054)
## Problem

Temporarily reduce the concurrency of gc-compaction to 1 job at a time.
We are going to roll out in the largest AWS region next week. Having one
job running at a time makes it easier to identify what tenant causes
problem if it's not running well and pause gc-compaction for that
specific tenant.

(We can make this configurable via pageserver config in the future!)

## Summary of changes

Reduce `CONCURRENT_GC_COMPACTION_TASKS` from 2 to 1.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-29 09:32:19 +00:00
Arpad Müller
831f2a4ba7 Fix flakiness of test_storcon_create_delete_sk_down (#12040)
The `test_storcon_create_delete_sk_down` test is still flaky. This test
addresses two possible causes for flakiness. both causes are related to
deletion racing with `pull_timeline` which hasn't finished yet.

* the first cause is timeline deletion racing with `pull_timeline`:
* the first deletion attempt doesn't contain the line because the
timeline doesn't exist yet
* the subsequent deletion attempts don't contain it either, only a note
that the timeline is already deleted.
* so this patch adds the note that the timeline is already deleted to
the regex
* the second cause is about tenant deletion racing with `pull_timeline`:
* there were no tenant specific tombstones so if a tenant was deleted,
we only added tombstones for the specific timelines being deleted, not
for the tenant itself.
* This patch changes this, so we now have tenant specific tombstones as
well as timeline specific ones, and creation of a timeline checks both.
* we also don't see any retries of the tenant deletion in the logs. once
it's done it's done. so extend the regex to contain the tenant deletion
message as well.

One could wonder why the regex and why not using the API to check
whether the timeline is just "gone". The issue with the API is that it
doesn't allow one to distinguish between "deleted" and "has never
existed", and latter case might race with `pull_timeline`. I.e. the
second case flakiness helped in the discovery of a real bug (no tenant
tombstones), so the more precise check was helpful.

Before, I could easily reproduce 2-9 occurences of flakiness when
running the test with an additional `range(128)` parameter (i.e. 218
times 4 times). With this patch, I ran it three times, not a single
failure.

Fixes #11838
2025-05-28 18:20:38 +00:00
Vlad Lazar
eadabeddb8 pageserver: use the same job size throughout the import lifetime (#12026)
## Problem

Import planning takes a job size limit as its input. Previously, the job
size came from a pageserver config field. This field may change while
imports are in progress. If this happens, plans will no longer be
identical and the import would fail permanently.

## Summary of Changes

Bake the job size into the import progress reported to the storage
controller. For new imports, use the value from the pagesever config,
and, for existing imports, use the value present in the shard progress.

This value is identical for all shards, but we want it to be versioned
since future versions of the planner might split the jobs up
differently. Hence, it ends up in `ShardImportProgress`.

Closes https://github.com/neondatabase/neon/issues/11983
2025-05-28 15:19:41 +00:00
Alex Chi Z.
67ddf1de28 feat(pageserver): create image layers at L0-L1 boundary (#12023)
## Problem

Previous attempt https://github.com/neondatabase/neon/pull/10548 caused
some issues in staging and we reverted it. This is a re-attempt to
address https://github.com/neondatabase/neon/issues/11063.

Currently we create image layers at latest record LSN. We would create
"future image layers" (i.e., image layers with LSN larger than disk
consistent LSN) that need special handling at startup. We also waste a
lot of read operations to reconstruct from L0 layers while we could have
compacted all of the L0 layers and operate on a flat level of historic
layers.

## Summary of changes

* Run repartition at L0-L1 boundary.
* Roll out with feature flags.
* Piggyback a change that downgrades "image layer creating below
gc_cutoff" to debug level.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-28 07:00:52 +00:00
Nikita Kalyanov
541fcd8d2f chore: expose new mark_invisible API in openAPI spec for use in cplane (#12032)
## Problem
There is a new API that I plan to use. We generate client from the spec
so it should be in the spec
## Summary of changes
Document the existing API in openAPI format
2025-05-28 03:39:59 +00:00
Suhas Thalanki
e77961c1c6 background worker that collects installed extensions (#11939)
## Problem

Currently, we collect metrics of what extensions are installed on
computes at start up time. We do not have a mechanism that does this at
runtime.

## Summary of changes

Added a background thread that queries all DBs at regular intervals and
collects a list of installed extensions.
2025-05-27 19:40:51 +00:00
Tristan Partin
cdfa06caad Remove test-images compatibility hack for confirming library load paths (#11927)
This hack was needed for compatiblity tests, but after the compute
release is no longer needed.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-05-27 17:33:16 +00:00
Alex Chi Z.
f0bb93a9c9 feat(pageserver): support evaluate boolean flags (#12024)
## Problem

Part of https://github.com/neondatabase/neon/issues/11813

## Summary of changes

* Support evaluate boolean flags.
* Add docs on how to handle errors.
* Add test cases based on real PostHog config.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-27 14:29:15 +00:00
Vlad Lazar
30adf8e2bd pageserver: add tracing spans for time spent in batch and flushing (#12012)
## Problem

We have some gaps in our traces. This indicates missing spans.

## Summary of changes

This PR adds two new spans:
* WAIT_EXECUTOR: time a batched request spends in the batch waiting to
be picked up
* FLUSH_RESPONSE: time a get page request spends flushing the response
to the compute


![image](https://github.com/user-attachments/assets/41b3ddb8-438d-4375-9da3-da341fc0916a)
2025-05-27 13:57:53 +00:00
Erik Grinaker
5d538a9503 page_api: tweak errors (#12019)
## Problem

The page API gRPC errors need a few tweaks to integrate better with the
GetPage machinery.

Touches https://github.com/neondatabase/neon/issues/11728.

## Summary of changes

* Add `GetPageStatus::InternalError` for internal server errors.
* Rename `GetPageStatus::Invalid` to `InvalidRequest` for clarity.
* Rename `status` and `GetPageStatus` to `status_code` and
`GetPageStatusCode`.
* Add an `Into<tonic::Status>` implementation for `ProtocolError`.
2025-05-27 12:06:51 +00:00
Arpad Müller
f3976e5c60 remove safekeeper_proto_version = 3 from tests (#12020)
Some tests still explicitly specify version 3 of the safekeeper
walproposer protocol. Remove the explicit opt in from the tests as v3 is
the default now since #11518.

We don't touch the places where a test exercises both v2 and v3. Those
we leave for #12021.

Part of https://github.com/neondatabase/neon/issues/10326
2025-05-27 11:32:15 +00:00
Vlad Lazar
9657fbc194 pageserver: add and stabilize import chaos test (#11982)
## Problem

Test coverage of timeline imports is lacking.

## Summary of changes

This PR adds a chaos import test. It runs an import while injecting
various chaos events
in the environment. All the commits that follow the test fix various
issues that were surfaced by it.

Closes https://github.com/neondatabase/neon/issues/10191
2025-05-27 09:52:59 +00:00
a-masterov
dd501554c9 add a script to run the test for online-advisor as a regular user. (#12017)
## Problem
The regression test for the extension online_advisor fails on the
staging instance due to a lack of permission to alter the database.
## Summary of changes
A script was added to work around this problem.

---------

Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>
2025-05-27 08:54:59 +00:00
github-actions[bot]
54433c0839 Proxy release 2025-05-27 06:01 UTC 2025-05-27 06:01:30 +00:00
Tristan Partin
fe1513ca57 Add neon.safekeeper_conninfo_options GUC (#11901)
In order to enable TLS connections between computes and safekeepers, we
need to provide the control plane with a way to configure the various
libpq keyword parameters, sslmode and sslrootcert. neon.safekeepers is a
comma separated list of safekeepers formatted as host:port, so isn't
available for extension in the same way that neon.pageserver_connstring
is. This could be remedied in a future PR.

Part-of: https://github.com/neondatabase/cloud/issues/25823
Link:
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-05-27 02:21:24 +00:00
Arpad Müller
3e86008e66 read-only timelines (#12015)
Support timeline creations on the storage controller to opt out from
their creation on the safekeepers, introducing the read-only timelines
concept. Read only timelines:

* will never receive WAL of their own, so it's fine to not create them
on the safekeepers
* the property is non-transitive. children of read-only timelines aren't
neccessarily read-only themselves.

This feature can be used for snapshots, to prevent the safekeepers from
being overloaded by empty timelines that won't ever get written to. In
the current world, this is not a problem, because timelines are created
implicitly by the compute connecting to a safekeeper that doesn't have
the timeline yet. In the future however, where the storage controller
creates timelines eagerly, we should watch out for that.

We represent read-only timelines in the storage controller database so
that we ensure that they never touch the safekeepers at all. Especially
we don't want them to cause a mess during the importing process of the
timelines from the cplane to the storcon database.

In a hypothetical future where we have a feature to detach timelines
from safekeepers, we'll either need to find a way to distinguish the
two, or if not, asking safekeepers to list the (empty) timeline prefix
and delete everything from it isn't a big issue either.

This patch will unconditionally hit the new safekeeper timeline creation
path for read-only timelines, without them needing the
`--timelines-onto-safekeepers` flag enabled. This is done because it's
lower risk (no safekeepers or computes involved at all) and gives us
some initial way to verify at least some parts of that code in prod.

https://github.com/neondatabase/cloud/issues/29435
https://github.com/neondatabase/neon/issues/11670
2025-05-26 23:23:58 +00:00
Lassi Pölönen
23fc611461 Add metadata to pgaudit log logline (#11933)
Previously we were using project-id/endpoint-id as SYSLOGTAG, which has
a
limit of 32 characters, so the endpoint-id got truncated.

The output is now in RFC5424 format, where the message is json encoded
with additional metadata `endpoint_id` and `project_id`

Also as pgaudit logs multiline messages, we now detect this by parsing
the timestamp in the specific format, and consider non-matching lines to
belong in the previous log message.

Using syslog structured-data would be an alternative, but leaning
towards json
due to being somewhat more generic.
2025-05-26 14:57:09 +00:00
Alex Chi Z.
dc953de85d feat(pageserver): integrate PostHog with gc-compaction rollout (#11917)
## Problem

part of https://github.com/neondatabase/neon/issues/11813

## Summary of changes

* Integrate feature store with tenant structure.
* gc-compaction picks up the current strategy from the feature store.
* We only log them for now for testing purpose. They will not be used
until we have more patches to support different strategies defined in
PostHog.
* We don't support property-based evaulation for now; it will be
implemented later.
* Evaluating result of the feature flag is not cached -- it's not
efficient and cannot be used on hot path right now.
* We don't report the evaluation result back to PostHog right now.

I plan to enable it in staging once we get the patch merged.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-26 13:09:37 +00:00
Alex Chi Z.
841517ee37 fix(pageserver): do not increase basebackup err counter when reconnect (#12016)
## Problem

We see unexpected basebackup error alerts in the alert channel.

https://github.com/neondatabase/neon/pull/11778 only fixed the alerts
for shutdown errors. However, another path is that tenant shutting down
while waiting LSN -> WaitLsnError::BadState -> QueryError::Reconnect.
Therefore, the reconnect error should also be discarded from the
ok/error counter.

## Summary of changes

Do not increase ok/err counter for reconnect errors.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-26 11:31:27 +00:00
a-masterov
1369d73dcd Add h3 to neon-extensions-test (#11946)
## Problem
We didn't test the h3 extension in our test suite.

## Summary of changes
Added tests for h3 and h3-postgis extensions
Includes upgrade test for h3

---------

Co-authored-by: Tristan Partin <tristan@neon.tech>
2025-05-26 11:29:39 +00:00
Erik Grinaker
7cd0defaf0 page_api: add Rust domain types (#11999)
## Problem

For the gRPC Pageserver API, we should convert the Protobuf types to
stricter, canonical Rust types.

Touches https://github.com/neondatabase/neon/issues/11728.

## Summary of changes

Adds Rust domain types that mirror the Protobuf types, with conversion
and validation.
2025-05-26 11:01:36 +00:00
Erik Grinaker
a082f9814a pageserver: add gRPC authentication (#12010)
## Problem

We need authentication for the gRPC server.

Requires #11972.
Touches #11728.

## Summary of changes

Add two request interceptors that decode the tenant/timeline/shard
metadata and authenticate the JWT token against them.
2025-05-26 10:24:45 +00:00
Erik Grinaker
ec991877f4 pageserver: add gRPC server (#11972)
## Problem

We want to expose the page service over gRPC, for use with the
communicator.

Requires #11995.
Touches #11728.

## Summary of changes

This patch wires up a gRPC server in the Pageserver, using Tonic. It
does not yet implement the actual page service.

* Adds `listen_grpc_addr` and `grpc_auth_type` config options (disabled
by default).
* Enables gRPC by default with `neon_local`.
* Stub implementation of `page_api.PageService`, returning unimplemented
errors.
* gRPC reflection service for use with e.g. `grpcurl`.

Subsequent PRs will implement the actual page service, including
authentication and observability.

Notably, TLS support is not yet implemented. Certificate reloading
requires us to reimplement the entire Tonic gRPC server.
2025-05-26 08:27:48 +00:00
Tristan Partin
abc6c84262 Update sql_exporter to 0.17.3 (#12013)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-05-23 20:16:13 +00:00
Conrad Ludgate
6768a71c86 proxy(tokio-postgres): refactor typeinfo query to occur earlier (#11993)
## Problem

For #11992 I realised we need to get the type info before executing the
query. This is important to know how to decode rows with custom types,
eg the following query:

```sql
CREATE TYPE foo AS ENUM ('foo','bar','baz');
SELECT ARRAY['foo'::foo, 'bar'::foo, 'baz'::foo] AS data;
```

Getting that to work was harder that it seems. The original
tokio-postgres setup has a split between `Client` and `Connection`,
where messages are passed between. Because multiple clients were
supported, each client message included a dedicated response channel.
Each request would be terminated by the `ReadyForQuery` message.

The flow I opted to use for parsing types early would not trigger a
`ReadyForQuery`. The flow is as follows:

```
PARSE ""    // parse the user provided query
DESCRIBE "" // describe the query, returning param/result type oids
FLUSH       // force postgres to flush the responses early

// wait for descriptions

  // check if we know the types, if we don't then
  // setup the typeinfo query and execute it against each OID:

  PARSE typeinfo    // prepare our typeinfo query
  DESCRIBE typeinfo
  FLUSH // force postgres to flush the responses early

  // wait for typeinfo statement

    // for each OID we don't know:
    BIND typeinfo
    EXECUTE
    FLUSH

    // wait for type info, might reveal more OIDs to inspect

  // close the typeinfo query, we cache the OID->type map and this is kinder to pgbouncer.
  CLOSE typeinfo 

// finally once we know all the OIDs:
BIND ""   // bind the user provided query - already parsed - to the user provided params
EXECUTE   // run the user provided query
SYNC      // commit the transaction
```

## Summary of changes

Please review commit by commit. The main challenge was allowing one
query to issue multiple sub-queries. To do this I first made sure that
the client could fully own the connection, which required removing any
shared client state. I then had to replace the way responses are sent to
the client, by using only a single permanent channel. This required some
additional effort to track which query is being processed. Lastly I had
to modify the query/typeinfo functions to not issue `sync` commands, so
it would fit into the desired flow above.

To note: the flow above does force an extra roundtrip into each query. I
don't know yet if this has a measurable latency overhead.
2025-05-23 19:41:12 +00:00
Peter Bendel
87fc0a0374 periodic pagebench on hetzner runners (#11963)
## Problem

- Benchmark periodic pagebench had inconsistent benchmarking results
even when run with the same commit hash.
Hypothesis is this was due to running on dedicated but virtualized EC
instance with varying CPU frequency.

- the dedicated instance type used for the benchmark is quite "old" and
we increasingly get `An error occurred (InsufficientInstanceCapacity)
when calling the StartInstances operation (reached max retries: 2):
Insufficient capacity.`

- periodic pagebench uses a snapshot of pageserver timelines to have the
same layer structure in each run and get consistent performance.
Re-creating the snapshot was a painful manual process (see
https://github.com/neondatabase/cloud/issues/27051 and
https://github.com/neondatabase/cloud/issues/27653)

## Summary of changes

- Run the periodic pagebench on a custom hetzner GitHub runner with
large nvme disk and governor set to defined perf profile
- provide a manual dispatch option for the workflow that allows to
create a new snapshot
- keep the manual dispatch option to specify a commit hash useful for
bi-secting regressions
- always use the newest created snapshot (S3 bucket uses date suffix in
S3 key, example
`s3://neon-github-public-dev/performance/pagebench/shared-snapshots-2025-05-17/`
- `--ignore`
`test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py`
in regular benchmarks run for each commit
- improve perf copying snapshot by using `cp` subprocess instead of
traversing tree in python


## Example runs with code in this PR:
- run which creates new snapshot
https://github.com/neondatabase/neon/actions/runs/15083408849/job/42402986376#step:19:55
- run which uses latest snapshot
-
https://github.com/neondatabase/neon/actions/runs/15084907676/job/42406240745#step:11:65
2025-05-23 09:37:19 +00:00
Erik Grinaker
06ce704041 Cargo.toml: upgrade Tonic to 0.13.1 (#11995)
## Problem

We're about to implement a gRPC interface for Pageserver. Let's upgrade
Tonic first, to avoid a more painful migration later. It's currently
only used by storage-broker.

Touches #11728.

## Summary of changes

Upgrade Tonic 0.12.3 → 0.13.1. Also opportunistically upgrade Prost
0.13.3 → 0.13.5. This transitively pulls in Indexmap 2.0.1 → 2.9.0, but
it doesn't appear to be used in any particularly critical code paths.
2025-05-23 08:57:35 +00:00
Konstantin Knizhnik
d5023f2b89 Restrict pump prefetch state only to regular backends (#12000)
## Problem

See https://github.com/neondatabase/neon/issues/11997

This guard prevents race condition with pump prefetch state (initiated
by timeout).
Assert checks that prefetching is also done under guard.
But prewarm knows nothing about it.

## Summary of changes

Pump prefetch state only in regular backends.
Prewarming is done by background workers now.
Also it seems to have not sense to pump prefetch state in any other
background workers: parallel executors, vacuum,... because they are
short living and can not leave unconsumed responses in socket.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-23 08:48:06 +00:00
Konstantin Knizhnik
8ff25dca8e Add online_advisor extension (#11898)
## Problem

Detect problems with Postgres optimiser: lack of indexes and statistics

## Summary of changes

https://github.com/knizhnik/online_advisor

Add online_advistor extension to docker image

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-23 05:08:32 +00:00
Alexey Kondratov
cf81330fbc fix(compute_ctl): Wait for rsyslog longer and with backoff (#12002)
## Problem

https://github.com/neondatabase/neon/pull/11988 waits only for max
~200ms, so we still see failures, which self-resolve after several
operation retries.

## Summary of changes

Change it to waiting for at least 5 seconds, starting with 2 ms sleep
between iterations and x2 sleep on each next iteration. It could be that
it's not a problem with a slow `rsyslog` start, but a longer wait won't
hurt. If it won't start, we should debug why `inittab` doesn't start it,
or maybe there is another problem.
2025-05-22 19:15:05 +00:00
Anastasia Lubennikova
e69ae739ff fix(compute_ctl): fix rsyslogd restart race. (#11988)
Add retry loop around waiting for rsyslog start

## Problem

## Summary of changes

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-22 15:20:50 +00:00
Dmitrii Kovalkov
136eaeb74a pageserver: basebackup cache (hackathon project) (#11989)
## Problem
Basebackup cache is on the hot path of compute startup and is generated
on every request (may be slow).

- Issue: https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Add `BasebackupCache` which stores basebackups on local disk.
- Basebackup prepare requests are triggered by
`XLOG_CHECKPOINT_SHUTDOWN` records in the log.
- Limit the size of the cache by number of entries.
- Add `basebackup_cache_enabled` feature flag to TenantConfig.
- Write tests for the cache

## Not implemented yet
- Limit the size of the cache by total size in bytes

---------

Co-authored-by: Aleksandr Sarantsev <aleksandr@neon.tech>
2025-05-22 12:45:00 +00:00
Erik Grinaker
211b824d62 pageserver: add branch-local consumption metrics (#11852)
## Problem

For billing, we'd like per-branch consumption metrics.

Requires https://github.com/neondatabase/neon/pull/11984.
Resolves https://github.com/neondatabase/cloud/issues/28155.

## Summary of changes

This patch adds two new consumption metrics:

* `written_size_since_parent`: `written_size - ancestor_lsn`
* `pitr_history_size_since_parent`: `written_size - max(pitr_cutoff,
ancestor_lsn)`

Note that `pitr_history_size_since_parent` will not be emitted until the
PITR cutoff has been computed, and may or may not increase ~immediately
when a user increases their PITR window (depending on how much history
we have available and whether the tenant is restarted/migrated).
2025-05-22 12:26:32 +00:00
Peter Bendel
f9fdbc9618 remove auth_endpoint password from log and command line for local proxy mode (#11991)
## Problem

When testing local proxy the auth-endpoint password shows up in command
line and log

```bash
RUST_LOG=proxy LOGFMT=text cargo run --release --package proxy --bin proxy --features testing -- \
  --auth-backend postgres \
  --auth-endpoint 'postgresql://postgres:secret_password@127.0.0.1:5432/postgres' \
  --tls-cert server.crt \
  --tls-key server.key \
  --wss 0.0.0.0:4444
```

## Summary of changes

- Allow to set env variable PGPASSWORD
- fall back to use PGPASSWORD env variable when auth-endpoint does not
contain password
- remove auth-endpoint password from logs in `--features testing` mode

Example

```bash
export PGPASSWORD=secret_password

RUST_LOG=proxy LOGFMT=text cargo run --package proxy --bin proxy --features testing -- \
  --auth-backend postgres \
  --auth-endpoint 'postgresql://postgres@127.0.0.1:5432/postgres' \
  --tls-cert server.crt \
  --tls-key server.key \
  --wss 0.0.0.0:4444 
```
2025-05-21 20:26:05 +00:00
Erik Grinaker
95a5f749c8 pageserver: use an Option for GcCutoffs::time (#11984)
## Problem

It is not currently possible to disambiguate a timeline with an
uninitialized PITR cutoff from one that was created within the PITR
window -- both of these have `GcCutoffs::time == Lsn(0)`. For billing
metrics, we need to disambiguate these to avoid accidentally billing the
entire history when a tenant is initially loaded.

Touches https://github.com/neondatabase/cloud/issues/28155.

## Summary of changes

Make `GcCutoffs::time` an `Option<Lsn>`, and only set it to `Some` when
initialized. A `pitr_interval` of 0 will yield `Some(last_record_lsn)`.

This PR takes a conservative approach, and mostly retains the old
behavior of consumers by using `unwrap_or_default()` to yield 0 when
uninitialized, to avoid accidentally introducing bugs -- except in cases
where there is high confidence that the change is beneficial (e.g. for
the `pageserver_pitr_history_size` Prometheus metric and to return early
during GC).
2025-05-21 15:42:11 +00:00
Konstantin Merenkov
5db20af8a7 Keep the conn info cache on max_client_conn from pgbouncer (#11986)
## Problem
Hitting max_client_conn from pgbouncer would lead to invalidation of the
conn info cache.
Customers would hit the limit on wake_compute.

## Summary of changes
`should_retry_wake_compute` detects this specific error from pgbouncer
as non-retriable,
meaning we won't try to wake up the compute again.
2025-05-21 15:27:30 +00:00
Arpad Müller
136cf1979b Add metric for number of offloaded timelines (#11976)
We want to keep track of the number of offloaded timelines. It's a
per-tenant shard metric because each shard makes offloading decisions on
its own.
2025-05-21 11:28:22 +00:00
Vlad Lazar
08bb72e516 pageserver: allow in-mem reads to be planned during writes (#11937)
## Problem

Get page tracing revealed situations where planning an in-memory layer
is taking around 150ms. Upon investigation, the culprit is the inner
in-mem layer file lock. A batch being written holds the write lock and a
read being planned wants the read lock. See [this
trace](https://neonprod.grafana.net/explore?schemaVersion=1&panes=%7B%22j61%22:%7B%22datasource%22:%22JMfY_5TVz%22,%22queries%22:%5B%7B%22refId%22:%22traceId%22,%22queryType%22:%22traceql%22,%22query%22:%22412ec4522fe1750798aca54aec2680ac%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22JMfY_5TVz%22%7D,%22limit%22:20,%22tableType%22:%22traces%22,%22metricsQueryType%22:%22range%22%7D%5D,%22range%22:%7B%22to%22:%221746702606349%22,%22from%22:%221746681006349%22%7D,%22panelsState%22:%7B%22trace%22:%7B%22spanId%22:%2291e9f1879c9bccc0%22%7D%7D%7D,%226d0%22:%7B%22datasource%22:%22JMfY_5TVz%22,%22queries%22:%5B%7B%22refId%22:%22traceId%22,%22queryType%22:%22traceql%22,%22query%22:%2220a4757706b16af0e1fbab83f9d2e925%22,%22datasource%22:%7B%22type%22:%22tempo%22,%22uid%22:%22JMfY_5TVz%22%7D,%22limit%22:20,%22tableType%22:%22traces%22,%22metricsQueryType%22:%22range%22%7D%5D,%22range%22:%7B%22to%22:%221746702614807%22,%22from%22:%221746681014807%22%7D,%22panelsState%22:%7B%22trace%22:%7B%22spanId%22:%2260e7825512bc2a6b%22%7D%7D%7D%7D)
for example.

## Summary of changes

Lift the index into its own RwLock such that we can at least plan during
write IO.

I tried to be smarter in
https://github.com/neondatabase/neon/pull/11866: arc swap + structurally
shared datastructure
and that killed ingest perf for small keys.

## Benchmarking

* No statistically significant difference for rust inget benchmarks when
compared to main.
2025-05-21 11:08:49 +00:00
Alexander Sarantcev
6f4f3691a5 pageserver: Add tracing endpoint correctness check in config validation (#11970)
## Problem

When using an incorrect endpoint string - `"localhost:4317"`, it's a
runtime error, but it can be a config error
- Closes: https://github.com/neondatabase/neon/issues/11394

## Summary of changes

Add config parse time check via `request::Url::parse` validation.

---------

Co-authored-by: Aleksandr Sarantsev <ephemeralsad@gmail.com>
2025-05-21 09:03:26 +00:00
dependabot[bot]
a2b756843e chore(deps): bump setuptools from 70.0.0 to 78.1.1 in the pip group across 1 directory (#11977)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-20 23:00:49 +00:00
Conrad Ludgate
f3c9d0adf4 proxy(logging): significant changes to json logging internals for performance. (#11974)
#11962 

Please review each commit separately.

Each commit is rather small in goal. The overall goal of this PR is to
keep the behaviour identical, but shave away small inefficiencies here
and there.
2025-05-20 17:57:59 +00:00
Konstantin Knizhnik
2e3dc9a8c2 Add rel_size_replica_cache (#11889)
## Problem

See 
Discussion:
https://neondb.slack.com/archives/C033RQ5SPDH/p1746645666075799
Issue: https://github.com/neondatabase/cloud/issues/28609

Relation size cache is not correctly updated at PS in case of replicas.

## Summary of changes

1. Have two caches for relation size in timeline:
`rel_size_primary_cache` and `rel_size_replica_cache`.
2. `rel_size_primary_cache` is actually what we have now. The only
difference is that it is not updated in `get_rel_size`, only by WAL
ingestion
3. `rel_size_replica_cache` has limited size (LruCache) and it's key is
`(Lsn,RelTag)` . It is updated in `get_rel_size`. Only strict LSN
matches are accepted as cache hit.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-20 15:38:27 +00:00
Konstantin Merenkov
568779fa8a proxy/scram: avoid memory copy to improve performance (#11980)
Touches #11941

## Problem
Performance of our PBKDF2 was worse than reference.

## Summary of changes
Avoided memory copy when HMACing in a tight loop.
2025-05-20 15:23:54 +00:00
Alexey Kondratov
e94acbc816 fix(compute_ctl): Dollar escaping and tests (#11969)
## Problem

In the escaping path we were checking that `${tag}$` or `${outer_tag}$`
are present in the string, but that's not enough, as original string
surrounded by `$` can also form a 'tag', like `$x$xx$x$`, which is fine
on it's own, but cannot be used in the string escaped with `$xx$`.

## Summary of changes

Remove `$` from the checks, just check if `{tag}` or `{outer_tag}` are
present. Add more test cases and change the catalog test to stress the
`drop_subscriptions_before_start: true` path as well.

Fixes https://github.com/neondatabase/cloud/issues/29198
2025-05-20 09:03:36 +00:00
github-actions[bot]
40bb9ff62a Proxy release 2025-05-20 06:01 UTC 2025-05-20 06:01:25 +00:00
Erik Grinaker
f4150614d0 pageserver: don't pass config to PageHandler (#11973)
## Problem

The gRPC page service API will require decoupling the `PageHandler` from
the libpq protocol implementation. As preparation for this, avoid
passing in the entire server config to `PageHandler`, and instead
explicitly pass in the relevant fields.

Touches https://github.com/neondatabase/neon/issues/11728.

## Summary of changes

* Change `PageHandler` to take a `GetVectoredConcurrentIo` instead of
the entire config.
* Change `IoConcurrency::spawn_from_conf` to take a
`GetVectoredConcurrentIo`.
2025-05-19 15:47:40 +00:00
Erik Grinaker
38dbc5f67f pageserver/page_api: add binary Protobuf descriptor (#11968)
## Problem

A binary Protobuf schema descriptor can be used to expose an API
reflection service, which in turn allows convenient usage of e.g.
`grpcurl` against the gRPC server.

Touches #11728.

## Summary of changes

* Generate a binary schema descriptor as
`pageserver_page_api::proto::FILE_DESCRIPTOR_SET`.
* Opportunistically rename the Protobuf package from `page_service` to
`page_api`.
2025-05-19 11:17:45 +00:00
Folke Behrens
3685ad606d endpoint_storage: Fix metrics test by excluding assertion on macos (#11952) 2025-05-19 10:56:03 +00:00
Ivan Efremov
76a7d37f7e proxy: Drop cancellation ops if they don't fit into the queue (#11950)
Add a redis ops batch size argument for proxy and remove timeouts by
using try_send()
2025-05-19 10:10:55 +00:00
Erik Grinaker
cdb6479c8a pageserver: add gRPC page service schema (#11815)
## Problem

For the [communicator
project](https://github.com/neondatabase/company_projects/issues/352),
we want to move to gRPC for the page service protocol.

Touches #11728.

## Summary of changes

This patch adds an experimental gRPC Protobuf schema for the page
service. It is equivalent to the current page service, but with several
improvements, e.g.:

* Connection multiplexing.
* Reduced head-of-line blocking.
* Client-side batching.
* Explicit tenant shard routing.
* GetPage request classification (normal vs. prefetch).
* Explicit rate limiting ("slow down" response status).

The API is exposed as a new `pageserver/page_api` package. This is
separate from the `pageserver_api` package to reduce the dependency
footprint for the communicator. The longer-term plan is to also split
out e.g. the WAL ingestion service to a separate gRPC package, e.g.
`pageserver/wal_api`.

Subsequent PRs will: add Rust domain types for the Protobuf types,
expose a gRPC server, and implement the page service.

Preliminary prototype benchmarks of this gRPC API is within 10% of
baseline libpq performance. We'll do further benchmarking and
optimization as the implementation lands in `main` and is deployed to
staging.
2025-05-19 09:03:06 +00:00
Konstantin Knizhnik
81c557d87e Unlogged build get smgr (#11954)
## Problem

See https://github.com/neondatabase/neon/issues/11910
and https://neondb.slack.com/archives/C04DGM6SMTM/p1747314649059129

## Summary of changes

Do not change persistence in `start_unlogged_build`

Postgres PRs:
https://github.com/neondatabase/postgres/pull/642
https://github.com/neondatabase/postgres/pull/641
https://github.com/neondatabase/postgres/pull/640
https://github.com/neondatabase/postgres/pull/639

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-18 05:02:47 +00:00
Trung Dinh
e963129678 pagesteam_handle_batched_message -> pagestream_handle_batched_message (#11916)
## Problem
Found a typo in code.

## Summary of changes

Co-authored-by: Trung Dinh <tdinh@roblox.com>
Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-05-17 22:30:29 +00:00
dependabot[bot]
4f0a9fc569 chore(deps): bump flask-cors from 5.0.0 to 6.0.0 in the pip group across 1 directory (#11960)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-17 22:06:32 +00:00
Emmanuel Ferdman
81c6a5a796 Migrate to correct logger interface (#11956)
## Problem
Currently the `logger` library throws annoying deprecation warnings:
```python
DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
```

## Summary of changes
This small PR resolves the annoying deprecation warnings by migrating to
`.warning` as suggested.

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-17 21:12:01 +00:00
Konstantin Knizhnik
8e05639dbf Invalidate LFC after unlogged build (#11951)
## Problem


See https://neondb.slack.com/archives/C04DGM6SMTM/p1747391617951239

LFC is not always properly updated during unlogged build so it can
contain stale content.

## Summary of changes

Invalidate LFC content at the end of unlogged build

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-17 19:06:59 +00:00
Alexander Bayandin
deed46015d CI(test-images): increase timeout from 20m to 60m (#11955)
## Problem

For some reason (unknown yet) 20m timeout is not enough for
`test-images` job on arm runners.
Ref:
https://github.com/neondatabase/neon/actions/runs/15075321681/job/42387530399?pr=11953

## Summary of changes
- Increase the timeout from 20m to 1h
2025-05-17 06:34:54 +00:00
Heikki Linnakangas
532d9b646e Add simple facility for an extendable shared memory area (#11929)
You still need to provide a max size up-front, but memory is only
allocated for the portion that is in use.

The module is currently unused, but will be used by the new compute
communicator project, in the neon Postgres extension. See
https://github.com/neondatabase/neon/issues/11729

---------

Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-05-16 21:22:36 +00:00
Heikki Linnakangas
55f91cf10b Update 'nix' package (#11948)
There were some incompatible changes. Most churn was from switching from
the now-deprecated fcntl:flock() function to
fcntl::Flock::lock(). The new function returns a guard object, while
with the old function, the lock was associated directly with the file
descriptor.

It's good to stay up-to-date in general, but the impetus to do this now
is that in https://github.com/neondatabase/neon/pull/11929, I want to
use some functions that were added only in the latest version of 'nix',
and it's nice to not have to build multiple versions. (Although,
different versions of 'nix' are still pulled in as indirect dependencies
from other packages)
2025-05-16 14:45:08 +00:00
Folke Behrens
baafcc5d41 proxy: Fix misspelled flag value alias, swap names and aliases (#11949)
## Problem

There's a misspelled flag value alias that's not really used anywhere.

## Summary of changes

Fix the alias and make aliases the official flag values and keep old
values as aliases.
Also rename enum variant. No need for it to carry the version now.
2025-05-16 14:12:39 +00:00
Evan Fleming
aa22572d8c safekeeper: refactor static remote storage usage to use Arc (#10179)
Greetings! Please add `w=1` to github url when viewing diff
(sepcifically `wal_backup.rs`)

## Problem

This PR is aimed at addressing the remaining work of #8200. Namely,
removing static usage of remote storage in favour of arc. I did not opt
to pass `Arc<RemoteStorage>` directly since it is actually
`Optional<RemoteStorage>` as it is not necessarily always configured. I
wanted to avoid having to pass `Arc<Optional<RemoteStorage>>` everywhere
with individual consuming functions likely needing to handle unwrapping.

Instead I've added a `WalBackup` struct that holds
`Optional<RemoteStorage>` and handles initialization/unwrapping
RemoteStorage internally. wal_backup functions now take self and
`Arc<WalBackup>` is passed as a dependency through the various consumers
that need it.

## Summary of changes
- Add `WalBackup` that holds `Optional<RemoteStorage>` and handles
initialization and unwrapping
- Modify wal_backup functions to take `WalBackup` as self (Add `w=1` to
github url when viewing diff here)
- Initialize `WalBackup` in safekeeper root
- Store `Arc<WalBackup>` in `GlobalTimelineMap` and pass and store in
each Timeline as loaded
- use `WalBackup` through Timeline as needed

## Refs

- task to remove global variables
https://github.com/neondatabase/neon/issues/8200
- drive-by fixes https://github.com/neondatabase/neon/issues/11501 
by turning the panic reported there into an error `remote storage not
configured`

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-05-16 12:41:10 +00:00
Arpad Müller
2d247375b3 Update rust to 1.87.0 (#11938)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

The 1.87.0 release marks 10 years of Rust.

[Announcement blog
post](https://blog.rust-lang.org/2025/05/15/Rust-1.87.0/)

Prior update was in #11431
2025-05-16 12:21:24 +00:00
Christian Schwarz
a7ce323949 benchmarking: extend test_page_service_batching.py to cover concurrent IO + batching under random reads (#10466)
This PR commits the benchmarks I ran to qualify concurrent IO before we
released it.

Changes:
- Add `l0stack` fixture; a reusable abstraction for creating a stack of
L0 deltas
  each of which has 1 Value::Delta per page.
- Such a stack of L0 deltas is a good and understandable demo for
concurrent IO
because to reconstruct any page, $layer_stack_height` Values need to be
read.
  Before concurrent IO, the reads were sequential.
  With concurrent IO, they are executed concurrently.
- So, switch `test_latency` to use the l0stack.
- Teach `pagebench`, which is used by `test_latency`, to limit itself to
the blocks of the relation created by the l0stack abstraction.
- Additional parametrization of `test_latency` over dimensions
`ps_io_concurrency,l0_stack_height,queue_depth`
- Use better names for the tests to reflect what they do, leave
interpretation of the (now quite high-dimensional) results to the reader
  - `test_{throughput => postgres_seqscan}`
  - `test_{latency => random_reads}`
- Cut down on permutations to those we use in production. Runtime is
about 2min.

Refs
- concurrent IO epic https://github.com/neondatabase/neon/issues/9378 
- batching task: fixes https://github.com/neondatabase/neon/issues/9837

---------

Co-authored-by: Peter Bendel <peterbendel@neon.tech>
2025-05-15 17:48:13 +00:00
Vlad Lazar
31026d5a3c pageserver: support import schema evolution (#11935)
## Problem

Imports don't support schema evolution nicely. If we want to change the
stuff we keep in storcon,
we'd have to carry the old cruft around.

## Summary of changes

Version import progress. Note that the import progress version
determines the version of the import
job split and execution. This means that we can also use it as a
mechanism for deploying new import
implementations in the future.
2025-05-15 16:13:15 +00:00
Vlad Lazar
2621ce2daf pageserver: checkpoint import progress in the storage controller (#11862)
## Problem

Timeline imports do not have progress checkpointing. Any time that the
tenant is shut-down, all progress is lost
and the import restarts from the beginning when the tenant is
re-attached.

## Summary of changes

This PR adds progress checkpointing.


### Preliminaries

The **unit of work** is a `ChunkProcessingJob`. Each
`ChunkProcessingJob` deals with the import for a set of key ranges. The
job split is done by using an estimation of how many pages each job will
produce.

The planning stage must be **pure**: given a fixed set of contents in
the import bucket, it will always yield the same plan. This property is
enforced by checking that the hash of the plan is identical when
resuming from a checkpoint.

The storage controller tracks the progress of each shard in the import
in the database in the form of the **latest
job** that has has completed.

### Flow

This is the high level flow for the happy path:
1. On the first run of the import task, the import task queries storcon
for the progress and sees that none is recorded.
2. Execute the preparatory stage of the import
3. Import jobs start running concurrently in a `FuturesOrdered`. Every
time the checkpointing threshold of jobs has been reached, notify the
storage controller.
4. Tenant is detached and re-attached
5. Import task starts up again and gets the latest progress checkpoint
from the storage controller in the form of a job index.
6. The plan is computed again and we check that the hash matches with
the original plan.
7. Jobs are spawned from where the previous import task left off. Note
that we will not report progress after the completion of each job, so
some jobs might run twice.

Closes https://github.com/neondatabase/neon/issues/11568
Closes https://github.com/neondatabase/neon/issues/11664
2025-05-15 13:18:22 +00:00
Vlad Lazar
a703cd342b storage_controller: enforce generations in import upcalls (#11900)
## Problem

Import up-calls did not enforce the usage of the latest generation. The
import might have finished in one previous generation, but not in the
latest one. Hence, the controller might try to activate a timeline
before it is ready. In theory, that would be fine, but it's tricky to
reason about.

## Summary of Changes

Pageserver provides the current generation in the upcall to the storage
controller and the later validates the generation. If the generation is
stale, we return an error which stops progress of the import job. Note
that the import job will retry the upcall until the stale location is
detached.

I'll add some proper tests for this as part of the [checkpointing
PR](https://github.com/neondatabase/neon/pull/11862).

Closes https://github.com/neondatabase/neon/issues/11884
2025-05-15 10:02:11 +00:00
Alexander Bayandin
42e4cf18c9 CI(neon_extra_builds): fix workflow syntax (#11932)
## Problem

```
Error when evaluating 'strategy' for job 'build-pgxn'. neondatabase/neon/.github/workflows/build-macos.yml@7907a9e2bf898f3d22b98d9d4d2c6ffc4d480fc3 (Line: 45, Col: 27): Matrix vector 'postgres-version' does not contain any values
```
See
https://github.com/neondatabase/neon/actions/runs/15039594216/job/42268015127?pr=11929

## Summary of changes
- Fix typo: `.chnages` -> `.changes`
- Ensure JSON is JSON by moving step output to env variable
2025-05-15 09:53:59 +00:00
Alex Chi Z.
9e5a41a342 fix(scrubber): remote_storage error causes layers to be deleted as orphans (#11924)
## Problem

close https://github.com/neondatabase/neon/issues/11159 ; we get
occasional wrong deletions of layer files being used and errors in
staging. This patch fixed it.

Example errors:

```
Timeline metadata errors: ["index_part.json contains a layer .... (shard 0000) that is not present in remote storage (layer_is_l0: false) with error: Failed to download a remote file: s3 head object\n\nCaused by:\n    0: dispatch failure\n    1: timeout\n    2: error trying to connect: HTTP connect timeout occurred after 3.1s\n
```

This error should not be fired because the file could exist, but we
cannot know if it exists due to head request failure.

## Summary of changes

Only generate cannot find layer errors when the head_object return type
is `NotFound`.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-15 07:02:16 +00:00
Konstantin Knizhnik
48b870bc07 Use unlogged build in GIST for storing root page (#11892)
## Problem

See https://github.com/neondatabase/neon/issues/11891

Newly added assert is first when root page of GIST index is written to
the disk as part of sorted build.

## Summary of changes

Wrap writing of root page in unlogged build.

https://github.com/neondatabase/postgres/pull/632
https://github.com/neondatabase/postgres/pull/633
https://github.com/neondatabase/postgres/pull/634

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-15 04:45:22 +00:00
Christian Schwarz
32a12783fd pageserver: batching & concurrent IO: update binary-built-in defaults; reduce CI matrix (#11923)
Use the current production config for batching & concurrent IO.

Remove the permutation testing for unit tests from CI.
(The pageserver unit test matrix takes ~10min for debug builds).

Drive-by-fix use of `if cfg!(test)` inside crate `pageserver_api`.
It is ineffective for early-enabling new defaults for pageserver unit
tests only.
The reason is that the `test` cfg is only set for the crate under test
but not its dependencies.
So, `cargo test -p pageserver` will build `pageserver_api` with
`cfg!(test) == false`.
Resort to checking for feature flag `testing` instead, since all our
unit tests are run with `--feature testing`.

refs
- `scattered-lsn` batching has been implemented and rolled out in all
envs, cf https://github.com/neondatabase/neon/issues/10765
- preliminary for https://github.com/neondatabase/neon/pull/10466
- epic https://github.com/neondatabase/neon/issues/9377
- epic https://github.com/neondatabase/neon/issues/9378
- drive-by fix
https://neondb.slack.com/archives/C0277TKAJCA/p1746821515504219
2025-05-14 16:30:21 +00:00
a-masterov
68120cfa31 Fix Cloud Extensions Regression (#11907)
## Problem
The regression test on extensions relied on the admin API to set the
default endpoint settings, which is not stable and requires admin
privileges. Specifically:
- The workflow was using `default_endpoint_settings` to configure
necessary PostgreSQL settings like `DateStyle`, `TimeZone`, and
`neon.allow_unstable_extensions`
- This approach was failing because the API endpoint for setting
`default_endpoint_settings` was changed (referenced in a comment as
issue #27108)
- The admin API requires special privileges.
## Summary of changes
We get rid of the admin API dependency and use ALTER DATABASE statements
instead:
**Removed the default_endpoint_settings mechanism:**
- Removed the default_endpoint_settings input parameter from the
neon-project-create action
- Removed the API call that was attempting to set these settings at the
project level
- Completely removed the default_endpoint_settings configuration from
the cloud-extensions workflow
**Added database-level settings:**
- Created a new `alter_db.sh` script that applies the same settings
directly to each test database
- Modified all extension test scripts to call this script after database
creation
2025-05-14 13:19:53 +00:00
github-actions[bot]
4688b815b1 Proxy release 2025-05-14 09:39 UTC 2025-05-14 09:39:15 +00:00
Alex Chi Z.
a8e652d47e rfc: add bottommost garbage-collection compaction (#8425)
Add the RFC for bottommost garbage-collection compaction

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-05-14 09:25:57 +00:00
Alex Chi Z.
81fd652151 fix(pageserver): use better estimation for compaction memory usage (#11904)
## Problem

Hopefully resolves `test_gc_feedback` flakiness.

## Summary of changes

`accumulated_values` should not exceed 512MB to avoid OOM. Previously we
only use number of items, which is not a good estimation.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-14 08:32:55 +00:00
Elizabeth Murray
d47e88e353 Update the pgrag version in the compute dockerfile. (#11867)
## Problem

The extensions test are hanging because of pgrag. The new version of
pgrag contains a fix for the hang.

## Summary of changes
2025-05-14 07:00:59 +00:00
Vlad Lazar
045ae13e06 pageserver: make imports work with tenant shut downs (#11855)
## Problem

Lifetime of imported timelines (and implicitly the import background
task) has some shortcomings:
1. Timeline activation upon import completion is tricky. Previously, a
timeline that finished importing
after a tenant detach would not get activated and there's concerns about
the safety of activating
concurrently with shut-down.
2. Import jobs can prevent tenant shut down since they hold the tenant
gate

## Summary of Changes

Track the import tasks in memory and abort them explicitly on tenant
shutdown.

Integrate more closely with the storage controller:
1. When an import task has finished all of its jobs, it notifies the
storage controller, but **does not** mark the import as done in the
index_part. When all shards have finished importing, the storage
controller will call the `/activate_post_import` idempotent endpoint for
all of them. The handler, marks the import complete in index part,
resets the tenant if required and checks if the timeline is active yet.
2. Not directly related, but the import job now gets the starting state
from the storage controller instead of the import bucket. This paves the
way for progress checkpointing.

Related: https://github.com/neondatabase/neon/issues/11568
2025-05-13 17:49:49 +00:00
Folke Behrens
234c882a07 proxy: Expose handlers for cpu and heap profiling (#11912)
## Problem

It's difficult to understand where proxy spends most of cpu and memory.

## Summary of changes

Expose cpu and heap profiling handlers for continuous profiling.

neondatabase/cloud#22670
2025-05-13 14:58:37 +00:00
Konstantin Knizhnik
290369061f Check prefetch result in DEBUG_COMPARE_LOCAL mode (#11502)
## Problem

Prefetched and LFC results are not checked in DEBUG_COMPARE_LOCAL mode

## Summary of changes

Add check for this results as well.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-13 14:13:42 +00:00
Anastasia Lubennikova
25ab16ee24 chore(compute): Postgres 17.5, 16.9, 15.13 and 14.18 (#11886)
Bump all minor versions. 

the only conflict was
src/backend/storage/smgr/smgr.c in v17
where our smgr changes conflicted with

ee578921b6
but it was trivial to resolve.
2025-05-13 13:30:09 +00:00
Vlad Lazar
cfbef4d586 safekeeper: downgrade stream from future WAL log (#11909)
## Problem

1. Safekeeper selection on the pageserver side isn't very dynamic. Once
you connect to one safekeeper, you'll use that one for as long as the
safekeeper keeps the connection alive. In principle, we could be more
eager, since the wal receiver connection can be cancelled but we don't
do that. We wait until the "session" is done and then we pick a new SK.
2. Picking a new SK is quite conservative. We will switch if: 
a. We haven't received anything from the SK within the last 10 seconds
(wal_connect_timeout) or
b. The candidate SK is 1GiB ahead or
c. The candidate SK is in the same AZ as the PS or d. There's a
candidate that is ahead and we've not had any WAL within the last 10
seconds (lagging_wal_timeout)

Hence, we can end up with pageservers that are requesting WAL which
their safekeeper hasn't seen yet.

## Summary of changes

Downgrade warning log to info.
2025-05-13 13:02:25 +00:00
Alex Chi Z.
34a42b00ca feat(pageserver): add PostHog lite client (#11821)
## Problem

part of https://github.com/neondatabase/neon/issues/11813

## Summary of changes

Add a lite PostHog client that only uses the local flag evaluation
functionality. Added a test case that parses an example feature flag and
gets the evaluation result.

TODO: support boolean flag, remote config; implement all operators in
PostHog.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-13 09:49:14 +00:00
Alex Chi Z.
a9979620c5 fix(remote_storage): continue on Azure+AWS retryable error (#11903)
## Problem

We implemented the retry logic in AWS S3 but not in Azure. Therefore, if
there is an error during Azure listing, we will return an Err to the
caller, and the stream will end without fetching more tenants.

Part of https://github.com/neondatabase/neon/issues/11159

Without this fix, listing tenant will stop once we hit an error (could
be network errors -- that happens more frequent on Azure). If we happen
to stop at a point that we only listed part of the shards, we will hit
the "missed shards" error or even remove layers being used.

This bug (for Azure listing) was introduced as part of
https://github.com/neondatabase/neon/pull/9840

There is also a bug that stops the stream for AWS when there's a timeout
-- this is fixed along with this patch.

## Summary of changes

Retry the request on error. In the future, we should make such streams
return something like `Result<Result<T>>` where the outer result is the
error that ends the stream and the inner one is the error that should be
retried by the caller.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-13 08:53:35 +00:00
Conrad Ludgate
a113c48c43 proxy: fix redis batching support (#11905)
## Problem

For `StoreCancelKey`, we were inserting 2 commands, but we were not
inserting two replies. This mismatch leads to errors when decoding the
response.

## Summary of changes

Abstract the command + reply pipeline so that commands and replies are
registered at the same time.
2025-05-13 08:33:53 +00:00
Tristan Partin
9971fba584 Properly configure the dynamic loader to load our compiled libraries (#11858)
The first line in /etc/ld.so.conf is:

	/etc/ld.so.conf.d/*

We want to control library load order so that our compiled binaries are
picked up before others from system packages. The previous solution
allowed the system libraries to load before ours.

Part-of: https://github.com/neondatabase/neon/issues/11857

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-05-12 17:36:07 +00:00
Conrad Ludgate
a77919f4b2 merge pg-sni-router into proxy (#11882)
## Problem

We realised that pg-sni-router doesn't need to be separate from proxy.
just a separate port.

## Summary of changes

Add pg-sni-router config to proxy and expose the service.
2025-05-12 15:48:48 +00:00
github-actions[bot]
0982ca4636 Proxy release 2025-05-12 14:35 UTC 2025-05-12 14:35:27 +00:00
Jakub Kołodziejczak
a618056770 chore(compute): skip audit logs for pg_session_jwt extension (#11883)
references
https://github.com/neondatabase/cloud/issues/28480#issuecomment-2866961124

related https://github.com/neondatabase/cloud/issues/28863

cc @MihaiBojin @conradludgate
2025-05-12 11:24:33 +00:00
Alex Chi Z.
307e1e64c8 fix(scrubber): more logs wrt relic timelines (#11895)
## Problem

Further investigation on
https://github.com/neondatabase/neon/issues/11159 reveals that the
list_tenant function can find all the shards of the tenant, but then the
shard gets missing during the gc timeline list blob. One reason could be
that in some ways the timeline gets recognized as a relic timeline.

## Summary of changes

Add logging to help identify the issue.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-12 09:17:35 +00:00
Arpad Müller
a537b2ffd0 pull_timeline: check tombstones by default (#11873)
Make `pull_timeline` check tombstones by default. Otherwise, we'd be
recreating timelines if the order between creation and deletion got
mixed up, as seen in #11838.

Fixes #11838.
2025-05-12 07:25:54 +00:00
Christian Schwarz
64353b48db direct+concurrent IO: retroactive RFC (#11788)
refs
- direct IO epic: https://github.com/neondatabase/neon/issues/8130
- concurrent IO epic https://github.com/neondatabase/neon/issues/9378
- obsoletes direct IO proposal RFC:
https://github.com/neondatabase/neon/pull/8240
- discussion in
https://neondb.slack.com/archives/C07BZ38E6SD/p1746028030574349
2025-05-10 15:06:06 +00:00
Christian Schwarz
79ddc803af feat(direct IO): runtime alignment validation; support config flag on macOS; default to DirectRw (#11868)
This PR adds a runtime validation mode to check adherence to alignment
and size-multiple requirements at the VirtualFile level.

This can help prevent alignment bugs from slipping into production
because test systems may have more lax requirements than production.
(This is not the case today, but it could change in the future).

It also allows catching O_DIRECT bugs on systems that don't have
O_DIRECT (macOS).
Consequently, we can now accept
`virtual_file_io_mode={direct,direct-rw}` on macOS now.
This has the side benefit of removing some annoying conditional
compilation around `IoMode`.

A third benefit is that it helped weed out size-multiple requirement
violation bugs in how the VirtualFile unit tests exercise read and write
APIs.
I seized the opportunity to trim these tests down to what actually
matters, i.e., exercising of the `OpenFiles` file descriptor cache.

Lastly, this PR flips the binary-built-in default to `DirectRw` so that
when running Python regress tests and benchmarks without specifying
`PAGESERVER_VIRTUAL_FILE_IO_MODE`, one gets the production behavior.

Refs
- fixes https://github.com/neondatabase/neon/issues/11676
2025-05-10 14:19:52 +00:00
Christian Schwarz
f5070f6aa4 fixup(direct IO): PR #11864 broke test suite parametrization (#11887)
PR
- github.com/neondatabase/neon/pull/11864

committed yesterday rendered the `PAGESERVER_VIRTUAL_FILE_IO_MODE`
env-var-based parametrization ineffective.

As a consequence, the tests and benchmarks in `test_runner/` were using
the binary built-in-default, i.e., `buffered`.
2025-05-09 18:13:35 +00:00
Matthias van de Meent
3b7cc4234c Fix PS connect attempt timeouts when facing interrupts (#11880)
With the 50ms timeouts of pumping state in connector.c, we need to
correctly handle these timeouts that also wake up pg_usleep.

This new approach makes the connection attempts re-start the wait
whenever it gets woken up early; and CHECK_FOR_INTERRUPTS() is called to
make sure we don't miss query cancellations.

## Problem

https://neondb.slack.com/archives/C04DGM6SMTM/p1746794528680269

## Summary of changes

Make sure we start sleeping again if pg_usleep got woken up ahead of
time.
2025-05-09 17:02:24 +00:00
Arpad Müller
33abfc2b74 storcon: remove finished safekeeper reconciliations from in-memory hashmap (#11876)
## Problem

Currently there is a memory leak, in that finished safekeeper
reconciliations leave a cancellation token behind which is never cleaned
up.

## Summary of changes

The change adds cleanup after finishing of a reconciliation. In order to
ensure we remove the correct cancellation token, and we haven't raced
with another reconciliation, we introduce a `TokenId` counter to tell
tokens apart.

Part of https://github.com/neondatabase/neon/issues/11670
2025-05-09 13:34:22 +00:00
Alex Chi Z.
93b964f829 fix(pageserver): do not do image compaction if it's below gc cutoff (#11872)
## Problem

We observe image compaction errors after gc-compaction finishes
compacting below the gc_cutoff. This is because `repartition` returns an
LSN below the gc horizon as we (likely) determined that `distance <=
self.repartition_threshold`.

I think it's better to keep the current behavior of when to trigger
compaction but we should skip image compaction if the returned LSN is
below the gc horizon.

## Summary of changes

If the repartition returns an invalid LSN, skip image compaction.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-09 12:07:52 +00:00
Vlad Lazar
d0aaec2abb storage_controller: create imported timelines on safekeepers (#11801)
## Problem

SK timeline creations were skipped for imported timelines since we
didn't know the correct start LSN
of the timeline at that point.

## Summary of changes

Created imported timelines on the SK as part of the import finalize
step.
We use the last record LSN of shard 0 as the start LSN for the
safekeeper timeline.

Closes https://github.com/neondatabase/neon/issues/11569
2025-05-09 10:55:26 +00:00
Alex Chi Z.
d0dc65da12 fix(pageserver): give up gc-compaction if one key has too long history (#11869)
## Problem

The limitation we imposed last week
https://github.com/neondatabase/neon/pull/11709 is not enough to protect
excessive memory usage.

## Summary of changes

If a single key accumulated too much history, give up compaction. In the
future, we can make the `generate_key_retention` function take a stream
of keys instead of first accumulating them in memory, thus easily
support such long key history cases.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-09 10:12:49 +00:00
Konstantin Knizhnik
03d635b916 Add more guards for prefetch_pump_state (#11859)
## Problem

See https://neondb.slack.com/archives/C08PJ07BZ44/p1746566292750689

Looks like there are more cases when `prefetch_pump_state` can be called
in unexpected place and cause core dump.

## Summary of changes

Add more guards.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-09 09:07:08 +00:00
Conrad Ludgate
5cd7f936f9 fix(neon-rls): optimistically assume role grants are already assigned for replicas (#11811)
## Problem

Read replicas cannot grant permissions for roles for Neon RLS. Usually
the permission is already granted, so we can optimistically check. See
INC-509

## Summary of changes

Perform a permission lookup prior to actually executing any grants.
2025-05-09 07:48:30 +00:00
Konstantin Knizhnik
101e115b38 Change prefetch logic in vacuum (#11650)
## Problem
See https://neondb.slack.com/archives/C03QLRH7PPD/p1745003314183649

Vacuum doesn't use prefetch because this strange logic in
`lazy_scan_heap`:

```
			/* And only up to the next unskippable block */
			if (next_prefetch_block + prefetch_budget > vacrel->next_unskippable_block)
				prefetch_budget = vacrel->next_unskippable_block - next_prefetch_block;
```
## Summary of changes

Disable prefetch only if vacuum jumps to next skippable block (there is
SKIP_PAGES_THRESHOLD) which cancel seqscan and perform jump only if gap
is large enough).


Postgres PRs:
https://github.com/neondatabase/postgres/pull/620
https://github.com/neondatabase/postgres/pull/621
https://github.com/neondatabase/postgres/pull/622
https://github.com/neondatabase/postgres/pull/623

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-09 06:54:40 +00:00
Christian Schwarz
b37bb7d7ed pageserver: timeline shutdown: fully quiesce ingest path beforefreeze_and_flush (#11851)
# Problem 

Before this PR, timeline shutdown would
- cancel the walreceiver cancellation token subtree (child token of
Timeline::cancel)
- call freeze_and_flush
- Timeline::cancel.cancel()
- ... bunch of waiting for things ...
- Timeline::gate.close()

As noted by the comment that is deleted by this PR, this left a window
where, after freeze_and_flush, walreceiver could still be running and
ingest data into a new InMemoryLayer.

This presents a potential source of log noise during Timeline shutdown
where the InMemoryLayer created after the freeze_and_flush observes
that Timeline::cancel is cancelled, failing the ingest with some
anyhow::Error wrapping (deeply) a `FlushTaskError::Cancelled` instance
(`flush task cancelled` error message).

# Solution

It turns out that it is quite easy to shut down, not just cancel,
walreceiver completely
because the only subtask spawned by walreceiver connection manager is
the `handle_walreceiver_connection` task, which is properly shut down
and waited upon when the manager task observes cancellation and exits
its retry loop.

The alternative is to replace all the usage of `anyhow` on the ingest
path
with differentiated error types. A lot of busywork for little gain to
fix
a potential logging noise nuisance, so, not doing that for now.

# Correctness / Risk

We do not risk leaking walreceiver child tasks because existing
discipline
is to hold a gate guard.

We will prolong `Timeline::shutdown` to the degree that we're no longer
making
progress with the rest of shutdown while the walreceiver task hasn't yet
observed cancellation. In practice, this should be negligible.

`Timeline::shutdown` could fail to complete if there is a hidden
dependency
of walreceiver shutdown on some subsystem. The code certainly suggests
there
isn't, and I'm not aware of any such dependency. Anyway, impact will be
low
because we only shut down Timeline instances that are obsolete, either
because
there is a newer attachment at a different location, or because the
timeline
got deleted by the user. We would learn about this through stuck cplane
operations or stuck storcon reconciliations. We would be able to
mitigate by
cancelling such stuck operations/reconciliations and/or by rolling back
pageserver.

# Refs
- identified this while investigating
https://github.com/neondatabase/neon/issues/11762
- PR that _does_ fix a bunch _real_ `flush task cancelled` noise on the
compaction path: https://github.com/neondatabase/neon/pull/11853
2025-05-08 18:48:24 +00:00
Conrad Ludgate
bef5954fd7 feat(proxy): track SNI usage by protocol, including for http (#11863)
## Problem

We want to see how many users of the legacy serverless driver are still
using the old URL for SQL-over-HTTP traffic.

## Summary of changes

Adds a protocol field to the connections_by_sni metric. Ensures it's
incremented for sql-over-http.
2025-05-08 16:46:57 +00:00
Christian Schwarz
8477d15f95 feat(direct IO): remove special case in test suite for compat tests (#11864)
PR
- https://github.com/neondatabase/neon/pull/11558
adds special treatment for compat snapshot binaries which don't
understand the `direct-rw` mode.

A new compat snapshot has been published since, so,
we can remove the special case.

refs:
- fixes https://github.com/neondatabase/neon/issues/11598
2025-05-08 16:11:45 +00:00
Arpad Müller
622b3b2993 Fixes for enabling --timelines-onto-safekeepers in tests (#11854)
Second PR with fixes extracted from #11712, relating to
`--timelines-onto-safekeepers`. Does the following:

* Moves safekeeper registration to `neon_local` instead of the test
fixtures
* Pass safekeeper JWT token if `--timelines-onto-safekeepers` is enabled
* Allow some warnings related to offline safekeepers (similarly to how
we allow them for offline pageservers)
* Enable generations on the compute's config if
`--timelines-onto-safekeepers` is enabled
* fix parallel `pull_timeline` race condition (the one that #11786 put
for later)

Fixes #11424
Part of #11670
2025-05-08 15:13:11 +00:00
Santosh Pingale
659366060d Reuse remote_client from the SnapshotDownloader instead of recreating in download function (#11812)
## Problem
At the moment, remote_client and target are recreated in download
function. We could reuse it from SnapshotDownloader instance. This isn't
a problem per se, just a quality of life improvement but it caught my
attention when we were trying out snapshot downloading in one of the
older version and ran into a curious case of s3 clients behaving in two
different manners. One client that used `force_path_style` and other one
didn't.

**Logs from this run:**
```
2025-05-02T12:56:22.384626Z DEBUG /data/snappie/2739e7da34e625e3934ef0b76fa12483/timelines/d44b831adb0a6ba96792dc3a5cc30910/000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000014E8F20-00000000014E8F99-00000001 requires download...
2025-05-02T12:56:22.384689Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:apply_configuration: timeout settings for this operation: TimeoutConfig { connect_timeout: Set(3.1s), read_timeout: Disabled, operation_timeout: Disabled, operation_attempt_timeout: Disabled }
2025-05-02T12:56:22.384730Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op: entering 'serialization' phase
2025-05-02T12:56:22.384784Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op: entering 'before transmit' phase
2025-05-02T12:56:22.384813Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op: retry strategy has OKed initial request
2025-05-02T12:56:22.384841Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op: beginning attempt #1
2025-05-02T12:56:22.384870Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt: resolving endpoint endpoint_params=EndpointResolverParams(TypeErasedBox[!Clone]:Params { bucket: Some("bucket"), region: Some("eu-north-1"), use_fips: false, use_dual_stack: false, endpoint: Some("https://s3.self-hosted.company.com"), force_path_style: false, accelerate: false, use_global_endpoint: false, use_object_lambda_endpoint: None, key: None, prefix: Some("/pageserver/tenants/2739e7da34e625e3934ef0b76fa12483/timelines/d44b831adb0a6ba96792dc3a5cc30910/000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000014E8F20-00000000014E8F99-00000001"), copy_source: None, disable_access_points: None, disable_multi_region_access_points: false, use_arn_region: None, use_s3_express_control_endpoint: None, disable_s3_express_session_auth: None }) endpoint_prefix=None
2025-05-02T12:56:22.384979Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt: will use endpoint Endpoint { url: "https://neon.s3.self-hosted.company.com", headers: {}, properties: {"authSchemes": Array([Object({"signingRegion": String("eu-north-1"), "disableDoubleEncoding": Bool(true), "name": String("sigv4"), "signingName": String("s3")})])} }
2025-05-02T12:56:22.385042Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt:lazy_load_identity:provide_credentials{provider=default_chain}: loaded credentials provider=Environment
2025-05-02T12:56:22.385066Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt:lazy_load_identity: identity cache miss occurred; added new identity (took 35.958µs) new_expiration=2025-05-02T13:11:22.385028Z valid_for=899.999961437s partition=IdentityCachePartition(5)
2025-05-02T12:56:22.385090Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt: loaded identity
2025-05-02T12:56:22.385162Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt: entering 'transmit' phase
2025-05-02T12:56:22.385211Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt: new TCP connector created in 361ns
2025-05-02T12:56:22.385288Z DEBUG resolving host="neon.s3.self-hosted.company.com"
2025-05-02T12:56:22.390796Z DEBUG invoke{service=s3 operation=ListObjectVersions sdk_invocation_id=7315885}:try_op:try_attempt: encountered orchestrator error; halting
```
2025-05-08 14:09:15 +00:00
Christian Schwarz
42d93031a1 fixup(#11819): broken macOS build (#11861)
refs
- fixes https://github.com/neondatabase/neon/issues/11860
2025-05-08 11:48:29 +00:00
Mark Novikov
d22377c754 Skip event triggers in dump-restore (#11794)
## Problem

Data import fails if the src db has any event triggers, because those
can only be restored by a superuser. Specifically imports from Heroku
and Supabase are guaranteed to fail.

Closes https://github.com/neondatabase/cloud/issues/27353

## Summary of changes

Depends on `pg_dump` patches per each supported PostgreSQL version:
- https://github.com/neondatabase/postgres/pull/630
- https://github.com/neondatabase/postgres/pull/629
- https://github.com/neondatabase/postgres/pull/627
- https://github.com/neondatabase/postgres/pull/628
2025-05-08 11:04:28 +00:00
Erik Grinaker
6c70789cfd storcon: increase drain+fill secondary warmup timeout from 20 to 30 seconds (#11848)
## Problem

During deployment drains/fills, we often see the storage controller
giving up on warmups after 20 seconds, when the warmup is nearly
complete (~90%). This can cause latency spikes for migrated tenants if
they block on layer downloads.

Touches https://github.com/neondatabase/cloud/issues/26193.

## Summary of changes

Increase the drain and fill secondary warmup timeout from 20 to 30
seconds.
2025-05-08 10:14:41 +00:00
Dmitrii Kovalkov
7e55497e13 tests: flush wal before waiting for last record lsn (#11726)
## Problem
Compute may flush WAL on page boundaries, leaving some records partially
flushed for a long time.
It leads to `wait_for_last_flush_lsn` stuck waiting for this partial
LSN.
- Closes: https://github.com/neondatabase/cloud/issues/27876

## Summary of changes
- Flush WAL via CHECKPOINT after requesting current_wal_lsn to make sure
that the record we point to is flushed in full
- Use proper endpoint in
`test_timeline_detach_with_aux_files_with_detach_v1`
2025-05-08 10:00:45 +00:00
Vlad Lazar
40f32ea326 pageserver: refactor import flow and add job concurrency limiting (#11816)
## Problem

Import code is one big block. Separating planning and execution will
help with reporting
progress of import to storcon (building block for resuming import).

## Summary of changes

Split up the import into planning and execution.
A concurrency limit driven by PS config is also added.
2025-05-08 09:19:14 +00:00
Christian Schwarz
1d1502bc16 fix(pageserver): flush task cancelled errors during timeline shutdown (#11853)
# Refs
- fixes https://github.com/neondatabase/neon/issues/11762

# Problem

PR #10993 introduced internal retries for BufferedWriter flushes.
PR #11052 added cancellation sensitivity to that retry loop.
That cancellation sensitivity is an error path that didn't exist before.

The result is that during timeline shutdown, after we
`Timeline::cancel`, compaction can now fail with error `flush task
cancelled`.
The problem with that:
1. We mis-classify this as an `error!`-worthy event.
2. This causes tests to become flaky because the error is not in global
`allowed_errors`.

Technically we also trip the `compaction_circuit_breaker` because the
resulting `CompactionError` is variant `::Other`.
But since this is Timeline shutdown, is doesn't matter practically
speaking.

# Solution / Changes

- Log the anyhow stack trace when classifying a compaction error as
`error!`.
  This was helpful to identify sources of `flush task cancelled` errors.
We only log at `error!` level in exceptional circumstances, so, it's ok
to have bit verbose logs.
- Introduce typed errors along the `BufferedWriter::write_*`=>
`BlobWriter::write_blob`
=> `{Delta,Image}LayerWriter::put_*` =>
`Split{Delta,Image}LayerWriter::put_{value,image}` chain.
- Proper mapping to `CompactionError`/`CreateImageLayersError` via new
`From` impls.

I am usually opposed to any magic `From` impls, but, it's how most of
the compaction code
works today.

# Testing

The symptoms are most prevalent in
`test_runner/regress/test_branch_and_gc.py::test_branch_and_gc`.
Before this PR, I was able to reproduce locally 1 or 2 times per 400
runs using
`DEFAULT_PG_VERSION=15 BUILD_TYPE=release poetry run pytest --count 400
-n 8`.
After this PR, it doesn't reproduce anymore after 2000 runs.

# Future Work

Technically the ingest path is also exposed to this new source of errors
because `InMemoryLayer` is backed by `BufferedWriter`.
But we haven't seen it occur in flaky tests yet.
Details and a fix in
- https://github.com/neondatabase/neon/pull/11851
2025-05-08 06:57:53 +00:00
Christian Schwarz
7eb85c56ac tokio-epoll-uring: avoid warn! noise due to ECANCELED during shutdowns (#11819)
# Problem

Before this PR, `test_pageserver_catchup_while_compute_down` would
occasionally fail due to scary-looking WARN log line

```
WARN ephemeral_file_buffered_writer{...}:flush_attempt{attempt=1}: \
 error flushing buffered writer buffer to disk, retrying after backoff err=Operation canceled (os error 125)
```

After lengthy investigation, the conclusion is that this is likely due
to a kernel bug related due to io_uring async workers (io-wq) and
signals.
The main indicator is that the error only ever happens in correlation
with pageserver shtudown when SIGTERM is received.
There is a fix that is merged in 6.14
kernels (`io-wq: backoff when retrying worker creation`).
However, even when I revert that patch, the issue is not reproducible
on 6.14, so, it remains a speculation.

It was ruled out that the ECANCELED is due to the executor thread
exiting before the async worker starts processing the operation.

# Solution

The workaround in this issue is to retry the operation on ECANCELED
once.
Retries are safe because the low-level io_engine operations are
idempotent.
(We don't use O_APPEND and I can't think of another flag that would make
 the APIs covered by this patch not idempotent.)

# Testing

With this PR, the warn! log no longer happens on [my reproducer
setup](https://github.com/neondatabase/neon/issues/11446#issuecomment-2843015111).
And the new rate-limited `info!`-level log line informing about the
internal retry shows up instead, as expected.

# Refs
- fixes https://github.com/neondatabase/neon/issues/11446
2025-05-08 06:33:29 +00:00
Dmitrii Kovalkov
24d62c647f storcon: add missing switch_timeline_membership method to sk client (#11850)
## Problem

`switch_timeline_membership` is implemented on safekeeper's server side,
but the is missing in the client.

- Part of https://github.com/neondatabase/neon/issues/11823

## Summary of changes
- Add `switch_timeline_membership` method to `SafekeeperClient`
2025-05-07 17:00:41 +00:00
Shockingly Good
4d2e4b19c3 fix(compute) Correct the PGXN s3 gateway URL. (#11796)
Corrects the postgres extension s3 gateway address to
be not just a domain name but a full base URL.

To make the code more readable, the option is renamed
to "remote_ext_base_url", while keeping the old name
also accessible by providing a clap argument alias.

Also provides a very simple and, perhaps, even redundant
unit test to confirm the logic behind parsing of the
corresponding CLI argument.

## Problem

As it is clearly stated in
https://github.com/neondatabase/cloud/issues/26005, using of the short
version of the domain name might work for now, but in the future, we
should get rid of using the `default` namespace and this is where it
will, most likely, break down.

## Summary of changes

The changes adjust the domain name of the extension s3 gateway to use
the proper base url format instead of the just domain name assuming the
"default" namespace and add a new CLI argument name for to reflect the
change and the expectance.
2025-05-07 16:34:08 +00:00
Alexey Kondratov
0691b73f53 fix(compute): Enforce cloud_admin role in compute_ctl connections (#11827)
## Problem

Users can override some configuration parameters on the DB level with
`ALTER DATABASE ... SET ...`. Some of these overrides, like `role` or
`default_transaction_read_only`, affect `compute_ctl`'s ability to
configure the DB schema properly.

## Summary of changes

Enforce `role=cloud_admin`, `statement_timeout=0`, and move
`default_transaction_read_only=off` override from control plane [1] to
`compute_ctl`. Also, enforce `search_path=public` just in case, although
we do not call any functions in user databases.

[1]:
133dd8c4db/goapp/controlplane/internal/pkg/compute/provisioner/provisioner_common.go (L70)

Fixes https://github.com/neondatabase/cloud/issues/28532
2025-05-07 12:14:24 +00:00
Vlad Lazar
3cf5e1386c pageserver: fix rough edges of pageserver tracing (#11842)
## Problem

There's a few rough edges around PS tracing.

## Summary of changes

* include compute request id in pageserver trace
* use the get page specific context for GET_REL_SIZE and GET_BATCH
* fix assertion in download layer trace


![image](https://github.com/user-attachments/assets/2ff6779c-7c2d-4102-8013-ada8203aa42f)
2025-05-07 10:13:26 +00:00
Alex Chi Z.
608afc3055 fix(scrubber): log download error (#11833)
## Problem

We use `head_object` to determine whether an object exists or not.
However, it does not always error due to a missing object.

## Summary of changes

Log the error so that we can have a better idea what's going on with the
scrubber errors in prod.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-07 09:21:17 +00:00
Tristan Partin
0ef6851219 Make the audience claim in compute JWTs a vector (#11845)
According to RFC 7519, `aud` is generally an array of StringOrURI, but
in special cases may be a single StringOrURI value. To accomodate future
control plane work where a single token may work for multiple services,
make the claim a vector.

Link: https://www.rfc-editor.org/rfc/rfc7519#section-4.1.3

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-05-06 22:19:15 +00:00
Mikhail
5c356c63eb endpoint_storage compute_ctl integration (#11550)
Add `/lfc/(prewarm|offload)` routes to `compute_ctl` which interact with
endpoint storage.

Add `prewarm_lfc_on_startup` spec option which, if enabled, downloads
LFC prewarm data on compute startup.

Resolves: https://github.com/neondatabase/cloud/issues/26343
2025-05-06 22:02:12 +00:00
Suhas Thalanki
384e3df2ad fix: pinned anon extension to v2.1.0 (#11844)
## Problem

Currently the setup for `anon` v2 in the compute image downloads the
latest version of the extension. This can be problematic as on a compute
start/restart it can download a version that is newer than what we have
tested and potentially break things, hence not giving us the ability to
control when the extension is updated.

We were also using `v2.2.0`, which is not ready for production yet and
has been clarified by the maintainer.

Additional context:
https://gitlab.com/dalibo/postgresql_anonymizer/-/issues/530

## Summary of changes

Changed the URL from which we download the `anon` extension to point to
`v2.1.0` instead of `latest`.
2025-05-06 21:52:15 +00:00
Tristan Partin
f9b3a2e059 Add scoping to compute_ctl JWT claims (#11639)
Currently we only have an admin scope which allows a user to bypass the
compute_id check. When the admin scope is provided, validate the
audience of the JWT to be "compute".

Closes: https://github.com/neondatabase/cloud/issues/27614

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-05-06 19:51:10 +00:00
Jakub Kołodziejczak
79ee78ea32 feat(compute): enable audit logs for pg_session_jwt extension (#11829)
related to https://github.com/neondatabase/cloud/issues/28480
related to https://github.com/neondatabase/pg_session_jwt/pull/36

cc @MihaiBojin @conradludgate @lneves12
2025-05-06 15:18:50 +00:00
Erik Grinaker
0e0ad073bf storcon: fix split aborts removing other tenants (#11837)
## Problem

When aborting a split, the code accidentally removes all other tenant
shards from the in-memory map that have the same shard count as the
aborted split, causing "tenant not found" errors. It will recover on a
storcon restart, when it loads the persisted state. This issue has been
present for at least a year.

Resolves https://github.com/neondatabase/cloud/issues/28589.

## Summary of changes

Only remove shards belonging to the relevant tenant when aborting a
split.

Also adds a regression test.
2025-05-06 13:57:34 +00:00
Alex Chi Z.
6827f2f58c fix(pageserver): only keep iter_with_options API, improve docs in gc-compact (#11804)
## Problem

Address comments in https://github.com/neondatabase/neon/pull/11709

## Summary of changes

- remove `iter` API, users always need to specify buffer size depending
on the expected memory usage.
- several doc improvements

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-05-06 12:27:16 +00:00
Peter Bendel
c82e363ed9 cleanup orphan projects created by python tests, too (#11836)
## Problem

- some projects are created during GitHub workflows but not by action
project_create but by python test scripts.

If the python test fails the project is not deleted

## Summary of changes

- make sure we cleanup those python created projects a few days after
they are no longer used, too
2025-05-06 12:26:13 +00:00
Alexander Bayandin
50dc2fae77 compute-node.Dockerfile: remove layer with duplicated name (#11807)
## Problem

Two `rust-extensions-build-pgrx14` layers were added independently in
two different PRs, and the layers are exactly the same

## Summary of changes
- Remove one of `rust-extensions-build-pgrx14` layers
2025-05-06 10:52:21 +00:00
github-actions[bot]
7272d9f7b3 Proxy release 2025-05-06 09:47 UTC 2025-05-06 09:47:48 +00:00
Folke Behrens
62ac5b94b3 proxy: Include the exp/nbf timestamps in the errors (#11828)
## Problem

It's difficult to tell when the JWT expired from current logs and error
messages.

## Summary of changes

Add exp/nbf timestamps to the respective error variants.
Also use checked_add when deserializing a SystemTime from JWT.

Related to INC-509
2025-05-06 09:28:25 +00:00
Konstantin Knizhnik
f0e7b3e0ef Use unlogged build for gist_indexsortbuild_flush_ready_pages (#11753)
## Problem

See https://github.com/neondatabase/neon/issues/11718

GIST index can be constructed in two ways: GIST_SORTED_BUILD and
GIST_BUFFERING.
We used unlogged build in the second case but not in the first.

## Summary of changes

Use unlogged build in `gist_indexsortbuild_flush_ready_pages`

Correspondent Postgres PRsL:
https://github.com/neondatabase/postgres/pull/624
https://github.com/neondatabase/postgres/pull/625
https://github.com/neondatabase/postgres/pull/626

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2025-05-06 07:24:27 +00:00
Dmitrii Kovalkov
c6ff18affc cosmetics(pgxn/neon): WP code small clean up (#11824)
## Problem
Some small cosmetic changes I made while reading the code. Should not
affect anything.

## Summary of changes
- Remove `n_votes` field because it's not used anymore
- Explicitly initialize `safekeepers_generation` with
`INVALID_GENERATION` if the generation is not present (the struct is
zero-initialized anyway, but the explicit initialization is better IMHO)
- Access SafekeeperId via pointer `sk_id` created above
2025-05-06 06:51:51 +00:00
Heikki Linnakangas
16ca74a3f4 Add SAFETY comment on libc::sysconf() call (#11581)
I got an 'undocumented_unsafe_blocks' clippy warning about it. Not sure
why I got the warning now and not before, but in any case a comment is a
good idea.
2025-05-06 06:49:23 +00:00
Peter Bendel
cb67f9a651 delete orphan left over projects (#11826)
## Problem

sometimes our benchmarking GitHub workflow is terminated by side-effects
beyond our control (e.g. GitHub runner looses connection to server) and
then we have left-over Neon projects created during the workflow

[Example where GitHub runner lost connection and project was not
deleted](https://github.com/neondatabase/neon/actions/runs/14017400543/job/39244816485)

Fixes https://github.com/neondatabase/cloud/issues/28546

## Summary of changes

- Add a cleanup step that cleans up left-over projects
- also give each project created during workflows a name that references
the testcase and GitHub runid

## Example run (test of new job steps)


https://github.com/neondatabase/neon/actions/runs/14837092399/job/41650741922#step:6:63

---------

Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>
2025-05-05 14:30:13 +00:00
devin-ai-integration[bot]
baf425a2cd [pageserver/virtual_file] impr: Improve OpenOptions API ergonomics (#11789)
# Improve OpenOptions API ergonomics

Closes #11787

This PR improves the OpenOptions API ergonomics by:

1. Making OpenOptions methods take and return owned Self instead of &mut
self
2. Changing VirtualFile::open_with_options_v2 to take an owned
OpenOptions
3. Removing unnecessary .clone() and .to_owned() calls

These changes make the API more idiomatic Rust by leveraging the builder
pattern with owned values, which is cleaner and more ergonomic than the
previous approach.

Link to Devin run:
https://app.devin.ai/sessions/c2a4b24f7aca40a3b3777f4259bf8ee1
Requested by: christian@neon.tech

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: christian@neon.tech <christian@neon.tech>
2025-05-05 13:06:37 +00:00
Alex Chi Z.
0b243242df fix(test): allow flush error in gc-compaction tests (#11822)
## Problem

Part of https://github.com/neondatabase/neon/issues/11762

## Summary of changes

While #11762 needs some work to refactor the error propagating thing, we
can do a hacky fix for the gc-compaction tests to allow flush error
during shutdown. It does not affect correctness.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-05 12:15:22 +00:00
Conrad Ludgate
6131d86ec9 proxy: allow invalid SNI (#11792)
## Problem

Some PrivateLink customers are unable to use Private DNS. As such they
use an invalid domain name to address Neon. We currently are rejecting
those connections because we cannot resolve the correct certificate.

## Summary of changes

1. Ensure a certificate is always returned.
2. If there is an SNI field, use endpoint fallback if it doesn't match.

I suggest reviewing each commit separately.
2025-05-05 11:18:55 +00:00
Konstantin Knizhnik
4b9087651c Checked that stored LwLSN >= FirstNormalUnloggedLSN (#11750)
## Problem

Undo unintended change 60b9fb1baf

## Summary of changes

Add assert that we are not storing fake LSN in LwLSN.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-05-02 19:27:59 +00:00
Konstantin Knizhnik
79699aebc8 Reserve in file descriptor pool sockets used for connections to page servers (#11798)
## Problem

See https://github.com/neondatabase/neon/issues/11790

The neon extension opens extensions to the pageservers, which consumes
file descriptors. Postgres has a mechanism to count how many FDs are in
use, but it doesn't know about those FDs. We should call
ReserveExternalFD() or AcquireExternalFD() to account for them.

## Summary of changes

Call `ReserveExternalFD()` for each shard

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Mikhail Kot <mikhail@neon.tech>
2025-05-02 14:36:10 +00:00
Alexander Bayandin
22290eb7ba CI: notify relevant team about release deploy failures (#11797)
## Problem

We notify only Storage team about failed deploys, but Compute and Proxy
teams can also benefit from that

## Summary of changes
- Adjust `notify-storage-release-deploy-failure` to notify the relevant
team about failed deploy
2025-05-02 12:46:21 +00:00
Alex Chi Z.
bbc35e10b8 fix(test): increase timeouts for some tests (#11781)
## Problem

Those tests are timing out more frequently after
https://github.com/neondatabase/neon/pull/11585

## Summary of changes

Increase timeout for `test_pageserver_gc_compaction_smoke`

Increase rollback wait timeout for `test_tx_abort_with_many_relations`

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-01 18:36:26 +00:00
Alex Chi Z.
ae2c3ac12f test: revert relsizev2 config (#11759)
## Problem

part of https://github.com/neondatabase/neon/issues/9516

One thing I realized in the past few months is that "no-way-back" things
like this are scary to roll out without a fine-grained rollout infra.
The plan was to flip the flag in the repo and roll it out soon, but I
don't think rolling out would happen in the near future. So I'd rather
revert the flag to avoid creating a discrepancy between staging and the
regress tests.

## Summary of changes

Not using rel_size_v2 by default in unit tests; we still have a few
tests to explicitly test the new format so we still get some test
coverages.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-05-01 17:51:10 +00:00
Vlad Lazar
16d594b7b3 pagectl: list layers for given key in decreasing LSN order (#11799)
Adds an extra key CLI arg to `pagectl layer list-layer`. When provided,
only layers with key ranges containing the key will be listed in
decreasing LSN order (indices are preserved for `dump-layer`).
2025-05-01 15:56:43 +00:00
Suhas Thalanki
f999632327 Adding anon v2 support to the dockerfile (#11313)
## Problem

Removed `anon` v1 support as described here:
https://github.com/neondatabase/cloud/issues/22663

Adding `anon` v2 support to re-introduce the `pg_anon` extension. 
Related Issues: https://github.com/neondatabase/cloud/issues/20456


## Summary of changes

Adding `anon` v2 support by building it in the dockerfile
2025-05-01 15:22:01 +00:00
Shockingly Good
5bd850d15a Fix the leaked tracing context for the "compute_monitor:run". (#11791)
Removes the leaked tracing context for the "compute_monitor:run" log,
which either inherited the "start_compute" span or also the HTTP request
context.

## Problem

The problem is that the context of the monitor's trace is unnecessarily
populated with the span data inherited from previously within the same
thread.

## Summary of changes

The context is completely reset by moving the span from the thread
spawning the monitor into the thread where the monitor will actually
start working.

Addresses https://github.com/neondatabase/cloud/issues/28145

## Examples

### Before
```
2025-04-30T16:39:05.840298Z  INFO start_compute:compute_monitor:run: compute is not running, waiting before monitoring activity
```

### After

```
2025-04-30T16:39:05.840298Z  INFO compute_monitor:run: compute is not running, waiting before monitoring activity
```
2025-05-01 09:09:10 +00:00
Dmitrii Kovalkov
1b789e8d7c fix(pgxn/neon): Use proper member size in TermsCollectedMset and VotesCollectedMset (#11785)
## Problem
`TermsCollectedMset` and `VotesCollectedMset` accept a MemberSet
argument to find a quorum in. It may be either `wp->mconf.members` or
`wp->mconf.new_members`. But the loops inside always use
`wp->mconf.members.len`.

If the sizes of member sets are different, it may lead to these
functions not scanning all the safekeepers from `mset`.

We are not planning to change the member set size dynamically now, but
it's worth fixing anyway.

- Part of https://github.com/neondatabase/neon/issues/11669

## Summary of changes
- Use proper size of member set in `TermsCollectedMset` and
`VotesCollectedMset`
2025-04-30 16:50:21 +00:00
Arpad Müller
bec7427d9e pull_timeline and sk logging fixes (#11786)
This patch contains some fixes of issues I ran into for #11712:

* make `pull_timeline` return success for timeline that already exists.
This follows general API design of storage components: API endpoints are
retryable and converge to a status code, instead of starting to error.
We change the `pull_timeline`'s return type a little bit, because we
might not actually have a source sk to pull from. Note that the fix is
not enough, there is still a race when two `pull_timeline` instances
happen in parallel: we might try to enter both pulled timelines at the
same time. That can be fixed later.
* make `pull_timeline` support one safekeeper being down. In general, if
one safekeeper is down, that's not a problem. the added comment explains
a potential situation (found in the `test_lagging_sk` test for example)
* don't log very long errors when computes try to connect to safekeepers
that don't have the timeline yet, if `allow_timeline_creation` is false.
That flag is enabled when a sk connection string with generation numbers
is passed to the compute, so we'll hit this code path more often. E.g.
when a safekeeper missed a timeline creation, but the compute connects
to it first before the `pull_timeline` gets requested by the storcon
reconciler: this is a perfectly normal situation. So don't log the whole
error backtrace, and don't log it on the error log level, but only on
info.

part of #11670
2025-04-30 16:24:01 +00:00
Alex Chi Z.
e2db76b9be feat(pageserver): ondemand download reason observability (#11780)
## Problem

Part of https://github.com/neondatabase/neon/issues/11615

## Summary of changes

We don't understand the root cause of why we get resident size surge
every now and then. This patch adds observability for that, and in the
next week, we might have a better understanding of what's going on.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-30 16:04:00 +00:00
Alex Chi Z.
6b4b8e0d8b fix(pageserver): do not increase basebackup err counter when shutdown (#11778)
## Problem

We occasionally see basebackup errors alerts but there were no errors
logged. Looking at the code, the only codepath that will cause this is
shutting down.

## Summary of changes

Do not increase any counter (ok/err) when basebackup request gets
cancelled due to shutdowns.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-30 15:50:12 +00:00
Konstantin Knizhnik
1d68577fbd Check target slot state in prefetch_wait_for (#11779)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1745599814030679

Assume the following scenario: prefetch_wait_for is doing
`CHECK_FOR_INTERRUPTS` which tries to load prefetch responses.
In case of error is calls pageserver_disconnect which aborts all
in-flight requests. But such failure is not detected by
`prefetch_wait_for` which returns true. As a result
`communicator_read_at_lsnv` assumes that slot is received, but as far as
asserts are disables at prod, it is not actually checked.
Then it tries to interpret response and ... *SIGSEGV*

## Summary of changes

Check target slot state  in `prefetch_wait_for`.

Resolves https://github.com/neondatabase/cloud/issues/28258

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-30 12:44:59 +00:00
Arseny Sher
60f63c076f Make safekeeper proto version 3 default (#11518)
## Problem

We have been running compute <-> sk protocol version 3 for a while on
staging with no issues observed, and want to fully migrate to it
eventually.

## Summary of changes

Let's make v3 the default.

ref https://github.com/neondatabase/neon/issues/10326

---------

Co-authored-by: Arpad Müller <arpad@neon.tech>
2025-04-30 12:23:20 +00:00
Mikhail Kot
8da4ec9740 Postgres metrics for stuck getpage requests (#11710)
https://github.com/neondatabase/neon/issues/10327
Resolves: #11720 

New metrics:
- `compute_getpage_stuck_requests_total`
- `compute_getpage_max_inflight_stuck_time_ms`
2025-04-30 12:01:41 +00:00
Em Sharnoff
b48404952d Bump vm-builder: v0.42.2 -> v0.46.0 (#11782)
Bumped to pick up the changes from neondatabase/autoscaling#1366 —
specifically including `uname` in the logs.

Other changes included:

* neondatabase/autoscaling#1301
* neondatabase/autoscaling#1296
2025-04-30 11:32:25 +00:00
devin-ai-integration[bot]
1d06172d59 pageserver: remove resident size from billing metrics (#11699)
This is a rebase of PR #10739 by @henryliu2014 on the current main
branch.

## Problem

pageserver: remove resident size from billing metrics

Fixes #10388

## Summary of changes

The following changes have been made to remove resident size from
billing metrics:

* removed the metric "resident_size" and related codes in
consumption_metrics/metrics.rs
* removed the item of the description of metric "resident_size" in
consumption_metrics.md
* refactored the metric "resident_size" related test case

Requested by: John Spray (john@neon.tech)

---------

Co-authored-by: liuheqing <hq.liu@qq.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: John Spray <john@neon.tech>
2025-04-29 18:34:56 +00:00
Elizabeth Murray
a08c1a23eb Upgrade the pgrag version in the compute Dockerfile. (#11687)
Update the compute Dockerfile to use a new version of pgrag. The new
version of pgrag uses the latest pgrx, and has a fix that terminates
background workers on postmaster exit.
2025-04-29 16:50:18 +00:00
John Spray
a2adc7dbd3 storcon: avoid multiple initdbs when shard 0 has stale locations (#11760)
## Problem

In #11727 I overlooked the case of multiple attached locations for shard
0.

I misread the code and thought `create_one` acts on one location, but it
actually acts on one _shard_, which is potentially multiple locations.

This was not a regression, but it meant that the fix was incomplete.

## Summary of changes

- In `create_one`, when updating shard zero, have any "other" locations
use the initdb from shard 0
2025-04-29 15:31:52 +00:00
Vlad Lazar
768a580373 pageserver: add not modified since lsn to get page span (#11774)
It's useful when debugging.
2025-04-29 14:07:23 +00:00
Folke Behrens
09247de8d5 proxy: Enable JSON logging by default (#11772)
This does not affect local_proxy.
2025-04-29 13:11:24 +00:00
Arpad Müller
0b35929211 Make SafekeeperReconciler parallel via semaphore (#11757)
Right now we only support running one reconciliation per safekeeper.
This is of course usually way below of what a safekeeper can do.
Therefore, introduce a semaphore and spawn the tasks asynchronously as
they come in.

Part of #11670
2025-04-29 12:46:15 +00:00
Ivan Efremov
b3db7f66ac fix(compute): Change the local_proxy log level (#11770)
Related to the INC-496
2025-04-29 11:49:16 +00:00
a-masterov
498d852bde Fix the empty region if run on schedule (#11764)
## Problem

When the workflow ran on a schedule, the `region_id` input was not set.
As a result, an empty region value was used, which caused errors during
execution.

## Summary of Changes

- Added fallback logic to set a default region (`aws-us-east-2`) when
`region_id` is not provided.
- Ensures the workflow works correctly both when triggered manually
(`workflow_dispatch`) and on schedule (`cron`).
2025-04-29 09:12:14 +00:00
Busra Kugler
7f8b1d79c0 Replace dorny/paths-filter with step-security maintained version (#11663)
## Problem
Our CI/CD security tool StepSecurity maintains safer forks of popular
GitHub Actions with low security scores. We're replacing
dorny/paths-filter with the maintained step-security/paths-filter
version to reduce risk of supply chain breaches and potential CVEs.

## Summary of changes
replace
```uses: dorny/paths-filter@de90cc6fb3 ```  with ```uses: step-security/paths-filter@v3```

This PR will fix: neondatabase/cloud#26141
2025-04-29 09:02:01 +00:00
JC Grünhage
d15f2ff57a fix(lint-release-pr): adjust lint and action to match (#11766)
## Problem
The `lint-release-pr` workflow run for
https://github.com/neondatabase/neon/pull/11763 failed, because the new
action did not match the lint.

## Summary of changes
Include time in expected merge message regex.
2025-04-29 08:56:44 +00:00
Konstantin Knizhnik
3593356c10 Prewarm sql api (#11742)
## Problem

Continue work on prewarm, see 
https://github.com/neondatabase/neon/pull/11740
https://github.com/neondatabase/neon/pull/11741

## Summary of changes

Add SQL API to prewarm

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-29 06:44:28 +00:00
github-actions[bot]
37d555aa59 Proxy release 2025/04/29 06:01 UTC 2025-04-29 06:01:28 +00:00
Tristan Partin
9e8ab2ab4f Skip remote extensions WITH_LIB test when sanitizers are enabled (#11758)
In order for the test to work when sanitizers are enabled, we would need
to compile the dummy Postgres extension with the same sanitizer flags
that we compile Postgres and the neon extension with. Doing this work
would be a little more than trivial, so skipping is the best option, at
least for now.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-28 19:13:35 +00:00
Alex Chi Z.
c1ff7db187 fix(pageserver): consider tombstones in replorigin (#11752)
## Problem

We didn't consider tombstones in replorigin read path in the past. This
was fine because tombstones are stored as LSN::Invalid before we
universally define what the tombstone is for sparse keyspaces.

Now we remove non-inherited keys during detach ancestor and write the
universal tombstone "empty image". So we need to consider it across all
the read paths.

related: https://github.com/neondatabase/neon/pull/11299

## Summary of changes

Empty value gets ignored for replorigin scans.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-28 18:54:26 +00:00
Konstantin Knizhnik
6d6b83e737 Prewarm implementation (#11741)
## Problem

Continue work on prewarm started in PR
https://github.com/neondatabase/neon/pull/11740

## Summary of changes

Implement prewarm using prefetch

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-28 18:17:03 +00:00
John Spray
0482690534 pageserver: make control_plane_api & generations fully mandatory (#10715)
## Problem

We had retained the ability to run in a generation-less mode to support
test_generations_upgrade, which was replaced with a cleaner backward
compat test in https://github.com/neondatabase/neon/pull/10701

## Summary of changes

- Remove all the special cases for "if no generation" or "if no control
plane api"
- Make control_plane_api config mandatory

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-04-28 17:24:55 +00:00
Tristan Partin
a750026c2e Fix compiler warning in libpagestore.c when WITH_SANITIZERS=yes (#11755)
Postgres has a nice self-documenting macro called pg_unreachable() when
you want to assert that a location in code won't be hit.

Warning in question:

```
/home/tristan957/Projects/work/neon//pgxn/neon/libpagestore.c: In function ‘pageserver_connect’:
/home/tristan957/Projects/work/neon//pgxn/neon/libpagestore.c:739:1: warning: control reaches end of non-void function [-Wreturn-type]
  739 | }
      | ^
```

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-28 17:09:48 +00:00
John Spray
998d2c2ce9 storcon: use shard 0's initdb for timeline creation (#11727)
## Problem

In princple, pageservers with different postgres binaries might generate
different initdbs, resulting in inconsistency between shards. To avoid
that, we should have shard 0 generate the initdb and other shards re-use
it.

Fixes: https://github.com/neondatabase/neon/issues/11340

## Summary of changes

- For shards with index greater than zero, set
`existing_initdb_timeline_id` in timeline creation to consume the
existing initdb rather than creating a new one
2025-04-28 16:43:35 +00:00
JC Grünhage
b1fa68f659 impr(ci): switch release PR creation over to use python based action (#11679)
## Problem
Our different repositories had both had code to achieve very similar
results in terms of release PR creation, but they were structured
differently and had different extensions. This was likely to cause
maintainability problems in the long run.

## Summary of changes
Switch to a python cli based composite action for creating the release
PRs that will also be introduced in our other repos later.

## To Do
- [ ] Adjust our docs to reflect the changes from this.
2025-04-28 16:37:36 +00:00
devin-ai-integration[bot]
84bc3380cc Remove SAFEKEEPER_AUTH_TOKEN env var parsing from safekeeper (#11698)
# Remove SAFEKEEPER_AUTH_TOKEN env var parsing from safekeeper

This PR is a follow-up to #11443 that removes the parsing of the
`SAFEKEEPER_AUTH_TOKEN` environment variable from the safekeeper
codebase while keeping the `auth_token_path` CLI flag functionality.

## Changes:
- Removed code that checks for the `SAFEKEEPER_AUTH_TOKEN` environment
variable
- Updated comments to reflect that only the `auth_token_path` CLI flag
is now used

As mentioned in PR #11443, the environment variable approach was planned
to be deprecated and removed in favor of the file-based approach, which
is more secure since environment variables can be quite public in both
procfs and unit files.

Link to Devin run:
https://app.devin.ai/sessions/d6f56cf1b4164ea9880a9a06358a58ac

Requested by: arpad@neon.tech

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: arpad@neon.tech <arpad@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-04-28 15:34:47 +00:00
Alex Chi Z.
11f6044338 fix(pageserver): report synthetic size = 1 if all tls offloaded (2) (#11731)
## Problem

https://github.com/neondatabase/neon/pull/11648 did this for resident
size instead of synthetic size.

## Summary of changes

Report synthetic_size == 1 if all timelines are offloaded.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-28 13:45:45 +00:00
Konstantin Knizhnik
692c0f3fb8 Prepare to prewarm support (#11740)
## Problem

See
(original prewarm implementation)
https://github.com/neondatabase/neon/pull/9197
(functions for storing/restoring LFC state)
https://github.com/neondatabase/neon/pull/9587
(store prefetch results in LFC)
https://github.com/neondatabase/neon/pull/10442

## Summary of changes

Preparation for prewarm implementation.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-28 13:24:18 +00:00
Alexander Bayandin
2b1d2a55d6 CI: fix typo oicd -> oidc (#11747)
## Problem

It's OIDC (OpenID Connect), not OICD

## Summary of changes
- Rename actions input `aws-oicd-role-arn` -> `aws-oidc-role-arn`
2025-04-28 12:44:28 +00:00
Konstantin Knizhnik
60b9fb1baf Ignore unlogged LSNs in set last written LSN (#11743)
## Problem

See https://github.com/neondatabase/neon/issues/11718
and https://neondb.slack.com/archives/C033RQ5SPDH/p1745122797538509

GIST other indexes performing "unlogged build" are using so called fake
LSNs - not a real LSN, but something like 0/1. Been stored in lwlsn
cache they cause incorrect lookup at PS.

## Summary of changes

Do not store fake LSNs in LwLSN hash.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-28 12:16:29 +00:00
Erik Grinaker
606f14034e pageserver: improve pageserver_smgr_query_seconds buckets (#11680)
## Problem

The `pageserver_smgr_query_seconds` buckets are too coarse, using powers
of 10: 1 µs, 10 µs, 100 µs, 1 ms, 10 ms, 100 ms, 1 s, 10 s, 100 s. This
is one of our most crucial latency metrics, and needs better resolution.

Touches #11594.

## Summary of changes

This patch uses buckets with better resolution around 1 ms (the typical
latency):

* 0.6 ms
* 1 ms
* 3 ms
* 6 ms
* 10 ms
* 30 ms
* 100 ms
* 1 s
* 3 s

These will be the same as the compute's `compute_getpage_wait_seconds`,
to make them comparable across the compute and Pageserver:
https://github.com/neondatabase/flux-fleet/pull/579. We sacrifice
buckets above 3 s, since these can already be considered "too slow".

This does not change the previously used `CRITICAL_OP_BUCKETS`, which is
also used for other operations on different timescales (e.g. LSN waits).
We should consider replacing this with more appropriate buckets for
specific operations, since it covers a large span with low resolution.
2025-04-28 11:52:44 +00:00
Conrad Ludgate
32393b4393 pg-sni-router: support compute TLS on different port (#11732)
## Problem

pg-sni-router isn't aware of compute TLS

## Summary of changes

If connections come in on port 4433, we require TLS to compute from
pg-sni-router
2025-04-28 11:29:44 +00:00
Alexander Bayandin
1a29f5672a CI(check-macos-build): trigger workflow automatically for PRs (#11706)
## Problem

- if-conditions for the `check-macos-build` workflow don't trigger it on
PRs with relevant changes (in Rust code or Postgres submodules).
- Jobs in the workflow depend on the presence of a cache, which is not
guaranteed.

## Summary of changes
- Fix if-conditions
- Use artifacts on top of cache whenever the workflow depends on it —
the cache might not be available
2025-04-28 09:03:10 +00:00
a-masterov
b8d47b5acf Run the extensions' tests on staging (#11164)
## Problem
We currently don't run end-to-end tests for PostgreSQL extensions on our
cloud infrastructure, which means we might miss problems that only occur
in a real cloud environment.

## Summary of changes
- Added a workflow to run extension tests against a cloud staging
instance
- Set up proper project configuration for extension testing
- Implemented test execution with appropriate environment settings
- Added error handling and reporting for test failures

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2025-04-28 08:13:49 +00:00
Alexander Lakhin
97e01ae6fd Add workflow to run particular test(s) N times (#11050)
## Problem
Provide an easy way to run particular test(s) N times on CI.

## Summary of changes

* Allow for passing the test selection and the number of test runs to
the existing "Build and Test Locally" workflow
* Allow for running multiple selected tests by the "Pytest regression
tests" step
* Introduce a new workflow to run specified test(s) several times
* Store results in a separate database to distinguish between testing
tests for stability and usual testing
2025-04-28 04:04:37 +00:00
Lokesh
459d51974c doc: minor updates to consumption-metrics document (#7153)
## Problem
Proposed minor changes to the `consumption_metrics` document.

## Summary of changes
- Fixed minor typos in the document.
- Minor formatting in the description of metrics `timeline_logical_size`
and `synthetic_storage_size`. Makes this consistent as with description
  of other metrics in the document.

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Mikhail Kot <mikhail@neon.tech>
2025-04-25 19:46:40 +00:00
StepSecurity Bot
902d361107 CI/CD Hardening: Fixing StepSecurity Flagged Issues (#11724)
### Summary
I'm fixing one or more of the following CI/CD misconfigurations to
improve security. Please feel free to leave a comment if you think the
current permissions for the GITHUB_TOKEN should not be restricted so I
can take a note of it as accepted behaviour.

- Restrict permissions for GITHUB_TOKEN
- Add step-security/harden-runner
- Pin Actions to a full length commit SHA

### Security Fixes
will fix https://github.com/neondatabase/cloud/issues/26141
2025-04-25 14:36:45 +00:00
Dmitrii Kovalkov
ef53a76434 storage_broker: https handler (#11603)
## Problem
Broker supports only HTTP, no HTTPS
- Closes: https://github.com/neondatabase/cloud/issues/27492

## Summary of changes
- Add `listen_https_addr`, `ssl_key_file`, `ssl_cert_file`,
`ssl_cert_reload_period` arguments to storage broker
- Make `listen_addr` argument optional
- Listen https in storage broker
- Support https for storage broker request in neon_local
- Add `use_https_storage_broker_api` option to NeonEnvBuilder
2025-04-25 14:28:56 +00:00
Vlad Lazar
6f0046b688 storage_controller: ensure mutual exclusion for imports and shard splits (#11632)
## Problem

Shard splits break timeline imports.

## Summary of Changes

Ensure mutual exclusion for imports and shard splits.

On the shard split code path:
1. Right before shard splitting, check the database to ensure that
no-import is on-going for the tenant. Exclusion is guaranteed because
this validation is done while holding the exclusive tenant lock.
Timeline creation (and import creation implicitly) requires a shared
tenant lock.
2. When selecting a shard to split, use the in-mem state to exclude
shards with an on-going import. This is opportunistic since an import
might start after the check, but allows shard splits to make progres
instead of continously retrying to split the same shard.

On the timeline creation code path:
1. Check the in-memory splitting flag on all shards of the tenant. If
any of them are splitting, error out asking the client to retry. On the
happy path this is not required, due to the tenant lock set-up described
above, but it covers the case where we restart with a pending
shard-split.

Closes https://github.com/neondatabase/neon/issues/11567
2025-04-25 11:46:15 +00:00
Em Sharnoff
2b0248cd76 fix(proxy): s/Console/Control plane/ in cplane error (#11716)
I got bamboozled by the error message while debugging, seems no
objections to updating it.

ref https://neondb.slack.com/archives/C060N3SEF9D/p1745570961111509

ref https://neondb.slack.com/archives/C039YKBRZB4/p1745570811957019?thread_ts=1745393368.283599
2025-04-25 11:09:56 +00:00
Fedor Dikarev
7b03216dca CI(check-macos-build): use gh native cache (#11707)
## Problem
- using Hetzner buckets for cache requires secrets, we either need
`secrets: inherit` to make it works
- we don't have self-hosted MacOs runners, so actually GH native cache
is more optimal solution there

## Summary of changes
- switch to GH native cache for macos builds
2025-04-25 09:18:20 +00:00
a-masterov
992aa91075 Refresh the codestyle of docker compose test script (#11715)
## Problem
The docker compose test script (`docker_compose_test.sh`) had
inconsistent codestyle, mixing legacy syntax with modern approaches and
not following best practices at all. This inconsistency could lead to
potential issues with variable expansion, path handling, and
maintainability.
## Summary of changes
This PR modernizes the test script with several codestyle improvements:
* Variable scoping and exports:
  * Added proper export declarations for environment variables
  * Added explicit COMPOSE_PROFILES export to avoid repetitive flags
* Modern Bash syntax:
  * Replaced [ ] with [[ ]] for safer conditional testing
  * Used arithmetic operations (( cnt += 3 )) instead of expr
  * Added proper variable expansion with braces ${variable}
  * Added proper quoting around variables and paths with "${variable}"
* Docker Compose commands:
  * Replaced hardcoded container names with service names
  * Used docker compose exec instead of docker exec $CONTAINER_NAME
  * Removed repetitive flags by using environment variables
* Shell script best practices:
  * Added function keyword before function definition
  * Used safer path handling with "$(dirname "${0}")"
These changes make the script more maintainable, less error-prone, and
more consistent with modern shell scripting standards.
2025-04-25 09:13:35 +00:00
Conrad Ludgate
afe9b27983 fix(compute/tls): support for checking certificate chains (#11683)
## Problem

It seems are production-ready cert-manager setup now includes a full
certificate chain. This was not accounted for and the decoder would
error.

## Summary of changes

Change the way we decode certificates to support cert-chains, ignoring
all but the first cert.

This also changes a log line to not use multi-line errors.

~~I have tested this code manually against real certificates/keys, I
didn't want to embed those in a test just yet, not until the cert
expires in 24 hours.~~
2025-04-25 09:09:14 +00:00
Alex Chi Z.
5d91d4e843 fix(pageserver): reduce gc-compaction memory usage (#11709)
## Problem

close https://github.com/neondatabase/neon/issues/11694

We had the delta layer iterator and image layer iterator set to buffer
at most 8MB data. Note that 8MB is the compressed size, so it is
possible for those iterators contain more than 8MB data in memory.

For the recent OOM case, gc-compaction was running over 556 layers,
which means that we will have 556 active iterators. So in theory, it
could take up to 556*8=4448MB memory when the compaction is going on. If
images get compressed and the compression ratio is high (for that
tenant, we see 3x compression ratio across image layers), then that's
13344MB memory.

Also we have layer rewrites, which explains the memory taken by
gc-compaction itself (versus the iterators). We rewrite 424 out of 556
layers, and each of such rewrites need a pair of delta layer writer. So
we are buffering a lot of deltas in the memory.

The flamegraph shows that gc-compaction itself takes 6GB memory, delta
iterator 7GB, and image iterator 2GB, which can be explained by the
above theory.

## Summary of changes

- Reduce the buffer sizes.
- Estimate memory consumption and if it is too high.
- Also give up if the number of layers-to-rewrite is too high.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-25 08:45:31 +00:00
Alexander Bayandin
2465e9141f test_runner: bump httpcore to 1.0.9 and h11 to 0.16.0 (#11711)
## Problem

https://github.com/advisories/GHSA-vqfr-h8mv-ghfj

## Summary of changes
- Bump `h11` to 0.16.0 (required to bump `httpcore` to 1.0.9)
2025-04-25 08:44:40 +00:00
Tristan Partin
2526f6aea1 Add remote extension test with library component (#11301)
The current test was just SQL files only, but we also want to test a
remote extension which includes a loadable library. With both extensions
we should cover a larger portion of compute_ctl's remote extension code
paths.

Fixes: https://github.com/neondatabase/neon/issues/11146

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-24 22:33:46 +00:00
Vlad Lazar
5ba7315c84 storage_controller: reconcile completed imports at start-up (#11614)
## Problem

In https://github.com/neondatabase/neon/pull/11345 coordination of
imports moved to the storage controller.
It involves notifying cplane when the import has been completed by
calling an idempotent endpoint.

If the storage controller shuts down in the middle of finalizing an
import, it would never be retried.

## Summary of changes

Reconcile imports at start-up by fetching the complete imports from the
database and spawning a background
task which notifies cplane.

Closes: https://github.com/neondatabase/neon/issues/11570
2025-04-24 18:39:19 +00:00
Vlad Lazar
6f7e3c18e4 storage_controller: make leadership protocol more robust (#11703)
## Problem

We saw the following scenario in staging:
1. Pod A starts up. Becomes leader and steps down the previous pod
cleanly.
2. Pod B starts up (deployment).
3. Step down request from pod B to pod A times out. Pod A did not manage
to stop its reconciliations within 10 seconds and exited with return
code 1
([code](7ba8519b43/storage_controller/src/service.rs (L8686-L8702))).
4. Pod B marks itself as the leader and finishes start-up
5. k8s restarts pod A
6. k8s marks pod B as ready
7. pod A sends step down request to pod A - this succeeds => pod A is
now the leader
8. k8s kills pod A because it thinks pod B is healthy and pod A is part
of the old replica set

We end up in a situation where the only pod we have (B) is stepped down
and attempts to forward requests to a leader that doesn't exist. k8s
can't detect that pod B is in a bad state since the /status endpoint
simply returns 200 hundred if the pod is running.

## Summary of changes

This PR includes a number of robustness improvements to the leadership
protocol:
* use a single step down task per controller
* add a new endpoint to be used as k8s liveness probe and check
leadership status there
* handle restarts explicitly (i.e. don't step yourself down)
* increase the step down retry count
* don't kill the process on long step down since k8s will just restart
it
2025-04-24 16:59:56 +00:00
Christian Schwarz
8afb783708 feat: Direct IO for the pageserver write path (#11558)
# Problem

The Pageserver read path exclusively uses direct IO if
`virtual_file_io_mode=direct`.

The write path is half-finished. Here is what the various writing
components use:

|what|buffering|flags on <br/>`v_f_io_mode`<br/>=`buffered`|flags on
<br/>`virtual_file_io_mode`<br/>=`direct`|
|-|-|-|-|
|`DeltaLayerWriter`| BlobWriter<BUFFERED=true> | () | () |
|`ImageLayerWriter`| BlobWriter<BUFFERED=false> | () | () |
|`download_layer_file`|BufferedWriter|()|()|
|`InMemoryLayer`|BufferedWriter|()|O_DIRECT|


The vehicle towards direct IO support is `BufferedWriter` which
- largely takes care of O_DIRECT alignment & size-multiple requirements 
- double-buffering to mask latency

`DeltaLayerWriter`, `ImageLayerWriter` use `blob_io::BlobWriter` , which
has neither of these.

# Changes

## High-Level

At a high-level this PR makes the following primary changes:

- switch the two layer writer types to use `BufferedWriter` & make
sensitive to `virtual_file_io_mode` (via open_with_options_**v2**)
- make `download_layer_file` sensitive to `virtual_file_io_mode` (also
via open_with_options_**v2**)
- add `virtual_file_io_mode=direct-rw` as a feature gate
- we're hackish-ly piggybacking on OpenOptions's ask for write access
here
- this means with just `=direct` InMemoryLayer reads and writes no
longer uses O_DIRECT
- this is transitory and we'll remove the `direct-rw` variant once the
rollout is complete

(The `_v2` APIs for opening / creating VirtualFile are those that are
sensitive to `virtual_file_io_mode`)

The result is:

|what|uses <br/>`BufferedWriter`|flags on
<br/>`v_f_io_mode`<br/>=`buffered`|flags on
<br/>`v_f_io_mode`<br/>=`direct`|flags on
<br/>`v_f_io_mode`<br/>=`direct-rw`|
|-|-|-|-|-|
|`DeltaLayerWriter`| ~~Blob~~BufferedWriter | () | () |  O_DIRECT |
|`ImageLayerWriter`| ~~Blob~~BufferedWriter | () | () |  O_DIRECT |
|`download_layer_file`|BufferedWriter|()|()|O_DIRECT|
|`InMemoryLayer`|BufferedWriter|()|~~O_DIRECT~~()|O_DIRECT|


## Code-Level


The main change is:
- Switch `blob_io::BlobWriter` away from its own buffering method to use
`BufferedWriter`.

Additional prep for upholding `O_DIRECT` requirements:
- Layer writer `finish()` methods switched to use IoBufferMut for
guaranteed buffer address alignment. The size of the buffers is PAGE_SZ
and thereby implicitly assumed to fulfill O_DIRECT requirements.

For the hacky feature-gating via `=direct-rw`:
- Track `OpenOptions::write(true|false)` in a field; bunch of mechanical
churn.
- Consolidate the APIs in which we "open" or "create" VirtualFile for
better overview over which parts of the code use the `_v2` APIs.

Necessary refactorings & infra work:
- Add doc comments explaining how BufferedWriter ensures that writes are
compliant with O_DIRECT alignment & size constraints. This isn't new,
but should be spelled out.
- Add the concept of shutdown modes to `BufferedWriter::shutdown` to
make writer shutdown adhere to these constraints.
- The `PadThenTruncate` mode might not be necessary in practice because
I believe all layer files ever written are sized in multiples `PAGE_SZ`
and since `PAGE_SZ` is larger than the current alignment requirements
(512/4k depending on platform), it won't be necesary to pad.
- Some test (I believe `round_trip_test_compressed`?) required it though
- [ ] TODO: decide if we want to accept that complexity; if we do then
address TODO in the code to separate alignment requirement from buffer
capacity
- Add `set_len` (=`ftruncate`) VirtualFile operation to support the
above.
- Allow `BufferedWriter` to start at a non-zero offset (to make room for
the summary block).

Cleanups unlocked by this change:
- Remove non-positional APIs from VirtualFile (e.g. seek, write_full,
read_full)

Drive-by fixes:
- PR https://github.com/neondatabase/neon/pull/11585 aimed to run unit
tests for all `virtual_file_io_mode` combinations but didn't because of
a missing `_` in the env var.

# Performance

This section assesses this PR's impact on deployments with current
production setting (`=direct`) and anticipated impact of switching to
(`=direct-rw`).

For `DeltaLayerWriter`, `=direct` should remain unchanged to slightly
improved on throughput because the `BlobWriter`'s buffer had the same
size as the `BufferedWriter`'s buffer, but it didn't have the
double-buffering that `BufferedWriter` has.
The `=direct-rw` enables direct IO; throughput should not be suffering
because of double-buffering; benchmarks will show if this is true.

The `ImageLayerWriter` was previously not doing any buffering
(`BUFFERED=false`).
It went straight to issuing the IO operation to the underlying
VirtualFile and the buffering was done by the kernel.
The switch to `BufferedWriter` under `=direct` adds an additional memcpy
into the BufferedWriter's buffer.
We will win back that memcpy when enabling direct IO via `=direct-rw`.

A nice win from the switch to `BufferedWriter` is that ImageLayerWriter
performs >=16x fewer write operations to VirtualFile (the BlobWriter
performs one write per len field and one write per image value).
This should save low tens of microseconds of CPU overhead from doing all
these syscalls/io_uring operations, regardless of `=direct` or
`=direct-rw`.
Aside from problems with alignment, this write frequency without
double-buffering is prohibitive if we actually have to wait for the
disk, which is what will happen when we enable direct IO via
(`=direct-rw`).
Throughput should not be suffering because of BufferedWrite's
double-buffering; benchmarks will show if this is true.

`InMemoryLayer` at `=direct` will flip back to using buffered IO but
remain on BufferedWriter.
The buffered IO adds back one memcpy of CPU overhead.
Throughput should not suffer and will might improve on
not-memory-pressured Pageservers but let's remember that we're doing the
whole direct IO thing to eliminate global memory pressure as a source of
perf variability.

## bench_ingest

I reran `bench_ingest` on `im4gn.2xlarge` and `Hetzner AX102`.
Use `git diff` with `--word-diff` or similar to see the change.

General guidance on interpretation:
- immediate production impact of this PR without production config
change can be gauged by comparing the same `io_mode=Direct`
- end state of production switched over to `io_mode=DirectRw` can be
gauged by comparing old results' `io_mode=Direct` to new results'
`io_mode=DirectRw`

Given above guidance, on `im4gn.2xlarge`
- immediate impact is a significant improvement in all cases
- end state after switching has same significant improvements in all
cases
- ... except `ingest/io_mode=DirectRw volume_mib=128 key_size_bytes=8192
key_layout=Sequential write_delta=Yes` which only achieves `238 MiB/s`
instead of `253.43 MiB/s`
  - this is a 6% degradation
  - this workload is typical for image layer creation

# Refs
- epic https://github.com/neondatabase/neon/issues/9868
- stacked atop
  - preliminary refactor https://github.com/neondatabase/neon/pull/11549
- bench_ingest overhaul https://github.com/neondatabase/neon/pull/11667
- derived from https://github.com/neondatabase/neon/pull/10063

Co-authored-by: Yuchen Liang <yuchen@neon.tech>
2025-04-24 14:57:36 +00:00
Konstantin Knizhnik
1531712555 Undo commit d1728a6bcd because it causes problems with creating pg_search extension (#11700)
## Problem

See https://neondb.slack.com/archives/C03H1K0PGKH/p1745489241982209

pg_search extension now can not be created.

## Summary of changes

Undo d1728a6bcd

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-24 14:46:10 +00:00
Alexander Bayandin
5e989a3148 CI(build-tools): bump packages in build-tools image (#11697)
## Problem

`cargo-deny` 0.16.2 spits a bunch of warnings like:
```
warning[index-failure]: unable to check for yanked crates
```

The issue is fixed for the latest version of `cargo-deny` (0.18.2). And
while we're here, let's bump all the packages we have in `build-tools`
image

## Summary of changes
- bump cargo-hakari to 0.9.36
- bump cargo-deny to 0.18.2
- bump cargo-hack to 0.6.36
- bump cargo-nextest to 0.9.94
- bump diesel_cli to 2.2.9
- bump s5cmd to 2.3.0
- bump mold to 2.37.1
- bump python to 3.11.12
2025-04-24 14:13:04 +00:00
Alexey Kondratov
985056be37 feat(compute): Introduce Postgres downtime metrics (#11346)
## Problem

Currently, we only report the timestamp of the last moment we think
Postgres was active. The problem is that if Postgres gets completely
unresponsive, we still report some old timestamp, and it's impossible to
distinguish situations 'Postgres is effectively down' and 'Postgres is
running, but no client activity'.

## Summary of changes

Refactor the `compute_ctl`'s compute monitor so that it was easier to
track the connection errors and failed activity checks, and report
- `now() - last_successful_check` as current downtime on any failure
- cumulative Postgres downtime during the whole compute lifetime

After adding a test, I also noticed that the compute monitor may not
reconnect even though queries fail with `connection closed` or `error
communicating with the server: Connection reset by peer (os error 54)`,
but for some reason we do not catch it with `client.is_closed()`, so I
added an explicit reconnect in case of any failures.

Discussion:
https://neondb.slack.com/archives/C03TN5G758R/p1742489426966639
2025-04-24 13:51:09 +00:00
Christian Schwarz
9c6ff3aa2b refactor(BufferedWriter): flush task owns the VirtualFile & abstraction for cleanup on drop (#11549)
Main change:

- `BufferedWriter` owns the `W`; no more `Arc<W>`
- We introduce auto-delete-on-drop wrappers for `VirtualFile`.
  - `TempVirtualFile` for write-only users
- `TempVirtualFileCoOwnedByEphemeralFileAndBufferedWriter` for
EphemeralFile which requires read access to the immutable prefix of the
file (see doc comments for details)
- Users of `BufferedWriter` hand it such a wrapped `VirtualFile`.
- The wrapped `VirtualFile` moves to the background flush task.
- On `BufferedWriter` shutdown, ownership moves back.
- Callers remove the wrapper (`disarm_into_inner()`) after doing final
touches, e.g., flushing index blocks and summary for delta/image layer
writers.

If the BufferedWriter isn't shut down properly via
`BufferedWriter::shutdown`, or if there is an error during final
touches, the wrapper type ensures that the file gets unlinked.

We store a GateGuard inside the wrapper to ensure that the Timeline is
still alive when unlinking on drop.

Rust doesn't have async drop yet, so, the unlinking happens using a
synchronous syscall.
NB we don't fsync the surrounding directory.
This is how it's been before this PR; I believe it is correct because
all of these files are temporary paths that get cleaned up on timeline
load.
Again, timeline load does not need to fsync because the next timeline
load will unlink again if the file reappears.

The auto-delete-on-drop can happen after a higher-level mechanism
retries.
Therefore, we switch all users to monotonically increasing, never-reused
temp file disambiguators.

The aspects pointed out in the last two paragraphs will receive further
cleanup in follow-up task
- https://github.com/neondatabase/neon/issues/11692

Drive-by changes:
- It turns out we can remove the two-pronged code in the layer file
download code.
No need to make this a separate PR because all of production already
uses `tokio-epoll-uring` with the buffered writer for many weeks.


Refs
- epic https://github.com/neondatabase/neon/issues/9868
- alternative to https://github.com/neondatabase/neon/pull/11544
2025-04-24 13:07:57 +00:00
Folke Behrens
9d472c79ce Fix what's currently flagged by cargo deny (#11693)
* Replace yanked papaya version
* Remove unused allowed license: OpenSSL
* Remove Zlib license from general allow list since it's listed in the
exceptions section per crate
* Drop clarification for ring since they have separate LICENSE files now
* List the tower-otel repo as allowed source while we sort out the OTel
deps
2025-04-24 13:02:31 +00:00
Arpad Müller
b43203928f Switch tenant snapshot subcommand to remote_storage (#11685)
Switches the tenant snapshot subcommand of the storage scrubber to
`remote_storage`. As this is the last piece of the storage scrubber
still using the S3 SDK, this finishes the project started in #7547.

This allows us to do tenant snapshots on Azure as well.

Builds on #11671
Fixes #8830
2025-04-24 12:22:07 +00:00
Arpad Müller
c35d489539 versioning API for remote_storage (#11671)
Adds a versioning API to remote_storage. We want to use it in the
scrubber, both for tenant snapshot as well as for metadata checks.

for #8830
and for #11588
2025-04-24 11:41:48 +00:00
Vlad Lazar
3a50d95b6d storage_controller: coordinate imports across shards in the storage controller (#11345)
## Problem

Pageservers notify control plane directly when a shard import has
completed.
Control plane has to download the status of each shard from S3 and
figure out if everything is truly done,
before proceeding with branch activation.

Issues with this approach are:
* We can't control shard split behaviour on the storage controller side.
It's unsafe to split
during import.
* Control plane needs to know about shards and implement logic to check
all timelines are indeed ready.

## Summary of changes

In short, storage controller coordinates imports, and, only when
everything is done, notifies control plane.

Big rocks:
1. Store timeline imports in the storage controller database. Each
import stores the status of its shards in the database.
We hook into the timeline creation call as our entry point for this.
2. Pageservers get a new upcall endpoint to notify the storage
controller of shard import updates.
3. Storage controller handles these updates by updating persisted state.
If an update finalizes the import,
then poll pageservers until timeline activation, and, then, notify the
control plane that the import is complete.

Cplane side change with new endpoint is in
https://github.com/neondatabase/cloud/pull/26166

Closes https://github.com/neondatabase/neon/issues/11566
2025-04-24 11:26:06 +00:00
Arpad Müller
d43b8e73ae Update sentry to 0.37 (#11686)
Update the sentry crate to 0.37. This deduplicates the `webpki-roots`
crate in our crate graph, and brings another dependency onto newer
rustls `0.23.18`.
2025-04-24 11:20:41 +00:00
devin-ai-integration[bot]
1808dad269 Add --dev CLI flag to pageserver and safekeeper binaries (#11526)
# Add --dev CLI flag to pageserver and safekeeper binaries

This PR adds the `--dev` CLI flag to both the pageserver and safekeeper
binaries without implementing any functionality yet. This is a precursor
to PR #11517, which will implement the full functionality to require
authentication by default unless the `--dev` flag is specified.

## Changes
- Add `dev_mode` config field to pageserver binary
- Add `--dev` CLI flag to safekeeper binary

This PR is needed for forward compatibility tests to work properly, when
we try to merge #11517

Link to Devin run:
https://app.devin.ai/sessions/ad8231b4e2be430398072b6fc4e85d46
Requested by: John Spray (john@neon.tech)

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: John Spray <john@neon.tech>
2025-04-24 10:45:40 +00:00
Folke Behrens
7ba8519b43 proxy: Update zerocopy to 0.8 (#11681)
Also add some macros that might result in more efficient code.
2025-04-24 09:39:08 +00:00
Christian Schwarz
f8100d66d5 ci: extend 'Wait for extension build to finish' timeout (#11689)
Refs
- https://neondb.slack.com/archives/C059ZC138NR/p1745427571307149
2025-04-24 08:15:08 +00:00
Christian Schwarz
51cdb570eb bench_ingest: general overhaul & add parametrization over virtual_file_io_mode (#11667)
Changes:
- clean up existing parametrization & criterion `BenchmarkId`
- additional parametrization over `virtual_file_io_mode`
- switch to `multi_thread` to be closer to production ([Slack
thread](https://neondb.slack.com/archives/C033RQ5SPDH/p1745339543093159))

Refs
- epic https://github.com/neondatabase/neon/issues/9868
- extracted from https://github.com/neondatabase/neon/pull/11558
2025-04-24 07:38:18 +00:00
devin-ai-integration[bot]
8e09ecf2ab Fix KeyError in physical replication benchmark test (#11675)
# Fix KeyError in physical replication benchmark test

This PR fixes the failing physical replication benchmark test that was
encountering a KeyError: 'endpoints'.

The issue was in accessing `project["project"]["endpoints"][0]["id"]`
when it should be `project["endpoints"][0]["id"]`, consistent with how
endpoints are accessed elsewhere in the codebase.

Fixed the issue in both test functions:
- test_ro_replica_lag
- test_replication_start_stop

Link to Devin run:
https://app.devin.ai/sessions/be3fe9a9ee5942e4b12e74a7055f541b
Requested by: Peter Bendel

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: peterbendel@neon.tech <peterbendel@neon.tech>
2025-04-23 14:51:08 +00:00
Mikhail Kot
c3534cea39 Rename object_storage->endpoint_storage (#11678)
1. Rename service to avoid ambiguity as discussed in Slack
2. Ignore endpoint_id in read paths as requested in
https://github.com/neondatabase/cloud/issues/26346#issuecomment-2806758224
2025-04-23 14:03:19 +00:00
Folke Behrens
21d3d60cef proxy/pglb: Add in-process connection support (#11677)
Define a `Connection` and a `Stream` type that resemble simple QUIC
connections
and (multiplexed) streams.
2025-04-23 12:18:30 +00:00
Tristan Partin
b00db536bb Add CPU architecture to the remote extensions object key (#11590)
ARM computes are incoming and we need to account for that in remote
extensions. Previously, we just blindly assumed that all computes were
x86_64.

Note that we use the Go architecture naming convention instead of the
Rust one directly to do our best and be consistent across the stack.

Part-of: https://github.com/neondatabase/cloud/issues/23148

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-22 22:47:22 +00:00
Arpad Müller
149cbd1e0a Support single and two safekeeper scenarios (#11483)
In tests and when one safekeeper is down in small regions, we need to
contend with one or two safekeepers. Before, we gave an error in
`safekeepers_for_new_timeline`. Now we just silently allow the timeline
to be created on one or two safekeepers.

Part of #9011
2025-04-22 21:27:01 +00:00
Alexander Lakhin
7b949daf13 fix(test): allow reconcile errors in test_storage_controller_heartbeats (#11665)
## Problem

test_storage_controller_heartbeats is flaky because of unallowed
reconciler errors (#11625)

## Summary of changes

Allow reconcile errors as in other tests in test_storage_controller.py.
2025-04-22 18:13:16 +00:00
Konstantin Knizhnik
132b6154bb Unlogged build debug compare local v2 (#11554)
## Problem

Init fork is used in DEBUG_COMPARE_LOCAL to determine unlogged relation
or unlogged build.
But it is created only after the relation is initialized and so can be
swapped out, producing `Page is evicted with zero LSN` error.

## Summary of changes

Create init fork together with main fork for unlogged relations in
DEBUG_COMPARE_LOCAL mode.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-22 18:07:45 +00:00
Alex Chi Z.
ad3519ebcb fix(pageserver): report synthetic size = 1 if all tls offloaded (#11648)
## Problem

A quick workaround for https://github.com/neondatabase/neon/issues/11631

## Summary of changes

Report synthetic size == 1 if all timelines are offloaded.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-22 14:28:22 +00:00
Dmitrii Kovalkov
6173c0f44c safekeeper: add enable_tls_wal_service_api (#11520)
## Problem
Safekeeper doesn't use TLS in wal service
- Closes: https://github.com/neondatabase/cloud/issues/27302

## Summary of changes
- Add `enable_tls_wal_service_api` option to safekeeper's cmd arguments
- Propagate `tls_server_config` to `wal_service` if the option is
enabled
- Create `BACKGROUND_RUNTIME` for small background tasks and offload SSL
certificate reloader to it.

No integration tests for now because support from compute side is
required: https://github.com/neondatabase/cloud/issues/25823
2025-04-22 13:19:03 +00:00
a-masterov
fd916abf25 Remove NOTICE messages, which can make the pg_repack regression test fail. (#11659)
## Problem
The pg_repack test can be flaky due to unpredictable `NOTICE` messages
about waiting for some processes.
E.g., 
```
 INFO: repacking table "public.issue3_2"
+NOTICE: Waiting for 1 transactions to finish. First PID: 427
```
## Summary of changes
The `client_min_messages` set to `warning` for the regression tests.
2025-04-22 11:43:45 +00:00
Alexander Bayandin
cd2e1fbc7c CI(benchmarks): upload perf results for passed tests (#11649)
## Problem

We run benchmarks in batches (five parallel jobs on different runners).
If any test in a batch fails, we won’t upload any results for that
batch, even for the tests that passed.

## Summary of changes
- Move the results upload to a separate step in the run-python-test-set
action, and execute this step even if tests fail.
2025-04-22 09:41:28 +00:00
github-actions[bot]
cae3e2976b Proxy release 2025-04-22 2025-04-22 06:02:06 +00:00
Tristan Partin
5df4a747e6 Update pgbouncer in compute images to 1.24.1 (#11651)
Fixes CVE-2025-2291.

Link:
https://www.postgresql.org/about/news/pgbouncer-1241-released-fixes-cve-2025-2291-3059/

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-21 17:49:17 +00:00
Vlad Lazar
cbf442292b pageserver: handle empty get vectored queries (#11652)
## Problem

If all batched requests are excluded from the query by
`Timeine::get_rel_page_at_lsn_batched` (e.g. because they are past the
end of the relation), the read path would panic since it doesn't expect
empty queries. This is a change in behaviour that was introduced with
the scattered query implementation.

## Summary of Changes

Handle empty queries explicitly.
2025-04-21 17:45:16 +00:00
Heikki Linnakangas
4d0c1e8b78 refactor: Extract some code in pagebench getpage command to function (#11563)
This makes it easier to add a different client implementation alongside
the current one. I started working on a new gRPC-based protocol to
replace the libpq protocol, which will introduce a new function like
`client_libpq`, but for the new protocol.

It's a little more readable with less indentation anyway.
2025-04-19 08:38:03 +00:00
JC Grünhage
3158442a59 fix(ci): set token for fast-forward failure comments and allow merging with state unstable (#11647)
## Problem

https://github.com/neondatabase/neon/actions/runs/14538136318/job/40790985693?pr=11645
failed, even though the relevant parts of the CI had passed and
auto-merge determined the PR is ready to merge. After that, commenting
failed.

## Summary of changes
- set GH_TOKEN for commenting after fast-forward failure
- allow merging with mergeable_state unstable
2025-04-18 17:49:34 +00:00
JC Grünhage
f006879fb7 fix(ci): make regex to find rc branches less strict (#11646)
## Problem

https://github.com/neondatabase/neon/actions/runs/14537161022/job/40787763965
failed to find the correct RC PR run, preventing artifact re-use. This
broke in https://github.com/neondatabase/neon/pull/11547.

There's a hotfix release containing this in
https://github.com/neondatabase/neon/pull/11645.

## Summary of changes
Make the regex for finding the RC PR run less strict, it was needlessly
precise.
2025-04-18 16:39:18 +00:00
Dmitrii Kovalkov
a0d844dfed pageserver + safekeeper: pass ssl ca certs to broker client (#11635)
## Problem
Pageservers and safakeepers do not pass CA certificates to broker
client, so the client do not trust locally issued certificates.
- Part of https://github.com/neondatabase/cloud/issues/27492

## Summary of changes
- Change `ssl_ca_certs` type in PS/SK's config to `Pem` which may be
converted to both `reqwest` and `tonic` certificates.
- Pass CA certificates to storage broker client in PS and SK
2025-04-18 06:27:23 +00:00
Alex Chi Z.
5073e46df4 feat(pageserver): use rfc3339 time and print ratio in gc-compact stats (#11638)
## Problem

follow-up on https://github.com/neondatabase/neon/pull/11601

## Summary of changes

- serialize the start/end time using rfc3339 time string
- compute the size ratio of the compaction

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-18 05:28:01 +00:00
Alexander Bayandin
182bd95a4e CI(regress-tests): run tests on large-metal (#11634)
## Problem

Regression tests are more flaky on virtualised (`qemu-x64-*`) runners

See https://neondb.slack.com/archives/C069Z2199DL/p1744891865307769
Ref https://github.com/neondatabase/neon/issues/11627

## Summary of changes
- Switch `regress-tests` to metal-only large runners to mitigate flaky
behaviour
2025-04-18 01:25:38 +00:00
Anastasia Lubennikova
ce7795a67d compute: use project_id, endpoint_id as tag (#11556)
for compute audit logs

part of https://github.com/neondatabase/cloud/issues/21955
2025-04-17 23:32:38 +00:00
Suhas Thalanki
134d01c771 remove pg_anon.patch (#11636)
This PR removes `pg_anon.patch` as the `anon` v1 extension has been
removed and the patch is not being used anywhere
2025-04-17 22:08:16 +00:00
Arpad Müller
c1e4befd56 Additional fixes and improvements to storcon safekeeper timelines (#11477)
This delivers some additional fixes and improvements to storcon managed
safekeeper timelines:

* use `i32::MAX` for the generation number of timeline deletion
* start the generation for new timelines at 1 instead of 0: this ensures
that the other components actually are generation enabled
* fix database operations we use for metrics
* use join in list_pending_ops to prevent the classical ORM issue where
one does many db queries
* use enums in `test_storcon_create_delete_sk_down`. we are adding a
second parameter, and having two bool parameters is weird.
* extend `test_storcon_create_delete_sk_down` with a test of whole
tenant deletion. this hasn't been tested before.
* remove some redundant logging contexts
* Don't require mutable access to the service lock for scheduling
pending ops in memory. In order to pull this off, create reconcilers
eagerly. The advantage is that we don't need mutable access to the
service lock that way any more.

Part of #9011

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2025-04-17 20:25:30 +00:00
a-masterov
6c2e5c044c random operations test (#10986)
## Problem
We need to test the stability of Neon.

## Summary of changes
The test runs random operations on a Neon project. It performs via the
Public API calls the following operations: `create a branch`, `delete a
branch`, `add a read-only endpoint`, `delete a read-only endpoint`,
`restore a branch to a random position in the past`. All the branches
and endpoints are loaded with `pgbench`.

---------

Co-authored-by: Peter Bendel <peterbendel@neon.tech>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-04-17 19:59:35 +00:00
Alex Chi Z.
748539b222 fix(pageserver): lower L0 compaction threshold (#11617)
## Problem

We saw OOMs due to L0 compaction happening simultaneously for all shards
of the same tenant right after the shard split.

## Summary of changes

Lower the threshold so that we compact fewer files.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-17 19:51:28 +00:00
Alex Chi Z.
ad0c5fdae7 fix(test): allow stale generation warnings in storcon (#11624)
## Problem

https://github.com/neondatabase/neon/pull/11531 did not fully fix the
problem because the warning is part of the storcon instead of
pageserver.

## Summary of changes

Allow stale generation error in storcon.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-17 16:12:24 +00:00
Christian Schwarz
2b041964b3 cover direct IO + concurrent IO in unit, regression & perf tests (#11585)
This mirrors the production config.

Thread that discusses the merits of this:
- https://neondb.slack.com/archives/C033RQ5SPDH/p1744742010740569

# Refs
- context
https://neondb.slack.com/archives/C04BLQ4LW7K/p1744724844844589?thread_ts=1744705831.014169&cid=C04BLQ4LW7K
- prep for https://github.com/neondatabase/neon/pull/11558 which adds
new io mode `direct-rw`

# Impact on CI turnaround time

Spot-checking impact on CI timings

- Baseline: [some recent main
commit](https://github.com/neondatabase/neon/actions/runs/14471549758/job/40587837475)
- Comparison: [this
commit](https://github.com/neondatabase/neon/actions/runs/14471945087/job/40589613274)
in this PR here

Impact on CI turnaround time

- Regression tests:
  - x64: very minor, sometimes better; likely in the noise
  - arm64: substantial  30min => 40min
- Benchmarks (x86 only I think): very minor; noise seems higher than
regress tests

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Alex Chi Z. <4198311+skyzh@users.noreply.github.com>
Co-authored-by: Peter Bendel <peterbendel@neon.tech>
Co-authored-by: Alex Chi Z <chi@neon.tech>
2025-04-17 15:53:10 +00:00
John Spray
d4c059a884 tests: use endpoint http wrapper to get auth (#11628)
## Problem

`test_compute_startup_simple` and `test_compute_ondemand_slru_startup`
are failing.

This test implicitly asserts that the metrics.json endpoint succeeds and
returns all expected metrics, but doesn't make it easy to see what went
wrong if it doesn't (e.g. in this failure
https://neon-github-public-dev.s3.amazonaws.com/reports/main/14513210240/index.html#suites/13d8e764c394daadbad415a08454c04e/b0f92a86b2ed309f/)

In this case, it was failing because of a missing auth token, because it
was using `requests` directly instead of using the endpoint http client
type.

## Summary of changes

- Use endpoint http wrapper to get raise_for_status & auth token
2025-04-17 15:03:23 +00:00
Folke Behrens
2c56c46d48 compute: Set max log level for local proxy sql_over_http mod to WARN (#11629)
neondatabase/cloud#27738
2025-04-17 14:38:19 +00:00
Tristan Partin
d1728a6bcd Remove old compatibility hack for remote extensions (#11620)
Control plane has long since been updated to send the right value.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-17 14:08:42 +00:00
John Spray
0a27973584 pageserver: rename Tenant to TenantShard (#11589)
## Problem

`Tenant` isn't really a whole tenant: it's just one shard of a tenant.

## Summary of changes

- Automated rename of Tenant to TenantShard
- Followup commit to change references in comments
2025-04-17 13:29:16 +00:00
Alexander Bayandin
07c2411f6b tests: remove mentions of ALLOW_*_COMPATIBILITY_BREAKAGE (#11618)
## Problem

There are mentions of `ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` and
`ALLOW_FORWARD_COMPATIBILITY_BREAKAGE`, but in reality, this mechanism
doesn't work, so let's remove it to avoid confusion.

The idea behind it was to allow some breaking changes by adding a
special label to a PR that would `xfail` the test. However, in practice,
this means we would need to carry this label through all subsequent PRs
until the release (and artifact regeneration). This approach isn't
really viable, as it increases the risk of missing a compatibility break
in another PR.

## Summary of changes
- Remove mentions and handling of
`ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` /
`ALLOW_FORWARD_COMPATIBILITY_BREAKAGE`
2025-04-17 10:03:21 +00:00
Alexander Bayandin
5819938c93 CI(pg-clients): fix workflow permissions (#11623)
## Problem

`pg-clients` can't start:

```
The workflow is not valid. .github/workflows/pg-clients.yml (Line: 44, Col: 3): Error calling workflow 'neondatabase/neon/.github/workflows/build-build-tools-image.yml@aa19f10e7e958fbe0e0641f2e8c5952ce3be44b3'. The nested job 'check-image' is requesting 'packages: read', but is only allowed 'packages: none'. .github/workflows/pg-clients.yml (Line: 44, Col: 3): Error calling workflow 'neondatabase/neon/.github/workflows/build-build-tools-image.yml@aa19f10e7e958fbe0e0641f2e8c5952ce3be44b3'. The nested job 'build-image' is requesting 'packages: write', but is only allowed 'packages: none'.
```

## Summary of changes
- Grant required `packages: write` permissions to the workflow
2025-04-17 08:54:23 +00:00
Konstantin Knizhnik
b7548de814 Disable autovacuum and increase limit for WS approximation (#11583)
## Problem

Test lfc working set approximation becomes flaky after recent changes in
prefetch.
May be it is caused by updating HLL in `lfc_write`, may be by some other
reasons.

## Summary of changes

1. Disable autovacuum in this test (as possible source of extra page
accesses).
2. Increase upper boundary for WS approximation from 12 to 20.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-17 05:07:45 +00:00
Tristan Partin
9794f386f4 Make Postgres 17 the default version (#11619)
This is mostly a documentation update, but a few updates with regard to
neon_local, pageserver, and tests.

17 is our default for users in production, so dropping references to 16
makes sense.

Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-16 23:23:37 +00:00
Tristan Partin
79083de61c Remove forward compatibility hacks related to compute_ctl auth (#11621)
These various hacks were needed for the forward compatibility tests.
Enough time has passed since the merge that these are no longer needed.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-16 23:14:24 +00:00
Folke Behrens
ec9079f483 Allow unwrap() in tests when clippy::unwrap_used is denied (#11616)
## Problem

The proxy denies using `unwrap()`s in regular code, but we want to use
it in test code
and so have to allow it for each test block.

## Summary of changes

Set `allow-unwrap-in-tests = true` in clippy.toml and remove all
exceptions.
2025-04-16 20:05:21 +00:00
Ivan Efremov
b9b25e13a0 feat(proxy): Return prefixed errors to testodrome (#11561)
Testodrome measures uptime based on the failed requests and errors. In
case of testodrome request we send back error based on the service. This
will help us distinguish error types in testodrome and rely on the
uptime SLI.
2025-04-16 19:03:23 +00:00
Alex Chi Z.
cf2e695f49 feat(pageserver): gc-compaction meta statistics (#11601)
## Problem

We currently only have gc-compaction statistics for each single
sub-compaction job.

## Summary of changes

Add meta statistics across all sub-compaction jobs scheduled.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-16 18:51:48 +00:00
Conrad Ludgate
fc233794f6 fix(proxy): make sure that sql-over-http is TLS aware (#11612)
I noticed that while auth-broker -> local-proxy is TLS aware, and TCP
proxy -> postgres is TLS aware, HTTP proxy -> postgres is not 😅
2025-04-16 18:37:17 +00:00
Tristan Partin
c002236145 Remove compute_ctl authorization bypass if testing feature was enable (#11596)
We want to exercise the authorization middleware in our regression
tests.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-16 17:54:51 +00:00
Erik Grinaker
4af0b9b387 pageserver: don't recompress images in ImageLayerInner::filter() (#11592)
## Problem

During shard ancestor compaction, we currently recompress all page
images as we move them into a new layer file. This is expensive and
unnecessary.

Resolves #11562.
Requires #11607.

## Summary of changes

Pass through compressed page images in `ImageLayerInner::filter()`.
2025-04-16 17:10:15 +00:00
Vlad Lazar
0e00faf528 tests: stability fixes for test_migration_to_cold_secondary (#11606)
1. Compute may generate WAL on shutdown. The test assumes that after
shutdown,
no further ingest happens. Tweak the compute shutdown to make the
assumption true.
2. Assertion of local layer count post cold migration is not right since
we may have downloaded
layers due to ingest. Remove it.

Closes https://github.com/neondatabase/neon/issues/11587
2025-04-16 16:31:23 +00:00
Anastasia Lubennikova
7747a9619f compute: fix copy-paste typo for neon GUC parameters check (#11610)
fix for commit
[5063151](5063151271)
2025-04-16 15:55:11 +00:00
Erik Grinaker
46100717ad pageserver: add VectoredBlob::raw_with_header (#11607)
## Problem

To avoid recompressing page images during layer filtering, we need
access to the raw header and data from vectored reads such that we can
pass them through to the target layer.

Touches #11562.

## Summary of changes

Adds `VectoredBlob::raw_with_header()` to return a raw view of the
header+data, and updates `read()` to track it.

Also adds `blob_io::Header` with header metadata and decode logic, to
reuse for tests and assertions. This isn't yet widely used.
2025-04-16 15:38:10 +00:00
Erik Grinaker
00eeff9b8d pageserver: add compaction_shard_ancestor to disable shard ancestor compaction (#11608)
## Problem

Splits of large tenants (several TB) can cause a huge amount of shard
ancestor compaction work, which can overload Pageservers.

Touches https://github.com/neondatabase/cloud/issues/22532.

## Summary of changes

Add a setting `compaction_shard_ancestor` (default `true`) to disable
shard ancestor compaction on a per-tenant basis.
2025-04-16 14:41:02 +00:00
Matthias van de Meent
2a46426157 Update neon GUCs with new default settings (#11595)
Staging and prod both have these settings configured like this, so let's
update this so we can eventually drop the overrides in prod.
2025-04-16 13:42:22 +00:00
Tristan Partin
edc11253b6 Fix neon_local public key parsing when create compute JWKS (#11602)
Finally figured out the right incantation. I had had this in my original
go, but due to some refactoring and apparently missed testing, I
committed a mistake. The reason this doesn't currently break anything is
that we bypass the authorization middleware when the "testing" cargo
feature is enabled.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-16 12:51:48 +00:00
Heikki Linnakangas
b4e26a6284 Set last-written LSN as part of smgr_end_unlogged_build() (#11584)
This way, the callers don't need to do it, reducing the footprint of
changes we've had to made to various index AM's build functions.
2025-04-16 12:34:18 +00:00
Vlad Lazar
96b46365e4 tests: attach final metrics to allure report (#11604)
## Problem

Metrics are saved in https://github.com/neondatabase/neon/pull/11559,
but the file is not matched by the attachment regex.

## Summary of changes

Make attachment regex match the metrics file.
2025-04-16 10:26:47 +00:00
Alex Chi Z.
aa19f10e7e fix(test): allow shutdown warning in preempt tests (#11600)
## Problem

test_gc_compaction_preempt is still flaky

## Summary of changes

- allow shutdown warning logs

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-15 21:50:28 +00:00
Konstantin Knizhnik
35170656fe Allocate WalProposerConn using TopMemoryAllocator (#11577)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1744659631698609
`WalProposerConn` is allocated using current memory context which life
time is not long enough.

## Summary of changes

Allocate `WalProposerConn`  using `TopMemoryContext`.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-15 19:13:12 +00:00
Tristan Partin
cd9ad75797 Remove compute_ctl authorization bypass on localhost (#11597)
For whatever reason, this never worked in production computes anyway.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-15 19:12:34 +00:00
Tristan Partin
eadb05f78e Teach neon_local to pass the Authorization header to compute_ctl (#11490)
This allows us to remove hacks in the compute_ctl authorization
middleware which allowed for bypasses of auth checks.

Fixes: https://github.com/neondatabase/neon/issues/11316

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-15 17:27:49 +00:00
Fedor Dikarev
c5115518e9 remove temp file from repo (#11586)
## Problem
In https://github.com/neondatabase/neon/pull/11409 we added temp file to
the repo.

## Summary of changes
Remove temp file from the repo.
2025-04-15 15:29:15 +00:00
Alex Chi Z.
931f8c4300 fix(pageserver): check if cancelled before waiting logical size (2/2) (#11575)
## Problem

close https://github.com/neondatabase/neon/issues/11486, proceeding
https://github.com/neondatabase/neon/pull/11531

## Summary of changes

This patch fixes the rest 50% of instability of
`test_create_churn_during_restart`. During tenant warmup, we'll request
logical size; however, if the startup gets cancelled, we won't be able
to spawn the initial logical size calculation task that sets the
`cancel_wait_for_background_loop_concurrency_limit_semaphore`.

Therefore, we check `cancelled` before proceeding to get
`cancel_wait_for_background_loop_concurrency_limit_semaphore`. There
will still be a race if the timeline shutdown happens after L5710 and
before L5711, but it should be enough to reduce the flakiness of the
test.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-15 15:16:16 +00:00
Alexander Bayandin
0f7c2cc382 CI(release): add time to RC PR branch names (#11547)
## Problem

We can't have more than one open release PR created on the same day (due
to non-unique enough branch names).

## Summary of changes
- Add time (hours and minutes) to RC PR branch names
- Also make sure we use UTC for releases
2025-04-15 15:08:05 +00:00
Erik Grinaker
983d56502b pageserver: reduce shard ancestor rewrite threshold to 30% (#11582)
## Problem

When doing power-of-two shard splits (i.e. 4 → 8 → 16), we end up
rewriting all layers since half of the pages will be local due to
striping. This causes a lot of resource usage when splitting large
tenants.

## Summary of changes

Drop the threshold of local/total pages to 30%, to reduce the amount of
layer rewrites after splits.
2025-04-15 14:26:29 +00:00
Erik Grinaker
bcef542d5b pageserver: don't rewrite invisible layers during ancestor compaction (#11580)
## Problem

Shard ancestor compaction can be very expensive following shard splits
of large tenants. We currently rewrite garbage layers after shard splits
as well, which can be a significant amount of data.

Touches https://github.com/neondatabase/cloud/issues/22532.

## Summary of changes

Don't rewrite invisible layers after shard splits.
2025-04-15 14:25:58 +00:00
a-masterov
e31455d936 Add the tests for the extensions pg_jsonschema and pg_session_jwt (#11323)
## Problem
`pg_jsonschema` and `pg_session_jwt` are not yet covered by tests
## Summary of changes
Added the tests for these extensions.
2025-04-15 14:06:01 +00:00
Alex Chi Z.
a4ea7d6194 fix(pageserver): gc-compaction verification false failure (#11564)
## Problem

https://github.com/neondatabase/neon/pull/11515 introduced a bug that
some key history cannot be verified.

If a key only exists above the horizon, the verification will fail for
its first occurrence because the history does not exist at that point.

As gc-compaction skips a key range whenever an error occurs, it might be
doing some wasted work in staging/prod now. But I'm not planning a
hotfix this week as the bug doesn't affect correctness/performance.

## Summary of changes

Allow keys with only above horizon history in the verification.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-15 13:58:32 +00:00
Alexander Bayandin
19bea5fd0c CI: do not wait for tests to trigger deploy job (#11548)
## Problem

There is too much delay between merging a PR into `main` and deploying
the changes to staging

## Summary of changes
- Trigger `deploy` job without waiting for `build-and-test-locally` job
2025-04-15 11:23:41 +00:00
a-masterov
5be94e28c4 Update the documentation of the cloud regress test (#11539)
## Problem
The information in the README.md contained errors, and some information
was missing.
## Summary of changes
Found errors are fixed, and new information is added.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-04-15 11:00:25 +00:00
Alexander Bayandin
63a106021a CI(allure-report-generate): Install allure to /tmp (#11579)
## Problem

The `/__w/neon/neon` directory is mounted from host to container and
persists between runs.
Sometimes the next workflow run fails to delete it:

```
Deleting the contents of '/__w/neon/neon'
Error: File was unable to be removed Error: EACCES: permission denied, rmdir '/__w/neon/neon/allure-2.32.2/bin'
```

## Summary of changes
- Download and install allure to `/tmp` which exists in container only

Ref https://github.com/neondatabase/cloud/issues/27186
2025-04-15 09:29:36 +00:00
Fedor Dikarev
9a6ace9bde introduce new runners: unit-perf and use them for benchmark jobs (#11409)
## Problem
Benchmarks results are inconsistent on existing small-metal runners

## Summary of changes
Introduce new `unit-perf` runners, and lets run benchmark on them.

The new hardware has slower, but consistent, CPU frequency - if run with
default governor schedutil.
Thus we needed to adjust some testcases' timeouts and add some retry
steps where hard-coded timeouts couldn't be increased without changing
the system under test.
-
[wait_for_last_record_lsn](6592d69a67/test_runner/fixtures/pageserver/utils.py (L193))
1000s -> 2000s
-
[test_branch_creation_many](https://github.com/neondatabase/neon/pull/11409/files#diff-2ebfe76f89004d563c7e53e3ca82462e1d85e92e6d5588e8e8f598bbe119e927)
1000s
-
[test_ingest_insert_bulk](https://github.com/neondatabase/neon/pull/11409/files#diff-e90e685be4a87053bc264a68740969e6a8872c8897b8b748d0e8c5f683a68d9f)
- with back throttling disabled compute becomes unresponsive for more
than 60 seconds (PG hard-coded client authentication connection timeout)
-
[test_sharded_ingest](https://github.com/neondatabase/neon/pull/11409/files#diff-e8d870165bd44acb9a6d8350f8640b301c1385a4108430b8d6d659b697e4a3f1)
600s -> 1200s

Right now there are only 2 runners of that class, and if we decide to go
with them, we have to check how much that type of runners we need, so
jobs not stuck with waiting for that type of runners available.

However we now decided to run those runners with governor performance
instead of schedutil.
This achieves almost same performance as previous runners but still
achieves consistent results for same commit

Related issue to activate performance governor on these runners
https://github.com/neondatabase/runner/pull/138

## Verification that it helps

### analyze runtimes on new runner for same commit

Table of runtimes for the same commit on different runners in
[run](https://github.com/neondatabase/neon/actions/runs/14417589789)

| Run | Benchmarks (1) | Benchmarks (2) |Benchmarks (3) |Benchmarks (4)
| Benchmarks (5) |
|--------|--------|---------|---------|---------|---------|
| 1 | 1950.37s | 6374.55s |  3646.15s |  4149.48s |  2330.22s | 
| 2 | - | 6369.27s |  3666.65s |  4162.42s |  2329.23s | 
| Delta % |  - |  0,07 %  | 0,5 %   |   0,3 % | 0,04 %   |
| with governor performance | 1519.57s |  4131.62s |  - | -  |  - |
| second run gov. perf. | 1513.62s |  4134.67s |  - | -  |  - |
| Delta % |  0,3 % |  0,07 %  |  -  |  - | -   |
| speedup gov. performance | 22 % |  35 % |  - | -  |  - |
| current desktop class hetzner runners (main) | 1487.10s | 3699.67s | -
| - | - |
| slower than desktop class | 2 % |  12 % |  - | -  |  - |


In summary, the runtimes for the same commit on this hardware varies
less than 1 %.

---------

Co-authored-by: BodoBolero <peterbendel@neon.tech>
2025-04-15 08:21:44 +00:00
Erik Grinaker
8c77ccfc01 pageserver: log total progress during shard ancestor compaction (#11565)
## Problem

Shard ancestor compaction doesn't currently log any global progress
information, only for the current batch.

## Summary of changes

Log the number of layers checked for eligibility this iteration, and the
total number of layers to check. This will indicate how far along the
total shard ancestor compaction has gotten for this iteration.
2025-04-15 07:25:09 +00:00
github-actions[bot]
51ecd1bb37 Proxy release 2025-04-15 2025-04-15 06:01:10 +00:00
Tristan Partin
cbd2fc2395 Clean up logs and error messages in compute_ctl authorize middleware (#11576)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-15 01:21:18 +00:00
Tristan Partin
028a191040 Continue with s/spec/config changes (#11574)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-14 21:18:21 +00:00
Vlad Lazar
8cce27bedb pageserver: add a randomized read path test (#11519)
## Problem

Every time we make changes to the read path to fix a bug or add a
feature,
we end up adding another incomprehensible test.

## Summary of changes

Add some generic infrastructure for generating a layer map from a type
spec
and use that for a read path test. The test is randomized but uses a
fixed seed
by default. A fuzzing mode is available for confidence building.

See [Notion
page](https://www.notion.so/neondatabase/Read-Path-Unit-Testing-Fuzzing-1d1f189e0047806c8e5cd37781b0a350?pvs=4)
for a diagram of the layer map
used.

Just for fun I tried removing [this
commit](9990199cb4)
from https://github.com/neondatabase/neon/pull/11494
and it caught the bug in the normal mode (no fuzzing required).
2025-04-14 15:31:32 +00:00
Vlad Lazar
90b706cd96 tests: save pageserver metrics at the end of the test (#11559)
## Problem

Sometimes it's useful to see the pageserver metrics after a test in
order to debug stuff.
For example, for https://github.com/neondatabase/neon/issues/11465 I'd
like to know
what the remote storage latencies are from the client.

## Summary of changes

When stopping the env, record the pageserver metrics into a file in the
pageserver's workdir.
2025-04-14 15:13:20 +00:00
Alex Chi Z.
057ce115de fix(test): allow stale generation errors (1/2) (#11531)
## Problem

Part of https://github.com/neondatabase/neon/issues/11486

## Summary of changes

50% of the test instability of `test_create_churn_during_restart` are
due to error message gets changed. Allow the new error message.

Still need to fix other errors due to failure to acquire semaphore in
this or the next patch.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-14 14:51:17 +00:00
Vlad Lazar
e85607eed8 tests: remove config tweak allowing old versions to start with a batching config (#11560)
## Problem

Pageservers now ignore unknown config fields, so this config tweaking is
no longer needed.

## Summary of changes

Get rid of the hack.

Closes https://github.com/neondatabase/neon/issues/11524
2025-04-14 14:42:35 +00:00
Tristan Partin
437071888e Fix logging in nightly physical replication benchmarks (#11541)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-14 13:57:33 +00:00
Vlad Lazar
148b3701cf pageserver: add metrics for get page batch breaking reasons (#11545)
## Problem

https://github.com/neondatabase/neon/pull/11494 changes the batching
logic, but we don't have a way to evaluate it.

## Summary of changes

This PR introduces a global and per timeline metric which tracks the
reason for
which a batch was broken.
2025-04-14 13:24:47 +00:00
Christian Schwarz
daebe50e19 refactor: plumb gate and cancellation down to to blob_io::BlobWriter (#11543)
In #10063 we will switch BlobWriter to use the owned buffers IO buffered
writer, which implements double-buffering by virtue of a background task
that performs the flushing.

That task's lifecylce must be contained within the Timeline lifecycle,
so, it must hold the timeline gate open and respect Timeline::cancel.

This PR does the noisy plumbing to reduce the #10063 diff.

Refs
- extracted from https://github.com/neondatabase/neon/pull/10063
- epic https://github.com/neondatabase/neon/issues/9868
2025-04-14 11:51:01 +00:00
Arpad Müller
e0ee6fbeff Remove deprecated --compute-hook-url storcon param (#11551)
We have already migrated the storage controller to
`--control-plane-url`, added in #11173. The new param was added to
support also safekeeper specific endpoints. See the docs changes in
#11195 for further details.

Part of #11163
2025-04-14 10:36:40 +00:00
Konstantin Knizhnik
307fa2ceb7 Remove unused n_synced variable from HandleSafekeeperResponse (#11553)
## Problem

clang produce warning about unused variable `n_synced` in
HandleSafekeeperResponse

## Summary of changes

Remove local variable.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-14 09:45:13 +00:00
Vlad Lazar
a338984dc7 pageserver: support keys at different LSNs in one get page batch (#11494)
## Problem

Get page batching stops when we encounter requests at different LSNs.
We are leaving batching factor on the table.

## Summary of changes

The goal is to support keys with different LSNs in a single batch and
still serve them with a single vectored get.
Important restriction: the same key at different LSNs is not supported
in one batch. Returning different key
versions is a much more intrusive change.

Firstly, the read path is changed to support "scattered" queries. This
is a conceptually simple step from
https://github.com/neondatabase/neon/pull/11463. Instead of initializing
the fringe for one keyspace,
we do it for multiple at different LSNs and let the logic already
present into the fringe handle selection.

Secondly, page service code is updated to support batching at different
LSNs. Eeach request parsed from the wire determines its effective
request LSN and keeps it in mem for the batcher toinspect. The batcher
allows keys at
different LSNs in one batch as long one key is not requested at
different LSNs.

I'd suggest doing the first pass commit by commit to get a feel for the
changes.

## Results

I used the batching test from [Christian's
PR](https://github.com/neondatabase/neon/pull/11391) which increases the
change of batch breaks. Looking at the logs I think the new code is at
the max batching factor for the workload (we
only break batches due to them being oversized or because the executor
is idle).

```
Main:
Reasons for stopping batching: {'LSN changed': 22843, 'of batch size': 33417}
test_throughput[release-pg16-50-pipelining_config0-30-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 14.6662

My branch:
Reasons for stopping batching: {'of batch size': 37024}
test_throughput[release-pg16-50-pipelining_config0-30-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 19.8333
```

Related: https://github.com/neondatabase/neon/issues/10765
2025-04-14 09:05:29 +00:00
Konstantin Knizhnik
8936a7abd8 Increase limit for worker processes for isolation test (#11504)
## Problem

See https://github.com/neondatabase/neon/issues/10652

Neon extension launches 2 BGW which reduce limit for parallel workers
and so affecting parallel_deadlock isolation test.

## Summary of changes

Increase `max_worker_processes` from default 8 to 16 for isolation test.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-12 18:09:12 +00:00
Conrad Ludgate
946e971df8 feat(proxy): add batching to cancellation queue processing (#10607)
Add batching to the redis queue, which allows us to clear it out quicker
should it slow down temporarily.
2025-04-12 09:16:22 +00:00
Dmitrii Kovalkov
d109bf8c1d neon_local: use ed25519 to gen local ssl certs (#11542)
## Problem
neon_local uses rsa to generate local SSL certs, which is slow
Follow-up on:
- https://github.com/neondatabase/neon/pull/11025#discussion_r1989453785
- https://github.com/neondatabase/neon/pull/11538

## Summary of changes
- Change key from rsa to ed25519 in neon_local
2025-04-11 17:49:15 +00:00
Alex Chi Z.
4f7b2cdd4f feat(pageserver): gc-compaction result verification (#11515)
## Problem

Part of #9114 

There was a debug-mode verification mode that verifies at every
retain_lsn. However, the code was tangled within the actual history
generation itself and it's hard to reason about correctness. This patch
adds a separate post-verification of the gc-compaction result that redos
logs at every retain_lsn and every record above the GC horizon. This
ensures that all key history we produce with gc-compaction is readable,
and if there're read errors after gc-compaction, it can only be
read-path errors instead of gc-compaction bugs.

## Summary of changes

* Add gc_compaction_verification flag, default to true.
* Implement a post-verification process.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-11 15:50:29 +00:00
Alex Chi Z.
66f56ddaec fix(pageserver): allow shutdown errors for gc compaction tests (#11530)
## Problem

`test_pageserver_compaction_preempt` is flaky.

## Summary of changes

Allow the shutdown errors.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-11 15:20:51 +00:00
Erik Grinaker
fd16caa7d0 pageserver: yield for L0 during ancestor compaction (#11536)
## Problem

Shard ancestor compaction does not yield for L0 compaction, potentially
starving it.

close https://github.com/neondatabase/neon/issues/11125

## Summary of changes

* Yield for L0 during shard ancestor compaction.
* Return `CompactionOutcome::Pending` when limited by `rewrite_max`, for
eager rescheduling.
2025-04-11 15:09:28 +00:00
Tristan Partin
ff5a527167 Consolidate compute_ctl configuration structures (#11514)
Previously, the structure of the spec file was just the compute spec.
However, the response from the control plane get spec request included
the compute spec and the compute_ctl config. This divergence was
hindering other work such as adding regression tests for compute_ctl
HTTP authorization.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-11 15:06:29 +00:00
Arpad Müller
c66444ea15 Add timeline_import http endpoint (#11484)
The added `timleine_import` endpoint allows us to migrate safekeeper
timelines from control plane managed to storcon managed.
 
Part of #9011
2025-04-11 14:10:27 +00:00
Arpad Müller
88f01c1ca1 Introduce WalIngestError (#11506)
Introduces a `WalIngestError` struct together with a
`WalIngestErrorKind` enum, to be used for walingest related failures and
errors.

* the enum captures backtraces, so we don't regress in comparison to
`anyhow::Error`s (backtraces might be a bit shorter if we use one of the
`anyhow::Error` wrappers)
* it explicitly lists most/all of the potential cases that can occur.

I've originally been inspired to do this in #11496, but it's a
longer-term TODO.
2025-04-11 14:08:46 +00:00
Erik Grinaker
a6937a3281 pageserver: improve shard ancestor compaction logging (#11535)
## Problem

Shard ancestor compaction always logs "starting shard ancestor
compaction", even if there is no work to do. This is very spammy (every
20 seconds for every shard). It also has limited progress logging.

## Summary of changes

* Only log "starting shard ancestor compaction" when there's work to do.
* Include details about the amount of work.
* Log progress messages for each layer, and when waiting for uploads.
* Log when compaction is completed, with elapsed duration and whether
there is more work for a later iteration.
2025-04-11 12:14:08 +00:00
Erik Grinaker
3c8565a194 test_runner: propagate config via attach_hook for test fix (#11529)
## Problem

The `pagebench` benchmarks set up an initial dataset by creating a
template tenant, copying the remote storage to a bunch of new tenants,
and attaching them to Pageservers.

In #11420, we found that
`test_pageserver_characterize_throughput_with_n_tenants` had degraded
performance because it set a custom tenant config in Pageservers that
was then replaced with the default tenant config by the storage
controller.

The initial fix was to register the tenants directly in the storage
controller, but this created the tenants with generation 1. This broke
`test_basebackup_with_high_slru_count`, where the template tenant was at
generation 2, leading to all layer files at generation 2 being ignored.

Resolves #11485.
Touches #11381.

## Summary of changes

This patch addresses both test issues by modifying `attach_hook` to also
take a custom tenant config. This allows attaching tenants to
Pageservers from pre-existing remote storage, specifying both the
generation and tenant config when registering them in the storage
controller.
2025-04-11 11:31:12 +00:00
Christian Schwarz
979fa0682b tests: update batching perf test workload to include scattered LSNs (#11391)
The batching perf test workload is currently read-only sequential scans.
However, realistic workloads have concurrent writes (to other pages)
going on.

This PR simulates concurrent writes to other pages by emitting logical
replication messages.

These degrade the achieved batching factor, for the reason see
- https://github.com/neondatabase/neon/issues/10765

PR 
- https://github.com/neondatabase/neon/pull/11494

will fix this problem and get batching factor back up.

---------

Co-authored-by: Vlad Lazar <vlad@neon.tech>
2025-04-11 09:55:49 +00:00
Christian Schwarz
8884865bca tests: make test_pageserver_getpage_throttle less flaky (#11482)
# Refs

- fixes https://github.com/neondatabase/neon/issues/11395

# Problem

Since 2025-03-10, we have observed increased flakiness of
`test_pageserver_getpage_throttle`.

The test is timing-dependent by nature, and was hitting the

```
 assert duration_secs >= 10 * actual_smgr_query_seconds, (
        "smgr metrics should not include throttle wait time"
    )
```

quite frequently.

# Analysis

These failures are not reproducible.

In this PR's history is a commit that reran the test 100 times without
requiring a single retry.

In https://github.com/neondatabase/neon/issues/11395 there is a link to
a query to the test results database.
It shows that the flakiness was not constant, but rather episodic:
2025-03-{10,11,12,13} 2025-03-{19,20,21} 2025-03-31 and 2025-04-01.

To me, this suggests variability in available CPU.

# Solution

The point of the offending assertion is to ensure that most of the
request latency is spent on throttling, because testing of the
throttling mechanism is the point of the test.
The `10` magic number means at most 10% of mean latency may be spent on
request processing.

Ideally we would control the passage of time (virtual clock source) to
make this test deterministic.

But I don't see that happening in our regression test setup.

So, this PR de-flakes the test as follows:
- allot up to 66% of mean latency for request processing
- increase duration from 10s to 20s, hoping to get better protection
from momentary CPU spikes in noisy neighbor tests or VMs on the runner
host

As a drive-by, switch to `pytest.approx` and remove one self-test
assertion I can't make sense of anymore.
2025-04-11 09:38:05 +00:00
Dmitrii Kovalkov
4c4e33bc2e storage: add http/https server and cert resover metrics (#11450)
## Problem
We need to export some metrics about certs/connections to configure
alerts and make sure that all HTTP requests are gone before turning
https-only mode on.
- Closes: https://github.com/neondatabase/cloud/issues/25526

## Summary of changes
- Add started connection and connection error metrics to http/https
Server.
- Add certificate expiration time and reload metrics to
ReloadingCertificateResolver.
2025-04-11 06:11:35 +00:00
Tristan Partin
342607473a Make Endpoint::respec_deep() infinitely deep (#11527)
Because it wasn't recursive, there was a limit to the depth of updates.
This work is necessary because as we teach neon_local and compute_ctl
that the content in --spec-path should match a similar structure we get
from the control plane, the spec object itself will no longer be
toplevel. It will be under the "spec" key.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-10 19:55:51 +00:00
John Spray
9c37bfc90a pageserver/tests: make image_layer_rewrite write less data (#11525)
## Problem

This test is slow to execute, particularly if you're on a slow
environment like vscode in a browser. Might have got much slower when we
switched to direct IO?

## Summary of changes

- Reduce the scale of the test by 10x, since there was nothing special
about the original size.
2025-04-10 17:03:22 +00:00
John Spray
52dee408dc storage controller: improve safety of shard splits coinciding with controller restarts (#11412)
## Problem

The graceful leadership transfer process involves calling step_down on
the old controller, but this was not waiting for shard splits to
complete, and the new controller could therefore end up trying to abort
a shard split while it was still going on.

We mitigated this already in #11256 by avoiding the case where shard
split completion would update the database incorrectly, but this was a
fragile fix because it assumes that is the only problematic part of the
split running concurrently.

Precursors:
- #11290 
- #11256

Closes: #11254 

## Summary of changes

- Hold the reconciler gate from shard splits, so that step_down will
wait for them. Splits should always be fairly prompt, so it is okay to
wait here.
- Defense in depth: if step_down times out (hardcoded 10 second limit),
then fully terminate the controller process rather than letting it
continue running, potentially doing split-brainy things. This makes
sense because the new controller will always declare itself leader
unilaterally if step_down fails, so leaving an old controller running is
not beneficial.
- Tests: extend
`test_storage_controller_leadership_transfer_during_split` to separately
exercise the case of a split holding up step_down, and the case where
the overall timeout on step_down is hit and the controller terminates.
2025-04-10 16:55:37 +00:00
Anastasia Lubennikova
5487a20b72 compute: Set log_parameter=off for audit logging. (#11500)
Log -> Base,
pgaudit.log = 'ddl', pgaudit.log_parameter='off'

Hipaa -> Extended.
pgaudit.log = 'all, -misc', pgaudit.log_parameter='off'
    
add new level Full:
pgaudit.log='all', pgaudit.log_parameter='on'

Keep old parameter names for compatibility,
until cplane side changes are implemented and released.

closes https://github.com/neondatabase/cloud/issues/27202
2025-04-10 15:28:28 +00:00
Alex Chi Z.
f06d721a98 test(pageserver): ensure gc-compaction does not fire critical errors (#11513)
## Problem

Part of https://github.com/neondatabase/neon/issues/10395

## Summary of changes

Add a test case to ensure gc-compaction doesn't fire any critical errors
if the key history is invalid due to partial GC.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-10 14:53:37 +00:00
Christian Schwarz
2e35f23085 tests: remove ignored fair field (#11521)
Pageserver has been ignoring field
`tenant_config.timeline_get_throttle.fair`
for many monhts, since we removed it from the config struct in
neondatabase/neon#8539.

Refs
- epic https://github.com/neondatabase/cloud/issues/27320
2025-04-10 14:24:30 +00:00
Anastasia Lubennikova
5063151271 compute: Add more neon ids to compute (#11366)
Pass more neon ids to compute_ctl.
Expose them to postgres as neon extension GUCs:
neon.project_id, neon.branch_id, neon.endpoint_id.


This is the compute side PR, not yet supported by cplane.
2025-04-10 13:04:18 +00:00
Erik Grinaker
0122d97f95 test_runner: only use last gen in test_location_conf_churn (#11511)
## Problem

`test_location_conf_churn` performs random location updates on
Pageservers. While doing this, it could instruct the compute to connect
to a stale generation and execute queries. This is invalid, and will
fail if a newer generation has removed layer files used by the stale
generation.

Resolves #11348.

## Summary of changes

Only connect to the latest generation when executing queries.
2025-04-10 10:07:16 +00:00
Arseny Sher
fae7528adb walproposer: make it aware of membership (#11407)
## Problem

Walproposer should get elected and commit WAL on safekeepers specified
by the membership configuration.

## Summary of changes

- Add to wp `members_safekeepers` and `new_members_safekeepers` arrays
mapping configuration members to connection slots. Establish this
mapping (by node id) when safekeeper sends greeting, giving its id and
when mconf becomes known / changes.
- Add to TermsCollected, VotesCollected,
GetAcknowledgedByQuorumWALPosition membership aware logic. Currently it
partially duplicates existing one, but we'll drop the latter eventually.
- In python, rename Configuration to MembershipConfiguration for
clarity.
- Add test_quorum_sanity testing new logic.

ref https://github.com/neondatabase/neon/issues/10851
2025-04-10 09:55:37 +00:00
Dmitrii Kovalkov
8a72e6f888 pageserver: add enable_tls_page_service_api (#11508)
## Problem
Page service doesn't use TLS for incoming requests.
- Closes: https://github.com/neondatabase/cloud/issues/27236

## Summary of changes
- Add option `enable_tls_page_service_api` to pageserver config
- Propagate `tls_server_config` to `page_service` if the option is
enabled

No integration tests for now because I didn't find out how to call page
service API from python and AFAIK computes don't support TLS yet
2025-04-10 08:45:17 +00:00
Tristan Partin
a04e33ceb6 Remove --spec-json argument from compute_ctl (#11510)
It isn't used by the production control plane or neon_local. The removal
simplifies compute spec logic just a little bit more since we can remove
any notion of whether we should allow live reconfigurations.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-09 22:39:54 +00:00
Alex Chi Z.
af0be11503 fix(pageserver): ensure gc-compaction gets preempted by L0 (#11512)
## Problem

Part of #9114 

## Summary of changes

Gc-compaction flag was not correctly set, causing it not getting
preempted by L0.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-09 21:41:11 +00:00
Alex Chi Z.
405a17bf0b fix(pageserver): ensure gc-compaction gets preempted by L0 (#11512)
## Problem

Part of #9114 

## Summary of changes

Gc-compaction flag was not correctly set, causing it not getting
preempted by L0.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-09 20:57:50 +00:00
Erik Grinaker
63ee8e2181 test_runner: ignore .___temp files in evict_random_layers (#11509)
## Problem

`test_location_conf_churn` often fails with `neither image nor delta
layer`, but doesn't say what the file actually is. However, past local
failures have indicated that it might be `.___temp` files.

Touches https://github.com/neondatabase/neon/issues/11348.

## Summary of changes

Ignore `.___temp` files when evicting local layers, and include the file
name in the error message.
2025-04-09 19:03:49 +00:00
Alex Chi Z.
2c21a65b0b feat(pageserver): add gc-compaction time-to-first-item stats (#11475)
## Problem

In some cases gc-compaction doesn't respond to the L0 compaction yield
notifier. I suspect it's stuck on getting the first item, and if so, we
probably need to let L0 yield notifier preempt `next_with_trace`.

## Summary of changes

- Add `time_to_first_kv_pair` to gc-compaction statistics.
- Inverse the ratio so that smaller ratio -> better compaction ratio.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-09 18:07:58 +00:00
Alex Chi Z.
ec66b788e2 fix(pageserver): use different walredo retry setting for gc-compaction (#11497)
## Problem

Not a complete fix for https://github.com/neondatabase/neon/issues/11492
but should work for a short term.

Our current retry strategy for walredo is to retry every request exactly
once. This retry doesn't make sense because it retries all requests
exactly once and each error is expected to cause process restart and
cause future requests to fail. I'll explain it with a scenario of two
threads requesting redos: one with an invalid history (that will cause
walredo to panic) and another that has a correct redo sequence.

First let's look at how we handle retries right now in
do_with_walredo_process. At the beginning of the function it will spawn
a new process if there's no existing one. Then it will continue to redo.
If the process fails, the first process that encounters the error will
remove the walredo process object from the OnceCell, so that the next
time it gets accessed, a new process will be spawned; if it is the last
one that uses the old walredo process, it will kill and wait the process
in `drop(proc)`. I'm skeptical whether this works under races but I
think this is not the root cause of the problem. In this retry handler,
if there are N requests attached to a walredo process and the i-th
request fails (panics the walredo), all other N-i requests will fail and
they need to retry so that they can access a new walredo process.

```
time       ---->
proc        A                 None   B
request 1   ^-----------------^ fail
            uses A for redo   replace with None
request 2      ^-------------------- fail
               uses A for redo
request 3             ^----------------^ fail
                      uses A for redo  last ref, wait for A to be killed
request 4                            ^---------------
                                     None, spawn new process B
```

The problem is with our retry strategy. Normally, for a system that we
want to retry on, the probability of errors for each of the requests are
uncorrelated. However, in walredo, a prior request that panics the
walredo process will cause all future walredo on that process to fail
(that's correlated).

So, back to the situation where we have 2 requests where one will
definitely fail and the other will succeed and we get the following
sequence, where retry attempts = 1,

* new walredo process A starts.
* request 1 (invalid) being processed on A and panics A, waiting for
retry, remove process A from the process object.
* request 2 (valid) being processed on A and receives pipe broken /
poisoned process error, waiting for retry, wait for A to be killed --
this very likely takes a while and cannot finish before request 1 gets
processed again
* new walredo process B starts.
* request 1 (invalid) being processed again on B and panics B, the whole
request fail.
* request 2 (valid) being processed again on B, and get a poisoned error
again.

```
time       ---->
proc        A                 None           B                    None
request 1   ^-----------------^--------------^--------------------^
            spawn A for redo  fail          spawn B for redo     fail
request 2      ^--------------------^-------------------------^------------^
               use A for redo       fail, wait to kill A      B for redo   fail again
```

In such cases, no matter how we set n_attempts, as long as the retry
count applies to all requests, this sequence is bound to fail both
requests because of how they get sequenced; while we could potentially
make request 2 successful.

There are many solutions to this -- like having a separate walredo
manager for compactions, or define which errors are retryable (i.e.,
broken pipe can be retried, while real walredo error won't be retried),
or having a exclusive big lock over the whole redo process (the current
one is very fine-grained). In this patch, we go with a simple approach:
use different retry attempts for different types of requests.

For gc-compaction, the attempt count is set to 0, so that it never
retries and consequently stops the compaction process -- no more redo
will be issued from gc-compaction. Once the walredo process gets
restarted, the normal read requests will proceed normally.

## Summary of changes

Add redo_attempt for each reconstruct value request to set different
retry policies.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-04-09 18:01:31 +00:00
Peter Bendel
af12647b9d large tenant oltp benchmark: reindex with downtime (remove concurrently) (#11498)
## Problem

our large oltp benchmark runs very long - we want to remove the duration
of the reindex step.
we don't run concurrent workload anyhow but added "concurrently" only to
have a "prod-like" approach. But if it just doubles the time we report
because it requires two instead of one full table scan we can remove it

## Summary of changes

remove keyword concurrently from the reindex step
2025-04-09 17:11:00 +00:00
Tristan Partin
1c237d0c6d Move compute_ctl claims struct into public API (#11505)
This is preparatory work for teaching neon_local to pass the
Authorization header to compute_ctl.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-09 16:58:44 +00:00
Tristan Partin
afd34291ca Make neon_local token generation generic over claims (#11507)
Instead of encoding a certain structure for claims, let's allow the
caller to specify what claims be encoded.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-09 16:41:29 +00:00
Vlad Lazar
66f80e77ba tests/performance: reconcile until idle before benchmark (#11435)
We'd like to run benchmarks starting from a steady state. To this end,
do a reconciliation round before proceeding with the benchmark.

This is useful for benchmarks that use tenant dir snapshots since a
non-standard tenant configuration is used to generate the snapshot. The
storage controller is not aware of the non default tenant configuration
and will reconcile while the bench is running.
2025-04-09 16:32:19 +00:00
Conrad Ludgate
72832b3214 chore: fix clippy lints from nightly-2025-03-16 (#11273)
I like to run nightly clippy every so often to make our future rust
upgrades easier. Some notable changes:

* Prefer `next_back()` over `last()`. Generic iterators will implement
`last()` to run forward through the iterator until the end.

* Prefer `io::Error::other()`.

* Use implicit returns

One case where I haven't dealt with the issues is the now
[more-sensitive "large enum variant"
lint](https://github.com/rust-lang/rust-clippy/pull/13833). I chose not
to take any decisions around it here, and simply marked them as allow
for now.
2025-04-09 15:04:42 +00:00
Vlad Lazar
d11f23a341 pageserver: refactor read path for multi LSN batching support (#11463)
## Problem

We wish to improve pageserver batching such that one batch can contain
requests for
pages at different LSNs. The current shape of the code doesn't lend
itself to the change.

## Summary of changes

Refactor the read path such that the fringe gets initialized upfront.
This is where the multi LSN
change will plug in. A couple other small changes fell out of this.

There should be NO behaviour change here. If you smell one, shout!

I recommend reviewing commits individually (intentionally made them as
small as possible).

Related: https://github.com/neondatabase/neon/issues/10765
2025-04-09 13:17:02 +00:00
Dmitrii Kovalkov
e7502a3d63 pageserver: return 412 PreconditionFailed in get_timestamp_of_lsn if timestamp is not found (#11491)
## Problem
Now `get_timestamp_of_lsn` returns `404 NotFound` if there is no clog
pages for given LSN, and it's difficult to distinguish from other 404
errors. A separate status code for this error will allow the control
plane to handle this case.
- Closes: https://github.com/neondatabase/neon/issues/11439
- Corresponding PR in control plane:
https://github.com/neondatabase/cloud/pull/27125

## Summary of changes
- Return `412 PreconditionFailed` instead of `404 NotFound` if no
timestamp is fond for given LSN.

I looked briefly through the current error handling code in cloud.git
and the status code change should not affect anything for the existing
code. Change from the corresponding PR also looks fine and should work
with the current PS status code. Additionally, here is OK to merge it
from control plane team:
https://github.com/neondatabase/neon/issues/11439#issuecomment-2789327552

---------

Co-authored-by: John Spray <john@neon.tech>
2025-04-09 13:16:15 +00:00
Heikki Linnakangas
ef8101a9be refactor: Split "communicator" routines to a separate source file (#11459)
pagestore_smgr.c had grown pretty large. Split into two parts, such
that the smgr routines that PostgreSQL code calls stays in
pagestore_smgr.c, and all the prefetching logic and other lower-level
routines related to communicating with the pageserver are moved to a
new source file, "communicator.c".

There are plans to replace communicator parts with a new
implementation. See https://github.com/neondatabase/neon/pull/10799.
This commit doesn't implement any of the new things yet, but it is
good preparation for it. I'm imagining that the new implementation
will approximately replace the current "communicator.c" code, exposing
roughly the same functions to pagestore_smgr.c.

This commit doesn't change any functionality or behavior, or make any
other changes to the existing code: It just moves existing code
around.
2025-04-09 12:28:59 +00:00
Arpad Müller
d2825e72ad Add is_stopping check around critical macro in walreceiver (#11496)
The timeline stopping state is set much earlier than the cancellation
token is fired, so by checking for the stopping state, we can prevent
races with timeline shutdown where we issue a cancellation error but the
cancellation token hasn't been fired yet.

Fix #11427.
2025-04-09 12:17:45 +00:00
Erik Grinaker
a6ff8ec3d4 storcon: change default stripe size to 16 MB (#11168)
## Problem

The current stripe size of 256 MB is a bit large, and can cause load
imbalances across shards. A stripe size of 16 MB appears more reasonable
to avoid hotspots, although we don't see evidence of this in benchmarks.

Resolves https://github.com/neondatabase/cloud/issues/25634.
Touches https://github.com/neondatabase/cloud/issues/21870.

## Summary of changes

* Change the default stripe size to 16 MB.
* Remove `ShardParameters::DEFAULT_STRIPE_SIZE`, and only use
`pageserver_api::shard::DEFAULT_STRIPE_SIZE`.
* Update a bunch of tests that assumed a certain stripe size.
2025-04-09 08:41:38 +00:00
Dmitrii Kovalkov
cf62017a5b storcon: add https metrics for pageservers/safekeepers (#11460)
## Problem
Storcon will not start up if `use_https` is on and there are some
pageservers or safekeepers without https port in the database. Metrics
"how many nodes with https we have in DB" will help us to make sure that
`use_https` may be turned on safely.
- Part of https://github.com/neondatabase/cloud/issues/25526

## Summary of changes
- Add `storage_controller_https_pageserver_nodes`,
`storage_controller_safekeeper_nodes` and
`storage_controller_https_safekeeper_nodes` Prometheus metrics.
2025-04-09 08:33:49 +00:00
Erik Grinaker
c610f3584d test_runner: tweak test_create_snapshot compaction (#11495)
## Problem

With the recent improvements to L0 compaction responsiveness,
`test_create_snapshot` now ends up generating 10,000 layer files
(compared to 1,000 in previous snapshots). This increases the snapshot
size by 4x, and significantly slows down tests.

## Summary of changes

Increase the target layer size from 128 KB to 256 KB, and the L0
compaction threshold from 1 to 5. This reduces the layer count from
about 10,000 to 1,000.
2025-04-09 06:52:49 +00:00
Konstantin Knizhnik
c9ca8b7c4a One more fix for unlogged build support in DEBUG_COMPARE_LOCAL (#11474)
## Problem

Support of unlogged build in DEBUG_COMPARE_LOCAL.
Neon SMGR treats present of local file as indicator of unlogged
relations.
But it doesn't work in  DEBUG_COMPARE_LOCAL mode.

## Summary of changes

Use INIT_FORKNUM as indicator of unlogged file and create this file
while unlogged index build.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-09 05:14:29 +00:00
Erik Grinaker
7679b63a2c pageserver: persist stripe size in tenant manifest for tenant_import (#11181)
## Problem

`tenant_import`, used to import an existing tenant from remote storage
into a storage controller for support and debugging, assumed
`DEFAULT_STRIPE_SIZE` since this can't be recovered from remote storage.
In #11168, we are changing the stripe size, which will break
`tenant_import`.

Resolves #11175.

## Summary of changes

* Add `stripe_size` to the tenant manifest.
* Add `TenantScanRemoteStorageShard::stripe_size` and return from
`tenant_scan_remote` if present.
* Recover the stripe size during`tenant_import`, or fall back to 32768
(the original default stripe size).
* Add tenant manifest compatibility snapshot:
`2025-04-08-pgv17-tenant-manifest-v1.tar.zst`

There are no cross-version concerns here, since unknown fields are
ignored during deserialization where relevant.
2025-04-08 20:43:27 +00:00
Erik Grinaker
d177654e5f gitignore: add /artifact_cache (#11493)
## Problem

This is generated e.g. by `test_historic_storage_formats`, and causes
VSCode to list all the contained files as new.

## Summary of changes

Add `/artifact_cache` to `.gitignore`.
2025-04-08 16:57:10 +00:00
Alex Chi Z.
a09c933de3 test(pageserver): add conditional append test record (#11476)
## Problem

For future gc-compaction tests when we support
https://github.com/neondatabase/neon/issues/10395

## Summary of changes

Add a new type of neon test WAL record that is conditionally applied
(i.e., only when image == the specified value). We can use this to mock
the situation where we lose some records in the middle, firing an error,
and see how gc-compaction reacts to it.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-08 16:08:44 +00:00
Mikhail Kot
6138d61592 Object storage proxy (#11357)
Service targeted for storing and retrieving LFC prewarm data.
Can be used for proxying S3 access for Postgres extensions like
pg_mooncake as well.

Requests must include a Bearer JWT token.
Token is validated using a pemfile (should be passed in infra/).

Note: app is not tolerant to extra trailing slashes, see app.rs
`delete_prefix` test for comments.

Resolves: https://github.com/neondatabase/cloud/issues/26342
Unrelated changes: gate a `rename_noreplace` feature and disable it in
`remote_storage` so as `object_storage` can be built with musl
2025-04-08 14:54:53 +00:00
Roman Zaynetdinov
a7142f3bc6 Configure rsyslog for logs export using the spec (#11338)
- Work on https://github.com/neondatabase/cloud/issues/24896
- Cplane part https://github.com/neondatabase/cloud/pull/26808

Instead of reconfiguring rsyslog via an API endpoint [we have
agreed](https://neondb.slack.com/archives/C04DGM6SMTM/p1743513810964509?thread_ts=1743170369.865859&cid=C04DGM6SMTM)
to have a new `logs_export_host` field as part of the compute spec.

---------

Co-authored-by: Tristan Partin <tristan@neon.tech>
2025-04-08 14:03:09 +00:00
Dmitrii Kovalkov
7791a49dd4 fix(tests): improve test_scrubber_tenant_snapshot stability (#11471)
## Problem
`test_scrubber_tenant_snapshot` is flaky with `request was dropped`
errors. More details are in the issue.
- Closes: https://github.com/neondatabase/neon/issues/11278

## Summary of changes
- Disable shard scheduling during pageservers restart
- Add `reconcile_until_idle` in the end of the test
2025-04-08 10:03:38 +00:00
dependabot[bot]
8a6d0dccaa build(deps): bump tokio from 1.38.0 to 1.38.2 in /test_runner/pg_clients/rust/tokio-postgres in the cargo group across 1 directory (#11478)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-08 10:01:15 +00:00
Heikki Linnakangas
7ffcbfde9a refactor: Move LFC function prototypes to separate header file (#11458)
Also, move the call to the lfc_init() function. It was weird to have it
in libpagestore.c, when libpagestore.c otherwise had nothing to do with
the LFC. Move it directly into _PG_init()
2025-04-08 09:03:56 +00:00
github-actions[bot]
1e6bb48076 Proxy release 2025-04-08 2025-04-08 06:01:37 +00:00
Konstantin Knizhnik
b2a0b2e9dd Skip hole tags in local_cache view (#11454)
## Problem

If the local file cache is shrunk, so that we punch some holes in the
underlying file, the local_cache view displays the holes incorrectly.
See https://github.com/neondatabase/neon/issues/10770

## Summary of changes

Skip hole tags in the local_cache view.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-08 03:52:50 +00:00
Alex Chi Z.
0875dacce0 fix(pageserver): more aggressively yield in gc-compaction, degrade errors to warnings (#11469)
## Problem

Fix various small issues discovered during gc-compaction rollout.

## Summary of changes

- Log level changes: if errors are from gc-compaction, fire a warning
instead of errors or critical errors.
- Yield to L0 compaction more aggressively. Instead of checking every 1k
keys, we check on every key. Sometimes a single key reconstruct takes a
long time.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-07 21:19:06 +00:00
Erik Grinaker
99d8788756 pageserver: improve tenant manifest lifecycle (#11328)
## Problem

Currently, the tenant manifest is only uploaded if there are offloaded
timelines. The checks are also a bit loose (e.g. only checks number of
offloaded timelines). We want to start using the manifest for other
things too (e.g. stripe size).

Resolves #11271.

## Summary of changes

This patch ensures that a tenant manifest always exists. The lifecycle
is:

* During preload, fetch the existing manifest, if any.
* During attach, upload a tenant manifest if it differs from the
preloaded one (or does not exist).
* Upload a new manifest as needed, if it differs from the last-known
manifest (ignoring version number).
* On splits, pre-populate the manifest from the parent.
* During Pageserver physical GC, remove old manifests but keep the
latest 2 generations.

This will cause nearly all existing tenants to upload a new tenant
manifest on their first attach after this change. Attaches are
concurrency-limited in the storage controller, so we expect this will be
fine.

Also updates `make_broken` to automatically log at `INFO` level when the
tenant has been cancelled, to avoid spurious error logs during shutdown.
2025-04-07 19:10:36 +00:00
Erik Grinaker
26c5c7e942 pageserver: set Stopping state on attach cancellation (#11462)
## Problem

If a tenant is cancelled (e.g. due to Pageserver shutdown) during
attach, it is set to `Broken`. This results both in error log spam and
500 responses for clients -- shutdown is supposed to return 503
responses which can be retried.

This becomes more likely to happen with #11328, where we perform tenant
manifest downloads/uploads during attach.

## Summary of changes

Set tenant state to `Stopping` when attach fails and the tenant is
cancelled, downgrading the log messages to INFO. This introduces two
variants of `Stopping` -- with and without a caller barrier -- where the
latter is used to signal attach cancellation.
2025-04-07 17:56:56 +00:00
Arpad Müller
8a2b19f467 Allow potential warning in test_storcon_create_delete_sk_down (#11466)
Since merging #11400 and addition of
`test_storcon_create_delete_sk_down`, we've seen an error occur multiple
times.

https://github.com/neondatabase/neon/pull/11400#issuecomment-2782528369
2025-04-07 16:52:54 +00:00
Arpad Müller
486872dd28 Add support to specify auth token via --auth-token-path (#11443)
Before we specified the JWT via `SAFEKEEPER_AUTH_TOKEN`, but env vars
are quite public, both in procfs as well as the unit files. So add a way
to put the auth token into a file directly.

context: https://neondb.slack.com/archives/C033RQ5SPDH/p1743692566311099
2025-04-07 16:12:04 +00:00
Alex Chi Z.
d37e90f430 fix(pageserver): allow shard ancestor compaction to be cancelled (#11452)
## Problem

https://github.com/neondatabase/neon/issues/11330
https://github.com/neondatabase/neon/issues/11358

## Summary of changes

Looking at the staging log, a few tenants right after shard split are
stuck on shutdown because they are running shard ancestor compaction.
The compaction does not respect the cancellation token.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-07 16:01:21 +00:00
Konstantin Knizhnik
8eb701d706 Save FSM/VM pages on normal shutdown (#11449)
## Problem

See https://neondb.slack.com/archives/C03QLRH7PPD/p1743746717119179

We wallow FSM/VM pages when they are written to disk to persist them in
PS.
But it is not happen during shutdown checkpoint, because writing to WAL
during checkpoint cause Postgres panic.

## Summary of changes

Move `CheckPointBuffers` call to `PreCheckPointGuts`

Postgres PRs:
https://github.com/neondatabase/postgres/pull/615
https://github.com/neondatabase/postgres/pull/614
https://github.com/neondatabase/postgres/pull/613
https://github.com/neondatabase/postgres/pull/612

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-07 13:56:55 +00:00
Conrad Ludgate
85a515c176 update tokio for RUSTSEC-2025-0023 (#11464) 2025-04-07 13:33:56 +00:00
Christian Schwarz
aa88279681 fix(storcon/http): node status API returns serialized runtime object (#11461)
The Serialize impl on the `Node` type is for the `/debug` endpoint only.
Committed APIs should use the `NodeDescribeResponse`.

Refs
- fixes https://github.com/neondatabase/neon/issues/11326
- found while working on admin UI change
https://github.com/neondatabase/cloud/pull/26207
2025-04-07 12:23:40 +00:00
Heikki Linnakangas
b2a670c765 refactor: Use same prototype for neon_read_at_lsn on all PG versions (#11457)
The 'neon_read' function needs to have a different prototype on PG < 16,
because it's part of the smgr interface. But neon_read_at_lsn doesn't
have that restriction.
2025-04-07 11:04:36 +00:00
a-masterov
ad9655bb01 Fix the errors in pg_regress test running on the staging. (#11432)
## Problem
The shared libraries preloaded by default interfered with the
`pg_regress` tests on staging, causing wrong results
## Summary of changes
The projects used for these tests are now free from unnecessary
extensions. Some changes were made in patches.
2025-04-06 19:30:21 +00:00
Heikki Linnakangas
1a87975d95 Misc cleanup of #includes and comments in the neon extension (#11456)
Remove useless and often wrong IDENTIFICATION comments. PostgreSQL
sources have them, mostly for historical reasons, but there's no need
for us to copy that style.

Remove unnecessary #includes in header files, putting the #includes
directly in the .c files that need them. The principle is that a header
file should #include other header files if they need definitions from
them, such that each header file can be compiled on its own, but not
other #includes. (There are tools to enforce that, but this was just a
manual clean up of violations that I happened to spot.)
2025-04-06 15:34:13 +00:00
dependabot[bot]
417b2781d9 build(deps): bump openssl from 0.10.70 to 0.10.72 in /test_runner/pg_clients/rust/tokio-postgres in the cargo group across 1 directory (#11455)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-05 13:00:51 +00:00
Suhas Thalanki
2841f1ffa5 removal of pg_embedding (#11440)
## Problem

The `pg_embedding` extension has been deprecated and can cause issues
with recent changes such as with
https://github.com/neondatabase/neon/issues/10973

Issue: `PG:2025-04-03 15:39:25.498 GMT
ttid=a4de5bee50225424b053dc64bac96d87/d6f3891b8f968458b3f7edea58fb3c6f
sqlstate=58P01 [15526] ERROR: could not load library
"/usr/local/lib/embedding.so": /usr/local/lib/embedding.so: undefined
symbol: SetLastWrittenLSNForRelation`

## Summary of changes

Removed `pg_embedding` extension from the compute image.
2025-04-04 18:21:23 +00:00
Christian Schwarz
aad410c8f1 improve ondemand-download latency observability (#11421)
## Problem

We don't have metrics to exactly quantify the end user impact of
on-demand downloads.

Perf tracing is underway (#11140) to supply us with high-resolution
*samples*.

But it will also be useful to have some aggregate per-timeline and
per-instance metrics that definitively contain all observations.

## Summary of changes

This PR consists of independent commits that should be reviewed
independently.

However, for convenience, we're going to merge them together.

- refactor(metrics): measure_remote_op can use async traits
- impr(pageserver metrics): task_kind dimension for
remote_timeline_client latency histo
  - implements https://github.com/neondatabase/cloud/issues/26800
- refs
https://github.com/neondatabase/cloud/issues/26193#issuecomment-2769705793
- use the opportunity to rename the metric and add a _global suffix;
checked grafana export, it's only used in two personal dashboards, one
of them mine, the other by Heikki
- log on-demand download latency for expensive-to-query but precise
ground truth
- metric for wall clock time spent waiting for on-demand downloads

## Refs

- refs https://github.com/neondatabase/cloud/issues/26800
- a bunch of minor investigations / incidents into latency outliers
2025-04-04 18:04:39 +00:00
Christian Schwarz
4f94751b75 pageserver config: ignore+warn about unknown fields (instead of deny_unknown_fields) (#11275)
# Refs
- refs https://github.com/neondatabase/neon/issues/8915
- discussion thread:
https://neondb.slack.com/archives/C033RQ5SPDH/p1742406381132599
- stacked atop https://github.com/neondatabase/neon/pull/11298
- corresponding internal docs update that illustrates how this PR
removes friction: https://github.com/neondatabase/docs/pull/404

# Problem

Rejecting `pageserver.toml`s with unknown fields adds friction,
especially when using `pageserver.toml` fields as feature flags that
need to be decommissioned.

See the added paragraphs on `pageserver_api::models::ConfigToml` for
details on what kind of friction it causes.

Also read the corresponding internal docs update linked above to see a
more imperative guide for using `pageserver.toml` flags as feature
flags.

# Solution

## Ignoring unknown fields

Ignoring is the serde default behavior.

So, just remove `serde(deny_unknown_fields)` from all structs in
`pageserver_api::config::ConfigToml`
`pageserver_api::config::TenantConfigToml`.

I went through all the child fields and verified they don't use
`deny_unknown_fields` either, including those shared with
`pageserver_api::models`.

## Warning about unknown fields

We still want to warn about unknown fields to 
- be informed about typos in the config template
- be reminded about feature-flag style configs that have been cleaned up
in code but not yet in config templates

We tried `serde_ignore` (cf draft #11319) but it doesn't work with
`serde(flatten)`.

The solution we arrived at is to compare the on-disk TOML with the TOML
that we produce if we serialize the `ConfigToml` again.
Any key specified in the on-disk TOML but not present in the serialized
TOML is flagged as an ignored key.
The mechanism to do it is a tiny recursive decent visitor on the
`toml_edit::DocumentMut`.

# Future Work

Invalid config _values_ in known fields will continue to fail pageserver
startup.
See
- https://github.com/neondatabase/cloud/issues/24349
for current worst case impact to deployments & ideas to improve.
2025-04-04 17:30:58 +00:00
Christian Schwarz
6ee84d985a impr(perf tracing): ability to correlate with page_service logs (#11398)
# Problem

Current perf tracing fields do not allow answering the question what a
specific Postgres backend was waiting for.

# Background

For Pageserver logs, we set the backend PID as the libpq
`application_name` on the compute side, and funnel that into the a
tracing field for the spans that emit to the global tracing subscriber.

# Solution

Funnel `application_name`, and the other fields that we use in the
logging spans, into the root span for perf tracing.

# Refs

- fixes https://github.com/neondatabase/neon/issues/11393
- stacked atop https://github.com/neondatabase/neon/pull/11433
- epic: https://github.com/neondatabase/neon/issues/9873
2025-04-04 15:13:54 +00:00
JC Grünhage
295be03a33 impr(ci): send clearer notifications to slack when retrying container image pushes (#11447)
## Problem
We've started sending slack notifications for failed container image
pushes that are being retried. There are more messages coming in than
expected, so clicking through the link to see what image failed is
happening more often than we hoped.

## Summary of changes
- Make slack notifications clearer, including whether the job succeeded
and what retries have happened.
- Log failures/retries in step more clearly, so that you can easily see
when something fails.
2025-04-04 14:56:41 +00:00
Alexander Bayandin
8e1b5a9727 Fix Postgres build on macOS (#11442)
## Problem
Postgres build fails with the following error on macOS:

```
/Users/bayandin/work/neon//vendor/postgres-v14/src/port/snprintf.c:424:27: error: 'strchrnul' is only available on macOS 15.4 or newer [-Werror,-Wunguarded-availability-new]
  424 |                         const char *next_pct = strchrnul(format + 1, '%');
      |                                                ^~~~~~~~~
/Users/bayandin/work/neon//vendor/postgres-v14/src/port/snprintf.c:376:14: note: 'strchrnul' has been marked as being introduced in macOS 15.4 here, but the deployment target is macOS 15.0.0
  376 | extern char *strchrnul(const char *s, int c);
      |              ^
/Users/bayandin/work/neon//vendor/postgres-v14/src/port/snprintf.c:424:27: note: enclose 'strchrnul' in a __builtin_available check to silence this warning
  424 |                         const char *next_pct = strchrnul(format + 1, '%');
      |                                                ^~~~~~~~~
  425 |
  426 |                         /* Dump literal data we just scanned over */
  427 |                         dostr(format, next_pct - format, target);
  428 |                         if (target->failed)
  429 |                                 break;
  430 |
  431 |                         if (*next_pct == '\0')
  432 |                                 break;
  433 |                         format = next_pct;
      |
1 error generated.
```

## Summary of changes
- Update Postgres fork to include changes from
6da2ba1d8a

Corresponding Postgres PRs:
- https://github.com/neondatabase/postgres/pull/608
- https://github.com/neondatabase/postgres/pull/609
- https://github.com/neondatabase/postgres/pull/610
- https://github.com/neondatabase/postgres/pull/611
2025-04-04 14:09:15 +00:00
Vlad Lazar
1ef4258f29 pageserver: add tenant level performance tracing sampling ratio (#11433)
## Problem

https://github.com/neondatabase/neon/pull/11140 introduces performance
tracing with OTEL
and a pageserver config which configures the sampling ratio of get page
requests.

Enabling a non-zero sampling ratio on a per region basis is too
aggressive and comes with perf
impact that isn't very well understood yet.

## Summary of changes

Add a `sampling_ratio` tenant level config which overrides the
pageserver level config.
Note that we do not cache the config and load it on every get page
request such that changes propagate
timely.

Note that I've had to remove the `SHARD_SELECTION` span to get this to
work. The tracing library doesn't
expose a neat way to drop a span if one realises it's not needed at
runtime.

Closes https://github.com/neondatabase/neon/issues/11392
2025-04-04 13:41:28 +00:00
Vlad Lazar
65e2aae6e4 pageserver/secondary: deregister IO metrics (#11283)
## Problem

IO metrics for secondary locations do not get deregistered when the
timeline is removed.

## Summary of changes

Stash the request context to be used for downloads in
`SecondaryTimelineDetail`. These objects match the lifetime of the
secondary timeline location pretty well.

When the timeline is removed, deregister the metrics too.

Closes https://github.com/neondatabase/neon/issues/11156
2025-04-04 10:52:59 +00:00
a-masterov
edc874e1b3 Use the same test image version as the computer one (#11448)
## Problem
Changes in compute can cause errors in tests if another version of
`neon-test-extensions` image is used.
## Summary of changes
Use the same version of `neon-test-extensions` image as `compute` one
for docker-compose based extension tests.
2025-04-04 10:13:00 +00:00
Dmitrii Kovalkov
181af302b5 storcon + safekeeper + scrubber: propagate root CA certs everywhere (#11418)
## Problem
There are some places in the code where we create `reqwest::Client`
without providing SSL CA certs from `ssl_ca_file`. These will break
after we enable TLS everywhere.
- Part of https://github.com/neondatabase/cloud/issues/22686

## Summary of changes
- Support `ssl_ca_file` in storage scrubber.
- Add `use_https_safekeeper_api` option to safekeeper to use https for
peer requests.
- Propagate SSL CA certs to storage_controller/client, storcon's
ComputeHook, PeerClient and maybe_forward.
2025-04-04 06:30:48 +00:00
Tristan Partin
497116b76d Download extension if it does not exist on the filesystem (#11315)
Previously we attempted to download all extensions in CREATE EXTENSION
statements. Extensions like pg_stat_statements and neon are not remote
extensions, but still we were requesting them when
skip_pg_catalog_updates was set to false.

Fixes: https://github.com/neondatabase/neon/issues/11127

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-04 01:06:22 +00:00
Arpad Müller
a917952b30 Add test_storcon_create_delete_sk_down and make it work (#11400)
Adds a test `test_storcon_create_delete_sk_down` which tests the
reconciler and pending op persistence if faced with a temporary
safekeeper downtime during timeline creation or deletion. This is in
contrast to `test_explicit_timeline_creation_storcon`, which tests the
happy path.

We also do some fixes:

* timeline and tenant deletion http requests didn't expect a body, but
`()` sent one.
* we got the tenant deletion http request's return type wrong: it's
supposed to be a hash map
* we add some logging to improve observability
* We fix `list_pending_ops` which had broken code meant to make it
possible to restrict oneself to a single pageserver. But diesel doesn't
support that sadly, or at least I couldn't figure out a way to make it
work. We don't need that functionality, so remove it.
* We add an info span to the heartbeater futures with the node id, so
that there is no context-free msgs like "Backoff: waiting 1.1 seconds
before processing with the task" in the storcon logs. we could also add
the full base url of the node but don't do it as most other log lines
contain that information already, and if we do duplication it should at
least not be verbose. One can always find out the base url from the node
id.

Successor of #11261
Part of #9011
2025-04-04 00:17:40 +00:00
Tristan Partin
e581b670f4 Improve nightly physical replication benchmark (#11389)
Log the created project and endpoint IDs and improve typing in the
source code to improve readability.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-03 23:00:58 +00:00
Alexander Bayandin
8ed79ed773 build(deps): bump h2 to 4.2.0 (#11437)
## Problem

We switched `h2` from 4.1.0 to a git commit to fix stubgen (in
https://github.com/neondatabase/neon/pull/10491). `h2` 4.2.0 was
released soon after that, so we can switch back to a pinned version.

Expected no changes, as 4.2.0 is the right next commit after the commit
we currently use:
dacd614fed%5E

## Summary of changes
- Bump `h2` to 4.2.0
2025-04-03 21:42:34 +00:00
Alex Chi Z.
381f42519e fix(pageserver): skip gc-compaction over sparse keyspaces (#11404)
## Problem

Part of https://github.com/neondatabase/neon/issues/11318

It's not 100% safe for now to run gc-compaction over the sparse
keyspace. It might cause deleted file to re-appear if a specific
sequence of operations are done as in the issue, which in reality
doesn't happen due to how we split delta/image layers based on the key
range.

A long-term fix would be either having a separate gc-compaction code
path for metadata keys (as how we have a different code path for
metadata image layer generation), or let the compaction process aware of
the information of "there's an image layer that doesn't contain a key"
so that we can skip the keys.

## Summary of changes

* gc-compaction auto trigger only triggers compaction over the normal
data range.
* do not hold gc_block_guard across the full compaction job, only hold
it during each subcompaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-03 19:40:44 +00:00
Arpad Müller
375df517a0 storcon: return 503 instead of 500 if there is no new leader yet (#11417)
The leadership transfer protocol between storage controller instances is
as follows, listing the steps for the new pod:

The new pod does these things:

1. new pod comes online. looks in database if there is a leader. if
there is, it asks that leader to step down.
2. the new pod does some operations to come online. they should be
fairly short timed, but it's not zero.
3. the new pod updates the leader entry in the database.

The old pod, once it gets the step down request, changes its internal
state to stepped down. It treats all incoming requests specially now:
instead of processing, it wants to forward them to the new pod. The
forwarding however only works if the new pod is online already, so
before forwarding it reads from the db for a leader (also to get the
address to forward to in the first place).

If the new pod is not online yet, i.e. during step 2 above, the old pod
might legitimately land in the branch which this patch is editing: the
leader in the database is a stepped down instance.

Before, we've returned a `ApiError::InternalServerError`, but that would
print the full backtrace plus an error log. With this patch, we cut down
on the noise, as it's an expected situation to have a short storcon
downtime while we are cutting over to the new instance. A
`ResourceUnavailable` error is not just more fitting, it also doesn't
print a backtrace once encountered, and only prints on the INFO log
level (see `api_error_handler` function).

Fixes #11320
cc #8954
2025-04-03 18:43:16 +00:00
Vlad Lazar
9db63fea7a pageserver: optionally export perf traces in OTEL format (#11140)
Based on https://github.com/neondatabase/neon/pull/11139

## Problem

We want to export performance traces from the pageserver in OTEL format.
End goal is to see them in Grafana.

## Summary of changes

https://github.com/neondatabase/neon/pull/11139 introduces the
infrastructure required to run the otel collector alongside the
pageserver.

### Design

Requirements:
1. We'd like to avoid implementing our own performance tracing stack if
possible and use the `tracing` crate if possible.
2. Ideally, we'd like zero overhead of a sampling rate of zero and be a
be able to change the tracing config for a tenant on the fly.
3. We should leave the current span hierarchy intact. This includes
adding perf traces without modifying existing tracing.

To satisfy (3) (and (2) in part) a separate span hierarchy is used.
`RequestContext` gains an optional `perf_span` member
that's only set when the request was chosen by sampling. All perf span
related methods added to `RequestContext` are no-ops for requests that
are not sampled.

This on its own is not enough for (3), so performance spans use a
separate tracing subscriber. The `tracing` crate doesn't have great
support for this, so there's a fair amount of boilerplate to override
the subscriber at all points of the perf span lifecycle.

### Perf Impact

[Periodic
pagebench](https://neonprod.grafana.net/d/ddqtbfykfqfi8d/e904990?orgId=1&from=2025-02-08T14:15:59.362Z&to=2025-03-10T14:15:59.362Z&timezone=utc)
shows no statistically significant regression with a sample ratio of 0.
There's an annotation on the dashboard on 2025-03-06.

### Overview of changes:
1. Clean up the `RequestContext` API a bit. Namely, get rid of the
`RequestContext::extend` API and use the builder instead.
2. Add pageserver level configs for tracing: sampling ratio, otel
endpoint, etc.
3. Introduce some perf span tracking utilities and expose them via
`RequestContext`. We add a `tracing::Span` wrapper to be used for perf
spans and a `tracing::Instrumented` equivalent for it. See doc comments
for reason.
4. Set up OTEL tracing infra according to configuration. A separate
runtime is used for the collector.
5. Add perf traces to the read path.

## Refs

- epic https://github.com/neondatabase/neon/issues/9873

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-04-03 17:56:51 +00:00
Alex Chi Z.
bfc767d60d fix(test): wait for shard split complete for test_lsn_lease_storcon (#11436)
## Problem

close https://github.com/neondatabase/neon/issues/11397
ref https://github.com/neondatabase/cloud/issues/23667

## Summary of changes

We need to wait until the shard split is complete, otherwise it will
print warning like waiting for shard split exclusive lock for 30s.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-03 17:49:45 +00:00
Alex Chi Z.
109c54a300 fix(pageserver): avoid gc-compaction triggering circuit breaker (#11403)
## Problem

There are some cases where traditional gc might collect some layer files
causing gc-compaction cannot read the full history of the key. This
needs to be resolved in the long-term by improving the compaction
process. For now, let's simply avoid such errors triggering the circuit
breaker.

## Summary of changes

* Move the place where we trigger the circuit breaker. We only trigger
it during compactions other than L0 compactions. We added the trigger a
year ago due to file cleanup concerns in image layer compaction.
* For gc-compaction, only return errors to the upper
compaction_iteration if it's a shutdown error. Otherwise, just log it
and skip the compaction for a key range.

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-04-03 17:18:37 +00:00
Vlad Lazar
74920d8cd8 storcon: notify compute if correct observed state was refreshed (#11342)
## Problem

Previously, if the observed state was refreshed and matching the intent,
we wouldn't send
a compute notification. This is unsafe. There's no guarantee that the
location landed on the
pageserver _and_ a compute notification for it was delivered.

See
https://github.com/neondatabase/neon/issues/11291#issuecomment-2743205411
for one such example.

## Summary of changes

Add a reproducer and notify the compute if the correct observed state
required a refresh.

Closes https://github.com/neondatabase/neon/issues/11291
2025-04-03 16:35:55 +00:00
Alex Chi Z.
131b32ef48 fix(pageserver): clean up aux files before detaching (#11299)
## Problem

Related to https://github.com/neondatabase/cloud/issues/26091 and
https://github.com/neondatabase/cloud/issues/25840

Close https://github.com/neondatabase/neon/issues/11297

Discussion on Slack:
https://neondb.slack.com/archives/C033RQ5SPDH/p1742320666313969

## Summary of changes

* When detaching, scan all aux files within
`sparse_non_inherited_keyspace` in the ancestor timeline and create an
image layer exactly at the ancestor LSN. All scanned keys will map to an
empty value, which is a delete tombstone.
- Note that end_lsn for rewritten delta layers = ancestor_lsn + 1, so
the image layer will have image_end_lsn=end_lsn. With the current
`select_layer` logic, the read path will always first read the image
layer.
* Add a test case.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-04-03 15:55:22 +00:00
Suhas Thalanki
581bb5d7d5 removed pg_anon setup from compute dockerfile (#10960)
## Problem

Removing the `anon` v1 extension in postgres as described in
https://github.com/neondatabase/cloud/issues/22663. This extension is
not built for postgres v17 and is out of date when compared to the
upstream variant which is v2 (we have v1.4).

## Summary of changes

Removed the `anon` v1 extension from the compute docker image

Related to https://github.com/neondatabase/cloud/issues/22663
2025-04-03 15:26:35 +00:00
JC Grünhage
3c78133477 feat(ci): add 'released' tag to container images from release runs (#11425)
## Problem
We had a problem with https://github.com/neondatabase/neon/pull/11413
having e2e tests failing, because an e2e test
(8d271bed47)
depended on an unreleased pageserver fix
(0ee5bfa2fc).
This came up because neon release CI runs against the most recent
releases of the other components, but cloud e2e tests run against
latest, which is tagged from main.

## Summary of changes
Add an additional `released` tag for released versions.

## Alternative to consider
We could (and maybe should) instead switch to `latest` being used for
released versions and `main` being used where we use `latest` right now.
That'd also mean we don't have to adjust the CI in the cloud repo.
2025-04-03 14:57:44 +00:00
Suhas Thalanki
46e046e779 Exporting file_cache_used to calculate LFC utilization (#11384)
## Problem

Exporting `file_cache_used` which specifies the number of used chunks in
the LFC. This helps calculate LFC utilization as: `file_cache_used_pages
/ (file_cache_used * file_cache_chunk_size_pages)`

## Summary of changes

Exporting `file_cache_used`.

Related Issue: https://github.com/neondatabase/cloud/issues/26688
2025-04-03 14:54:45 +00:00
Arpad Müller
d8cee52637 Update rust to 1.86.0 (#11431)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Announcement blog
post](https://blog.rust-lang.org/2025/04/03/Rust-1.86.0.html).

Prior update was in #10914.
2025-04-03 14:53:28 +00:00
Dmitrii Kovalkov
2e11d129d0 tests: suppress mgm api timeout error in sotrcon (#11428)
## Problem
Since
0f367cb665
the timeout in `with_client_retries` is implemented via `tokio::timeout`
instead of `reqwest::ClientBuilder::timeout` (because we reuse the
client). It changed the error representation if the timeout is exceeded.
Such errors were suppressed in `allowed_errors.py`, but old regexps do
not match the new error.
Discussion:
https://neondb.slack.com/archives/C033RQ5SPDH/p1743533184736319

## Summary of changes
- Add new `Timeout` error to `allowed_errors.py`
2025-04-03 14:18:50 +00:00
Luís Tavares
43a7423f72 feat: bump pg_session_jwt extension to 0.3.0 (#11399)
## Problem

Bumps https://github.com/neondatabase/pg_session_jwt to the latest
release
[v0.3.0](https://github.com/neondatabase/pg_session_jwt/releases/tag/v0.3.0)
that introduces PostgREST fallback mechanisms.

## Summary of changes

Updates the extension download tar and the extension version in the
proxy constant.

## Subscribers
@mrl5
2025-04-03 13:01:18 +00:00
Arpad Müller
374736a4de Print remote_addr span for Failed to serve HTTP connection error (#11423)
I've encountered this error in #11422. Ideally we'd have the URL as well
to associate it with a tenant, but at this level we only have the remote
addr I guess. Better than nothing.
2025-04-03 11:58:12 +00:00
Alexander Bayandin
5e507776bc Compute: update plv8 patch (#11426)
## Problem

https://github.com/neondatabase/cloud/issues/26866

## Summary of changes
- Update plv8 patch

Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2025-04-03 11:29:53 +00:00
Alexander Lakhin
4e8e0951be Increase timeout for test_pageserver_gc_compaction_smoke (#11410)
## Problem
The test_pageserver_gc_compaction_smoke fails rather often due to a
timeout on slow machines.
See https://github.com/neondatabase/neon/issues/11355.

## Summary of changes
Increase the timeout for the test.
2025-04-03 11:23:30 +00:00
JC Grünhage
64a8d0c2e6 impr(ci): retry container image pushing and send slack messages for failures (#11416)
## Problem
We've seen quite a few CI failures related to pushes to docker hub
failing with weird error messages that indicate maybe docker hub is just
not reliable.

## Summary of changes
Retry container image pushing up to 10 times, and send a slack message
if we had to retry, regardless of the job succeeding or not.
2025-04-03 08:59:42 +00:00
Tristan Partin
7602e6ffc0 Skip compute_ctl authorization checks in testing builds (#11186)
We will require authorization in production. We need to skip in testing
builds for now because regression tests would fail. See
https://github.com/neondatabase/neon/issues/11316 for more information.

Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-04-03 00:00:28 +00:00
Erik Grinaker
17193d6a33 test_runner: fix pagebench tenant configs (#11420)
## Problem

Pagebench creates a bunch of tenants by first creating a template tenant
and copying its remote storage, then attaching the copies to the
Pageserver.

These tenants had custom configurations to disable GC and compaction.
However, these configs were only picked up by the Pageserver on attach,
and not registered with the storage controller. This caused the storage
controller to replace the tenant configs with the default tenant config,
re-enabling GC and compaction which interferes with benchmark
performance.

Resolves #11381.

## Summary of changes

Register the copied tenants with the storage controller, instead of
directly attaching them to the Pageserver.
2025-04-02 20:11:39 +00:00
Erik Grinaker
03ae57236f docs: add compaction notes (#11415)
Lifted from
https://www.notion.so/neondatabase/Rough-Notes-on-Compaction-1baf189e004780859e65ef63b85cfa81?pvs=4.
2025-04-02 19:55:08 +00:00
Arpad Müller
e3d27b2f68 Start safekeeper node IDs with 0 and forbid 0 from registering (#11419)
Right now we start safekeeper node ids at 0. However, other code treats
0 as invalid (see #11407). We decided on latter. Therefore, make the
register python tests register safekeepers starting at node id 1 instead
of 0, and forbid safekeepers with id 0 from registering.

Context:
https://github.com/neondatabase/neon/pull/11407#discussion_r2024852328
2025-04-02 18:36:50 +00:00
Alex Chi Z.
dd1299f337 feat(storcon): passthrough mark invisible and add tests (#11401)
## Problem

close https://github.com/neondatabase/neon/issues/11279

## Summary of changes

* Allow passthrough of other methods in tenant timeline shard0
passthrough of storcon.
* Passthrough mark invisible API in storcon.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-02 17:11:49 +00:00
John Spray
cb19e4e05d pageserver: remove legacy TimelineInfo::latest_gc_cutoff field (2/2) (#11136)
## Problem

This field was retained for backward compat only in #10707.

Once https://github.com/neondatabase/cloud/pull/25233 is released,
nothing will be reading this field.

Related: https://github.com/neondatabase/cloud/issues/24250

## Summary of changes

- Remove TimelineInfo::latest_gc_cutoff_lsn
2025-04-02 15:21:58 +00:00
Erik Grinaker
9df230c837 storcon: improve autosplit defaults (#11332)
## Problem

In #11122, we changed the autosplit behavior to allow repeated and
initial splits. The defaults were set such that they retain the current
production settings (8 shards at 64 GB). However, these defaults don't
really make sense by themselves.

Once we deploy new settings to production, we should change the defaults
to something more reasonable.

## Summary of changes

Changes the following default settings:

* `max_split_shards`: 8 → 16
* `initial_split_threshold`: 64 GB → disabled
* `initial_split_shards`: 8 → 2
2025-04-02 15:11:52 +00:00
JC Grünhage
3c2bc5baba fix(ci): run checks on release PRs (#11375)
## Problem
Hotfix releases mean that sometimes changes in release PRs haven't been
tested and linted yet. Disabling tests and lints is therefore not
necessarily safe. In the future we will check whether tests have run on
the same git tree already to speed things up, but for now we need to
turn tests back on fully. This partially reverts:
https://github.com/neondatabase/neon/pull/11272

## Summary of changes
Run checks on `.*-rc-pr` runs.
2025-04-02 14:32:53 +00:00
Alexey Kondratov
6667810800 chore(compute_ctl): Minor code and comment fixes (#11411)
## Problem

In #11376 I mistakenly reworded one comment and also forgot to commit
one of the suggestions.

## Summary of changes

Fix it here.
2025-04-02 14:20:52 +00:00
Erik Grinaker
47f5bcf2bc pageserver: don't periodically flush layers for stale attachments (#11317)
## Problem

Tenants in attachment state `Stale` can't upload layers, and don't run
compaction, but still do periodic L0 layer flushes in the tenant
housekeeping loop. If the tenant remains stuck in stale mode, this
causes a large buildup of L0 layers, causing logging, metrics increases,
and possibly alerts.

Resolves #11245.

## Summary of changes

Don't perform periodic layer flushes in stale attachment state.
2025-04-02 12:55:15 +00:00
a-masterov
c179d098ef Increase the timeout for extensions upgrade tests (on schedule). (#11406)
## Problem
Sometimes the forced extension upgrade test fails (on schedule) due to a
timeout.
## Summary of changes
The timeout is increased to 60 mins.
2025-04-02 11:18:37 +00:00
Peter Bendel
4bc6dbdd5f use a prod-like shared_buffers size for some perf unit tests (#11373)
## Problem

In Neon DBaaS we adjust the shared_buffers to the size of the compute,
or better described we adjust the max number of connections to the
compute size and we adjust the shared_buffers size to the number of max
connections according to about the following sizes
`2 CU: 225mb; 4 CU: 450mb; 8 CU: 900mb`

[see](877e33b428/goapp/controlplane/internal/pkg/compute/computespec/pg_settings.go (L405))

## Summary of changes

We should run perf unit tests with settings that is realistic for a
paying customer and select 8 CU as the reference for those tests.
2025-04-02 10:43:05 +00:00
John Spray
7dc8370848 storcon: do graceful migrations from chaos injector (#11028)
## Problem

Followup to https://github.com/neondatabase/neon/pull/10913

Existing chaos injection just does simple cutovers to secondary
locations. Let's also exercise code for doing graceful migrations. This
should implicitly test how such migrations cope with overlapping with
service restarts.

## Summary of changes
2025-04-02 10:08:05 +00:00
Alex Chi Z.
c4fc602115 feat(pageserver): support synthetic size calculation for invisible branches (#11335)
## Problem

ref https://github.com/neondatabase/neon/issues/11279


Imagine we have a branch with 3 snapshots A, B, and C:
```
base---+---+---+---main
        \-A \-B \-C
base=100G, base-A=1G, A-B=1G, B-C=1G, C-main=1G
```
at this point, the synthetic size should be 100+1+1+1+1=104G.

after the deletion, the structure looks like:
```
base---+---+---+
       \-A \-B \-C
```
If we simply assume main never exists, the size will be calculated as
size(A) + size(B) + size(C)=300GB, which obviously is not what the user
would expect.

The correct way to do this is to assume part of main still exists, that
is to say, set C-main=1G:
```
base---+---+---+main
       \-A \-B \-C
```
And we will get the correct synthetic size of 100G+1+1+1=103G.


## Summary of changes

* Do not generate gc cutoff point for invisible branches.
* Use the same LSN as the last branchpoint for branch end.
* Remove test_api_handler for mark_invisible.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-01 18:50:58 +00:00
Konstantin Knizhnik
02936b82c5 Fix effective_lsn calculation for prefetch (#11219)
## Problem

See  https://neondb.slack.com/archives/C04DGM6SMTM/p1741594233757489

Consider the following scenario:

1. Backend A wants to prefetch some block B
2. Backend A checks that block B is not present in shared buffer
3. Backend A registers new prefetch request and calls
prefetch_do_request
4. prefetch_do_request calls neon_get_request_lsns
5. neon_get_request_lsns obtains LwLSN for block B
6. Backend B downloads B, updates and wallogs it (let say to Lsn1)
7. Block B is once again thrown from shared buffers, its LwLSN is set to
Lsn1
8. Backend A obtains current flush LSN, let's say that it is Lsn1
9. Backend A stores Lsn1 as effective_lsn in prefetch slot.
10. Backend A reads page B with LwLSN=Lsn1
11. Backend A finds in prefetch ring response for prefetch request for
block B with effective_lsn=Lsn1, so that it satisfies
neon_prefetch_response_usable condition
12. Backend A uses deteriorated version of the page!

## Summary of changes

Use `not_modified_since` as `effective_lsn`. 
It should not cause some degrade of performance because we store LwLSN
when it was not found in LwLSN hash, so if page is not changed till
prefetch response is arrived, then LwLSN should not be changed.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-04-01 15:48:02 +00:00
Arpad Müller
1fad1abb24 storcon: timeline deletion improvements and fixes (#11334)
This PR contains a bunch of smaller followups and fixes of the original
PR #11058. most of these implement suggestions from Arseny:

* remove `Queryable, Selectable` from `TimelinePersistence`: they are
not needed.
* no `Arc` around `CancellationToken`: it itself is an arc wrapper
* only schedule deletes instead of scheduling excludes and deletes
* persist and delete deletion ops
* delete rows in timelines table upon tenant and timeline deletion
* set `deleted_at` for timelines we are deleting before we start any
reconciles: this flag will help us later to recognize half-executed
deletions, or when we crashed before we could remove the timeline row
but after we removed the last pending op (handling these situations are
left for later).

Part of #9011
2025-04-01 15:16:33 +00:00
Arpad Müller
016068b966 API spec: add safekeepers field returned by storcon (#11385)
Add an optional `safekeepers` field to `TimelineInfo` which is returned
by the storcon upon timeline creation if the
`--timelines-onto-safekeepers` flag is enabled. It contains the list of
safekeepers chosen.

Other contexts where we return `TimelineInfo` do not contain the
`safekeepers` field, sadly I couldn't make this more type safe like done
in Rust via `TimelineCreateResponseStorcon`, as there is no way of
flattening or inheritance (and I don't that duplicating the entire type
for some minor type safety improvements is worth it).

The storcon side has been done in #11058.

Part of https://github.com/neondatabase/cloud/issues/16176
cc https://github.com/neondatabase/cloud/issues/16796
2025-04-01 12:39:10 +00:00
Erik Grinaker
80596feeaa pageserver: invert CompactFlags::NoYield as YieldForL0 (#11382)
## Problem

`CompactFlags::NoYield` was a bit inconvenient, since every caller
except for the background compaction loop should generally set it (e.g.
HTTP API calls, tests, etc). It was also inconsistent with
`CompactionOutcome::YieldForL0`.

## Summary of changes

Invert `CompactFlags::NoYield` as `CompactFlags::YieldForL0`. There
should be no behavioral changes.
2025-04-01 11:43:58 +00:00
Erik Grinaker
225cabd84d pageserver: update upload queue TODOs (#11377)
Update some upload queue TODOs, particularly to track
https://github.com/neondatabase/neon/issues/10283, which I won't get
around to.
2025-04-01 11:38:12 +00:00
Alexey Kondratov
557127550c feat(compute): Add compute_ctl_up metric (#11376)
## Problem

For computes running inside NeonVM, the actual compute image tag is
buried inside the NeonVM spec, and we cannot get it as part of standard
k8s container metrics (it's always an image and a tag of the NeonVM
runner container). The workaround we currently use is to extract the
running computes info from the control plane database with SQL. It has
several drawbacks: i) it's complicated, separate DB per region; ii) it's
slow; iii) it's still an indirect source of info, i.e. k8s state could
be different from what the control plane expects.

## Summary of changes

Add a new `compute_ctl_up` gauge metric with `build_tag` and `status`
labels. It will help us to both overview what are the tags/versions of
all running computes; and to break them down by current status (`empty`,
`running`, `failed`, etc.)

Later, we could introduce low cardinality (no endpoint or compute ids)
streaming aggregates for such metrics, so they will be blazingly fast
and usable for monitoring the fleet-wide state.
2025-04-01 08:51:17 +00:00
github-actions[bot]
1470af0b42 Proxy release 2025-04-01 2025-04-01 06:01:27 +00:00
Konstantin Knizhnik
cfe3e6d4e1 Remove loop from pageserver_try_receive (#11387)
## Problem

Commit
3da70abfa5
cause noticeable performance regression (40% in update-with-prefetch in
test_bulk_update):
https://neondb.slack.com/archives/C04BLQ4LW7K/p1742633167580879

## Summary of changes

Remove loop from pageserver_try_receive to make it fetch not more than
one response. There is still loop in `pump_prefetch_state` which can
fetch as many responses as available.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-31 19:49:32 +00:00
Alex Chi Z.
47d47000df fix(pageserver): passthrough lsn lease in storcon API (#11386)
## Problem

part of https://github.com/neondatabase/cloud/issues/23667

## Summary of changes

lsn_lease API can only be used on pageservers. This patch enables
storcon passthrough.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-31 19:16:42 +00:00
Matthias van de Meent
e5b95bc9dc Neon LFC/prefetch: Improve page read handling (#11380)
Previously we had different meanings for the bitmask of vector IOps.
That has now been unified to "bit set = final result, no more
scribbling".

Furthermore, the LFC read path scribbled on pages that were already
read; that's probably not a good thing so that's been fixed too. In
passing, the read path of LFC has been updated to read only the
requested pages into the provided buffers, thus reducing the IO size of
vectorized IOs.

## Problem

## Summary of changes
2025-03-31 17:04:00 +00:00
Alex Chi Z.
0ee5bfa2fc fix(pageserver): allow sibling archived branch for detaching (#11383)
## Problem

close https://github.com/neondatabase/neon/issues/11379

## Summary of changes

Remove checks around archived branches for detach v2. I also updated the
comments `ancestor_retain_lsn`.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-31 16:32:55 +00:00
Fedor Dikarev
00bcafe82e chore(ci): upgrade stats action with docker images from ghcr.io (#11378)
## Problem
Current version of GitHub Workflow Stats action pull docker images from
DockerHub, that could be an issue with the new pull limits on DockerHub
side.

## Summary of changes
Switch to version `v0.2.2`, with docker images hosted on `ghcr.io`
2025-03-31 14:21:07 +00:00
Conrad Ludgate
ed117af73e chore(proxy/tokio-postgres): remove phf from sqlstate and switch to tracing (#11249)
In sqlstate, we have a manual `phf` construction, which is not
explicitly guaranteed to be stable - you're intended to use a build.rs
or the macro to make sure it's constructed correctly each time. This was
inherited from tokio-postgres upstream, which has the same issue
(https://github.com/rust-phf/rust-phf/pull/321#issuecomment-2724521193).

We don't need this encoding of sqlstate, so I've switched it to simply
parse 5 bytes
(https://www.postgresql.org/docs/current/errcodes-appendix.html).

While here, I switched out log for tracing.
2025-03-31 12:35:51 +00:00
Konstantin Knizhnik
21a891a06d Fix IS_LOCAL_REL macro (first class has oid=FirstNormalObjectId) (#11369)
## Problem

Macro IS_LOCAL_REL used for DEBUG_COMPARE_LOCAL mode use greater-than
rather than greater-or-equal comparison while first table really is
assigned FirstNormalObjectId.

## Summary of changes

Replace strict greater with greater-or-equal comparison.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-31 11:16:35 +00:00
Alexander Bayandin
30a7dd630c ruff: enable TC — flake8-type-checking (#11368)
## Problem

`TYPE_CHECKING` is used inconsistently across Python tests.

## Summary of changes
- Update `ruff`: 0.7.0 -> 0.11.2
- Enable TC (flake8-type-checking):
https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc
- (auto)fix all new issues
2025-03-30 18:58:33 +00:00
Erik Grinaker
db5384e1b0 pageserver: remove L0 flush upload wait (#11196)
## Problem

Previously, L0 flushes would wait for uploads, as a simple form of
backpressure. However, this prevented flush pipelining and upload
parallelism. It has since been disabled by default and replaced by L0
compaction backpressure.

Touches https://github.com/neondatabase/cloud/issues/24664.

## Summary of changes

This patch removes L0 flush upload waits, along with the
`l0_flush_wait_upload`. This can't be merged until the setting has been
removed across the fleet.
2025-03-30 13:14:04 +00:00
JC Grünhage
5cb6a4bc8b fix(ci): use the right sha in release PRs (#11365)
## Problem

`github.sha` contains a merge commit of `head` and `base` if we're in a
PR. In release PRs, this makes no sense, because we fast-forward the
`base` branch to contain the changes from `head`.

Even though we correctly use `${{ github.event.pull_request.head.sha ||
github.sha }}` to reference the git commit when building artifacts, we
don't use that when checking out code, because we want to test the merge
of head and base usually. In the case of release PRs, we definitely
always want to test on the head sha though, because we're going to
forward that, and it already has the base sha as a parent, so the merge
would end up with the same tree anyway.

As a side effect, not checking out `${{
github.event.pull_request.head.sha || github.sha }}` also caused
https://github.com/neondatabase/neon/actions/runs/13986389780/job/39173256184#step:6:49
to say `release-tag=release-compute-8187`, while
https://github.com/neondatabase/neon/actions/runs/14084613121/job/39445314780#step:6:48
is talking about `build-tag=release-compute-8186`

## Summary of changes
Run a few things on `github.event.pull_request.head.sha`, if we're in a
release PR.
2025-03-28 11:56:24 +00:00
Folke Behrens
1dbf40ee2c proxy: Update redis crate (#11372) 2025-03-28 11:43:52 +00:00
Fedor Dikarev
939354abea chore(ci): pin python base images to sha (#11367)
Similar to how we pin base `debian` images, also pin `python` base
images, so we better cache them and have reproducible builds.
2025-03-27 17:42:28 +00:00
Fedor Dikarev
1d5d168626 impr(ci): use hetzner buckets for cache (#11364)
## Problem
Occasionally getting data from GH cache could be slow, with less than
10MB/s and taking 5+ minutes to download cache:
```
Received 20971520 of 2987085791 (0.7%), 9.9 MBs/sec
Received 50331648 of 2987085791 (1.7%), 15.9 MBs/sec
...
Received 1065353216 of 2987085791 (35.7%), 4.8 MBs/sec
Received 1065353216 of 2987085791 (35.7%), 4.7 MBs/sec
...
```

https://github.com/neondatabase/neon/actions/runs/13956437454/job/39068664599#step:7:17

Resulting in getting cache even longer that build time.

## Summary of changes
Switch to the caches, that are closer to the runners, and they provided
stable throughput about 70-80MB/s
2025-03-27 11:11:45 +00:00
Folke Behrens
b40dd54732 compute-node: Add some debugging tools to image (#11352)
## Problem

Some useful debugging tools are missing from the compute image and
sometimes it's impossible to install them because memory is tightly
packed.

## Summary of changes

Add the following tools: iproute2, lsof, screen, tcpdump.
The other changes come from sorting the packages alphabetically.

```bash
$ docker image inspect ghcr.io/neondatabase/vm-compute-node-v16:7555 | jaq '.[0].Size'
1389759645
$ docker image inspect ghcr.io/neondatabase/vm-compute-node-v16:14083125313 | jaq '.[0].Size'
1396051101
$ echo $((1396051101 - 1389759645))
6291456
```
2025-03-27 11:09:27 +00:00
Folke Behrens
4bb7087d4d proxy: Fix some clippy warnings coming in next versions (#11359) 2025-03-26 10:50:16 +00:00
Arpad Müller
5f3551e405 Add "still waiting for task" for slow shutdowns (#11351)
To help with narrowing down
https://github.com/neondatabase/cloud/issues/26362, we make the case
more noisy where we are wait for the shutdown of a specific task (in the
case of that issue, the `gc_loop`).
2025-03-24 17:29:44 +00:00
Anastasia Lubennikova
3e5884ff01 Revert "feat(compute_ctl): allow to change audit_log_level for existi… (#11343)
…ng (#11308)"

This reverts commit e5aef3747c.

The logic of this commit was incorrect:
enabling audit requires a restart of the compute,
because audit extensions use shared_preload_libraries.
So it cannot be done in the configuration phase,
require endpoint restart instead.
2025-03-21 18:09:34 +00:00
Vlad Lazar
9fc7c22cc9 storcon: add use_local_compute_notifications flag (#11333)
## Problem

While working on bulk import, I want to use the `control-plane-url` flag
for a different request.
Currently, the local compute hook is used whenever no control plane is
specified in the config.
My test requires local compute notifications and a configured
`control-plane-url` which isn't supported.

## Summary of changes

Add a `use-local-compute-notifications` flag. When this is set, we use
the local flow regardless of other config values.
It's enabled by default in neon_local and disabled by default in all
other envs. I had to turn the flag off in tests
that wish to bypass the local flow, but that's expected.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-03-21 15:31:06 +00:00
Folke Behrens
23ad228310 pgxn: Increase the pageserver response timeout a bit (#11339)
Increase the PS response timeout slightly but noticeably,
so it does not coincide with the default TCP_RTO_MAX.
2025-03-21 14:21:53 +00:00
Dmitrii Kovalkov
aeb53fea94 storage: support multiple SSL CA certificates (#11341)
## Problem
- We need to support multiple SSL CA certificates for graceful root CA
certificate rotation.
- Closes: https://github.com/neondatabase/cloud/issues/25971

## Summary of changes
- Parses `ssl_ca_file` as a pem bundle, which may contain multiple
certificates. Single pem cert is a valid pem bundle, so the change is
backward compatible.
2025-03-21 13:43:38 +00:00
Dmitrii Kovalkov
0f367cb665 storcon: reuse reqwest http client (#11327)
## Problem

- Part of https://github.com/neondatabase/neon/issues/11113
- Building a new `reqwest::Client` for every request is expensive
because it parses CA certs under the hood. It's noticeable in storcon's
flamegraph.

## Summary of changes
- Reuse one `reqwest::Client` for all API calls to avoid parsing CA
certificates every time.
2025-03-21 11:48:22 +00:00
John Spray
76088c16d2 storcon: reproduce shard split issue (#11290)
## Problem

Issue https://github.com/neondatabase/neon/issues/11254 describes a case
where restart during a shard split can result in a bad end state in the
database.

## Summary of changes

- Add a reproducer for the issue
- Tighten an existing safety check around updated row counts in
complete_shard_split
2025-03-21 08:48:56 +00:00
John Spray
0d99609870 docs: storage controller retro-RFC (#11218)
## Problem

Various aspects of the controller's job are already described in RFCs,
but the overall service didn't have an RFC that records design tradeoffs
and the top level structure.

## Summary of changes

- Add a retrospective RFC that should be useful for anyone understanding
storage controller functionality
2025-03-21 08:32:11 +00:00
Alex Chi Z.
bae9b9acdc feat(pageserver): persist timeline invisible flag (#11331)
## Problem

part of https://github.com/neondatabase/neon/issues/11279

## Summary of changes

The invisible flag is used to exclude a timeline from synthetic size
calculation. For the first step, let's persist this flag. Most of the
code are following the `is_archived` modification flow.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-20 18:39:08 +00:00
Nikita Kalyanov
53f54ba37a chore: expose detach_v2 (#11325)
we need this exposed in the spec to use it in cplane. extracted from
https://github.com/neondatabase/cloud/pull/26167

## Problem

## Summary of changes
2025-03-20 18:04:17 +00:00
Dmitrii Kovalkov
28fc051dcc storage: live ssl certificate reload (#11309)
## Problem
SSL certs are loaded only during start up. It doesn't allow the rotation
of short-lived certificates without server restart.

- Closes: https://github.com/neondatabase/cloud/issues/25525

## Summary of changes
- Implement `ReloadingCertificateResolver` which reloads certificates
from disk periodically.
2025-03-20 16:26:27 +00:00
Folke Behrens
d0102a473a pgxn: Include local port in no-response log messages (#11321)
## Problem

Now that stuck connections are quickly terminated it's not easy to
quickly find the right port from the pid to correlate the connection
with the one seen on pageserver side.

## Summary of changes

Call getsockname() and include the local port number in the
no-response-from-pageserver log messages.
2025-03-20 16:06:00 +00:00
Alex Chi Z.
78502798ae feat(compute_ctl): pass compute type to pageserver with pg_options (#11287)
## Problem

second try of https://github.com/neondatabase/neon/pull/11185, part of
https://github.com/neondatabase/cloud/issues/24706

## Summary of changes

Tristan reminded me of the `options` field of the pg wire protocol,
which can be used to pass configurations. This patch adds the parsing on
the pageserver side, and supplies `neon.endpoint_type` as part of the
`options`.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-20 15:48:40 +00:00
Erik Grinaker
65d690b21d storcon: add repeated auto-splits and initial splits (#11122)
## Problem

Currently, we only split tenants into 8 shards once, at the 64 GB split
threshold. For very large tenants, we need to keep splitting to avoid
huge shards. And we also want to eagerly split at a lower threshold to
improve throughput during initial ingestion.

See
https://github.com/neondatabase/cloud/issues/22532#issuecomment-2706215907
for details.

Touches https://github.com/neondatabase/cloud/issues/22532.
Requires #11157.

## Summary of changes

This adds parameters and logic to enable repeated splits when a tenant's
largest timeline divided by shard count exceeds `split_threshold`, as
well as eager initial splits at a lower threshold to speed up initial
ingestion. The default parameters are all set such that they retain the
current behavior in production (only split into 8 shards once, at 64
GB).

* `split_threshold` now specifies a maximum shard size. When a shard
exceeds it, all tenant shards are split by powers of 2 such that all
tenant shards fall below `split_threshold`. Disabled by default, like
today.
* Add `max_split_shards` to specify a max shard count for autosplits.
Defaults to 8 to retain current behavior.
* Add `initial_split_threshold` and `initial_split_shards` to specify a
threshold and target count for eager splits of unsharded tenants.
Defaults to 64 GB and 8 shards to retain current production behavior.

Because this PR sets `initial_split_threshold` to 64 GB by default, it
has the effect of enabling autosplits by default. This was not the case
previously, since `split_threshold` defaults to None, but it is already
enabled across production and staging. This is temporary until we
complete the production rollout.

For more details, see code comments.

This must wait until #11157 has been deployed to Pageservers.

Once this has been deployed to production, we plan to change the
parameters to:

* `split-threshold`: 256 GB
* `initial-split-threshold`: 16 GB
* `initial-split-shards`: 4
* `max-split-shards`: 16

The final split points will thus be:

* Start: 1 shard
* 16 GB: 4 shards
* 1 TB: 8 shards
* 2 TB: 16 shards

We will then change the default settings to be disabled by default.

---------

Co-authored-by: John Spray <john@neon.tech>
2025-03-20 15:43:57 +00:00
Arpad Müller
5dd60933d3 Mark min_readable_lsn as required in the pageserver API spec (#11324)
Fully applies the changes of
https://github.com/neondatabase/cloud/pull/25233 to neon.git. The field
is always present in the Rust struct definition, so it can be marked as
required.

cc #10707
2025-03-20 15:25:09 +00:00
Gleb Novikov
2065074559 fast_import: put job status to s3 (#11284)
## Problem

`fast_import` binary is being run inside neonvms, and they do not
support proper `kubectl describe logs` now, there are a bunch of other
caveats as well: https://github.com/neondatabase/autoscaling/issues/1320

Anyway, we needed a signal if job finished successfully or not, and if
not — at least some error message for the cplane operation. And after [a
short
discussion](https://neondb.slack.com/archives/C07PG8J1L0P/p1741954251813609),
that s3 object is the most convenient at the moment.

## Summary of changes

If `s3_prefix` was provided to `fast_import` call, any job run puts a
status object file into `{s3_prefix}/status/fast_import` with contents
`{"done": true}` or `{"done": false, "error": "..."}`. Added a test as
well
2025-03-20 15:23:35 +00:00
Konstantin Knizhnik
3da70abfa5 Fix pageserver_try_receive (#11096)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1741176713523469

The problem is that this function is using `PQgetCopyData(shard->conn,
&resp_buff.data, 1 /* async = true */)`
to try to fetch next message. But this function returns 0 if the whole
message is not present in the buffer.
And input buffer may contain only part of message so result is not
fetched.

## Summary of changes

Use `PQisBusy` + `WaitEventSetWait` to check if data is available and
`PQgetCopyData(shard->conn, &resp_buff.data, 0)` to read whole message
in this case.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-20 15:21:00 +00:00
Anastasia Lubennikova
e5aef3747c feat(compute_ctl): allow to change audit_log_level for existing (#11308)
projects.

Preserve the information about the current audit log level in compute
state, so that we don't relaunch rsyslog on every spec change

https://github.com/neondatabase/cloud/issues/25349

---------

Co-authored-by: Tristan Partin <tristan@neon.tech>
2025-03-20 11:23:20 +00:00
Arpad Müller
91dad2514f storcon: also support tenant deletion for safekeepers (#11289)
If a tenant gets deleted, delete also all of its timelines. We assume
that by the time a tenant is being deleted, no new timelines are being
created, so we don't need to worry about races with creation in this
situation.

Unlike #11233, which was very simple because it listed the timelines and
invoked timeline deletion, this PR obtains a list of safekeepers to
invoke the tenant deletion on, and then invokes tenant deletion on each
safekeeper that has one or multiple timelines.

Alternative to #11233
Builds on #11288
Part of #9011
2025-03-20 10:52:21 +00:00
Dmitrii Kovalkov
9bf59989db storcon: add https API (#11239)
## Problem

Pageservers use unencrypted HTTP requests for storage controller API.

- Closes: https://github.com/neondatabase/cloud/issues/25524

## Summary of changes

- Replace hyper0::server::Server with http_utils::server::Server in
storage controller.
- Add HTTPS handler for storage controller API.
- Support `ssl_ca_file` in pageserver.
2025-03-20 08:22:02 +00:00
Tristan Partin
c6f5a58d3b Remove potential for SQL injection (#11260)
Timeline IDs do not contain characters that may cause a SQL injection,
but best to always play it safe.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-19 19:19:38 +00:00
Suhas Thalanki
5589efb6de moving LastWrittenLSNCache to Neon Extension (#11031)
## Problem

We currently have this code duplicated across different PG versions.
Moving this to an extension would reduce duplication and simplify
maintenance.

## Summary of changes

Moving the LastWrittenLSN code from PG versions to the Neon extension
and linking it with hooks.

Related Postgres PR: https://github.com/neondatabase/postgres/pull/590

Closes: https://github.com/neondatabase/neon/issues/10973

---------

Co-authored-by: Tristan Partin <tristan@neon.tech>
2025-03-19 17:29:40 +00:00
Folke Behrens
019a29748d proxy: Move release PR creation to Tuesday (#11306)
Move the creation of the proxy release PR to Tuesday mornings.
2025-03-19 16:45:29 +00:00
StepSecurity Bot
88ea855cff fix(ci): Fixing StepSecurity Flagged Issues (#11311)
This pull request is created by
[StepSecurity](https://app.stepsecurity.io/securerepo) at the request of
@areyou1or0.
 ## Summary

This pull request is created by
[StepSecurity](https://app.stepsecurity.io/securerepo) at the request of
@areyou1or0. Please merge the Pull Request to incorporate the requested
changes. Please tag @areyou1or0 on your message if you have any
questions related to the PR.
## Summary

This pull request is created by
[StepSecurity](https://app.stepsecurity.io/securerepo) at the request of
@areyou1or0. Please merge the Pull Request to incorporate the requested
changes. Please tag @areyou1or0 on your message if you have any
questions related to the PR.

## Security Fixes

### Least Privileged GitHub Actions Token Permissions

The GITHUB_TOKEN is an automatically generated secret to make
authenticated calls to the GitHub API. GitHub recommends setting minimum
token permissions for the GITHUB_TOKEN.

- [GitHub Security
Guide](https://docs.github.com/en/actions/security-guides/automatic-token-authentication#using-the-github_token-in-a-workflow)
- [The Open Source Security Foundation (OpenSSF) Security
Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions)
### Pinned Dependencies

GitHub Action tags and Docker tags are mutable. This poses a security
risk. GitHub's Security Hardening guide recommends pinning actions to
full length commit.

- [GitHub Security
Guide](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions)
- [The Open Source Security Foundation (OpenSSF) Security
Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies)
### Harden Runner

[Harden-Runner](https://github.com/step-security/harden-runner) is an
open-source security agent for the GitHub-hosted runner to prevent
software supply chain attacks. It prevents exfiltration of credentials,
detects tampering of source code during build, and enables running jobs
without `sudo` access. See how popular open-source projects use
Harden-Runner
[here](https://docs.stepsecurity.io/whos-using-harden-runner).

<details>
<summary>Harden runner usage</summary>

You can find link to view insights and policy recommendation in the
build log

<img
src="https://github.com/step-security/harden-runner/blob/main/images/buildlog1.png?raw=true"
width="60%" height="60%">

Please refer to
[documentation](https://docs.stepsecurity.io/harden-runner) to find more
details.
</details>



will fix https://github.com/neondatabase/cloud/issues/26141
2025-03-19 16:44:22 +00:00
Vlad Lazar
9ce3704ab5 pageseserver: rename cplane api to storage controller api (#11310)
## Problem

The pageserver upcall api was designed to work with control plane or the
storage controller.
We have completed the transition period and now the upcall api only
targets the storage controller.

## Summary of changes

Rename types accordingly and tweak some comments.
2025-03-19 16:29:52 +00:00
Alexey Kondratov
518269ea6a feat(compute): Add perf test for compute startup time breakdown (#11198)
## Problem

We had a recent Postgres startup latency (`start_postgres_ms`)
degradation, but it was only caught with SLO alerts. There was actually
an existing test for the same purpose -- `start_postgres_ms`, but it's
doing only two starts, so it's a bit noisy.

## Summary of changes

Add new compute startup latency test that does 100 iterations and
reports p50, p90 and p99 latencies.

Part of https://github.com/neondatabase/cloud/issues/24882
2025-03-19 16:11:33 +00:00
a-masterov
cf9d817a21 Add tests for some extensions currently not covered by the regression tests (#11191)
## Problem
Some extensions do not contain tests, which can be easily run on top of
docker-compose or staging.

## Summary of changes
Added the pg_regress based tests for `pg_tiktoken`, `pgx_ulid`, `pg_rag`
Now they will be run on top of docker-compose, but I intend to adopt
them to be run on top staging in the next PRs
2025-03-19 15:38:33 +00:00
John Spray
55cb07f680 pageserver: improve debuggability of timeline creation failures during chaos testing (#11300)
## Problem

We're seeing timeline creation failures that look suspiciously like some
race with the cleanup-deletion of initdb temporary directories. I
couldn't spot the bug, but we can make it a bit easier to debug.

Related: https://github.com/neondatabase/neon/issues/11296

## Summary of changes

- Avoid surfacing distracting ENOENT failure to delete as a log error --
this is fine, and can happen if timeline is cancelled while doing
initdb, or if initdb itself has an error where it doesn't write the dir
(this error is surfaced separately)
- Log after purging initdb temp directories
2025-03-19 13:38:03 +00:00
Christian Schwarz
0f20dae3c3 impr: merge pageserver_api::models::TenantConfig and pageserver::tenant::config::TenantConfOpt (#11298)
The only difference between
- `pageserver_api::models::TenantConfig` and
- `pageserver::tenant::config::TenantConfOpt`

at this point is that `TenantConfOpt` serializes with
`skip_serializing_if = Option::is_none`.
That is an efficiency improvement for all the places that currently
serde `models::TenantConfig` because new serializations will no longer
write `$fieldname: null` for each field that is `None` at runtime.

This should be particularly beneficial for Storcon, which stores
JSON-serialized `models::TenantConfig` in its DB.

# Behavior Changes


This PR changes the serialization behavior: we omit `None` fields
instead of serializing `$fieldname: null`).

So it's a data format change (see section on compatibility below).

And it changes API responses from Storcon and Pageserver.

## API Response Compatibility

Storcon returns the location description.
Afaik it is passed through into
- storcon_cli output
- storcon UI in console admin UI

These outputs will no longer contain `$fieldname: null` values,
which de-bloats the output (good).
But in storcon UI, it also serves as an editor "default", which
will be eliminated after a storcon with this PR is released.


## Data Format Compatibility


Backwards compat: new software reading old serialized data will
deserialize to the same runtime value because all the field types
are exactly the same and `skip_serializing_if` does not affect
deserialization.

Forward compat: old software reading data serialized by new software
will map absence fields in the serialized form to runtime value
`Option::None`. This is serde default behavior, see this playground
to convince yourself:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=f7f4e1a169959a3085b6158c022a05eb

The `serde(with="humantime_serde")` however behaves strangely:
if used on an `Option<Duration>`, it still requires the field to be
present,
unlike the serde default behavior shown in the previous paragraph.
The workaround is to set `serde(default)`.
Previously it was set on each individual field, but, we do have the
container attribute, so, set it there.
This requires deriving a `Default` impl, which, because all fields are
`Option`,
is non-magic.
See my notes here:
https://gist.github.com/problame/eddbc225a5d12617e9f2c6413e0cf799

# Future Work

We should have separate types (& crates) for
- runtime types configuration (e.g. PageServerConf::tenant_config,
AttachedLocationConf)
- `config-v1` file pageserver local disk file format
- `mgmt API`
- `pageserver.toml`

Right now they all use the same, which is convenient but makes it hard
to reason about compatibility breakage.

# Refs

- corresponding docs.neon.build PR
https://github.com/neondatabase/docs/pull/470
2025-03-19 12:47:17 +00:00
JC Grünhage
aedeb37220 fix(ci): put the BUILD_TAG of the upcoming release into RC PR artifacts (#11304)
## Problem
#11061 changed how artifacts for releases are built, by
reusing/retagging the artifacts from release PRs. This resulted in the
BUILD_TAG that's baked into the images to not be as expected.
Context: https://neondb.slack.com/archives/C08JBTT3R1Q/p1742333300129069

## Summary of changes
Set BUILD_TAG to the release tag of the upcoming release when running
inside release PRs.
2025-03-19 09:34:28 +00:00
Anastasia Lubennikova
6af974548e feat(compute_ctl): Add basic audit logging for computes. (#11170)
if `audit_log_level` is set to Log, 
preload pgaudit extension and log DDL with masked parameters into
standard postgresql log
2025-03-19 00:13:36 +00:00
Christian Schwarz
9fb77d6cdd buffered writer: add cancellation sensitivity (#11052)
In
-
https://github.com/neondatabase/neon/pull/10993#issuecomment-2690428336

I added infinite retries for buffered writer flush IOs, primarily to
gracefully handle ENOSPC but more generally so that the buffered writer
is not left in a state where reads from the surrounding InMemoryLayer
cause panics.

However, I didn't add cancellation sensitivity, which is concerning
because then there is no way to detach a timeline/tenant that is
encountering the write IO errors.
That’s a legitimate scenario in the case of some edge case bug. 
See the #10993 description for details.


This PR
- first makes flush loop infallible, enabled by infinite retries
- then adds sensitivity to `Timeline::cancel` to the flush loop, thereby
making it fallible in one specific way again
- finally fixes the InMemoryLayer/EphemeralFile/BufferedWriter
amalgamate to remain read-available after flush loop is cancelled.

The support for read-availability after cancellation is necessary so
that reads from the InMemoryLayer that are already queued up behind the
RwLock that wraps the BufferedWriter won't panic because of the
`mutable=None` that we leave behind in case the flush loop gets
cancelled.

# Alternatives

One might think that we can only ship the change for read-availability
if flush encounters an error, without the infinite retrying and/or
cancellation sensitivity complexity.

The problem with that is that read-availability sounds good but is
really quite useless, because we cannot ingest new WAL without a
writable InMemoryLayer. Thus, very soon after we transition to read-only
mode, reads from compute are going to wait anyway, but on `wait_lsn`
instead of the RwLock, because ingest isn't progressing.

Thus, having the infinite flush retries still makes more sense because
they're just "slowness" to the user, whereas wait_lsn is hard errors.
2025-03-18 18:48:43 +00:00
JC Grünhage
99639c26b4 fix(ci): update build-tools image references (#11293)
## Problem
https://github.com/neondatabase/neon/pull/11210 migrated pushing images
to ghcr. Unfortunately, it was incomplete in using images from ghcr,
which resulted in a few places referencing the ghcr build-tools image,
while trying to use docker hub credentials.

## Summary of changes
Use build-tools image from ghcr consistently.
2025-03-18 15:21:22 +00:00
Ivan Efremov
86fe26c676 fix(proxy): Fix testodrome HTTP header handling in proxy (#11292)
Relates to #22486
2025-03-18 15:14:08 +00:00
JC Grünhage
eb6efda98b impr(ci): move some kinds of tests to PR runs only (#11272)
## Problem
The pipelines after release merges are slower than they need to be at
the moment. This is because some kinds of tests/checks run on all kinds
of pipelines, even though they only matter in some of those.

## Summary of changes
Run `check-codestyle-{rust,python,jsonnet}`, `build-and-test-locally`
and `trigger-e2e-tests` only on regular PRs, not release PR or pushes to
main or release branches.
2025-03-18 13:49:34 +00:00
Conrad Ludgate
fd41ab9bb6 chore: remove x509-parser (#11247)
Both crates seem well maintained. x509-cert is part of the high quality
RustCrypto project that we already make heavy use of, and I think it
makes sense to reduce the dependencies where possible.
2025-03-18 13:05:08 +00:00
JC Grünhage
2dfff6a2a3 impr(ci): use ghcr.io as the default container registry (#11210)
## Problem
Docker Hub has new rate limits coming up, and to avoid problems coming
with those we're switching to GHCR.

## Summary of changes
- Push images to GHCR initially and distribute them from there
- Use images from GHCR in docker-compose
2025-03-18 11:30:49 +00:00
Arpad Müller
2cf6ae76fc storcon: move safekeeper related stuff out of service.rs (#11288)
There is no functional change here. We move safekeeper related code from
`service.rs` to `service/safekeeper_service.rs`, so that safekeeper
related stuff is contained in a single file. This also helps with
preventing `service.rs` from growing even further.

Part of #9011.
2025-03-18 09:00:53 +00:00
Dmitrii Kovalkov
57d51e949d tests: suppress excessive pageserver errors in test_timeline_ancestor_detach_errors (#11277)
## Problem

The test is flaky because of the same reasons as described in
https://github.com/neondatabase/neon/issues/11177.
The test has already suppressed these `WARN` and `ERROR` log messages,
but the regexp didn't match all possible errors.

## Summary of changes
- Change regexp to suppress all possible allowed error log messages.
2025-03-18 07:10:11 +00:00
Arpad Müller
0d3d639ef3 storcon: remove timeouts for safekeeper heartbeating (#11232)
PRs #10891 and #10902 have time-bounded the safekeeper heartbeating of
the storage controller. Those timeouts were not meant to be permanent,
but temporary until we figured out the reasons for the safekeeper
heartbeating causing problems.

Now they are better understood and resolved. A comment is
[here](https://github.com/neondatabase/cloud/issues/24396#issuecomment-2679342929),
but most importantly, we've had:

* #10954 to send heartbeats concurrently (before the issue was we sent
them sequentially, so the total time time was number of nodes times time
for timeout to be hit, now the total time is the maximum of all things
we are heartbeating)
* work to actually make heartbeats work and not error, i.e. JWT rollout
for storcon, not sending heartbeats to decomissioned safekeepers,
removal of decomissioned safekeepers from the databases

Part of https://github.com/neondatabase/cloud/issues/25473
2025-03-18 03:37:45 +00:00
Alex Chi Z.
05ca27c981 fix(pagectl/benches): scope context with debug tools (#11285)
## Problem


7c462b3417
requires all contexts have scopes. pagectl/benches don't have such
scopes.

close https://github.com/neondatabase/neon/issues/11280

## Summary of changes

Adding scopes for the tools.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-17 21:27:27 +00:00
Alex Chi Z.
bb64beffbb fix(pageserver): log compaction errors with timeline ids (#11231)
## Problem

Makes it easier to debug.

## Summary of changes

Log compaction errors with timeline ids.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-17 19:42:02 +00:00
Konstantin Knizhnik
24f41bee5c Update LFC in case of unlogged build (#11262)
## Problem

Unlogged build is used for GIST/SPGIST/GIN/HNSW indexes.
In this mode we first change relation class to `RELPERSISTENCE_UNLOGGED`
and save them on local disk.
But we do not save unlogged relations in LFC.
It may cause fetching incorrect value from LFC if relfilenode is reused.

## Summary of changes

Save modified pages in LFC on second stage of unlogged build (when
modified pages are walloged).
There is no need to save pages in LFC at first phase because the will be
in any case overwritten with assigned LSN at second phase.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-17 19:06:42 +00:00
Suhas Thalanki
a05c99f487 fix: removed anon pg extension (#10936)
## Problem

Removing the `anon` v1 extension in postgres as described in
https://github.com/neondatabase/cloud/issues/22663. This extension is
not built for postgres v17 and is out of date when compared to the
upstream variant which is v2 (we have v1.4).

## Summary of changes

Removed the `anon` v1 extension from being built or preloaded

Related to https://github.com/neondatabase/cloud/issues/22663
2025-03-17 18:23:32 +00:00
JC Grünhage
486ffeef6d fix(ci): don't have neon-test-extensions release tag push depend on compute-node-image build (#11281)
## Problem
Failures like
https://github.com/neondatabase/neon/actions/runs/13901493608/job/38896940612?pr=11272
are caused by the dependency on `compute-node-image`, which was wrong on
release jobs anyway.

## Summary of changes
Remove dependency on `compute-node-image` from the job
`add-release-tag-to-neon-test-extension-image`.
2025-03-17 16:31:49 +00:00
Arpad Müller
56149a046a Add test_explicit_timeline_creation_storcon and make it work (#11261)
Adds a basic test that makes the storcon issue explicit creation of a
timeline on safeekepers (main storcon PR in #11058). It was adapted from
`test_explicit_timeline_creation` from #11002.

Also, do a bunch of fixes needed to get the test work (the API
definitions weren't correct), and log more stuff when we can't create a
new timeline due to no safekeepers being active.

Part of #9011

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2025-03-17 16:28:21 +00:00
Roman Zaynetdinov
db30e1669c Add /configure_telemetry API endpoint (#11117)
Work on https://github.com/neondatabase/cloud/issues/23721 and
https://github.com/neondatabase/cloud/issues/23714

Depends on https://github.com/neondatabase/neon/pull/11111

- Add `/configure_telemetry` API endpoint
- Support second rsyslog configuration for Postgres logs export
- Enable logs export when compute feature is enabled and configure
Postgres to send logs to syslog

I have used `/configure_telemetry` name because in the future I see it
also being used for configuring a `pg_tracing` extension to export
traces. Let me know if you'd rather have these APIs separate. In this
case we can rename it to `/configure_rsyslog`.
2025-03-17 13:53:23 +00:00
JC Grünhage
fdf04d4d81 fix(ci): use correct branch ref for checking whether this is a release merge queue (#11270)
## Problem

https://github.com/neondatabase/neon/actions/runs/13894288475/job/38871819190
shows the "Add fast-fordward label to PR to trigger fast-forward merge"
job being skipped. This is due to not using the right variable for
checking which branch the merge queue is merging into.

## Summary of changes
Use the `branch` output of the `meta` task for checking the target
branch of a merge group.
2025-03-17 09:26:45 +00:00
Alexander Bayandin
136cae76c2 fix(ci): correct regex to detect release-compute RC PRs (#11269)
## Problem
The regex in `_meta.yml` workflow doesn't detect RC PRs for compute
releases:
https://neondb.slack.com/archives/C059ZC138NR/p1742164884669389

## Summary of changes
- Fix regex

---------

Co-authored-by: Peter Bendel <peterbendel@neon.tech>
2025-03-17 07:25:12 +00:00
Konstantin Knizhnik
15e63afe7d Support DEBUG_COMPARE_LOCAL mode for unloggedindex build (#11257)
## Problem

In unlogged index build (used fir GIST/SPGIST/GIN indexes) files is
created on disk and then removed at the end.
It contradicts to the logic of DEBUG_COMPARE_LOCAL mode.

## Summary of changes

Do not create and unlink files in unlogged build in DEBUG_COMPARE_LOCAL
mode.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-17 06:07:24 +00:00
Alexey Kondratov
966abd3bd6 fix(compute_ctl): Dollar escaping helper fixes (#11263)
## Problem

In the previous PR #11045, one edge-case wasn't covered, when an ident
contains only one `$`, we were picking `$$` as a 'wrapper'. Yet, when
this `$` is at the beginning or at the end of the ident, then we end up
with `$$$` in a row which breaks the escaping.

## Summary of changes

Start from `x` tag instead of a blank string.

Slack:
https://neondb.slack.com/archives/C08HV951W2W/p1742076675079769?thread_ts=1742004205.461159&cid=C08HV951W2W
2025-03-16 18:39:54 +00:00
Alexey Kondratov
8566cad23b chore(docs): Refresh RFC guide to suggest using YYYY-MM-DD prefix (#11252)
## Problem

Serial/numeric IDs lead to collisions, which is not critical but looks
awkward.
Previous discussion:
https://neondb.slack.com/archives/C033A2WE6BZ/p1741891345869979

## Summary of changes

Suggest using the `YYYY-MM-DD` prefix, which i) has less chance of
collision; ii) provides out-of-the-box lexicographic sorting; iii) even
if it collides, it's not a big deal -- just two RFCs have been started
on the same day.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-03-16 17:17:58 +00:00
Peter Bendel
228bb75354 Extend large tenant OLTP workload ... (#11166)
... to better match the workload characteristics of real Neon customers

## Problem

We analyzed workloads of large Neon users and want to extend the oltp
workload to include characteristics seen in those workloads.

## Summary of changes

- for re-use branch delete inserted rows from last run
- adjust expected run-time (time-outs) in GitHub workflow
- add queries that exposes the prefetch getpages path
- add I/U/D transactions for another table (so far the workload was
insert/append-only)
- add an explicit vacuum analyze step and measure its time
- add reindex concurrently step and measure its time (and take care that
this step succeeds even if prior reindex runs have failed or were
canceled)
- create a second connection string for the pooled connection that
removes the `-pooler` suffix from the hostname because we want to run
long-running statements (database maintenance) and bypass the pooler
which doesn't support unlimited statement timeout

## Test run


https://github.com/neondatabase/neon/actions/runs/13851772887/job/38760172415
2025-03-16 14:04:48 +00:00
Cihan Demirci
a5b00b87ba CI(pre-merge-checks): use step-security/changed-files (#11265)
Use Step Security maintained version of `tj-actions/changed-files`.

https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised#use-the-stepsecurity-maintained-changed-files-action
2025-03-16 13:53:27 +00:00
John Spray
a674ed8caf storcon: safety check when completing shard split (#11256)
## Problem

There is a rare race between controller graceful deployment and shard
splitting where we may incorrectly both abort _and_ complete the split
(on different pods), and thereby leave no shards at all in the database.

Related: #11254

## Summary of changes

- In complete_shard_split, refuse to delete anything if child shards are
not found
2025-03-14 20:08:24 +00:00
Erik Grinaker
53d50c7ea5 pageserver: deflake compaction tests (#11246)
These need to set `NoYield`, otherwise they may be preempted by pending
L0 compaction.
2025-03-14 17:45:18 +00:00
Dmitrii Kovalkov
3168bd0e3a tests: suppress "Cancelled request finished with an error" in test_timeline_archive (#11241)
## Problem

Previous PR https://github.com/neondatabase/neon/pull/11190 didn't
suppress `Cancelled request finished with an error` messages, which are
also expected, so the test
https://github.com/neondatabase/neon/issues/11177 is still flaky.

## Summary of changes
- Suppress `Cancelled request finished with an error` in
`test_timeline_archive`
2025-03-14 17:42:09 +00:00
Alexander Bayandin
4a97cd0b7e test_runner: fix tests with jsonnet for Python 3.13 (#11240)
## Problem
Python's `jsonnet` 0.20.0 doesn't support Python 3.13, so we have a
couple of tests xfailed because of that.

## Summary of changes
- Bump `jsonnet` to `0.21.0rc2` which supports Python 3.13
- Unxfail `test_sql_exporter_metrics_e2e` and
`test_sql_exporter_metrics_smoke` on Python 3.13
2025-03-14 17:02:55 +00:00
Anastasia Lubennikova
b7c6738524 feat(compute_ctl): add pgaudt log gc to compute_ctl (#11169)
- add pgaudt_gc thread to compute_ctl
to cleanup old pgaudit logs if they exist.
pgaudit can rotate files, but it doesn't delete the old files
  
- Add AUDIT_LOG_DIR_SIZE metric to compute_ctl
to track the size of the audit log directory in bytes.

- Fix permissions for rsyslog state files directory
2025-03-14 14:08:16 +00:00
Conrad Ludgate
7fe5a689b4 feat(proxy): export ingress metrics (#11244)
## Problem

We exposed the direction tag in #10925 but didn't actually include the
ingress tag in the export to allow for an adaption period.

## Summary of changes

We now export the ingress direction
2025-03-14 13:54:57 +00:00
Dmitrii Kovalkov
b0922967e0 Bump humantime version and remove advisories.ignore (#11242)
## Problem

- Closes:
https://github.com/neondatabase/neon/issues/11179#issuecomment-2724222041

## Summary of changes
- Bump humantime version to `2.2`
- Remove `RUSTSEC-2025-0014` from `advisories.ignore`
2025-03-14 11:51:11 +00:00
Dmitrii Kovalkov
f68be2b5e2 safekeeper: https for management API (#11171)
## Problem

Storage controller uses unencrypted HTTP requests for safekeeper
management API.

- Closes: https://github.com/neondatabase/cloud/issues/24836

## Summary of changes

- Replace `hyper0::server::Server` with `http_utils::server::Server` in
safekeeper.
- Add HTTPS handler for safekeeper management API.
2025-03-14 11:41:22 +00:00
Christian Schwarz
04370b48b3 fix(storcon): optimization validation makes decisions based on wrong SecondaryProgress (#11229)
# Refs

- fixes https://github.com/neondatabase/neon/issues/11228

# Problem High-Level

When storcon validates whether a `ScheduleOptimizationAction` should be
applied, it retrieves the `tenant_secondary_status` to determine whether
a secondary is ready for the optimization.

When collecting results, it associates secondary statuses with the wrong
optimization actions in the batch of optimizations that we're
validating.

The result is that we make the decision for shard/location X based on
the SecondaryStatus of a random secondary location Y in the current
batch of optimizations.

A possible symptom is an early cutover, as seen in this engineering
investigation here:
- https://github.com/neondatabase/cloud/issues/25734

# Problem Code-Level

This code here in `optimize_all_validate`


97e2e27f68/storage_controller/src/service.rs (L7012-L7029)

zips the `want_secondary_status` with the Vec returned from
`tenant_for_shards_api` .

However, the Vec returned from `want_secondary_status` is not ordered
(it uses FuturesUnordered internally).

# Solution

Sort the Vec in input order before returning it.

`optimize_all_validate` was the only caller affected by this problem

While at it, also future-proof similar-looking function
`tenant_for_shards`.
None of its callers care about the order, but this type of function
signature is easy to use incorrectly.

# Future Work

Avoid the additional iteration, map, and allocation.

Change API to leverage AsyncFn (async closure).
And/or invert `tenant_for_shards_api` into a Future ext trait / iterator
adaptor thing.
2025-03-14 11:21:16 +00:00
Arpad Müller
5359cf717c storcon: add API definitions for exclude_timeline and term_bump (#11197)
Adds API definitions for the safekeeper API endpoints `exclude_timeline`
and `term_bump`. Also does a bugfix to return the correct type from
`delete_timeline`.

Part of #8614
2025-03-14 00:00:37 +00:00
Erik Grinaker
d6d78a050f pageserver: disable l0_flush_wait_upload by default (#11215)
## Problem

This is already disabled in production, as it is replaced by L0 flush
delays. It will be removed in a later PR, once the config option is no
longer specified in production.

## Summary of changes

Disable `l0_flush_wait_upload` by default.
2025-03-13 21:08:28 +00:00
Erik Grinaker
4ff000c042 pageserver: deflake test_metadata_image_creation (#11230)
## Problem

`test_metadata_image_creation ` became flaky with #11212, since image
compaction may yield to L0 compaction.

## Summary of changes

Set `NoYield` when compacting in tenant tests.
2025-03-13 20:46:21 +00:00
Conrad Ludgate
9a3020d2ce chore(proxy): pre-initialise metricvecs (#11226)
## Problem

We noticed that error metrics didn't show for some services with light
load. This is not great and can cause problems for dashboards/alerts

## Summary of changes

Pre-initialise some metricvecs.
2025-03-13 20:23:53 +00:00
Alex Chi Z.
23b713900e feat(storcon): passthrough ancestor detach behavior (#11199)
## Problem

https://github.com/neondatabase/neon/issues/10310
https://github.com/neondatabase/neon/pull/11158

## Summary of changes

We need to passthrough the new detach behavior through the storcon API.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-13 20:21:23 +00:00
Arpad Müller
b1a1be6a4c switch pytests and neon_local to control_plane_hooks_api (#11195)
We want to switch away from and deprecate the `--compute-hook-url` param
for the storcon in favour of `--control-plane-url` because it allows us
to construct urls with `notify-safekeepers`.

This PR switches the pytests and neon_local from a
`control_plane_compute_hook_api` to a new param named
`control_plane_hooks_api` which is supposed to point to the parent of
the `notify-attach` URL.

We still support reading the old url from disk to not be too disruptive
with existing deployments, but we just ignore it.

Also add docs for the `notify-safekeepers` upcall API.

Follow-up of #11173
Part of https://github.com/neondatabase/neon/issues/11163
2025-03-13 19:50:52 +00:00
Erik Grinaker
8afae9d03c pageserver: enable l0_flush_delay_threshold by default (#11214)
## Problem

`l0_flush_delay_threshold` has already been set to 30 in production for
a couple of weeks. Let's harmonize the default.

## Summary of changes

Update `DEFAULT_L0_FLUSH_DELAY_FACTOR` to 3 such that the default
`l0_flush_delay_threshold` is `3 * compaction_threshold`.

This differs from the production setting, which is hardcoded to 30 (with
`compaction_threshold` at 10), and is more appropriate for any tenants
that have custom `compaction_threshold` overrides.
2025-03-13 19:15:22 +00:00
JC Grünhage
066b0a1be9 fix(ci): correctly push neon-test-extensions in releases and to ghcr (#11225)
## Problem
ef0d4a48a adjusted how we build container images and how we push them,
and the neon-test-extensions image was overlooked. Additionally, is was
also missed in 1f0dea9a1, which pushed our container images to GHCR.

## Summary of changes
Push neon-test-extensions to GHCR and also push release tags for it.
2025-03-13 18:18:55 +00:00
Konstantin Knizhnik
398d2794eb Handle DEBUG_COMPARE_LOCAL mode in neon_zeroextend (#11220)
## Problem

DEBUG_COMPARE_LOCAL is not supported in neon_zeroextend added in PG16

## Summary of changes

Add support of DEBUG_COMPARE_LOCAL in neon_zeroextend

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-13 16:30:32 +00:00
Erik Grinaker
3c3b9dc919 pageserver: enable image_creation_preempt_threshold by default (#11216)
## Problem

This is already set in production, we should harmonize the default.

## Summary of changes

Default `image_creation_preempt_threshold` to 3.
2025-03-13 16:28:21 +00:00
Christian Schwarz
ed31dd2a3c pageserver: better observability for slow wait_lsn (#11176)
# Problem

We leave too few observability breadcrumbs in the case where wait_lsn is
exceptionally slow.

# Changes

- refactor: extract the monitoring logic out of `log_slow` into
`monitor_slow_future`
- add global + per-timeline counter for time spent waiting for wait_lsn
- It is updated while we're still waiting, similar to what we do for
page_service response flush.
- add per-timeline counterpair for started & finished wait_lsn count
- add slow-logging to leave breadcrumbs in logs, not just metrics

For the slow-logging, we need to consider not flooding the logs during a
broker or network outage/blip.
The solution is a "log-streak-level" concurrency limit per timeline.
At any given time, there is at most one slow wait_lsn that is logging
the "still running" and "completed" sequence of logs.
Other concurrent slow wait_lsn's don't log at all.
This leaves at least one breadcrumb in each timeline's logs if some
wait_lsn was exceptionally slow during a given period.
The full degree of slowness can then be determined by looking at the
per-timeline metric.

# Performance

Reran the `bench_log_slow` benchmark, no difference, so, existing call
sites are fine.

We do use a Semaphore, but only try_acquire it _after_ things have
already been determined to be slow. So, no baseline overhead
anticipated.

# Refs

-
https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222
2025-03-13 15:03:53 +00:00
Conrad Ludgate
3dec117572 feat(compute_ctl): use TLS if configured (#10972)
Closes: https://github.com/neondatabase/cloud/issues/22998

If control-plane reports that TLS should be used, load the certificates
(and watch for updates), make sure postgres use them, and detects
updates.

Procedure:
1. Load certificates
2. Reconfigure postgres/pgbouncer
3. Loop on a timer until certificates have loaded
4. Go to 1

Notes:
1. We only run this procedure if requested on startup by control plane.
2. We needed to compile pgbouncer with openssl enabled
3. Postgres doesn't allow tls keys to be globally accessible - must be
read only to the postgres user. I couldn't convince the autoscaling team
to let me put this logic into the VM settings, so instead compute_ctl
will copy the keys to be read-only by postgres.
4. To mitigate a race condition, we also verify that the key matches the
cert.
2025-03-13 15:03:22 +00:00
Alex Chi Z.
b2286f5bcb fix(pageserver): don't panic if gc-compaction find no keys (#11200)
## Problem

There was a panic on staging that compaction didn't find any keys. This
is possible if all layers selected for compaction does not contain any
keys within the current shard.

## Summary of changes

Make panic an error. In the future, we can try creating an empty image
layer so that GC can clean up those layers. Otherwise, for now, we can
only rely on shard ancestor compaction to remove these data.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-13 14:38:45 +00:00
Erik Grinaker
c036fec065 pageserver: enable compaction_l0_first by default (#11212)
## Problem

`compaction_l0_first` has already been enabled in production for a
couple of weeks.

## Summary of changes

Enable `compaction_l0_first` by default.

Also set `CompactFlags::NoYield` in `timeline_checkpoint_handler`, to
ensure explicitly requested compaction runs to completion. This endpoint
is mainly used in tests, and caused some flakiness where tests expected
compaction to complete.
2025-03-13 14:28:42 +00:00
github-actions[bot]
f92f92b91b Proxy release 2025-03-13 2025-03-13 13:43:01 +00:00
JC Grünhage
89c7e4e917 fix(ci): use paranthesis for error handling in jq when fetching release PRs (#11217)
## Problem
#11061 introduced code fetching previous releases. #11151 introduced jq
error handling, which has also been applied in #11061, but parenthesis
have been missed.

## Summary of changes
Add parenthesis around error handling code.
2025-03-13 13:40:43 +00:00
Erik Grinaker
5a245a837d storcon: retain stripe size when autosplitting sharded tenants (#11194)
## Problem

Autosplits always request `DEFAULT_STRIPE_SIZE` for splits. However,
splits do not allow changing the stripe size of already-sharded tenants,
and will error out if it differs.

In #11168, we are changing the stripe size, which could hit this when
attempting to autosplit already sharded tenants.

Touches #11168.

## Summary of changes

Pass `new_stripe_size: None` when autosplitting already sharded tenants.
Otherwise, pass `DEFAULT_STRIPE_SIZE` instead of the shard identity's
stripe size, since we want to use the current default rather than an
old, persisted default.
2025-03-13 13:28:10 +00:00
devin-ai-integration[bot]
efb1df4362 fix: Change metric_unit from 'microseconds' to 'μs' in test_compute_ctl_api.py (#11209)
# Fix metric_unit length in test_compute_ctl_api.py

## Description
This PR changes the metric_unit from "microseconds" to "μs" in
test_compute_ctl_api.py to fix the issue where perf test results were
not being stored in the database due to the string exceeding the 10
character limit of the metric_unit column in the perf_test_results
table.

## Problem
As reported in Slack, the perf test results were not being uploaded to
the database because the "microseconds" string (12 characters) exceeds
the 10 character limit of the metric_unit column in the
perf_test_results table.

## Solution
Replace "microseconds" with "μs" in all metric_unit parameters in the
test_compute_ctl_api.py file.

## Testing
The changes have been committed and pushed. The PR is ready for review.

Link to Devin run:
https://app.devin.ai/sessions/e29edd672bd34114b059915820e8a853
Requested by: Peter Bendel

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: peterbendel@neon.tech <peterbendel@neon.tech>
2025-03-13 10:17:01 +00:00
github-actions[bot]
dbb205ae92 Proxy release 2025-03-13 2025-03-13 09:50:35 +00:00
JC Grünhage
803e6f908a fix(ci): fix syntax of lint-release-pr (#11208)
## Problem
A small adjustment in #11061 broke the lint-release-pr.sh script, and
the new version was neither tested nor linted. This has been done now,
the script is once again tested and passing `shellcheck`.

## Summary of changes
Add missing `el` of `elif` condition chain.
2025-03-13 09:42:38 +00:00
JC Grünhage
afc9524bc7 fix(ci): run lint-release-pr on head-ref (#11206)
## Problem
#11061 changed release pr creation, and I missed that the workflow will
checkout a would-be-merge of the rc branch and the release branch
instead of the head ref, unless explicitly instructed otherwise.

## Summary of changes
Check out head ref for linting the release PRs.
2025-03-13 08:17:33 +00:00
JC Grünhage
507353404c fix(ci): pass emtpy body when creating release PRs (#11203)
## Problem
#11061 changed release pr creation, and I missed that creating PRs using
`gh` in non-interactive environments *requires* `--body` instead of
defaulting to an empty body.

## Summary of changes
Explicitly set an empty body when creating release PRs.
2025-03-12 23:54:43 +00:00
JC Grünhage
48be4df3f3 fix(ci): fetch all refs in release PR creation (#11201)
## Problem
#11061 changed release PR creation, and I missed that we need to
explicitly fetch the whole history so that the relevant git refs and
objects are available.

## Summary of changes
- Fetch all git refs including history by setting fetch-depth to 0
- Reference release branch as a remote branch, because we haven't
checked it out locally
2025-03-12 22:32:38 +00:00
Alex Chi Z.
c3b3b507f7 feat(pageserver): support detaching behavior v2 (#11158)
## Problem

close https://github.com/neondatabase/neon/issues/10310

## Summary of changes

This patch adds a new behavior for the detach_ancestor API: detach with
multi-level ancestor and no reparenting. Though we can potentially
support multi-level + do reparenting / single-level + no-reparenting in
the future, as it's not required for the recovery/snapshot epic, I'd
prefer keeping things simple now that we only handle the old one and the
new one instead of supporting the full feature matrix.

I only added a test case of successful detaching instead of testing
failures. I'd like to make this into staging and add more tests in the
future.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-12 22:27:23 +00:00
JC Grünhage
ef0d4a48a8 Reuse artifacts from release PRs (#11061)
## Problem
When we release our components, we perform builds in the release PR,
then test the components, then merge the PR, and then build everything
*again*, run tests *again*, and only then start deployments.

To speed things up, we want to perform builds and run tests in the PR,
and start deployments using the existing artifacts from the release PR.

To make that possible, we need to have both CI pipelines running on the
same commit hash, which requires fast forwarding release. That only
works, if we have a commit in the PR that has the current release branch
state as an ancestor.

## Summary of changes
- Changes to release PR creation:
- Remove templates and automatic bodies for release PRs. The previous
template wasn't used anymore, and the automatic body we created in the
pipeline didn't contain any useful content anymore after the changees
here.
- Make it possible to select the source branch. For releases that aren't
cut from `main`, like https://github.com/neondatabase/neon/pull/11051,
we need a way to trigger the new flow from a different branch.
- Determine `release-branch` automatically from the component name
instead of passing that as well.
- Changes to the merge queue job:
- Rename `get-changed-files` to `meta` in preparation of additional data
being fetched as part of that job
- Fail the merge queue if we're trying to merge into a branch other than
main - this is to prevent non-fast-forward merges.
- Label PRs to branches other than main as `fast-forward`, to trigger
the fast-forward job
- Add a fast-forward job that can be triggered with the `fast-forward`
label that performs a fast-forward merge. This only happens if the PR
has `mergeable_state == clean`, so CI having passed.
- Build and Test on releases now skips building images, skips testing
images and skips triggering e2e tests. We add new tags to the images
from the release PR to tag them as release images, and we push them to
the prod registries.
2025-03-12 21:00:59 +00:00
Alex Chi Z.
8a5a739af0 test(pageserver): add small tenant compaction (#11049)
## Problem

close https://github.com/neondatabase/neon/issues/10881

## Summary of changes

Mock a tenant with very small amount of data.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-12 20:34:19 +00:00
Tristan Partin
5eed0e4b94 Add docs to performance/test_logical_replication.py on how to run the suite (#10175)
These docs are in tandem with what was recently published on the
internal docs site.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-12 17:31:09 +00:00
Tristan Partin
bb3c0ff251 Make collecting the installed extensions metric async (#11071)
If the goal is to make compute_ctl completely asynchronous, then this is
one step to getting there.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-12 16:09:02 +00:00
Conrad Ludgate
7aec1364dd chore(proxy): remove enum and composite type queries (#11178)
In our json encoding, we only need to know about array types.
Information about composites or enums are not actually used.

Enums are quite popular, needing to type query them when not needed can
add some latency cost for no gain.
2025-03-12 15:47:17 +00:00
Tristan Partin
40672b739e Move maybe_add_request_id_header middleware into middleware module (#11187)
This matches the authorization middleware.

---------

Signed-off-by: Tristan Partin <tristan@neon.tech>
Co-authored-by: Mikhail Kot <mikhail@neon.tech>
2025-03-12 15:34:46 +00:00
Vlad Lazar
02a83913ec storcon: do not update observed state on node activation (#11155)
## Problem

When a node becomes active, we query its locations and update the
observed state in-place.
This can race with the observed state updates done when processing
reconcile results.

## Summary of changes

The argument for this reconciliation step is that is reduces the need
for background reconciliations.
I don't think is actually true anymore. There's two cases.

1. Restart of node after drain. Usually the node does not go through the
offline state here, so observed locations
were not marked as none. In any case, there should be a handful of
shards max on the node since we've just drained it.
2. Node comes back online after failure or network partition. When the
node is marked offline, we reschedule everything away from it. When it
later becomes active, the previous observed location is extraneous and
requires a reconciliation anyway.

Closes https://github.com/neondatabase/neon/issues/11148
2025-03-12 15:31:28 +00:00
Erik Grinaker
c7717c85c7 storcon,pageserver: use persisted stripe size when loading unsharded tenants (#11193)
## Problem

When the storage controller and Pageserver loads tenants from persisted
storage, it uses `ShardIdentity::unsharded()` for unsharded tenants.
However, this replaces the persisted stripe size of unsharded tenants
with the default stripe size.

This doesn't really matter for practical purposes, since the stripe size
is meaningless for unsharded tenants anyway, but can cause consistency
check failures if the persisted stripe size differs from the default.
This was seen in #11168, where we change the default stripe size.

Touches #11168.

## Summary of changes

Carry over the persisted stripe size from `TenantShardPersistence` for
unsharded tenants, and from `LocationConf` on Pageservers.

Also add bounds checks for type casts when loading persisted shard
metadata.
2025-03-12 15:16:54 +00:00
Erik Grinaker
1436b8469c pageserver: appease unused lint on macOS (#11192)
## Problem

`info_span!` is only used in a `linux` branch, causing the unused lint
to fire on macOS.

## Summary of changes

Fully qualify the `info_span!` use.
2025-03-12 14:34:29 +00:00
JC Grünhage
fc515e7be2 chore(deps): bump env_logger to 0.11.7 (#11188)
## Problem
`humantime` is unmaintained, we want to migrate to `jiff`, see
https://github.com/neondatabase/neon/issues/11179.

`env_logger` in older versions depend on `humantime`, and newer versions
depend on `jiff`, so we need to update it.

## Summary of changes
Update `env_logger` to the most recent release, which does not depend on
`humantime` anymore.
2025-03-12 14:26:52 +00:00
John Spray
7015dbbdf0 storcon_cli: remove pre-warm helper (#11183)
## Problem

This command was used when onboarding tenants to the storage controller.
We no longer do that, so the command can go.

## Summary of changes

- Remove `storcon_cli tenant-warmup` command
2025-03-12 14:02:11 +00:00
Dmitrii Kovalkov
73e37ae388 Suppress "request was dropped" errors in test_timeline_archive (#11190)
## Problem

Test `test_timeline_archive` is flaky because it makes requests that are
intended to fail. It sometimes leads to warning in pageserver's logs.
More details are in the issue.

- Closes: https://github.com/neondatabase/neon/issues/11177

## Summary of changes
- Suppress such errors.
2025-03-12 13:23:31 +00:00
Vlad Lazar
1c0ff3c04d utils: explicit OTEL export config and OTEL enablement via common entry point (#11139)
We want to export performance traces from the pageserver in OTEL format.
End goal is to see them in Grafana.

To this end, there are two changes here:
1. Update the `tracing-utils` crate to allow for explicitly specifying
the export configuration. Pageserver configuration is loaded from a file
on start-up. This allows us to use the same flow for export configs
there.
2. Update the `utils::logging::init` common entry point to set up OTEL
tracing infrastructure if requested. Note that an entirely different
tracing subscriber is used. This is to avoid interference with the
existing tracing set-up. For now, no service uses this functionality.

PR to plug this into the pageserver is
[here](https://github.com/neondatabase/neon/pull/11140).

Related https://github.com/neondatabase/neon/issues/9873
2025-03-12 11:07:49 +00:00
John Spray
7bf6397334 pageserver: remove legacy TimelineInfo::latest_gc_cutoff field (1/2) (#11149)
## Problem

This field was retained for backward compat only in
https://github.com/neondatabase/neon/pull/10707.

Once https://github.com/neondatabase/cloud/pull/25233 is released,
nothing external will be reading this field.

Internally, this was a mandatory field so storage controller is still
trying to decode it, so we must do this removal in two steps: this PR
makes the field optional, and after one release we can fully remove it.

Related: https://github.com/neondatabase/cloud/issues/24250

## Summary of changes

- Rename field to `_unused`
- Remove field from swagger
- Make field optional
2025-03-12 10:23:41 +00:00
Konstantin Knizhnik
f60ffe3021 Rebase compare local debug mode (#11174)
## Problem

DEBUG_COMPARE_LOCAL mode is broken

See
https://neondb.slack.com/archives/C03QLRH7PPD/p1732862608323269?thread_ts=1732711054.862919&cid=C03QLRH7PPD

## Summary of changes

Fix compile errors and unlogged build issues.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-12 05:52:18 +00:00
Arpad Müller
da2431f11f storcon: add --control-plane-url config option (#11173)
Adds the `--control-plane-url` config option to the storcon, which we
want to migrate to instead of using `notify-attach`.

Part of #11163
2025-03-12 02:30:56 +00:00
JC Grünhage
e8396034ac fix(ci): fail meta using jq halt_error if data is unexpectedly missing (#11151)
## Problem
When the githb API is having problems, we might not get data back, and
are happily setting vars as empty. This causes problems down the line.
See
https://github.com/neondatabase/neon/actions/runs/13718859397/job/38381946590?pr=11132#step:5:1
for example.

## Summary of changes
Fail the `meta` job if we don't get expected data back from github.
2025-03-11 22:59:30 +00:00
Tristan Partin
decd265c99 Revert notify to 6.0.0 (#11162)
The upgrade to 8.0.0 caused severe performance regressions in the
start_postgres_ms metric, which measures the time it takes from execing
Postgres to the time Postgres marks itself as ready in the
postmaster.pid file. We use the notify crate to watch for changes in the
pgdata directory and the postmaster.pid file.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-11 22:18:09 +00:00
Christian Schwarz
158db414bf buffered writer: handle write errors by retrying all write IO errors indefinitely (#10993)
# Problem

If the Pageserver ingest path
(InMemoryLayer=>EphemeralFile=>BufferedWriter)
encounters ENOSPC or any other write IO error when flushing the mutable
buffer
of the BufferedWriter, the buffered writer is left in a state where
subsequent _reads_
from the InMemoryLayer it will cause a `must not use after we returned
an error` panic.

The reason is that
1. the flush background task bails on flush failure, 
2. causing the `FlushHandle::flush` function to fail at channel.recv()
and
3. causing the `FlushHandle::flush` function to bail with the flush
error,
4. leaving its caller `BufferedWriter::flush` with
`BufferedWriter::mutable = None`,
5. once the InMemoryLayer's RwLock::write guard is dropped, subsequent
reads can enter,
6. those reads find `mutable = None` and cause the panic.

# Context

It has always been the contract that writes against the BufferedWriter
API
must not be retried because the writer/stream-style/append-only
interface makes no atomicity guarantees ("On error, did nothing or a
piece of the buffer get appended?").

The idea was that the error would bubble up to upper layers that can
throw away the buffered writer and create a new one. (See our [internal
error handling policy document on how to handle e.g.
`ENOSPC`](c870a50bc0/src/storage/handling_io_and_logical_errors.md (L36-L43))).

That _might_ be true for delta/image layer writers, I haven't checked.

But it's certainly not true for the ingest path: there are no provisions
to throw away an InMemoryLayer that encountered a write error an
reingest the WAL already written to it.

Adding such higher-level retries would involve either resetting
last_record_lsn to a lower value and restarting walreceiver. The code
isn't flexible enough to do that, and such complexity likely isn't worth
it given that write errors are rare.

# Solution

The solution in this PR is to retry _any_ failing write operation
_indefinitely_ inside the buffered writer flush task, except of course
those that are fatal as per `maybe_fatal_err`.

Retrying indefinitely ensures that `BufferedWriter::mutable` is never
left `None` in the case of IO errors, thereby solving the problem
described above.

It's a clear improvement over the status quo.

However, while we're retrying, we build up backpressure because the
`flush` is only double-buffered, not infinitely buffered.

Backpressure here is generally good to avoid resource exhaustion, **but
blocks reads** and hence stalls GetPage requests because InMemoryLayer
reads and writes are mutually exclusive.
That's orthogonal to the problem that is solved here, though.

## Caveats

Note that there are some remaining conditions in the flush background
task where it can bail with an error. I have annotated one of them with
a TODO comment.
Hence the `FlushHandle::flush` is still fallible and hence the overall
scenario of leaving `mutable = None` on the bail path is still possible.
We can clean that up in a later commit.

Note also that retrying indefinitely is great for temporary errors like
ENOSPC but likely undesirable in case the `std::io::Error` we get is
really due to higher-level logic bugs.
For example, we could fail to flush because the timeline or tenant
directory got deleted and VirtualFile's reopen fails with ENOENT.

Note finally that cancellation is not respected while we're retrying.
This means we will block timeline/tenant/pageserver shutdown.
The reason is that the existing cancellation story for the buffered
writer background task was to recv from flush op channel until the
sending side (FlushHandle) is explicitly shut down or dropped.

Failing to handle cancellation carries the operational risk that even if
a single timeline gets stuck because of a logic bug such as the one laid
out above, we must still restart the whole pageserver process.

# Alternatives Considered

As pointed out in the `Context` section, throwing away a InMemoryLayer
that encountered an error and reingesting the WAL is a lot of complexity
that IMO isn't justified for such an edge case.
Also, it's wasteful.
I think it's a local optimum.

A more general and simpler solution for ENOSPC is to `abort()` the
process and run eviction on startup before bringing up the rest of
pageserver.
I argued for it in the past, the pro arguments are still valid and
complete:
https://neondb.slack.com/archives/C033RQ5SPDH/p1716896265296329
The trouble at the time was implementing eviction on startup.
However, maybe things are simpler now that we are fully storcon-managed
and all tenants have secondaries.
For example, if pageserver `abort()`s on ENOSPC and then simply don't
respond to storcon heartbeats while we're running eviction on startup,
storcon will fail tenants over to the secondary anyway, giving us all
the time we need to clean up.

The downside is that if there's a systemic space management bug, above
proposal will just propagate the problem to other nodes. But I imagine
that because of the delays involved with filling up disks, the system
might reach a half-stable state, providing operators more time to react.

# Demo

Intermediary commit `a03f335121480afc0171b0f34606bdf929e962c5` is demoed
in this (internal) screen recording:
https://drive.google.com/file/d/1nBC6lFV2himQ8vRXDXrY30yfWmI2JL5J/view?usp=drive_link

# Perf Testing

Ran `bench_ingest` on tmpfs, no measurable difference.

Spans are uniquely owned by the flush task, and the span stack isn't too
deep, so, enter and exit should be cheap.
Plus, each flush takes ~150us with direct IO enabled, so, not _that_
high frequency event anyways.

# Refs
- fixes https://github.com/neondatabase/neon/issues/10856
2025-03-11 20:40:23 +00:00
Christian Schwarz
083a30b1e2 storage broker: disable deploy by default (#11172)
context
-
https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222
- companion infra.git PR:
https://github.com/neondatabase/infra/pull/3249
2025-03-11 19:45:06 +00:00
Alex Chi Z.
7d221214bb feat(pageserver): support no-yield for gc-compaction (#11184)
## Problem

This should also resolve the test flakiness of `test_gc_feedback`.

close https://github.com/neondatabase/neon/issues/11144

## Summary of changes

If `NoYield` is set, do not yield in gc-compaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-11 19:13:52 +00:00
Dmitrii Kovalkov
8983677f29 Ignore cargo deny advisory RUSTSEC-2025-0014 for humantime (#11180)
## Problem

`humantime` is not maintained and `cargo deny check` fails

- Will be addressed in https://github.com/neondatabase/neon/issues/11179

## Summary of changes

Ignore RUSTSEC-2025-0014 advisory for now
2025-03-11 19:09:32 +00:00
Ivan Efremov
011f7c21a3 fix(proxy): Add testodrome query id HTTP header (#11167)
Handle "X-Neon-Query-ID" header to glue data with testodrome queries.

Relates to the #22486
2025-03-11 17:17:30 +00:00
Alex Chi Z.
7588983168 fix(scrubber): log even if no refs are found (#11160)
## Problem

Investigate https://github.com/neondatabase/neon/issues/11159

## Summary of changes

This doesn't fix the issue, but at least we can narrow down the cause
next time it happens by logging ancestor referenced layer cnt even if
it's 0.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-11 14:33:35 +00:00
Arseny Sher
359c64c779 walproposer: pre generations refactoring (#11060)
## Problem

https://github.com/neondatabase/neon/issues/10851

## Summary of changes

Do some refactoring before making walproposer generations aware.

- Rename SS_VOTING to SS_WAIT_VOTING, SS_IDLE to SS_WAIT_ELECTED
- Continue to get rid of epochs: rename GetEpoch to GetLastLogTerm,
donorEpoch to donorLastLogTerm
- Instead of counting n_votes, n_connected, introduce explicit
WalProposerState (collecting terms / voting / elected). Refactor out
TermsCollected and VotesCollected; they will determine state transition
differently depending whether generations are enabled or not.

There is no new logic in this PR and thus no new tests.
2025-03-11 14:01:00 +00:00
Erik Grinaker
f466c01995 pageserver: add max_logical_size_per_shard for get_top_tenants (#11157)
## Problem

In #11122, we want to split shards once the logical size of the largest
timeline exceeds a split threshold. However, `get_top_tenants` currently
only returns `max_logical_size`, which tracks the max _total_ logical
size of a timeline across all shards.

This is problematic, because the storage controller needs to fetch a
list of N tenants that are eligible for splits, but the API doesn't
currently have a way to express this. For example, with a split
threshold of 1 GB, a tenant with `max_logical_size` of 4 GB is eligible
to split if it has 1 or 2 shards, but not if it already has 4 shards. We
need to express this in per-shard terms, otherwise the `get_top_tenants`
endpoint may end up only returning tenants that can't be split, blocking
splits entirely.

Touches https://github.com/neondatabase/neon/pull/11122.
Touches https://github.com/neondatabase/cloud/issues/22532.

## Summary of changes

Add `TenantShardItem::max_logical_size_per_shard` containing
`max_logical_size / shard_count`, and
`TenantSorting::MaxLogicalSizePerShard` to order and filter by it.
2025-03-11 11:43:55 +00:00
Conrad Ludgate
d1b60fa0b6 fix(proxy): delete prepared statements when discarding (#11165)
Fixes https://github.com/neondatabase/serverless/issues/144

When tables have enums, we need to perform type queries for that data.
We cache these query statements for performance reasons. In Neon RLS, we
run "discard all" for security reasons, which discards all the
statements. When we need to type check again, the statements are no
longer valid.

This fixes it to discard the statements as well.

I've also added some new logs and error types to monitor this. Currently
we don't see the prepared statement errors in our logs.
2025-03-11 10:48:50 +00:00
Christian Schwarz
7c462b3417 impr: propagate VirtualFile metrics via RequestContext (#7202)
# Refs

- fixes https://github.com/neondatabase/neon/issues/6107

# Problem

`VirtualFile` currently parses the path it is opened with to identify
the `tenant,shard,timeline` labels to be used for the `STORAGE_IO_SIZE`
metric.

Further, for each read or write call to VirtualFile, it uses
`with_label_values` to retrieve the correct metrics object, which under
the hood is a global hashmap guarded by a parking_lot mutex.

We perform tens of thousands of reads and writes per second on every
pageserver instance; thus, doing the mutex lock + hashmap lookup is
wasteful.

# Changes

Apply the technique we use for all other timeline-scoped metrics to
avoid the repeat `with_label_values`: add it to `TimelineMetrics`.

Wrap `TimelineMetrics` into an `Arc`.

Propagate the `Arc<TimelineMetrics>` down do `VirtualFile`, and use
`Timeline::metrics::storage_io_size`.

To avoid contention on the `Arc<TimelineMetrics>`'s refcount atomics
between different connection handlers for the same timeline, we wrap it
into another Arc.

To avoid frequent allocations, we store that Arc<Arc<TimelineMetrics>>
inside the per-connection timeline cache.

Preliminary refactorings to enable this change:
- https://github.com/neondatabase/neon/pull/11001
- https://github.com/neondatabase/neon/pull/11030


# Performance

I ran the benchmarks in
`test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py`
on an `i3en.3xlarge` because that's what we currently run them on.

None of the benchmarks shows a meaningful difference in latency or
throughput or CPU utilization.

I would have expected some improvement in the
many-tenants-one-client-each workload because they all hit that hashmap
constantly, and clone the same `UintCounter` / `Arc` inside of it.

But apparently the overhead is miniscule compared to the remaining work
we do per getpage.

Yet, since the changes are already made, the added complexity is
manageable, and the perf overhead of `with_label_values` demonstrable in
micro-benchmarks, let's have this change anyway.
Also, propagating TimelineMetrics through RequestContext might come in
handy down the line.

The micro-benchmark that demonstrates perf impact of
`with_label_values`, along with other pitfalls and mitigation techniques
around the `metrics`/`prometheus` crate:
- https://github.com/neondatabase/neon/pull/11019

# Alternative Designs

An earlier iteration of this PR stored an `Arc<Arc<Timeline>>` inside
`RequestContext`.
The problem is that this risks reference cycles if the RequestContext
gets stored in an object that is owned directly or indirectly by
`Timeline`.

Ideally, we wouldn't be using this mess of Arc's at all and propagate
Rust references instead.
But tokio requires tasks to be `'static`, and so, we wouldn't be able to
propagate references across task boundaries, which is incompatible with
any sort of fan-out code we already have (e.g. concurrent IO) or future
code (parallel compaction).
So, opt for Arc for now.
2025-03-11 07:23:06 +00:00
Christian Schwarz
420f7b07b4 add benchmark demonstrating metrics/prometheus crate multicore scalability pitfalls & workarounds (#11019)
We use the `metrics` / `prometheus` crate in the Pageserver code base.

This PR demonstrates
- typical performance pitfalls with that crate
- our current set of techniques to avoid most of these pitfalls.

refs
- https://github.com/neondatabase/neon/issues/10948
- https://github.com/neondatabase/neon/pull/7202 
- I applied the `label_values__cache_label_values_lookup` technique
there.
  - It didn't yield measurable results in high-level benchmarks though.
2025-03-11 07:22:56 +00:00
Arpad Müller
4d3c477689 storcon: timetime table, creation and deletion (#11058)
This PR extends the storcon with basic safekeeper management of
timelines, mainly timeline creation and deletion. We want to make the
storcon manage safekeepers in the future. Timeline creation is
controlled by the `--timelines-onto-safekeepers` flag.

1. it adds the `timelines` and `safekeeper_timeline_pending_ops` tables
to the storcon db
2. extend code for the timeline creation and deletion
4. it adds per-safekeeper reconciler tasks 

TODO:

* maybe not immediately schedule reconciliations for deletions but have
a prior manual step
* tenant deletions
* add exclude API definitions (probably separate PR)
* how to choose safekeeper to do exclude on vs deletion? this can be a
bit hairy because the safekeeper might go offline in the meantime.
* error/failure case handling
* tests (cc test_explicit_timeline_creation from #11002)
* single safekeeper mode: we often only have one SK (in tests for
example)
* `notify-safekeepers` hook:
https://github.com/neondatabase/neon/issues/11163

TODOs implemented:

* cancellations of enqueued reconciliations on a per-timeline basis,
helpful if there is an ongoing deletion
* implement pending ops overwrite behavior
* load pending operations from db

RFC section for important reading:
[link](https://github.com/neondatabase/neon/blob/main/docs/rfcs/035-safekeeper-dynamic-membership-change.md#storage_controller-implementation)

Implements the bulk of #9011
Successor of #10440.

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2025-03-11 02:31:22 +00:00
Alex Chi Z.
3451bdd3d2 fix(test): force L0 compaction before gc-compaction (#11143)
## Problem

Fix test flakyness of `test_gc_feedback`

Closes: https://github.com/neondatabase/neon/issues/11153

## Summary of changes

Looking at the log, gc-compaction is interrupted by L0 compaction.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-10 20:03:49 +00:00
Konstantin Knizhnik
fb1957936c Fix caclulation of LFC used_pages (#11095)
## Problem

Async prefetch in LFC PR cause incorrect calculation of LFC
`used_pages`when page is overwritten

## Summary of changes

Decrement `used_pages` is page is overwritten.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Matthias van de Meent <matthias@neon.tech>
2025-03-10 18:28:55 +00:00
Matthias van de Meent
bc052fd0fc Add configuration options to disable prevlink checks (#11138)
This allows for improved decoding of otherwise broken WAL.

## Problem

Currently, if (or when) a WAL record has a wrong prevptr, that breaks
decoding. With this, we don't have to break on that if we decide it's OK
to proceed after that.

## Summary of changes

Use a Neon GUC to allow the system to enable the NEON-specific
skip_lsn_checks option in XLogReader.
2025-03-10 17:02:30 +00:00
Vlad Lazar
8c553297cb safekeeper: use max end lsn as start of next batch (#11152)
## Problem

Partial reads are still problematic. They are stored in the buffer of
the wal decoder and result in gaps being reported too eagerly on the
pageserver side.

## Summary of changes

Previously, we always used the start LSN of the chunk of WAL that was
just read. This patch switches to using the end LSN of the last record
that was decoded in the previous iteration.
2025-03-10 16:33:28 +00:00
Dmitrii Kovalkov
63b22d3fb1 pageserver: https for management API (#11025)
## Problem

Storage controller uses unencrypted HTTP requests for pageserver
management API.

Closes: https://github.com/neondatabase/cloud/issues/24283


## Summary of changes
- Implement `http_utils::server::Server` with TLS support.
- Replace `hyper0::server::Server` with `http_utils::server::Server` in
pageserver.
- Add HTTPS handler for pageserver management API.
- Generate local SSL certificates in neon local.
2025-03-10 15:07:59 +00:00
JC Grünhage
f17931870f fix(ci): use <!subteam^ID> syntax for pinging groups on slack (#11135)
## Problem
Pinging groups on slack didn't work, because I didn't use the correct
syntax.

## Summary of changes
Use the correct syntax for pinging groups.
2025-03-10 13:27:23 +00:00
Arpad Müller
33c3c34c95 Appease cargo deny errors (#11142)
* pprof can also use `prost` as a backend, switch to it as `protobuf`
has no update available but a security issue.
* `paste` is a build time dependency, so add the unmaintained warning as
an exception.
2025-03-10 13:24:14 +00:00
Ivan Efremov
5d38fd6c43 fix(proxy): Use testodrome query id for latency measurement (#11150)
Add a new neon option "neon_query_id" to glue data with testodrome
queries. Log latency in microseconds always.

Relates to the #22486
2025-03-10 12:55:16 +00:00
Dmitrii Kovalkov
66881b4394 storcon: update scheduler stats when changing node's preferred az (#11147)
## Problem

`home_shard_count` is not updated on the preferred AZ change.

Closes: https://github.com/neondatabase/neon/issues/10493

## Summary of changes
- Update scheduler stats (node ref counts) on preferred AZ change.
2025-03-10 11:33:00 +00:00
Konstantin Knizhnik
c87d307e8c Print state of connection buffer when no response is receioved from PS for a long time (#11145)
## Problem

See https://neondb.slack.com/archives/C08DE6Q9C3B

Sometimes compute is not able to receive responses from PS for a long
time (minutes).
I do not think that the problem is at compute side, but in order to
exclude this possibility I wan to see more information about connection
state at compute side, particularly amount of data cached in connection
buffer.

## Summary of changes

Right now we are dumping state of socket buffer.
This PR adds printing state of connection buffer.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-09 18:36:36 +00:00
Tristan Partin
1b8c4286c4 Fetch remote extension in ALTER EXTENSION UPDATE statements (#11102)
Previously, remote extensions were not fetched unless they were used in
some other manner. For instance, loading a BM25 index in pg_search
fetches the pg_search extension. However, if on a fresh compute with
pg_search 0.15.5 installed, the user ran `ALTER EXTENSION pg_search
UPDATE TO '0.15.6'` without first using the pg_search extension, we
would not fetch the extension and fail to find an update path.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-09 17:29:44 +00:00
Tristan Partin
3fe5650039 Fix dropping role with table privileges granted by non-neon_superuser (#10964)
We were previously only revoking privileges granted by neon_superuser.
However, we need to do it for all grantors.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-07 19:00:11 +00:00
Alex Chi Z.
cd438406fb feat(pageserver): add force patch index_part API (#11119)
## Problem

As part of the disaster recovery tool. Partly for
https://github.com/neondatabase/neon/issues/9114.

## Summary of changes

* Add a new pageserver API to force patch the fields in index_part and
modify the timeline internal structures.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-07 17:42:52 +00:00
Dmitrii Kovalkov
e876794ce5 storcon: use https safekeeper api (#11065)
## Problem

Storage controller uses http for requests to safekeeper management API.

Closes: https://github.com/neondatabase/cloud/issues/24835

## Summary of changes
- Add `use_https_safekeeper_api` option to storcon to use https api
- Use https for requests to safekeeper management API if this option is
enabled
- Add `ssl_ca_file` option to storcon for ability to specify custom root
CA certificate
2025-03-07 17:22:47 +00:00
John Spray
87e6117dfd storage controller: API-driven graceful migrations (#10913)
## Problem

The current migration API does a live migration, but if the destination
doesn't already have a secondary, that live migration is unlikely to be
able to warm up a tenant properly within its timeout (full warmup of a
big tenant can take tens of minutes).

Background optimisation code knows how to do this gracefully by creating
a secondary first, but we don't currently give a human a way to trigger
that.

Closes: https://github.com/neondatabase/neon/issues/10540

## Summary of changes

- Add `prefererred_node` parameter to TenantShard, which is respected by
optimize_attachment
- Modify migration API to have optional prewarm=true mode, in which we
set preferred_node and call optimize_attachment, rather than directly
modifying intentstate
- Require override_scheduler=true flag if migrating somewhere that is a
less-than-optimal scheduling location (e.g. wrong AZ)
- Add `origin_node_id` to migration API so that callers can ensure
they're moving from where they think they're moving from
- Add tests for the above

The storcon_cli wrapper for this has a 'watch' mode that waits for
eventual cutover. This doesn't show the warmth of the secondary evolve
because we don't currently have an API for that in the controller, as
the passthrough API only targets attached locations, not secondaries. It
would be straightforward to add later as a dedicated endpoint for
getting secondary status, then extend the storcon_cli to consume that
and print a nice progress indicator.
2025-03-07 17:02:38 +00:00
Vlad Lazar
084fc4a757 pageserver: enable previous heatmaps by default (#11132)
We add the off by default configs in
https://github.com/neondatabase/neon/pull/11088 because
the unarchival heatmap was causing oversized secondary locations. That
was fixed in https://github.com/neondatabase/neon/pull/11098, so let's
turn them on by default.
2025-03-07 16:05:31 +00:00
Vlad Lazar
937876cbe2 safekeeper: don't skip empty records for shard zero (#11137)
## Problem

Shard zero needs to track the start LSN of the latest record
in adition to the LSN of the next record to ingest. The former
is included in basebackup persisted by the compute in WAL.

Previously, empty records were skipped for all shards. This caused
the prev LSN tracking on the PS to fall behind and led to logical
replication
issues.

## Summary of changes

Shard zero now receives emtpy interpreted records for LSN tracking
purposes.
A test is included too.
2025-03-07 15:52:01 +00:00
Alexander Lakhin
a4ce20db5c Support workflow_dispatch event in _meta.yml (#11133)
## Problem
Allow for using _meta.yml with workflow_dispatch event.

## Summary of changes
Handle this event in the run-kind step; fix and update the description
of the run-kind output.
2025-03-07 15:00:06 +00:00
Erik Grinaker
eedd179f0c storcon: initial autosplit tweaks (#11134)
## Problem

This patch makes some initial tweaks as preparation for
https://github.com/neondatabase/cloud/issues/22532, where we will be
introducing additional autosplit logic.

The plan is outlined in
https://github.com/neondatabase/cloud/issues/22532#issuecomment-2706215907.

## Summary of changes

Minor code cleanups and behavioral changes:

* Decide that we'll split based on `max_logical_size` (later possibly
`total_logical_size`).
* Fix a bug that would split the smallest candidate rather than the
largest.
* Pick the largest candidate by `max_logical_size` rather than
`resident_size`, for consistency (debatable).
* Split out `get_top_tenant_shards()` to fetch split candidates.
* Fetch candidates concurrently from all nodes.
* Make `TenantShard.get_scheduling_policy()` return a copy instead of a
reference.
2025-03-07 14:38:01 +00:00
Arpad Müller
f1b18874c3 storcon: require safekeeper jwt's in strict mode (#11116)
We have introduced the ability to specify safekeeper JWTs for the
storage controller. It now does heartbeats. We now want to also require
the presence of those JWTs.

Let's merge this PR shortly after the release cutoff.

Part of / follow-up of
https://github.com/neondatabase/cloud/issues/24727
2025-03-07 13:29:48 +00:00
Fedor Dikarev
db77896e92 remove CODEOWNER assignement for the test_runner/ (#11130)
## Problem
That adds an extra unnecessary load to the PerfCorr team

## Summary of changes
Remove `CODEOWNERS` assignment for the `test_runner/` folder
2025-03-07 12:38:27 +00:00
Alexey Kondratov
f5aa8c3eac feat(compute_ctl): Add a basic HTTP API benchmark (#11123)
## Problem

We just had a regression reported at
https://neondb.slack.com/archives/C08EXUJF554/p1741102467515599, which
clearly came with one of the releases. It's not a huge problem yet, but
it's annoying that we cannot quickly attribute it to a specific commit.

## Summary of changes

Add a very simple `compute_ctl` HTTP API benchmark that does 10k
requests to `/status` and `metrics.json` and reports p50 and p99.

---------

Co-authored-by: Peter Bendel <peterbendel@neon.tech>
2025-03-07 12:35:42 +00:00
Arpad Müller
cea67fc062 update ring to 0.17.13 (#11131)
Update ring from 0.17.6 to 0.17.13. Addresses the advisory:
https://rustsec.org/advisories/RUSTSEC-2025-0009
2025-03-07 12:17:04 +00:00
Alex Chi Z.
e825974a2d feat(pageserver): yield gc-compaction to L0 compaction (#11120)
## Problem

Part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

gc-compaction could take a long time in some cases, for example, if the
job split heuristics is wrong and we selected a too large region for
compaction that can't be finished within a reasonable amount of time. We
will give up such tasks and yield to L0 compaction. Each gc-compaction
sub-compaction job is atomic and cannot be split further so we have to
give up (instead of storing a state and continue later as in image
creation).

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-06 20:30:11 +00:00
Fedor Dikarev
50d883d516 Add performance-correctness to the CODEOWNERS (#11124)
## Problem
After splitting teams it became a bit more complicated for the PerfCorr
team to work on tests changes.

## Summary of changes
1. Add PerfCorr team co-owners for `.github/` folder
2. Add PerCorr team as owner for `test_runner/` folder
2025-03-06 19:59:17 +00:00
Alexey Kondratov
a485022300 fix(compute_ctl): Properly escape identifiers inside PL/pgSQL blocks (#11045)
## Problem

In f37eeb56, I properly escaped the identifier, but I haven't noticed
that the resulting string is used in the `format('...')`, so it needs
additional escaping. Yet, after looking at it closer and with Heikki's
and Tristan's help, it appeared to be that it's a full can of worms and
we have problems all over the code in places where we use PL/pgSQL
blocks.

## Summary of changes

Add a new `pg_quote_dollar()` helper to deal with it, as dollar-quoting
of strings seems to be the only robust way to escape strings in dynamic
PL/pgSQL blocks. We mimic the Postgres' `pg_get_functiondef` logic here
[1].

While on it, I added more tests and caught a couple of more bugs with
string escaping:

1. `get_existing_dbs_async()` was wrapping `owner` in additional
double-quotes if it contained special characters
2. `construct_superuser_query()` was flawed in even more ways than the
rest of the code. It wasn't realistic to fix it quickly, but after
thinking about it more, I realized that we could drop most of it
altogether. IIUC, it was added as some sort of migration, probably back
when we haven't had migrations yet. So all the complicated code was
needed to properly update existing roles and DBs. In the current Neon,
this code only runs before we create the very first DB and role. When we
create roles and DBs, all `neon_superuser` grants are added in the
different places. So the worst thing that could happen is that there is
an ancient branch somewhere, so when users poke it, they will realize
that not all Neon features work as expected. Yet, the fix is simple and
self-serve -- just create a new role via UI or API, and it will get a
proper `neon_superuser` grant.

[1]:
8b49392b27/src/backend/utils/adt/ruleutils.c (L3153)

Closes neondatabase/cloud#25048
2025-03-06 19:54:29 +00:00
Anastasia Lubennikova
3dee29eb00 Spawn rsyslog from neonvm (#11111)
then configure it from compute_ctl.
to make it more robust in case of restarts and rsyslogd crashes.
2025-03-06 19:14:19 +00:00
Peter Bendel
3bb318a295 run periodic page bench more frequently to simplify bi-secting regressions (#11121)
## Problem

When periodic pagebench runs only once a day a lot of commits can be in
between a good run and a regression.

## Summary of changes

Run the workflow every 3 hours
2025-03-06 17:47:54 +00:00
Alex Chi Z.
11334a2cdb feat(pageserver): more statistics for gc-compaction (#11103)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

* Add timers for each phase of the gc-compaction.
* Add a final ratio computation to directly show the garbage collection
ratio in the logs.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-06 16:44:00 +00:00
Alexey Kondratov
4b77807de9 fix(compute/sql_exporter): Ignore invalid DBs when collecting size (#11097)
## Problem

Original Slack discussion:
https://neondb.slack.com/archives/C04DGM6SMTM/p1739915430147169

TL;DR in Postgres, it's totally normal to have 'invalid' DBs (state
after the interrupted `DROP DATABASE`). Yet, some of our metrics
collected with `sql_exporter` try to get the size of such invalid DBs.

Typical log lines:
```
time=2025-03-05T16:30:32.368Z level=ERROR source=promhttp.go:52 msg="Error gathering metrics" error="[from Gatherer #1] [collector=neon_collector,query=pg_stats_userdb] pq: [NEON_SMGR] [reqid 0] could not read db size of db 173228 from page server at lsn 0/44A0E8C0"
time=2025-03-05T16:30:32.369Z level=ERROR source=promhttp.go:52 msg="Error gathering metrics" error="[from Gatherer #1] [collector=neon_collector,query=db_total_size] pq: [NEON_SMGR] [reqid 0] could not read db size of db 173228 from page server at lsn 0/44A0E8C0"
```

## Summary of changes

Ignore invalid DBs in these two metrics -- `pg_stats_userdb` and
`db_total_size`
2025-03-06 15:32:17 +00:00
Vlad Lazar
5ceb8c994d pageserver: mark unarchival heatmap layers as cold (#11098)
## Problem

On unarchival, we update the previous heatmap with all visible layers.
When the primary generates a new heatmap it includes all those layers,
so the secondary will download them. Since they're not actually resident
on the primary (we didn't call the warm up API), they'll never be
evicted,
so they remain in the heatmap.

We want these layers in the heatmap, since we might wish to warm-up an
unarchived timeline after a shard migration. However, we don't want them
to be downloaded on the secondary until we've warmed up the primary.

## Summary of Changes

Include these layers in the heatmap and mark them as cold. All heatmap
operations act on non-cold layers apart from the attached location
warming up API,
which will download the cold layers. Once the cold layers are downloaded
on the primary,
they'll be included in the next heatmap as hot and the secondary starts
fetching them too.
2025-03-06 11:25:02 +00:00
Vlad Lazar
43cea0df91 pageserver: allow for unit test stress test (#11112)
## Problem

I like using `cargo stress` to hammer on a test, but it doesn't work out
of the box because it does parallel runs by default and tests always use
the same repo dir.

## Summary of changes

Add an uuid to the test repo dir when generating it.
2025-03-06 11:23:25 +00:00
Erik Grinaker
ab7efe9e47 pageserver: add amortized read amp metrics (#11093)
## Problem

In a batch, `pageserver_layers_per_read_global` counts all layer visits
towards every read in the batch, since this directly affects the
observed latency of the read. However, this doesn't give a good picture
of the amortized read amplification due to batching.

## Summary of changes

Add two more global read amp metrics:

* `pageserver_layers_per_read_batch_global`: number of layers visited
per batch.
* `pageserver_layers_per_read_amortized_global`: number of layers
divided by reads in a batch.
2025-03-06 10:23:48 +00:00
Folke Behrens
16b8a3f598 Update Jinja2 to 3.1.6 (#11109)
https://github.com/neondatabase/neon/security/dependabot/89
2025-03-06 09:55:41 +00:00
Conrad Ludgate
85072b715f Merge pull request #11106 from neondatabase/rc/release-proxy/2025-03-06
Proxy release 2025-03-06
2025-03-06 09:53:00 +00:00
Folke Behrens
f343537e4d proxy: Small adjustments to json logging (#11107)
* Remove callsite identifier registration on span creation. Forgot to
remove from last PR. Was part of alternative idea.
* Move "spans" object to right after "fields", so event and span fields
are listed together.
2025-03-06 09:18:28 +00:00
github-actions[bot]
6c86fe7143 Proxy release 2025-03-06 2025-03-06 06:02:15 +00:00
Alex Chi Z.
78b322f616 rfc: add 041-rel-sparse-keyspace (#10412)
Based on the PoC patch I've done in #10316, I'd like to put an RFC in
advance to ensure everyone is on the same page, and start incrementally
port the code to the main branch.

https://github.com/neondatabase/neon/issues/9516


[Rendered](https://github.com/neondatabase/neon/blob/skyzh/rfc-041-rel-sparse-keyspace/docs/rfcs/041-rel-sparse-keyspace.md)

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-03-05 21:43:16 +00:00
Alex Chi Z.
2de3629b88 test(pageserver): use reldirv2 by default in regress tests (#11081)
## Problem

For pg_regress test, we do both v1 and v2; for all the rest, we default
to v2.

part of https://github.com/neondatabase/neon/issues/9516

## Summary of changes

Use reldir v2 across test cases by default.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-05 21:02:44 +00:00
Em Sharnoff
1fe23fe8d2 compute/lfc: Add chunk size to neon_lfc_stats (#11100)
This PR adds a new key to neon.neon_lfc_stats —
'file_cache_chunk_size_pages'. It just returns the value of
BLOCKS_PER_CHUNK from the LFC implementation.

The new value should (eventually) allow changing the chunk size without
breaking any places that rely on LFC stats values measured in number of
chunks.

See neondatabase/cloud#25170 for more.
2025-03-05 20:35:08 +00:00
Peter Bendel
604eb5e8d4 fix grafana dashboard link for pooler endoints (#11099)
## Problem

Our benchmarking workflows contain links to grafana dashboards to
troubleshoot problems.
This works fine for non-pooled endpoints.
For pooled endpoints we need to remove the `-pooler` suffix from the
endpoint's hostname to get a valid endpoint ID.

Example link that doesn't work in this run


https://github.com/neondatabase/neon/actions/runs/13678933253/job/38246028316#step:8:311

## Summary of changes

Check if connection string is a -pooler connection string and if so
remove this suffix from the endpoint ID.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-03-05 20:01:17 +00:00
Tristan Partin
d599d2df80 Update postgres_exporter to 0.17.1 (#11094)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-05 18:32:45 +00:00
Alexey Kondratov
8263107f6c feat(compute): Add filename label to remote ext requests metric (#11091)
## Problem

We realized that we may use this metric for more 'live' info about
extension installations vs. what we have with installed extensions
metric, which is only updated at start, atm.

## Summary of changes

Add `filename` label to `compute_ctl_remote_ext_requests_total`. Note
that it contains the raw archive name with `.tar.zst` at the end, so the
consumer may need to strip this suffix.

Closes https://github.com/neondatabase/cloud/issues/24694
2025-03-05 18:17:57 +00:00
Anastasia Lubennikova
d94fc75cfc Setup compute_ctl pgaudit and rsyslog (#10615)
Setup pgaudit and pgauditlogtofile extensions
in compute_ctl when the ComputeAuditLogLevel is
set to 'hipaa'.

See cloud PR https://github.com/neondatabase/cloud/pull/24568

Add rsyslog setup for compute_ctl.
Spin up a rsyslog server in the compute VM,
and configure it to send logs to the endpoint
specified in AUDIT_LOGGING_ENDPOINT env.
2025-03-05 18:01:00 +00:00
Alex Chi Z.
9cdc8c0e6c feat(pageserver): revisit error types for gc-compaction (#11082)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

We used anyhow::Error everywhere and it's time to fix.

## Summary of changes

* Make sure that cancel errors are correctly propagated as
CompactionError::ShuttingDown.
* Skip all the trigger computation work if gc_cutoff is not generated
yet.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-05 15:57:38 +00:00
Arpad Müller
2d45522fa6 storcon db: load safekeepers from DB again (#11087)
Earlier PR #11041 soft-disabled the loading code for safekeepers from
the storcon db. This PR makes us load the safekeepers from the database
again, now that we have [JWTs available on
staging](https://github.com/neondatabase/neon/pull/11087) and soon on
prod.

This reverts commit 23fb8053c5.

Part of https://github.com/neondatabase/cloud/issues/24727
2025-03-05 15:45:43 +00:00
JC Grünhage
94e6897ead fix(ci): make deploy job depend on pushing images to dev registries (#11089)
## Problem
If an image fails to push to dev registries, we shouldn't trigger the
deploy job, because that depends on images existing in dev registries.
To ensure this is the case, the deploy job needs to depend on pushing to
dev registries.

## Summary of changes

Make `deploy` depend on `push-neon-image-dev` and
`push-compute-image-dev`.
2025-03-05 14:28:43 +00:00
Erik Grinaker
332aae1484 test_runner/regress: speed up test_check_visibility_map (#11086)
## Problem

`test_check_visibility_map` is the slowest test in CI, and can cause
timeouts under particularly slow configurations (`debug` and
`without-lfc`).

## Summary of changes

* Reduce the `pgbench` scale factor from 10 to 8.
* Omit a redundant vacuum during `pgbench` init.
* Remove a final `vacuum freeze` + `pg_check_visible` pass, which has
questionable value (we've already done a vacuum freeze previously, and
we don't flush the compute cache before checking anyway).
2025-03-05 13:50:35 +00:00
Vlad Lazar
8c12ccf729 pageserver: gate previous heatmap behind config flag (#11088)
## Problem

On unarchival, we update the previous heatmap with all visible layers.
When the primary generates a new heatmap it includes all those layers,
so the secondary will download them. Since they're not actually resident
on the primary (we didn't call the warm up API), they'll never be
evicted, so they remain in the heatmap.

This leads to oversized secondary locations like we saw in pre-prod.

## Summary of changes

Gate the loading of the previous heatmaps and the heatmap generation on
unarchival behind configuration
flags. They are disabled by default, but enabled in tests.
2025-03-05 12:20:18 +00:00
Vlad Lazar
abae7637d6 pageserver: do big reads to fetch slru segment (#11029)
## Problem

Each page of the slru segment is fetched individually when it's loaded
on demand.

## Summary of Changes

Use `Timeline::get_vectored` to fetch 16 at a time.
2025-03-05 11:55:55 +00:00
Anastasia Lubennikova
38a883118a Skip dropping tablesync replication slots on the publisher from branch (#11073)
fixes https://github.com/neondatabase/cloud/issues/24292

Do not drop tablesync replication slots on the publisher, 
when we're in the process of dropping subscriptions inherited by a neon
branch.
Because these slots are still needed by the parent branch subscriptions.

For regular slots we handle this by setting the slot_name to NONE
before calling DROP SUBSCRIPTION, but tablesync slots are not exposed to
SQL.

rely on GUC disable_logical_replication_subscribers=true 
to know that we're in the Neon-specific process of dropping
subscriptions.
2025-03-05 11:29:46 +00:00
Erik Grinaker
40aa4d7151 utils: log Sentry initialization (#11077)
## Problem

We don't have any logging for Sentry initialization. This makes it hard
to verify that it has been configured correctly.

## Summary of changes

Log some basic info when Sentry has been initialized, but omit the
public key (which allows submitting events). Also log when `SENTRY_DSN`
isn't specified at all, and when it fails to initialize (which is
supposed to panic, but we may as well).
2025-03-05 11:23:07 +00:00
Folke Behrens
8e51bfc597 proxy: JSON logging field refactor (#11078)
## Problem

Grafana Loki's JSON handling is somewhat limited and the log message
should be structured in a way that it's easy to sift through logs and
filter.

## Summary of changes

* Drop span_id. It's too short lived to be of value and only bloats the
logs.
* Use the span's name as the object key, but append a unique numeric
value to prevent name collisions.
* Extract interesting span fields into a separate object at the root.

New format:
```json
{
  "timestamp": "2025-03-04T18:54:44.134435Z",
  "level": "INFO",
  "message": "connected to compute node at 127.0.0.1 (127.0.0.1:5432) latency=client: 22.002292ms, cplane: 0ns, compute: 5.338875ms, retry: 0ns",
  "fields": {
    "cold_start_info": "unknown"
  },
  "process_id": 56675,
  "thread_id": 9122892,
  "task_id": "24",
  "target": "proxy::compute",
  "src": "proxy/src/compute.rs:288",
  "trace_id": "5eb89b840ec63fee5fc56cebd633e197",
  "spans": {
    "connect_request#1": {
      "ep": "endpoint",
      "role": "proxy",
      "session_id": "b8a41818-12bd-4c3f-8ef0-9a942cc99514",
      "protocol": "tcp",
      "conn_info": "127.0.0.1"
    },
    "connect_to_compute#6": {},
    "connect_once#8": {
      "compute_id": "compute",
      "pid": "853"
    }
  },
  "extract": {
    "session_id": "b8a41818-12bd-4c3f-8ef0-9a942cc99514"
  }
}
```
2025-03-05 10:27:46 +00:00
Peter Bendel
906d7468cc exclude separate perf tests from bench step (#11084)
## Problem

Our benchmarking workflow has a job step `bench`which runs all tests in
test_runner/performance/* except those that we want to run separately.
We recently added two test cases to that testcase directory that we want
to run separately but forgot to ignore them during the bench step. This
is now causing
[failures](https://github.com/neondatabase/neon/actions/runs/13667689340/job/38212087331#step:7:392).

## Summary of changes

Ignore the separately run tests in the bench step.
2025-03-05 10:14:51 +00:00
Konstantin Knizhnik
438f7bb726 Check response status in prefetch_lookup (#11080)
## Problem

New async prefetch introduces `prefetch+lookup[` function which is
called before LFC lookup to check if prefetch request is already
completed.

This function is not containing now check that response is actually
`T_NeonGetPageResponse` (and not error).

## Summary of changes

Add checks for response tag.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-05 10:03:09 +00:00
Peter Bendel
f62ddb11ed Distinguish manually submitted runs for periodic pagebench in grafana dashboard (#11079)
## Problem

Periodic pagebench workflow runs periodically from latest main commit
and also allows to dispatch it manually for a given commit hash to
bi-sect regressions.
However in the dashboards we can not distinguish manual runs from
periodic runs which makes it harder to follow the trend.

## Summary of changes

Send an additional flag commit type to the benchmark runner instance to
distinguish the run type.

Note: this needs a follow-up PR on the receiving side.
2025-03-04 18:11:43 +00:00
Tristan Partin
7b7e4a9fd3 Authorize compute_ctl requests from the control plane (#10530)
The compute should only act if requests come from the control plane.

Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-03-04 18:08:00 +00:00
Erik Grinaker
4bbdb758ec compute_tools: appease unused lint on macOS (#11074)
## Problem

On macOS, the `unused` lint complains about two variables not used in
`!linux` builds.

These were introduced in #11007.

## Summary of changes

Appease the linter by explicitly using the variables in `!linux`
branches.
2025-03-04 16:39:32 +00:00
Alex Chi Z.
20af9cef17 fix(test): use the same value for reldir v1+v2 (#11070)
## Problem

part of https://github.com/neondatabase/neon/issues/11067

My observation is that with the current value of settings, x86-v1
usually takes 30s, arm-v1 1m30s, x86-v2 1m, arm-v2 3m. But sometimes the
system could run too slow and cause test to timeout on arm with reldir
v2.

While I investigate what's going on and further improve the performance,
I'd like to set both of them to use the same test input, so that it
doesn't timeout and we don't abuse this test case as a performance test.

## Summary of changes

Use the same settings for both test cases.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-04 14:55:50 +00:00
Erik Grinaker
a2902e774a http-utils: generate heap profiles with jemalloc_pprof (#11075)
## Problem

The code to generate symbolized pprof heap profiles and flamegraph SVGs
has been upstreamed to the `jemalloc_pprof` crate:

* https://github.com/polarsignals/rust-jemalloc-pprof/pull/22
* https://github.com/polarsignals/rust-jemalloc-pprof/pull/23

## Summary of changes

Use `jemalloc_pprof` to generate symbolized pprof heap profiles and
flamegraph SVGs.

This reintroduces a bunch of internal jemalloc stack frames that we'd
previously strip, e.g. each stack now always ends with
`prof_backtrace_impl` (where jemalloc takes a stack trace for heap
profiling), but that seems ok.
2025-03-04 12:13:41 +00:00
Vlad Lazar
435bf452e6 tests: remove obsolete err log whitelisting (#11069)
The pageserver read path now supports overlapped in-memory and image
layers via
https://github.com/neondatabase/neon/pull/11000. These allowed errors
are now obsolete.
2025-03-04 08:18:19 +00:00
Erik Grinaker
65addfc524 storcon: add per-tenant rate limiting for API requests (#10924)
## Problem

Incoming requests often take the service lock, and sometimes even do
database transactions. That creates a risk that a rogue client can
starve the controller of the ability to do its primary job of
reconciling tenants to an available state.

## Summary of changes

* Use the `governor` crate to rate limit tenant requests at 10 requests
per second. This is ~10-100x lower than the worst "attack" we've seen
from a client bug. Admin APIs are not rate limited.
* Add a `storage_controller_http_request_rate_limited` histogram for
rate limited requests.
* Log a warning every 10 seconds for rate limited tenants.

The rate limiter is parametrized on TenantId, because the kinds of
client bug we're protecting against generally happen within tenant
scope, and the rates should be somewhat stable: we expect the global
rate of requests to increase as we do more work, but we do not expect
the rate of requests to one tenant to increase.

---------

Co-authored-by: John Spray <john@neon.tech>
2025-03-03 22:04:59 +00:00
Alex Chi Z.
6d0976dad5 feat(pageserver): persist reldir v2 migration status (#10980)
## Problem

part of https://github.com/neondatabase/neon/issues/9516

## Summary of changes

Similar to the aux v2 migration, we persist the relv2 migration status
into index_part, so that even the config item is set to false, we will
still read from the v2 storage to avoid loss of data.

Note that only the two variants `None` and
`Some(RelSizeMigration::Migrating)` are used for now. We don't have full
migration implemented so it will never be set to
`RelSizeMigration::Migrated`.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-03 21:05:43 +00:00
Alex Chi Z.
dbf9a80261 fix(pageserver): avoid flooding gc-compaction logs (#11024)
## Problem

The "did not trigger" gets logged at 10k/minute in staging.

## Summary of changes

Change it to debug level.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-03 20:23:20 +00:00
Vlad Lazar
6ca49b4d0c safekeeper: fix a gap tracking edge case (#11054)
The interpreted reader tracks a record aligned current position in the
WAL stream.
Partial reads move the stream internally, but not from the pov of the
interpreted WAL reader.

Hence, where new shards subscribe with a start position that matches the
reader's current position,
but we've also done some partial reads. This confuses the gap tracking.
To make it more robust,
update the current batch start to the min between the new start position
and its current value.

Since no record has been decoded yet (position matches), we can't have
lost it
2025-03-03 19:16:03 +00:00
Vlad Lazar
5197e43396 pageserver: add recurse flag to layer download spec (#11068)
I missed updating the open api spec in the original PR.
We need this so that the cplane auto-generated client sees the flag.
2025-03-03 19:04:01 +00:00
Alexander Lakhin
9a4e2eab61 Fix artifact name for build with sanitizers (#11066)
## Problem
When a build is made with sanitizers, this is not reflected in the
artifact name, which can lead to overriding normal builds with sanitized
ones.

## Summary of changes
Take this property of a build into account when constructing the
artifact name.
2025-03-03 18:00:53 +00:00
Vlad Lazar
8298bc903c pageserver: handle in-memory layer overlaps with persistent layers (#11000)
## Problem

Image layers may be nested inside in-memory layers as diagnosed
[here](https://github.com/neondatabase/neon/issues/10720#issuecomment-2649419252).
The read path doesn't support this and may skip over the image layer,
resulting in a failure to reconstruct the page.

## Summary of changes

We already support nesting of image layers inside delta layers. The
logic lives in `LayerMap::select_layer`.
The main goal of this PR is to propagate the candidate in-memory layer
down to that point and update
the selection logic.

Important changes are:
1. Support partial reads for the in-memory layer. Previously, we could
only specify the start LSN of the read.
We need to control the end LSN too.
2. `LayerMap::ranged_search` considers in-memory layers too. Previously,
the search for in-memory layers
was done explicitly in `Timeline::get_reconstruct_data_timeline`. Note
that `LayerMap::ranged_search` now returns
a weak readable layer which the `LayerManager` can upgrade. This dance
is such that we can unit test the layer selection logic.
3. Update `LayerMap::select_layer` to consider the candidate in-memory
layer too

Loosely related drive bys:
1. Remove the "keys not found" tracking in the ranged search. This
wasn't used anywhere and it just complicates things.
2. Remove the difficulty map stuff from the layer map. Again, not used
anywhere.

Closes https://github.com/neondatabase/neon/issues/9185
Closes https://github.com/neondatabase/neon/issues/10720
2025-03-03 17:52:59 +00:00
John Spray
b953daa21f safekeeper: allow remote deletion to proceed after dropped requests (#11042)
## Problem

If a caller times out on safekeeper timeline deletion on a large
timeline, and waits a while before retrying, the deletion will not
progress while the retry is waiting. The net effect is very very slow
deletion as it only proceeds in 30 second bursts across 5 minute idle
periods.

Related: https://github.com/neondatabase/neon/issues/10265

## Summary of changes

- Run remote deletion in a background task
- Carry a watch::Receiver on the Timeline for other callers to join the
wait
- Restart deletion if the API is called again and the previous attempt
failed
2025-03-03 16:03:51 +00:00
Peter Bendel
a07599949f First version of a new benchmark to test larger OLTP workload (#11053)
## Problem

We want to support larger tenants (regarding logical database size,
number of transactions per second etc.) and should increase our test
coverage of OLTP transactions at larger scale.

## Summary of changes

Start a new benchmark that over time will add more OLTP tests at larger
scale.
This PR covers the first version and will be extended in further PRs.

Also fix some infrastructure:
- default for new connections and large tenants is to use connection
pooler pgbouncer, however our fixture always added
`statement_timeout=120` which is not compatible with pooler
[see](https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter)
- action to create branch timed out after 10 seconds and 10 retries but
for large tenants it can take longer so use increasing back-off for
retries

## Test run

https://github.com/neondatabase/neon/actions/runs/13593446706
2025-03-03 15:25:48 +00:00
Vlad Lazar
38277497fd pageserver: log shutdown at info level for basebackup (#11046)
## Problem

Timeline shutdown during basebackup logs at error level because the the
canecellation error is smushed into BasebackupError::Server.

## Summary of changes
Introduce BasebackupError::Shutdown and use it. `log_query_error` will
now see `QueryError::Shutdown` and log at info level.
2025-03-03 13:46:50 +00:00
Arseny Sher
ef2b50994c walproposer: basic infra to enable generations (#11002)
## Problem

Preparation for https://github.com/neondatabase/neon/issues/10851

## Summary of changes

Add walproposer `safekeepers_generations` field which can be set by
prefixing `neon.safekeepers` GUC with `g#n:`. Non zero value (n) forces
walproposer to use generations. In particular, this also disables
implicit timeline creation as timeline will be created by storcon. Add
test checking this. Also add missing infra: `--safekeepers-generation`
flag to neon_local endpoint start + fix `--start-timeout` flag: it
existed but value wasn't used.
2025-03-03 13:20:20 +00:00
Konstantin Knizhnik
8669bfe493 Do not store zero pages in inmem SMGR for walredo (#11043)
## Problem


See https://neondb.slack.com/archives/C033RQ5SPDH/p1740157873114339

smgrextend for FSM fork is called during page reconstruction by walredo
process causing overflow of inmem SMGR (64 pages).

## Summary of changes

Do not store zero pages in inmem SMGR because `inmem_read` returns zero
page if it is not able to locate specified block.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-03-03 12:50:07 +00:00
Misha Sakhnov
625c526bdd ci: create multiarch vm images (#11017)
## Problem

We build compute-nodes as multi-arch images, but not the
vm-compute-nodes. The PR adds multiarch vm images the same way as in
autoscaling repo.

## Summary of changes

Add architecture to the matrix for vm compute build steps
Add merge job

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-03-03 11:47:09 +00:00
a-masterov
df0767176a Change the tags names according to the curent state (#11059)
## Problem
We have not synced `force-test-extensions-upgrade.yml` with the last
changes.
The variable `TEST_EXTENSIONS_UPGRADE` was ignored in the script and
actually set to `NEW_COMPUTE_TAG` while it should be set to
`OLD_COMPUTE_TAG` as we are about to run compatibility tests.
## Summary of changes
The tag names were synced, the logic was fixed.
2025-03-03 09:40:49 +00:00
Heikki Linnakangas
38ddfab643 compute_ctl: Perform more startup actions in parallel (#11008)
To speed up compute startup. Resizing swap in particular takes about 100
ms on my laptop. By performing it in parallel with downloading the
basebackup, that latency is effectively hidden. I would imagine that
downloading remote extensions can also take a non-trivial amount of
time, although I didn't try to measure that. In any case that's now also
performed in parallel with downloading the basebackup.
2025-03-03 00:29:37 +00:00
Heikki Linnakangas
066324d6ec compute_ctl: Rearrange startup code (#11007)
Move most of the code to compute.rs, so that all the major startup steps
are visible in one place. You can now get a pretty good picture of what
happens in the latency-critical path at compute startup by reading
ComputeNode::start_compute().

This also clarifies the error handling in start_compute. Previously, the
start_postgres function sometimes returned an Err, and sometimes Ok but
with the compute status already set to Failed. Now the start_compute
function always returns Err on failure, and it's the caller's
responsibility to change the compute status to Failed. Separately from
that, it returns a handle to the Postgres process via a `&mut` reference
if it had already started Postgres (i.e. on success, or if the failure
happens after launching the Postgres process).

---------

Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2025-02-28 22:48:05 +00:00
Suhas Thalanki
ee0c8ca8fd Add -fsigned-char for cross platform signed chars (#10852)
## Problem

In multi-character keys, the GIN index creates a CRC Hash of the first 3
bytes of the key.
The hash can have the first bit to be set or unset, needing to have a
consistent representation
of `char` across architectures for consistent results. GIN stores these
keys by their hashes
which determines the order in which the keys are obtained from the GIN
index.

By default, chars are signed in x86 and unsigned in arm, leading to
inconsistent behavior across different platform architectures. Adding
the `-fsigned-char` flag to the GCC compiler forces chars to be treated
as signed across platforms, ensuring the ordering in which the keys are
obtained consistent.

## Summary of changes

Added `-fsigned-char` to the `CFLAGS` to force GCC to use signed chars
across platforms.
Added a test to check this across platforms.

Fixes: https://github.com/neondatabase/cloud/issues/23199
2025-02-28 21:07:21 +00:00
Ivan Efremov
56033189c1 feat(proxy): Log latency after connect to compute (#11048)
## Problem
To measure latency accurate we should associate the testodrome role
within a latency data

## Summary of changes
Add latency logging to associate different roles within a latency.

Relates to the #22486
2025-02-28 17:58:42 +00:00
Erik Grinaker
d857f63e3b pageserver: fix race that can wedge background tasks (#11047)
## Problem

`wait_for_active_tenant()`, used when starting background tasks, has a
race condition that can cause it to wait forever (until cancelled). It
first checks the current tenant state, and then subscribes for state
updates, but if the state changes between these then it won't be
notified about it.

We've seen this wedge compaction tasks, which can cause unbounded layer
file buildup and read amplification.

## Summary of changes

Use `watch::Receiver::wait_for()` to check both the current and new
tenant states.
2025-02-28 17:00:22 +00:00
Alex Chi Z.
f79ee0bb88 fix(storcon): loop in chaos injection (#11004)
## Problem

Somehow the previous patch loses the loop in the chaos injector function
so everything will only run once.
https://github.com/neondatabase/neon/pull/10934

## Summary of changes

Add back the loop.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-28 15:49:15 +00:00
Vlad Lazar
23fb8053c5 storcon: soft disable SK heartbeats (#11041)
## Problem

JWT tokens aren't in place, so all SK heartbeats fail. This is
equivalent to a wait before applying the PS heartbeats and makes things
more flaky.

## Summary of Changes

Add a flag that skips loading SKs from the db on start-up and at
runtime.
2025-02-28 15:49:09 +00:00
Conrad Ludgate
d9ced89ec0 feat(proxy): require TLS to compute if prompted by cplane (#10717)
https://github.com/neondatabase/cloud/issues/23008

For TLS between proxy and compute, we are using an internally
provisioned CA to sign the compute certificates. This change ensures
that proxy will load them from a supplied env var pointing to the
correct file - this file and env var will be configured later, using a
kubernetes secret.

Control plane responds with a `server_name` field if and only if the
compute uses TLS. This server name is the name we use to validate the
certificate. Control plane still sends us the IP to connect to as well
(to support overlay IP).

To support this change, I'd had to split `host` and `host_addr` into
separate fields. Using `host_addr` and bypassing `lookup_addr` if
possible (which is what happens in production). `host` then is only used
for the TLS connection.

There's no blocker to merging this. The code paths will not be triggered
until the new control plane is deployed and the `enableTLS` compute flag
is enabled on a project.
2025-02-28 14:20:25 +00:00
Erik Grinaker
c7ff3c4c9b safekeeper: downgrade interpreted reader errors (#11034)
## Problem

This `critical!` could fire on IO errors, which is just noisy.

Resolves #11027.

## Summary of changes

Downgrade to error, except for decode errors. These could be either data
corruption or a bug, but seem worth investigating either way.
2025-02-28 14:06:56 +00:00
Christian Schwarz
7c53fd0d56 refactor(page_service / timeline::handle): the GateGuard need not be a special case (#11030)
# Changes

While working on
- https://github.com/neondatabase/neon/pull/7202

I found myself needing to cache another expensive Arc::clone inside
inside the timeline::handle::Cache by wrapping it in another Arc.

Before this PR, it seemed like the only expensive thing we were caching
was the
connection handler tasks' clone of `Arc<Timeline>`.

But in fact the GateGuard was another such thing, but it was
special-cased
in the implementation.

So, this refactoring PR de-special-cases the GateGuard.

# Performance

With this PR we are doing strictly _less_ operations per `Cache::get`.
The reason is that we wrap the entire `Types::Timeline` into one Arc.
Before this PR, it was a separate Arc around the Arc<Timeline> and
one around the Arc<GateGuard>.

With this PR, we avoid an allocation per cached item, namely,
the separate Arc around the GateGuard.

This PR does not change the amount of shared mutable state.

So, all in all, it should be a net positive, albeit probably not
noticable
with our small non-NUMA instances and generally high CPU usage per
request.

# Reviewing

To understand the refactoring logistics, look at the changes to the unit
test types first.
Then read the improved module doc comment.
Then the remaining changes.

In the future, we could rename things to be even more generic.
For example, `Types::TenantMgr` could really be a `Types::Resolver`.
And `Types::Timeline` should, to avoid constant confusion in the doc
comment,
be called `Types::Cached` or `Types::Resolved`.
Because the `handle` module, after this PR, really doesn't care that
we're
using it for storing Arc's and GateGuards.

Then again, specicifity is sometimes more useful than being generic.
And writing the module doc comment in a totally generic way would
probably also be more confusing than helpful.
2025-02-28 12:31:52 +00:00
a-masterov
7607686f25 Make test extensions upgrade work with absent images (#11036)
## Problem
CI does not pass for the compute release due to the absence of some
images

## Summary of changes
Now we use the images from the old non-compute releases for non-compute
images
2025-02-28 11:16:22 +00:00
Vlad Lazar
0d6d58bd3e pageserver: make heatmap layer download API more cplane friendly (#10957)
## Problem

We intend for cplane to use the heatmap layer download API to warm up
timelines after unarchival. It's tricky for them to recurse in the
ancestors,
and the current implementation doesn't work well when unarchiving a
chain
of branches and warming them up.

## Summary of changes

* Add a `recurse` flag to the API. When the flag is set, the operation
recurses into the parent
timeline after the current one is done.
* Be resilient to warming up a chain of unarchived branches. Let's say
we unarchived `B` and `C` from
the `A -> B -> C` branch hierarchy. `B` got unarchived first. We
generated the unarchival heatmaps
and stash them in `A` and `B`. When `C` unarchived, it dropped it's
unarchival heatmap since `A` and `B`
already had one. If `C` needed layers from `A` and `B`, it was out of
luck. Now, when choosing whether
to keep an unarchival heatmap we look at its end LSN. If it's more
inclusive than what we currently have,
keep it.
2025-02-28 10:36:53 +00:00
John Spray
55633ebe3a storcon: enable API passthrough to nonzero shards (#11026)
## Problem

Storage controller will proxy GETs to pageserver-like tenant/timeline
paths through to the pageserver.

Usually GET passthroughs make sense to go to shard 0, e.g. if you want
to list timelines.

But sometimes you really want to know about a particular shard, e.g.
reading its cache state or similar.

## Summary of changes

- Accept shard IDs as well as tenant IDs in the passthrough route
- Refactor node lookup to take a shard ID and make the tenant ID case a
layer on top of that. This is one more lock take-drop during these
requests, but it's not particularly expensive and these requests
shouldn't be terribly frequent

This is not immediately used by anything, but will be there any time we
want to e.g. do a pass-through query to check the warmth of a tenant
cache on a particular shard or somesuch.
2025-02-28 08:42:08 +00:00
Heikki Linnakangas
a4b2009800 compute_ctl: Refactor, moving spec_apply functions to spec_apply.rs (#11006)
Seems nice to have the function and all its subroutines in the same
source file.
2025-02-27 20:13:06 +00:00
JC Grünhage
66d5fe7f5b Merge pull request #11023 from neondatabase/rc/release-proxy/2025-02-27
Proxy release 2025-02-27
2025-02-27 19:10:58 +01:00
Alex Chi Z.
ab1f22b7d1 fix(pageserver): correctly access layer map in gc-compaction (#11021)
## Problem

layer_map access was unwrapped. It might return an error during
shutdown.

## Summary of changes

Propagate the layer_map access error back to the compaction loop.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-27 16:26:55 +00:00
github-actions[bot]
a1b9528757 Proxy release 2025-02-27 2025-02-27 16:18:42 +00:00
JC Grünhage
7ed236e17e fix(ci): push prod container images again (#11020)
## Problem
https://github.com/neondatabase/neon/pull/10841 made building compute
and neon images optional on releases that don't need them. The
`push-<component>-image-prod` jobs had transitive dependencies that were
skipped due to that, causing the images not to be pushed to production
registries.

## Summary of changes

Add `!failure() && !cancelled() &&` to the beginning of the conditions
for these jobs to ensure they run even if some of their transitive
dependencies are skipped.
2025-02-27 16:16:14 +00:00
Konstantin Knizhnik
e58f264a05 Increase inmem SMGR size for walredo process to 100 pagees (#10937)
## Problem

We see `Inmem storage overflow` in page server logs:

https://neondb.slack.com/archives/C033RQ5SPDH/p1740157873114339

walked process is using inseam SMGR with storage size limited by 64
pages with warning watermark 32 (based ion the assumption that
XLR_MAX_BLOCK_ID is 32, so WAL record can not access more than 32
pages).

Actually it is not true. We can update up to 3 forks for each block
(including update of FSM and VM forks).

## Summary of changes

This PR increases inseam SMGR size for walled process to 100 pages and
print stack trace in case of overflow.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-27 14:31:05 +00:00
Matthias van de Meent
a283edaccf PS/Prefetch: Use a timeout for reading data from TCP (#10834)
This reduces pressure on OS TCP buffers, reducing flush times in other
systems like PageServer.

## Problem

## Summary of changes
2025-02-27 14:00:18 +00:00
a-masterov
ad37199745 Separate the upgrade tests in timelines (#10974)
## Problem
We created extensions in a single database. The tests could interfere,
i.e., discover some service tables left by other extensions and produce
unexpected results.
## Summary of changes
The tests are now run in a separate timeline, so only one extension owns
the database, which prevents interference.
2025-02-27 13:45:18 +00:00
Erik Grinaker
93b59e65a2 pageserver: remove stale comment (#11016)
No longer true now that we eagerly notify the compaction loop.
2025-02-27 12:56:28 +00:00
Ivan Efremov
1423bb8aa2 Merge pull request #11011 from neondatabase/rc/release-proxy/2025-02-27
Proxy release 2025-02-27
2025-02-27 13:57:49 +02:00
Christian Schwarz
e35f7758d8 impr(controller_upcall_client): clean up copy-pasta code & add context to retries (#10991)
Before this PR, re-attach and validate would log the same warning
```
calling control plane generation validation API failed
```
on retry errors.

This can be confusing.

This PR makes the message generically valid for any upcall and adds
additional tracing spans to capture context.

Along the way, clean up some copy-pasta variable naming.

refs
-
https://github.com/neondatabase/neon/issues/10381#issuecomment-2684755827

---------

Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>
2025-02-27 10:59:43 +00:00
Peter Bendel
3a3d62dc4f Bodobolero/test cum stats persistence (#10995)
## Problem

So far cumulative statistics have not been persisted when Neon scales to
zero (suspends endpoint).
With PR https://github.com/neondatabase/neon/pull/6560 the cumulative
statistics should now survive endpoint restarts and correctly trigger
the auto- vacuum and auto analyze maintenance

So far we did not have a testcase that validates that improvement in our
dev cloud environment with a real project.

## Summary of changes

Introduce testcase `test_cumulative_statistics_persistence`in the
benchmarking workflow running daily to verify:

- Verifies that the cumulative statistics are correctly persisted across
restarts.
- Cumulative statistics are important to persist across restarts because
they are used
-  when auto-vacuum an auto-analyze trigger conditions are met.
-  The test performs the following steps:
    - Seed a new project using pgbench
    - insert tuples that by itself are not enough to trigger auto-vacuum
    - suspend the endpoint
    - resume the endpoint
- insert additional tuples that by itself are not enough to trigger
auto-vacuum but in combination with the previous tuples are
- verify that autovacuum is triggered by the combination of tuples
inserted before and after endpoint suspension

## Test run


https://github.com/neondatabase/neon/actions/runs/13546879714/job/37860609089#step:6:282
2025-02-27 10:45:13 +00:00
Arpad Müller
a22be5af72 Migrate the last crates to edition 2024 (#10998)
Migrates the remaining crates to edition 2024. We like to stay on the
latest edition if possible. There is no functional changes, however some
code changes had to be done to accommodate the edition's breaking
changes.

Like the previous migration PRs, this is comprised of three commits:

* the first does the edition update and makes `cargo check`/`cargo
clippy` pass. we had to update bindgen to make its output [satisfy the
requirements of edition
2024](https://doc.rust-lang.org/edition-guide/rust-2024/unsafe-extern.html)
* the second commit does a `cargo fmt` for the new style edition.
* the third commit reorders imports as a one-off change. As before, it
is entirely optional.

Part of #10918
2025-02-27 09:40:40 +00:00
Christian Schwarz
f09843ef17 refactor(pageserver): propagate RequestContext to layer downloads (#11001)
For some reason the layer download API never fully got
`RequestContext`-infected.

This PR fixes that as a precursor to
- https://github.com/neondatabase/neon/issues/6107
2025-02-27 09:26:25 +00:00
JC Grünhage
c92a36740b fix(ci): support PR-on-top-of-PR usecase again (#11013)
## Problem
https://github.com/neondatabase/neon/pull/10841 broke CI on PRs that
aren't based on main or a release branch but want to merge into another
PR.

## Summary of changes
Replace `run-kind=pr-main` with `run-kind=pr`, so that all PRs that
aren't release PRs are treated equally.
2025-02-27 09:05:15 +00:00
Arseny Sher
8b86cd1154 safekeeper: follow membership configuration rules (#10781)
## Problem

safekeepers must ignore walproposer messages with non matching
membership conf.

## Summary of changes

Make safekeepers reject vote request, proposer elected and append
request messages with non matching generation. Switch to the
configuration in the greeting message if it is higher.

In passing, fix one comment and WAL truncation.

Last part of https://github.com/neondatabase/neon/issues/9965
2025-02-27 06:13:30 +00:00
github-actions[bot]
332f064a42 Proxy release 2025-02-27 2025-02-27 00:17:57 +00:00
Heikki Linnakangas
c50b38ab72 compute_ctl: Fix comment on start_postgres (#11005)
The comment was woefully outdated and outright wrong. It applied a long
time ago (before commit e5cc2f92c4 to be precise), but nowadays the
function just launches postgres and waits until it starts accepting
connections. The other things the comment talked about are done in other
functions.
2025-02-26 23:38:45 +00:00
Fedor Dikarev
4f4a3910d0 fix error (Line: 74, Col: 26): Unexpected value 'false' (#10999)
## Problem
Check neon with extra platform builds is failing on main with:
```
The template is not valid. .github/workflows/neon_extra_builds.yml (Line: 74, Col: 26): Unexpected value 'false'
```
https://github.com/neondatabase/neon/actions/runs/13549634905

## Summary of changes
Use `fromJson()` to have `false` as boolean value.

thanks to @skyzh for pointing on the issue
2025-02-26 19:54:46 +00:00
Alex Chi Z.
11aab9f0de fix(pageserver): further stablize gc-compaction tests (#10975)
## Problem

Yet another source of flakyness for
https://github.com/neondatabase/neon/issues/10517

## Summary of changes

The test scenario we want to create is that we have an image layer in
index_part and then overwrite it, so we have to ensure it gets persisted
in index_part by doing a force checkpoint.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-26 19:50:10 +00:00
Heikki Linnakangas
5cfdb1244f compute_ctl: Add OTEL tracing to incoming HTTP requests and startup (#10971)
We lost this with the switch to axum for the HTTP server. Add it back.

In addition to just resurrecting the functionality we had before, pass
the tracing context of the /configure HTTP request to the start_postgres
operation that runs in the main thread. This way, the 'start_postgres'
and all its sub-spans like getting the basebackup become children of the
HTTP request span. This allows end-to-end tracing of a compute start,
all the way from the proxy to the SQL queries executed by compute_ctl as
part of compute startup.
2025-02-26 19:27:16 +00:00
Arseny Sher
643a48210f safekeeper: exclude API (#10757)
## Problem

https://github.com/neondatabase/neon/pull/10241 added configuration
switch endpoint, but it didn't delete timeline if node was excluded.

## Summary of changes

Add separate /exclude API endpoint which similarly accepts membership
configuration where sk is supposed by be excluded. Implementation
deletes the timeline locally.

Some more small related tweaks:
- make mconf switch API PUT instead of POST as it is idempotent;
- return 409 if switch was refused instead of 200 with requested &
current;
- remove unused was_active flag from delete response;
- remove meaningless _force suffix from delete functions names;
- reuse timeline.rs delete_dir function in timelines_global_map instead
of its own copy.

part of https://github.com/neondatabase/neon/issues/9965
2025-02-26 19:26:33 +00:00
Arseny Sher
c1a040447d walproposer: send valid timeline_start_lsn in v2 (#10994)
## Problem

https://github.com/neondatabase/neon/pull/10647 dropped
timeline_start_lsn from protocol messages as it can be taken from term
history. In v2 0 was sent in the placeholder. However, until safekeepers
are deployed with that PR they still use the value, setting
timeline_start_lsn to 0, which confuses WAL reading; problem appears
only when compute includes 10647 but safekeepers don't.

ref
https://neondb.slack.com/archives/C04DGM6SMTM/p1740577649644269?thread_ts=1740572363.541619&cid=C04DGM6SMTM

## Summary of changes

Send real value instead of 0 in v2.
2025-02-26 17:38:44 +00:00
Alex Chi Z.
30f3be9840 fix(test): reduce number of relations in test_tx_abort_with_many_relations (#10997)
## Problem

I see a lot of timeout errors, which indicates that this test is too
slow. It seems that create relations are fast, but the subsequent
truncating step is slow.

## Summary of changes

Reduce number of relations for now, and investigate later.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-26 17:19:14 +00:00
JC Grünhage
8dfa8f0b94 feat(ci): don't build storage on compute-releases and vice versa (#10841)
## Problem
Release CI is slow, because we're doing unnecessary work, for example
building compute images on storage releases and vice versa.

## Summary of changes
- Extract tag generation into reusable workflow and extend it with
fetching of previous component releases
- Don't build neon images on compute releases and don't build compute
images on proxy and storage releases
- Reuse images from previous releases for tests on branches where we
don't build those images

## Open questions
- We differentiate between `TAG` and `COMPUTE_TAG` in a few places, but
we don't differentiate between storage and proxy releases. Since they
use the same image, this will continue to work, but I'm not sure this is
what we want.
2025-02-26 17:17:26 +00:00
Alex Chi Z.
a138a6de9b fix(pageserver): correctly handle collect_keyspace errors (#10976)
## Problem

ref https://github.com/neondatabase/neon/issues/10927

## Summary of changes

* Implement `is_critical` and `is_cancel` over `CompactionError`.
* Revisit all places that uses `CollectKeyspaceError` to ensure they are
handled correctly.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-26 17:09:50 +00:00
Arpad Müller
14347630a4 ancestor detach: delete hardlinked layers on error (#10977)
Delete layers that we have hardlinked so far when there is an error in
`remote_copy`. This prevents a retry of the ancestor detach from
stumbling over already present layer files: the hardlink would fail with
an error.

If there is a crash, we already clean up during the timeline attach: we
loop over all layer files and purge all layers that are not referenced
by the `index_part.json`.

Make sure to hold the timeline gate to prevent races with
detach&attach&read from the layer file.

These cleanups aren't completely enough however, as there is code after
`prepare` as well. To handle errors there, we add a special case for
`AlreadyExists` errors during the hardlink, where we check if the layer
is an orphan, and if yes, we delete it from local disk. That is ideally
not the case we hit, as it is less clear in that scenario where the
layer came from, but it provides good defense in depth.

Related #10729
Fixes #10970
2025-02-26 16:11:15 +00:00
Erik Grinaker
86b9703f06 pageserver: set SO_KEEPALIVE on the page service socket (#10992)
## Problem

If the client connection goes dead without an explicit close (e.g. due
to network infrastructure dropping the connection) then we currently
won't detect it for a long time, which may e.g. block GetPage flushes
and keep the task running.

Touches https://github.com/neondatabase/cloud/issues/23515.

## Summary of changes

Enable `SO_KEEPALIVE` on the page service socket, to enable periodic TCP
keepalive probes. These are configured via Linux sysctls, which will be
deployed separately. By default, the first probe is sent after 2 hours,
so this doesn't have a practical effect until we change the sysctls.
2025-02-26 14:36:05 +00:00
Arseny Sher
01581f3af5 safekeeper: drop json_ctrl (#10722)
## Problem

json_ctrl.rs is an obsolete attempt to have tests with fine control of
feeding messages into safekeeper superseded by desim framework.

## Summary of changes

Drop it.
2025-02-26 13:32:37 +00:00
Arpad Müller
f94286f0c9 Upgrade compute_tools and compute_api to edition 2024 (#10983)
Updates `compute_tools` and `compute_api` crates to edition 2024. We
like to stay on the latest edition if possible. There is no functional
changes, however some code changes had to be done to accommodate the
edition's breaking changes.

The PR has three commits:

* the first commit updates the named crates to edition 2024 and appeases
`cargo clippy` by changing code.
* the second commit performs a `cargo fmt` that does some minor changes
(not many)
* the third commit performs a cargo fmt with nightly options to reorder
imports as a one-time thing. it's completely optional, but I offer it
here for the compute team to review it.

I'd like to hear opinions about the third commit, if it's wanted and
felt worth the diff or not. I think most attention should be put onto
the first commit.

Part of #10918
2025-02-26 13:12:26 +00:00
Fedor Dikarev
c2a768086d add credentials for pulling containers for the jobs (#10987)
Ref: https://github.com/neondatabase/cloud/issues/24939

## Problem
I found that we are missing authorization for some container jobs, that
will make them use anonymous pulls. It's not an issue for now, with high
enough limits, but that could be an issue when new limits introduced in
DockerHub (10 pulls / hour)

## Summary of changes
- add credentials for the jobs that run in containers
2025-02-26 12:50:06 +00:00
Vlad Lazar
622a9def6f tests: use generated record lsn instead of hardcoded one (#10990)
... and start the initial reader with the correct lsn

Closes https://github.com/neondatabase/neon/issues/10978
2025-02-26 12:47:13 +00:00
Arpad Müller
26bda17551 storcon: use the SchedulingPolicy enum in SafekeeperPersistence (#10897)
We don't want to serialize to/from string all the time, so use
`SchedulingPolicy` in `SafekeeperPersistence` via the use of a wrapper.

Stacked atop #10891
2025-02-26 12:12:50 +00:00
Folke Behrens
0d36f52a6c proxy: Record and export user-agent header (#10955)
neondatabase/cloud#24464
2025-02-26 11:39:34 +00:00
Heikki Linnakangas
40ad42d556 Silence "sudo: unable to resolve host" messages at compute startup (#10985) 2025-02-26 10:10:05 +00:00
Heikki Linnakangas
e452f2a5a3 Remove some redundant log lines at postgres startup (#10958) 2025-02-26 10:06:42 +00:00
Heikki Linnakangas
43b109af69 compute_ctl: Add more detailed tracing spans to startup subroutines (#10979)
In local dev environment, these steps take around 100 ms, and they are
in the critical path of a compute startup on a compute pool hit. I don't
know if it's like that in production, but as first step, add tracing
spans to the functions so that they can be measured more easily.
2025-02-26 09:51:07 +00:00
Arthur Petukhovsky
3684162d9f Bump vm-builder v0.37.1 -> v0.42.2 (#10981)
Bump version to pick up changes introduced in
https://github.com/neondatabase/autoscaling/pull/1286

It's better to have a compute release for this change first, because:
- vm-runner changes kernel loglevel from 7 to 6
- vm-builder has a change to bring it back to 7 after startup

Previous update: https://github.com/neondatabase/neon/pull/10015
2025-02-26 09:19:19 +00:00
Arpad Müller
920040e402 Update storage components to edition 2024 (#10919)
Updates storage components to edition 2024. We like to stay on the
latest edition if possible. There is no functional changes, however some
code changes had to be done to accommodate the edition's breaking
changes.

The PR has two commits:

* the first commit updates storage crates to edition 2024 and appeases
`cargo clippy` by changing code. i have accidentially ran the formatter
on some files that had other edits.
* the second commit performs a `cargo fmt`

I would recommend a closer review of the first commit and a less close
review of the second one (as it just runs `cargo fmt`).

part of https://github.com/neondatabase/neon/issues/10918
2025-02-25 23:51:37 +00:00
Konstantin Knizhnik
dc975d554a Incremenet getpage histogram in prefetch_lookup (#10965)
## Problem

PR https://github.com/neondatabase/neon/pull/10442 added prefetch_lookup
function.
It changed handling of getpage requests in compute.

Before:
1. Lookup in LFC (return if found)
2. Register prefetch buffer
3. Wait prefetch result (increment getpage_hist)
Now:

1. Lookup prefetch ring (return if prefetch request is already
completed)
2. Lookup in LFC (return if found)
3. Register prefetch buffer
4. Wait prefetch result (increment getpage_hist)

So if prefetch result is already available, then get page histogram is
not incremented.
It case failure of some test_throughtput benchmarks:
https://neondb.slack.com/archives/C033RQ5SPDH/p1740425527249499

## Summary of changes

Increment getpage histogram in `prefetch_lookup`

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-25 19:51:38 +00:00
Suhas Thalanki
d05606252d fix: only showing LSN for static computes in neon endpoint list (#10931)
## Problem

`neon endpoint list` shows a different LSN than what the state of the
replica is. This is mainly down to what we define as LSN in this output.
If we define it as the LSN that a compute was started with, it only
makes sense to show it for static computes.

## Summary of changes

Removed the output of `last_record_lsn` for primary/hot standby
computes.

Closes: https://github.com/neondatabase/neon/issues/5825

---------

Co-authored-by: Tristan Partin <tristan@neon.tech>
2025-02-25 19:26:14 +00:00
Alex Chi Z.
c69ebb4486 fix(ci): extend timeout to 75min (#10963)
60min is not enough for debug builds

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-25 17:37:23 +00:00
a-masterov
1fb2faab5b Rename the patch files for the semver test (#10966)
## Problem
The patch for `semver` extensions relies on `PG_VERSION` environment
variable. The files were named without the letter `v` so script cannot
find them.
## Summary of changes
The patch files were renamed.
2025-02-25 16:00:43 +00:00
Alex Chi Z.
015092d259 feat(pageserver): add automatic trigger for gc-compaction (#10798)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

Add the auto trigger for gc-compaction. It computes two values: L1 size
and L2 size. When L1 size >= initial trigger threshold, we will trigger
an initial gc-compaction. When l1_size / l2_size >=
gc_compaction_ratio_percent, we will trigger the "tiered" gc-compaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-25 14:50:39 +00:00
Alex Chi Z.
b7fcf2c7a7 test(pageserver): add reldir v2 into tests (#10750)
## Problem

We have `test_perf_many_relations` but it only runs on remote clusters,
and we cannot directly modify tenant config. Therefore, I patched one of
the current tests to benchmark relv2 performance.

close https://github.com/neondatabase/neon/issues/9986

## Summary of changes

* Add `v1/v2` selector to `test_tx_abort_with_many_relations`.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-25 14:50:22 +00:00
Erik Grinaker
8deeddd4f0 pageserver: ignore CollectKeySpaceError::Cancelled during compaction (#10968)
This pops up a few times during deployment. Not sure why it fires
without `self.cancel` being cancelled, but could be e.g. ancestor
timelines or sth.
2025-02-25 14:49:41 +00:00
a-masterov
f78ac44748 Use the Dockerfile COPY instead of docker cp (#10943)
## Problem
We use `docker cp` to copy the files required for the extension tests
now.
It causes problems if we run older images with the newer source tree.
## Summary of changes
Copying the files was moved to the compute Dockerfile.
2025-02-25 12:44:06 +00:00
Folke Behrens
f4fefd9f2f pre-commit: Switch to cargo fmt to handle per-crate editions (#10969)
cargo knows what edition each crate uses.
2025-02-25 12:29:27 +00:00
Konstantin Knizhnik
8f82c661d4 Move neon_pgstat_file_size_limit to the extension (#10959)
## Problem

PG14 uses separate backend for stats collector having no access to
shaerd memory.
As far as AUX mechanism requires access to shared memory, persisting
pgstat.stat file
is not supported at pg14. And so there is no definition of
`neon_pgstat_file_size_limit`
variable. It makes it impossible to provide same config for all Postgres
version.

## Summary of changes

Move neon_pgstat_file_size_limit to Neon extension.

Postgres submodules PR:
https://github.com/neondatabase/postgres/pull/587
https://github.com/neondatabase/postgres/pull/588
https://github.com/neondatabase/postgres/pull/589

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Tristan Partin <tristan@neon.tech>
2025-02-25 12:23:04 +00:00
Arseny Sher
758f597280 compute <-> sk protocol v3 (#10647)
## Problem

As part of https://github.com/neondatabase/neon/issues/8614 we need to
pass membership configurations between compute and safekeepers.

## Summary of changes

Add version 3 of the protocol carrying membership configurations.
Greeting message in both sides gets full conf, and other messages
generation number only. Use protocol bump to include other accumulated
changes:
- stop packing whole structs on the wire as is;
- make the tag u8 instead of u64;
- send all ints in network order;
- drop proposer_uuid, we can pass it in START_WAL_PUSH and it wasn't
much useful anyway.
Per message changes, apart from mconf:
- ProposerGreeting: tenant / timeline id is sent now as hex cstring.
Remove proto version, it is passed outside in START_WAL_PUSH. Remove
postgres timeline, it is unused. Reorder fields a bit.
- AcceptorGreeting: reorder fields
- VoteResponse: timeline_start_lsn is removed. It can be taken from
first member of term history, and later we won't need it at all when all
timelines will be explicitly created. Vote itself is u8 instead of u64.
- ProposerElected: timeline_start_lsn is removed for the same reasons.
- AppendRequest: epoch_start_lsn removed, it is known from term history
in ProposerElected.

Both compute and sk are able to talk v2 and v3 to make rollbacks (in
case we need them) easier; neon.safekeeper_proto_version GUC sets the
client version. v2 code can be dropped later.

So far empty conf is passed everywhere, future PRs will handle them.

To test, add param to some tests choosing proto version; we want to test
both 2 and 3 until we fully migrate.

ref https://github.com/neondatabase/neon/issues/10326

---------

Co-authored-by: Arthur Petukhovsky <petuhovskiy@yandex.ru>
2025-02-25 11:56:05 +00:00
Vlad Lazar
0d9a45a475 safekeeper: invalidate start of interpreted batch on reader resets (#10951)
## Problem

The interpreted WAL reader tracks the start of the current logical
batch.
This needs to be invalidated when the reader is reset.

This bug caused a couple of WAL gap alerts in staging.

## Summary of changes

* Refactor to make it possible to write a reproducer
* Add repro unit test
* Fix by resetting the start with the reader

Related https://github.com/neondatabase/cloud/issues/23935
2025-02-25 10:21:35 +00:00
Vlad Lazar
5d17640944 storcon: send heartbeats concurrently (#10954)
## Problem

While looking at logs I noticed that heartbeats are sent sequentially.
The loop polling the UnorderedSet is at the wrong level of identation.
Instead of doing it after we have the full set, we did after each entry.

## Summary of Changes

Poll the UnorderedSet properly.
2025-02-25 09:33:08 +00:00
Erik Grinaker
6621be6b7b pageserver: tweak slow GetPage logging (#10956)
## Problem

We recently added slow GetPage request logging. However, this
unintentionally included the flush time when logging (which we already
have separate logging for). It also logs at WARN level, which is a bit
aggressive since we see this fire quite frequently.

Follows https://github.com/neondatabase/neon/pull/10906.

## Summary of changes

* Only log the request execution time, not the flush time.
* Extract a `pagestream_dispatch_batched_message()` helper.
* Rename `warn_slow()` to `log_slow()` and downgrade to INFO.
2025-02-24 22:01:14 +00:00
Heikki Linnakangas
565a9e62a1 compute: Disconnect if no response to a pageserver request is received (#10882)
We've seen some cases in production where a compute doesn't get a
response to a pageserver request for several minutes, or even more. We
haven't found the root cause for that yet, but whatever the reason is,
it seems overly optimistic to think that if the pageserver hasn't
responded for 2 minutes, we'd get a response if we just wait patiently a
little longer. More likely, the pageserver is dead or there's some kind
of a network glitch so that the TCP connection is dead, or at least
stuck for a long time. Either way, it's better to disconnect and
reconnect. I set the default timeout to 2 minutes, which should be
enough for any GetPage request under normal circumstances, even if the
pageserver has to download several layer files from remote storage.

Make the disconnect timeout configurable. Also make the "log interval",
after which we print a message to the log configurable, so that if you
change the disconnect timeout, you can set the log timeout
correspondingly. The default log interval is still 10 s. The new GUCs
are called "neon.pageserver_response_log_timeout" and
"neon.pageserver_response_disconnect_timeout".

Includes a basic test for the log and disconnect timeouts.

Implements issue #10857
2025-02-24 20:16:37 +00:00
Peter Bendel
8fd0f89b94 rename libduckdb.so in pg_duckdb context to avoid conflict with pg_mooncake (#10915)
## Problem

Introducing pg_duckdb caused a conflict with pg_mooncake.
Both use libduckdb.so in different versions.

## Summary of changes

- Rename the libduckdb.so to libduckdb_pg_duckdb.so in the context of
pg_duckdb so that it doesn't conflict with libduckdb.so referenced by
pg_mooncake.
- use a version map to rename the duckdb symbols to a version specific
name
  - DUCKDB_1.1.3 for pg_mooncake
  - DUCKDB_1.2.0 for pg_duckdb 

For the concept of version maps see
- https://www.man7.org/conf/lca2006/shared_libraries/slide19a.html
-
https://peeterjoot.com/2019/09/20/an-example-of-linux-glibc-symbol-versioning/
- https://akkadia.org/drepper/dsohowto.pdf
2025-02-24 17:50:49 +00:00
JC Grünhage
1f0dea9a1a feat(ci): push container images to ghcr.io as well (#10945)
## Problem
There's new rate-limits coming on docker hub. To reduce our reliance on
docker hub and the problems the limits are going to cause for us, we
want to prepare for this by also pushing our container images to ghcr.io

## Summary of changes
Push our images to ghcr.io as well and not just docker hub.
2025-02-24 17:45:23 +00:00
Heikki Linnakangas
40acb0c06d Fix usage of WaitEventSetWait() with timeout (#10947)
WaitEventSetWait() returns the number of "events" that happened, and
only that many events in the WaitEvent array are updated. When the
timeout is reached, it returns 0 and does not modify the WaitEvent array
at all. We were reading 'event.events' without checking the return
value, which would be uninitialized when the timeout was hit.

No test included, as this is harmless at the moment. But this caused the
test I'm including in PR #10882 to fail. That PR changes the logic to
loop back to retry the PQgetCopyData() call if WL_SOCKET_READABLE was
set. Currently, making an extra call to PQconsumeInput() is harmless,
but with that change in logic, it turns into a busy-wait.
2025-02-24 17:15:07 +00:00
Arpad Müller
df362de0dd Reject basebackup requests for archived timelines (#10828)
For archived timelines, we would like to prevent all non-pageserver
issued getpage requests, as users are not supposed to send these.
Instead, they should unarchive a timeline before issuing any external
read traffic.

As this is non-trivial to do, at least prevent launches of new computes,
by errorring on basebackup requests for archived timelines. In #10688,
we started issuing a warning instead of an error, because an error would
mean a stuck project. Now after we can confirm the the warning is not
present in the logs for about a week, we can issue errors.

Follow-up of #10688 
Related: #9548
2025-02-24 16:38:13 +00:00
Alex Chi Z.
5fad4a4cee feat(storcon): chaos injection of force exit (#10934)
## Problem

close https://github.com/neondatabase/cloud/issues/24485

## Summary of changes

This patch adds a new chaos injection mode for the storcon. The chaos
injector reads the crontab and exits immediately at the configured time.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-24 15:30:21 +00:00
Arpad Müller
fdde58120c Upgrade proxy crates to edition 2024 (#10942)
This upgrades the `proxy/` crate as well as the forked libraries in
`libs/proxy/` to edition 2024.

Also reformats the imports of those forked libraries via:

```
cargo +nightly fmt -p proxy -p postgres-protocol2 -p postgres-types2 -p tokio-postgres2 -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true
```

It can be read commit-by-commit: the first commit has no formatting
changes, only changes to accomodate the new edition.

Part of #10918
2025-02-24 15:26:28 +00:00
Vlad Lazar
459446fcb8 pagesever: include visible layers in heatmaps after unarchival (#10880)
## Problem

https://github.com/neondatabase/neon/pull/10788 introduced an API for
warming up attached locations
by downloading all layers in the heatmap. We intend to use it for
warming up timelines after unarchival too,
but it doesn't work. Any heatmap generated after the unarchival will not
include our timeline, so we've lost
all those layers.

## Summary of changes

Generate a cheeky heatmap on unarchival. It includes all the visible
layers.
Use that as the `PreviousHeatmap` which inputs into actual heatmap
generation.

Closes: https://github.com/neondatabase/neon/issues/10541
2025-02-24 15:21:17 +00:00
Alexander Bayandin
17724a19e6 CI(allure-reports): update dependencies and cleanup code (#10794)
## Problem

There are a bunch of minor improvements that are too small and
insignificant as is, so collecting them in one PR.

## Summary of changes
- Add runner arch to artifact name to make it easier to distinguish
files on S3
([ref](https://neondb.slack.com/archives/C059ZC138NR/p1739365938371149))
- Use `github.event.pull_request.number` instead of parsing
`$GITHUB_EVENT_PATH` file
- Update Allure CLI and `allure-pytest`
2025-02-24 15:07:14 +00:00
John Spray
2a5d7e5a78 tests: improve compat test coverage of controller-pageserver interaction (#10848)
## Problem

We failed to detect https://github.com/neondatabase/neon/pull/10845
before merging, because the tests we run with a matrix of component
versions didn't include the ones that did live migrations.

## Summary of changes

- Do a live migration during the storage controller smoke test, since
this is a pretty core piece of functionality
- Apply a compat version matrix to the graceful cluster restart test,
since this is the functionality that we most urgently need to work
across versions to make deploys work.

I expect the first CI run of this to fail, because
https://github.com/neondatabase/neon/pull/10845 isn't merged yet.
2025-02-24 12:22:22 +00:00
Conrad Ludgate
fb77f28326 feat(proxy): add direction and private link id to billing export (#10925)
ref: https://github.com/neondatabase/cloud/issues/23385

Adds a direction flag as well as private-link ID to the traffic
reporting pipeline. We do not yet actually count ingress, but we include
the flag anyway.

I have additionally moved vpce_id string parsing earlier, since we
expect it to be utf8 (ascii).
2025-02-24 11:49:11 +00:00
Heikki Linnakangas
a6f315c9c9 Remove unnecessary dependencies to synchronous 'postgres' crate (#10938)
The synchronous 'postgres' crate is just a wrapper around the async
'tokio_postgres' crate. Some places were unnecessarily using the
re-exported NoTls and Error from the synchronous 'postgres' crate, even
though they were otherwise using the 'tokio_postgres' crate. Tidy up by
using the tokio_postgres types directly.
2025-02-24 09:40:25 +00:00
Alexey Kondratov
df264380b9 fix(compute_ctl): Skip invalid DBs in PerDatabasePhase (#10910)
## Problem

After refactoring the configuration code to phases, it became a bit
fuzzy who filters out DBs that are not present in Postgres, are invalid,
or have `datallowconn = false`. The first 2 are important for the DB
dropping case, as we could be in operation retry, so DB could be already
absent in Postgres or invalid (interrupted `DROP DATABASE`).

Recent case:
https://neondb.slack.com/archives/C03H1K0PGKH/p1740053359712419

## Summary of changes

Add a common code that filters out inaccessible DBs inside
`ApplySpecPhase::RunInEachDatabase`.
2025-02-21 21:50:50 +00:00
Arpad Müller
4bbe75de8c Update vm_monitor to edition 2024 (#10916)
Updates `vm_monitor` to edition 2024. We like to stay on the latest
edition if possible. There is no functional changes, it's only changes
due to the rustfmt edition.

part of https://github.com/neondatabase/neon/issues/10918
2025-02-21 20:29:05 +00:00
Tristan Partin
c0c3ed94a9 Fix flaky test_compute_installed_extensions_metric test (#10933)
There was a race condition with compute_ctl and the metric being
collected related to whether the neon extension had been updated or not.
compute_ctl will run `ALTER EXTENSION neon UPDATE` on compute start in
the postgres database.

Fixes: https://github.com/neondatabase/neon/issues/10932

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-21 18:29:48 +00:00
Konstantin Knizhnik
b1d8771d5f Store prefetch results in LFC cache once as soon as they are received (#10442)
## Problem

Prefetch is performed locally, so different backers can request the same
pages form PS.
Such duplicated request increase load of page server and network
traffic.
Making prefetch global seems to be very difficult and undesirable,
because different queries can access chunks on different speed. Storing
prefetch chunks in LFC will not completely eliminate duplicates, but can
minimise such requests.

The problem with storing prefetch result in LFC is that in this case
page is not protected by share buffer lock.
So we will have to perform extra synchronisation at LFC side.

See:

https://neondb.slack.com/archives/C0875PUD0LC/p1736772890602029?thread_ts=1736762541.116949&cid=C0875PUD0LC

@MMeent implementation of prewarm:
See https://github.com/neondatabase/neon/pull/10312/


## Summary of changes

Use conditional variables to sycnhronize access to LFC entry.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-21 16:56:16 +00:00
Vlad Lazar
3e82addd64 storcon: use Duration for duration's in the storage controller tenant config (#10928)
## Problem

The storage controller treats durations in the tenant config as strings.
These are loaded from the db.
The pageserver maps these durations to a seconds only format and we
always get a mismatch compared
to what's in the db.

## Summary of changes

Treat durations as durations inside the storage controller and not as
strings.
Nothing changes in the cross service API's themselves or the way things
are stored in the db.

I also added some logging which I would have made the investigation a
10min job:
1. Reason for why the reconciliation was spawned
2. Location config diff between the observed and wanted states
2025-02-21 15:45:00 +00:00
John Spray
5e3c234edc storcon: do more concurrent optimisations (#10929)
## Problem

Now that we rely on the optimisation logic to handle fixing things up
after some tenants are in the wrong AZ (e.g. after node failure), it's
no longer appropriate to treat optimisations as an ultra-low-priority
task. We used to reflect that low priority with a very low limit on
concurrent execution, such that we would only migrate 2 things every 20
seconds.

## Summary of changes

- Increase MAX_OPTIMIZATIONS_EXEC_PER_PASS from 2 to 16
- Increase MAX_OPTIMIZATIONS_PLAN_PER_PASS from 8 to 64.

Since we recently gave user-initiated actions their own semaphore, this
should not risk starving out API requests.
2025-02-21 14:58:49 +00:00
Arpad Müller
ff3819efc7 storcon: infrastructure for safekeeper specific JWT tokens (#10905)
Safekeepers only respond to requests with the per-token scope, or the
`safekeeperdata` JWT scope. Therefore, add infrastructure in the storage
controller for safekeeper JWTs. Also, rename the ambiguous `jwt_token`
to `pageserver_jwt_token`.

Part of #9011
Related: https://github.com/neondatabase/cloud/issues/24727
2025-02-21 11:02:02 +00:00
Arpad Müller
f927ae6e15 Return a json response in scheduling_policy handler (#10904)
Return an empty json response in the `scheduling_policy` handler.

This prevents errors of the form:

```
Error: receive body: error decoding response body: EOF while parsing a value at line 1 column 0
```

when setting the scheduling policy via the `storcon_cli`.

part of #9011.
2025-02-21 11:01:57 +00:00
Heikki Linnakangas
61d385caea Split plv8 build into two parts (#10920)
Plv8 consists of two parts:
1. the V8 engine, which is built from vendored sources, and
2. the PostgreSQL extension.

Split those into two separate steps in the Dockerfile. The first step
doesn't need any PostgreSQL sources or any other files from the neon
repository, just the build tools and the upstream plv8 sources. Use the
build-deps image as the base for that step, so that the layer can be
cached and doesn't need to be rebuilt every time. This is worthwhile
because the V8 build takes a very long time.
2025-02-21 09:03:54 +00:00
Alex Chi Z.
c214c32d3f fix(pageserver): avoid creating empty job for gc-compaction (#10917)
## Problem

This should be one last fix for
https://github.com/neondatabase/neon/issues/10517.

## Summary of changes

If a keyspace is empty, we might produce a gc-compaction job which
covers no layer files. We should avoid generating such jobs so that the
gc-compaction image layer can cover the full key range.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-21 01:56:30 +00:00
Erik Grinaker
9b42d1ce1a pageserver: periodically log slow ongoing getpage requests (#10906)
## Problem

We don't have good observability for "stuck" getpage requests.

Resolves https://github.com/neondatabase/cloud/issues/23808.

## Summary of changes

Log a periodic warning (every 30 seconds) if GetPage request execution
is slow to complete, to aid in debugging stuck GetPage requests.

This does not cover response flushing (we have separate logging for
that), nor reading the request from the socket and batching it (expected
to be insignificant and not straightforward to handle with the current
protocol).

This costs 95 nanoseconds on the happy path when awaiting a
`tokio::task::yield_now()`:

```
warn_slow/enabled=false time:   [45.716 ns 46.116 ns 46.687 ns]
warn_slow/enabled=true  time:   [141.53 ns 141.83 ns 142.18 ns]
```
2025-02-20 21:38:42 +00:00
Konstantin Knizhnik
0b9b391ea0 Fix caclulation of prefetch ring position to fit in-flight request in resized ring buffer (#10899)
## Problem

Refer https://github.com/neondatabase/neon/issues/10885

Wait position in ring buffer to restrict number of in-flight requests is
not correctly calculated.

## Summary of changes

Update condition and remove redundant assertion

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-20 20:21:44 +00:00
Heikki Linnakangas
3f376e44ba Temporarily disable pg_duckdb (#10909)
It clashed with pg_mooncake

This is the same as the hotfix #10908 , but for the main branch, to keep
the release and main branches in sync. In particular, we don't want to
accidentally revert this temporary fix, if we cut a new release from
main.
2025-02-20 19:48:06 +00:00
Arpad Müller
5b81a774fc Update rust to 1.85.0 (#10914)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Announcement blog
post](https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html).

Prior update was in #10618.
2025-02-20 19:16:22 +00:00
Konstantin Knizhnik
bd335fa751 Fix prototype of CheckPointReplicationState (#10907)
## Problem

Occasionally removed (void) from definition of
`CheckPointReplicationState` function

## Summary of changes

Restore function prototype.

https://github.com/neondatabase/postgres/pull/585
https://github.com/neondatabase/postgres/pull/586

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-20 18:29:14 +00:00
Vlad Lazar
34996416d6 pageserver: guard against WAL gaps in the interpreted protocol (#10858)
## Problem

The interpreted SK <-> PS protocol does not guard against gaps (neither
does the Vanilla one, but that's beside the point).

## Summary of changes

Extend the protocol to include the start LSN of the PG WAL section from
which the records were interpreted.
Validation is enabled via a config flag on the pageserver and works as
follows:

**Case 1**: `raw_wal_start_lsn` is smaller than the requested LSN
There can't be gaps here, but we check that the shard received records
which it hasn't seen before.

**Case 2**: `raw_wal_start_lsn` is equal to the requested LSN
This is the happy case. No gap and nothing to check

**Case 3**: `raw_wal_start_lsn` is greater than the requested LSN
This is a gap.

To make Case 3 work I had to bend the protocol a bit.
We read record chunks of WAL which aren't record aligned and feed them
to the decoder.
The picture below shows a shard which subscribes at a position somewhere
within Record 2.
We already have a wal reader which is below that position so we wait to
catch up.
We read some wal in Read 1 (all of Record 1 and some of Record 2). The
new shard doesn't
need Record 1 (it has already processed it according to the starting
position), but we read
past it's starting position. When we do Read 2, we decode Record 2 and
ship it off to the shard,
but the starting position of Read 2 is greater than the starting
position the shard requested.
This looks like a gap.


![image](https://github.com/user-attachments/assets/8aed292e-5d62-46a3-9b01-fbf9dc25efe0)

To make it work, we extend the protocol to send an empty
`InterpretedWalRecords` to shards
if the WAL the records originated from ends the requested start
position. On the pageserver,
that just updates the tracking LSNs in memory (no-op really). This gives
us a workaround for
the fake gap.

As a drive by, make `InterpretedWalRecords::next_record_lsn` mandatory
in the application level definition.
It's always included.

Related: https://github.com/neondatabase/cloud/issues/23935
2025-02-20 17:49:05 +00:00
Tristan Partin
d571553d8a Remove hacks in compute_ctl related to compute ID (#10751) 2025-02-20 17:31:52 +00:00
Tristan Partin
f7474d3f41 Remove forward compatibility hacks related to compute HTTP servers (#10797)
These hacks were added to appease the forward compatibility tests and
can be removed.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-20 17:31:42 +00:00
Dmitrii Kovalkov
e808e9432a storcon: use https for pageservers (#10759)
## Problem

Storage controller uses unsecure http for pageserver API.

Closes: https://github.com/neondatabase/cloud/issues/23734
Closes: https://github.com/neondatabase/cloud/issues/24091

## Summary of changes

- Add an optional `listen_https_port` field to storage controller's Node
state and its API (RegisterNode/ListNodes/etc).
- Allow updating `listen_https_port` on node registration to gradually
add https port for all nodes.
- Add `use_https_pageserver_api` CLI option to storage controller to
enable https.
- Pageserver doesn't support https for now and always reports
`https_port=None`. This will be addressed in follow-up PR.
2025-02-20 17:16:04 +00:00
Anastasia Lubennikova
7c7180a79d Fix deadlock in drop_subscriptions_before_start (#10806)
ALTER SUBSCRIPTION requires AccessExclusive lock
which conflicts with iteration over pg_subscription when multiple
databases are present
and operations are applied concurrently.

Fix by explicitly locking pg_subscription
in the beginning of the transaction in each database.

## Problem
https://github.com/neondatabase/cloud/issues/24292
2025-02-20 17:14:16 +00:00
Erik Grinaker
07bee60037 pageserver: make compaction walredo errors critical (#10884)
Mark walredo errors as critical too.

Also pull the pattern matching out into the outer `match`.

Follows #10872.
2025-02-20 12:08:54 +00:00
Erik Grinaker
f7edcf12e3 pageserver: downgrade ephemeral layer roll wait message (#10883)
We already log a message for this during the L0 flush, so the additional
message is mostly noise.
2025-02-20 12:08:30 +00:00
a-masterov
1d9346f8b7 Add pg_repack test (#10638)
## Problem
We don't test `pg_repack`
## Summary of changes
The test for `pg_repack` is added
2025-02-20 10:05:01 +00:00
Folke Behrens
c962f2b447 Merge pull request #10903 from neondatabase/rc/release-proxy/2025-02-20
Proxy release 2025-02-20
2025-02-20 10:37:47 +01:00
Konstantin Knizhnik
a6d8640d6f Persist pg_stat information in pageserver (#6560)
## Problem

Statistic is saved in local file and so lost on compute restart.

Persist in in page server using the same AUX file mechanism used for
replication slots

See more about motivation in
https://neondb.slack.com/archives/C04DGM6SMTM/p1703077676522789

## Summary of changes

Persist postal file using AUX mechanism


Postgres PRs:
https://github.com/neondatabase/postgres/pull/547
https://github.com/neondatabase/postgres/pull/446
https://github.com/neondatabase/postgres/pull/445

Related to #6684 and #6228

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-20 06:38:55 +00:00
github-actions[bot]
446b3f9d28 Proxy release 2025-02-20 2025-02-20 06:02:01 +00:00
Arpad Müller
bb7e244a42 storcon: fix heartbeats timing out causing a panic (#10902)
Fix an issue caused by PR
https://github.com/neondatabase/neon/pull/10891: we introduced the
concept of timeouts for heartbeats, where we would hang up on the other
side of the oneshot channel if a timeout happened (future gets
cancelled, receiver is dropped).

This hang up would make the heartbeat task panic when it did obtain the
response, as we unwrap the result of the result sending operation. The
panic would lead to the heartbeat task panicing itself, which is then
according to logs the last sign of life we of that process invocation.
I'm not sure what brings down the process, in theory tokio [should
continue](https://docs.rs/tokio/latest/tokio/runtime/enum.UnhandledPanic.html#variant.Ignore),
but idk.

Alternative to #10901.
2025-02-19 23:04:05 +00:00
Arpad Müller
787b98f8f2 storcon: log all safekeepers marked as offline (#10898)
Doing this to help debugging offline safekeepers.

Part of https://github.com/neondatabase/neon/issues/9011
2025-02-19 20:45:22 +00:00
Vlad Lazar
f148d71d9b test: disable background heatmap uploads and downloads in cold migration test (#10895)
## Problem

Background heatmap uploads and downloads were blocking the ones done
manually by the test.

## Summary of changes

Disable Background heatmap uploads and downloads for the cold migration
test. The test does
them explicitly.
2025-02-19 19:30:17 +00:00
JC Grünhage
aad817d806 refactor(ci): use reusable push-to-container-registry workflow for pinning the build-tools image (#10890)
## Problem
Pinning build tools still replicated the ACR/ECR/Docker Hub login and
pushing, even though we have a reusable workflow for this. Was mentioned
as a TODO in https://github.com/neondatabase/neon/pull/10613.

## Summary of changes
Reuse `_push-to-container-registry.yml` for pinning the build-tools
images.
2025-02-19 17:26:09 +00:00
Erik Grinaker
0b3db74c44 libs: remove unnecessary regex in pprof::symbolize (#10893)
`pprof::symbolize()` used a regex to strip the Rust monomorphization
suffix from generic methods. However, the `backtrace` crate can do this
itself if formatted with the `:#` flag.

Also tighten up the code a bit.
2025-02-19 17:11:12 +00:00
Arpad Müller
9ba2a87e69 storcon: sk heartbeat fixes (#10891)
This PR does the following things:

* The initial heartbeat round blocks the storage controller from
becoming online again. If all safekeepers are unresponsive, this can
cause storage controller startup to be very slow. The original intent of
#10583 was that heartbeats don't affect normal functionality of the
storage controller. So add a short timeout to prevent it from impeding
storcon functionality.

* Fix the URL of the utilization endpoint.

* Don't send heartbeats to safekeepers which are decomissioned.

Part of https://github.com/neondatabase/neon/issues/9011

context: https://neondb.slack.com/archives/C033RQ5SPDH/p1739966807592589
2025-02-19 16:57:11 +00:00
Alex Chi Z.
1f9511dbd9 feat(pageserver): yield image creation to L0 compactions across timelines (#10877)
## Problem

A simpler version of https://github.com/neondatabase/neon/pull/10812

## Summary of changes

Image layer creation will be preempted by L0 accumulated on other
timelines. We stop image layer generation if there's a pending L0
compaction request.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-19 15:10:12 +00:00
Erik Grinaker
aab5482fd5 storcon: add CPU/heap profiling endpoints (#10894)
Adds CPU/heap profiling for storcon.

Also fixes allowlists to match on the path only, since profiling
endpoints take query parameters.

Requires #10892 for heap profiling.
2025-02-19 14:43:29 +00:00
Erik Grinaker
3720cf1c5a storcon: use jemalloc (#10892)
## Problem

We'd like to enable CPU/heap profiling for storcon. This requires
jemalloc.

## Summary of changes

Use jemalloc as the global allocator, and enable heap sampling for
profiling.
2025-02-19 14:20:51 +00:00
Erik Grinaker
0453eaf65c pageserver: reduce default compaction_upper_limit to 20 (#10889)
## Problem

We've seen the previous default of 50 cause OOMs. Compacting many L0
layers at once now has limited benefit, since the cost is mostly linear
anyway. This is already being reduced to 20 in production settings.

## Summary of changes

Reduce `DEFAULT_COMPACTION_UPPER_LIMIT` to 20.

Once released, let's remove the config overrides.
2025-02-19 14:12:05 +00:00
Heikki Linnakangas
2d96134a4e Remove unused dependencies (#10887)
Per cargo machete.
2025-02-19 14:09:01 +00:00
JC Grünhage
e52e93797f refactor(ci): use variables for AWS account IDs (#10886)
## Problem
Our AWS account IDs are copy-pasted all over the place. A wrong paste
might only be caught late if we hardcode them, but will get flagged
instantly by actionlint if we access them from github actions variables.
Resolves https://github.com/neondatabase/neon/issues/10787, follow-up
for https://github.com/neondatabase/neon/pull/10613.

## Summary of changes
Access AWS account IDs using Github Actions variables.
2025-02-19 12:34:41 +00:00
Erik Grinaker
aa115a774c storcon: eagerly attempt autosplits (#10849)
## Problem

Autosplits are crucial for bulk ingest performance. However, autosplits
were only attempted when there was no other pending work. This could
cause e.g. mass AZ affinity violations following Pageserver restarts to
starve out autosplits for hours.

Resolves #10762.

## Summary of changes

Always attempt autosplits in the background reconciliation loop,
regardless of other pending work.
2025-02-19 09:01:02 +00:00
Peter Bendel
2f0d6571a9 add a variant to ingest benchmark with shard-splitting disabled (#10876)
## Problem

we measure ingest performance for a few variants (stripe-sizes,
pre-sharded, shard-splitted).
However some phenomena (e.g. related to L0 compaction) in PS can be
better observed and optimized with un-sharded tenants.

## Summary of changes

- Allow to create projects with a policy that disables sharding
(`{"scheduling": "Essential"}`)
- add a variant to ingest_benchmark that uses that policy for the new
project

## Test run
https://github.com/neondatabase/neon/actions/runs/13396325970
2025-02-19 08:43:53 +00:00
a-masterov
7199919f04 Fix the problems discovered in the upgrade test (#10826)
## Problem
The nightly test discovered problems in the extensions upgrade test.
1. `PLv8` has different versions on PGv17 and PGv16 and a different test
set, which was not implemented correctly
[sample](https://github.com/neondatabase/neon/actions/runs/13382330475/job/37372930271)
2. The same for `semver`
[sample](https://github.com/neondatabase/neon/actions/runs/13382330475/job/37372930017)
3. `pgtap` interfered with the other tests, e.g. tables, created by
other extensions caused the tests to fail.

## Summary of changes
The discovered problems were fixed.
1. The tests list for `PLv8` is now generated using the original
Makefile
2. The patches for `semver` are now split for PGv16 and PGv17.
3. `pgtap` is being tested in a separate database now.

---------

Co-authored-by: Mikhail Kot <mikhail@neon.tech>
2025-02-19 06:40:09 +00:00
Alex Chi Z.
a4e3989c8d fix(pageserver): make repartition error critical (#10872)
## Problem

Read errors during repartition should be a critical error.

## Summary of changes

<del>We only have one call site</del> We have two call sites of
`repartition` where one of them is during the initial image upload
optimization and another is during image layer creation, so I added a
`critical!` here instead of inside `collect_keyspace`.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-18 20:19:23 +00:00
Peter Bendel
9d074db18d Use link to cross-service-endpoint dashboard in allure reports and benchmarking workflow logs (#10874)
## Problem

We have links to deprecated dashboards in our logs

Example
https://github.com/neondatabase/neon/actions/runs/13382454571/job/37401983608#step:8:348

## Summary of changes

Use link to cross service endpoint instead.

Example:
https://github.com/neondatabase/neon/actions/runs/13395407925/job/37413056148#step:7:345
2025-02-18 19:54:21 +00:00
Alex Chi Z.
538ea03f73 feat(pageserver): allow read path debug in getpagelsn API (#10748)
## Problem

The usual workflow for me to debug read path errors in staging is:
download the tenant to my laptop, import, and then run some read tests.

With this patch, we can do this directly over staging pageservers.

## Summary of changes

* Add a new `touchpagelsn` API that does a page read but does not return
page info back.
* Allow read from latest record LSN from get/touchpagelsn
* Add read_debug config in the context.
* The read path will read the context config to decide whether to enable
read path tracing or not.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-18 18:54:53 +00:00
Erik Grinaker
cb8060545d pageserver: don't log noop image compaction (#10873)
## Problem

We log image compaction stats even when no image compaction happened.
This is logged every 10 seconds for every timeline.

## Summary of changes

Only log when we actually performed any image compaction.
2025-02-18 17:49:01 +00:00
JC Grünhage
9151d3a318 feat(ci): notify storage oncall if deploy job fails on release branch (#10865)
## Problem
If the deploy job on the release branch doesn't succeed, the preprod
deployment will not have happened. It was requested that this triggers a
notification in https://github.com/neondatabase/neon/issues/10662.

## Summary of changes
If we're on the release branch and the deploy job doesn't end up in
"success", notify storage oncall on slack.
2025-02-18 17:20:03 +00:00
Anastasia Lubennikova
381115b68e Add pgaudit and pgauditlogtofile extensions (#10763)
to compute image.

This commit doesn't enable anything yet. 
It is a preparatory work for enabling audit logging in computes.
2025-02-18 16:32:32 +00:00
Vlad Lazar
1a69a8cba7 storage: add APIs for warming up location after cold migrations (#10788)
## Problem

We lack an API for warming up attached locations based on the heatmap
contents.
This is problematic in two places:
1. If we manually migrate and cut over while the secondary is still cold
2. When we re-attach a previously offloaded tenant

## Summary of changes

https://github.com/neondatabase/neon/pull/10597 made heatmap generation
additive
across migrations, so we won't clobber it a after a cold migration. This
allows us to implement:

1. An endpoint for downloading all missing heatmap layers on the
pageserver:

`/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`.
Only one such operation per timeline is allowed at any given time. The
granularity is tenant shard.
2. An endpoint to the storage controller to trigger the downloads on the
pageserver:

`/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`.
This works both at
tenant and tenant shard level. If an unsharded tenant id is provided,
the operation is started on
all shards, otherwise only the specified shard.
3. A storcon cli command. Again, tenant and tenant-shard level
granularities are supported.

Cplane will call into storcon and trigger the downloads for all shards.
When we want to rescue a migration, we will use storcon cli targeting
the specific tenant shard.

Related:  https://github.com/neondatabase/neon/issues/10541
2025-02-18 16:09:06 +00:00
Alex Chi Z.
ed98f6d57e feat(pageserver): log lease request (#10832)
## Problem

To investigate https://github.com/neondatabase/cloud/issues/23650

## Summary of changes

We log lease requests to see why there are clients accessing things
below gc_cutoff.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-18 16:06:39 +00:00
Alex Chi Z.
f9a063e2e9 test(pageserver): fix test_pageserver_gc_compaction_idempotent (#10833)
## Problem

ref https://github.com/neondatabase/neon/issues/10517

## Summary of changes

For some reasons the job split algorithm decides to have different image
coverage range for two compactions before/after restart. So we remove
the subcompaction key range and let it generate an image covering the
full range, which should make the test more stable.

Also slightly tuned the logging span.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-18 16:06:20 +00:00
Heikki Linnakangas
f36ec5c84b chore(compute): Postgres 17.4, 16.8, 15.12 and 14.17 (#10868)
Update all minor versions. No conflicts.

Postgres repository PRs:
- https://github.com/neondatabase/postgres/pull/584
- https://github.com/neondatabase/postgres/pull/583
- https://github.com/neondatabase/postgres/pull/582
- https://github.com/neondatabase/postgres/pull/581
2025-02-18 15:56:43 +00:00
Alexander Bayandin
274cb13293 test_runner: fix mismatch versions tests on linux (#10869)
## Problem

Tests with mixed-version binaries always use the latest binaries on CI
([an
example](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10848/13378137061/index.html#suites/8fc5d1648d2225380766afde7c428d81/1ccefc4cfd4ef176/)):

The versions of new `storage_broker` and old `pageserver` are the same:
`b45254a5605f6fdafdf475cdd3e920fe00898543`.

This affects only Linux, on macOS the version mixed correctly.

## Summary of changes
- Use hardlinks instead of symlinks to create a directory with
mixed-version binaries
2025-02-18 15:52:00 +00:00
Alex Chi Z.
290f007b8e Revert "feat(pageserver): repartition on L0-L1 boundary (#10548)" (#10870)
This reverts commit 443c8d0b4b.

## Problem

We observe a massive amount of compaction errors.

## Summary of changes

If the tenant did not write any L1 layers (i.e., they accumulate L0
layers where number of them is below L0 threshold), image creation will
always fail. Therefore, it's not correct to simply use the
disk_consistent_lsn or L0/L1 boundary for the image creation.
2025-02-18 15:43:33 +00:00
Alexander Lakhin
29e4ca351e Pass asan/ubsan options to pg_dump/pg_restore started by fast_import (#10866) 2025-02-18 15:41:20 +00:00
Arpad Müller
caece02da7 move pull_timeline to safekeeper_api and add SafekeeperGeneration (#10863)
Preparations for a successor of #10440: 

* move `pull_timeline` to `safekeeper_api` and add it to
`SafekeeperClient`. we want to do `pull_timeline` on any creations that
we couldn't do initially.
* Add a `SafekeeperGeneration` type instead of relying on a type alias.
we want to maintain a safekeeper specific generation number now in the
storcon database. A separate type is important to make it impossible to
mix it up with the tenant's pageserver specific generation number. We
absolutely want to avoid that for correctness reasons. If someone mixes
up a safekeeper and pageserver id (both use the `NodeId` type), that's
bad but there is no wrong generations flying around.

part of #9011
2025-02-18 14:02:22 +00:00
Arseny Sher
d36baae758 Add gc_blocking and restore latest_gc_cutoff in openapi spec (#10867)
## Problem

gc_blocking is missing in the tenant info, but cplane wants to use it.
Also, https://github.com/neondatabase/neon/pull/10707/ removed
latest_gc_cutoff from the spec, renaming it to applied_gc_cutoff.
Temporarily get it back until cplane migrates.

## Summary of changes

Add them.

ref https://neondb.slack.com/archives/C03438W3FLZ/p1739877734963979
2025-02-18 13:57:12 +00:00
Alexander Lakhin
f81259967d Add test to make sure sanitizers really work when expected (#10838) 2025-02-18 13:23:18 +00:00
Conrad Ludgate
719ec378cd fix(local_proxy): discard all in tx (#10864)
## Problem

`discard all` cannot run in a transaction (even if implicit)

## Summary of changes

Split up the query into two, we don't need transaction support.
2025-02-18 08:54:20 +00:00
Alexander Bayandin
27241f039c test_runner: fix neon_local usage for version mismatch tests (#10859)
## Problem

Tests with mixed versions of binaries always pick up new versions if
services are started using `neon_local`.

## Summary of changes
- Set `neon_local_binpath` along with `neon_binpath` and
`pg_distrib_dir` for tests with mixed versions
2025-02-17 20:29:14 +00:00
Heikki Linnakangas
811506aaa2 fast_import: Use rust s3 client for uploading (#10777)
This replaces the use of the awscli utility. awscli binary is massive,
it added about 200 MB to the docker image size, while the s3 client was
already a dependency so using that is essentially free, as far as binary
size is concerned.

I implemented a simple upload function that tries to keep 10 uploads
going in parallel. I believe that's the default behavior of the "aws s3
sync" command too.
2025-02-17 20:07:31 +00:00
Heikki Linnakangas
2884917bd4 compute: Allow postgres user to power off the VM also on <= v16 (#10860)
I did this for debian bookworm variant in PR #10710, but forgot to
update the "bullseye" dockerfile that is used to build older PostgreSQL
versions.
2025-02-17 19:42:57 +00:00
Tristan Partin
b34598516f Warn when PR may require regenerating cloud PG settings (#10229)
These generated Postgres settings JSON files can get out of sync causing
the control plane to reject updated to an endpoint or project's Postgres
settings.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-17 19:02:16 +00:00
Erik Grinaker
84bbe87d60 pageserver: tweak pageserver_layers_per_read histogram resolution (#10847)
## Problem

The current `pageserver_layers_per_read` histogram buckets don't
represent the current reality very well. For the percentiles we care
about (e.g. p50 and p99), we often see fairly high read amp, especially
during ingestion, and anything below 4 can be considered very good.
 
## Summary of changes

Change the per-timeline read amp histogram buckets to `[4.0, 8.0, 16.0,
32.0, 64.0, 128.0, 256.0]`.
2025-02-17 17:24:17 +00:00
Vlad Lazar
b10890b81c tests: compare digests in test_peer_recovery (#10853)
## Problem

Test fails when comparing the first WAL segment because the system id in
the segment header is different. The system id is not consistently set
correctly since segments are usually inited on the safekeeper sync step
with sysid 0.

## Summary of Chnages

Compare timeline digests instead. This skips the header.

Closes https://github.com/neondatabase/neon/issues/10596
2025-02-17 16:32:24 +00:00
Conrad Ludgate
3204efc860 chore(proxy): use specially named prepared statements for type-checking (#10843)
I was looking into
https://github.com/neondatabase/serverless/issues/144, I recall previous
cases where proxy would trigger these prepared statements which would
conflict with other statements prepared by our client downstream.

Because of that, and also to aid in debugging, I've made sure all
prepared statements that proxy needs to make have specific names that
likely won't conflict and makes it clear in a error log if it's our
statements that are causing issues
2025-02-17 16:19:57 +00:00
Tristan Partin
da79cc5eee Add neon.extension_server_{connect,request}_timeout (#10801)
Instead of hardcoding the request timeout, let's make it configurable as
a PGC_SUSET GUC.

Additionally, add a connect timeout GUC. Although the extension server
runs on the compute, it is always best to keep operations from hanging.
Better to present a timeout error to the user than a stuck backend.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-17 15:40:43 +00:00
John Spray
39d42d846a pageserver_api: fix decoding old-version TimelineInfo (#10845)
## Problem

In #10707 some new fields were introduced in TimelineInfo.

I forgot that we do not only use TimelineInfo for encoding, but also
decoding when the storage controller calls into a pageserver, so this
broke some calls from controller to pageserver while in a mixed-version
state.

## Summary of changes

- Make new fields have default behavior so that they are optional
2025-02-17 15:04:47 +00:00
Arpad Müller
0330b61729 Azure SDK: use neon branch again (#10844)
Originally I wanted to switch back to the `neon` branch before merging
#10825, but I forgot to do it. Do it in a separate PR now.

No actual change of the source code, only changes the branch name (so
that maybe in a few weeks we can delete the temporary branch
`arpad/neon-rebase`).
2025-02-17 14:59:01 +00:00
Erik Grinaker
8a2d95b4b5 pageserver: appease unused lint on macOS (#10846)
## Problem

`SmgrOpFlushInProgress::measure()` takes a `socket_fd` argument which is
only used on Linux. This causes linter warnings on macOS.

Touches #10823.

## Summary of changes

Add a noop use of `socket_fd` on non-Linux branch.
2025-02-17 14:41:22 +00:00
Konstantin Knizhnik
8c6d133d31 Fix out-of-boundaries access in addSHLL function (#10840)
## Problem

See https://github.com/neondatabase/neon/issues/10839

rho(x,b) functions returns values in range [1,b+1] and addSHLL tries to
store it in array of size b+1.

## Summary of changes

Subtract 1 fro value returned by rho

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-17 12:54:17 +00:00
Arpad Müller
81f08d304a Rebase Azure SDK and apply newest patch (#10825)
The [upstream PR](https://github.com/Azure/azure-sdk-for-rust/pull/1997)
has been merged with some changes to use threads with async, so apply
them to the neon specific fork to be nice to the executor (before, we
had the state as of filing of that PR). Also, rebase onto the latest
version of upstream's `legacy` branch.

current SDK commits:
[link](https://github.com/neondatabase/azure-sdk-for-rust/commits/neon-2025-02-14)
now:
[link](https://github.com/neondatabase/azure-sdk-for-rust/commits/arpad/neon-refresh)

Prior update was in #10790
2025-02-17 10:44:44 +00:00
Peter Bendel
d566d604cf feat(compute) add pg_duckdb extension v0.3.1 (#10829)
We want to host pg_duckdb (starting with v0.3.1) on Neon.

This PR replaces https://github.com/neondatabase/neon/pull/10350 which
was for older pg_duckdb v0.2.0

Use cases
- faster OLAP queries
- access to datelake files (e.g. parquet) on S3 buckets from Neon
PostgreSQL

Because neon does not provide superuser role to neon customers we need
to grant some additional permissions to neon_superuser:

Note: some grants that we require are already granted to `PUBLIC` in new
release of pg_duckdb
[here](3789e4c509/sql/pg_duckdb--0.2.0--0.3.0.sql (L1054))

```sql
GRANT ALL ON FUNCTION duckdb.install_extension(TEXT) TO neon_superuser;
GRANT ALL ON TABLE duckdb.extensions TO neon_superuser;
GRANT ALL ON SEQUENCE duckdb.extensions_table_seq TO neon_superuser;
```
2025-02-17 10:43:16 +00:00
Alexander Lakhin
f739773edd Fix format of milliseconds in pytest output (#10836)
## Problem

The timestamp prefix of pytest log lines contains milliseconds without
leading zeros, so values of milliseconds less than 100 printed
incorrectly.

For example:
```
2025-02-15 12:02:51.997 INFO [_internal.py:97] 127.0.0.1 - - ...
2025-02-15 12:02:52.4   INFO [_internal.py:97] 127.0.0.1 - - ...
2025-02-15 12:02:52.9   INFO [_internal.py:97] 127.0.0.1 - - ...
2025-02-15 12:02:52.23  INFO [_internal.py:97] 127.0.0.1 - - ...
```

## Summary of changes
Fix log_format for pytest so that milliseconds are printed with leading
zeros.
2025-02-16 04:59:52 +00:00
Heikki Linnakangas
2dae0612dd fast_import: Fix shared_buffers setting (#10837)
In commit 9537829ccd I made shared_buffers be derived from the system's
available RAM. However, I failed to remove the old hard-coded
shared_buffers=10GB settings, shared_buffers was set twice. Oopsie.
2025-02-16 00:01:19 +00:00
Alexander Bayandin
2ec8dff6f7 CI(build-and-test-locally): set session-timeout for pytest (#10831)
## Problem

Sometimes, a regression test run gets stuck (taking more than 60
minutes) and is killed by GitHub's `timeout-minutes` without leaving any
traces in the test results database.
I find no correlation between this and either the build type, the
architecture, or the Postgres version.

See: https://neonprod.grafana.net/goto/nM7ih7cHR?orgId=1

## Summary of changes
- Bump `pytest-timeout` to the version that supports `--session-timeout`
- Set `--session-timeout` to (timeout-minutes - 10 minutes) * 60 seconds
in Attempt to stop tests gracefully to generate test reports until they
are forcibly stopped by the stricter `timeout-minutes` limit.
2025-02-15 10:34:11 +00:00
Alex Chi Z.
ae091c6913 feat(pageserver): store reldir in sparse keyspace (#10593)
## Problem

Part of https://github.com/neondatabase/neon/issues/9516

## Summary of changes

This patch adds the support for storing reldir in the sparse keyspace.
All logic are guarded with the `rel_size_v2_enabled` flag, so if it's
set to false, the code path is exactly the same as what's currently in
prod.

Note that we did not persist the `rel_size_v2_enabled` flag and the
logic around it will be implemented in the next patch. (i.e., what if we
enabled it, restart the pageserver, and then it gets set to false? we
should still read from v2 using the rel_size_v2_migration_status in the
index_part). The persistence logic I'll implement in the next patch will
disallow switching from v2->v1 via config item.

I also refactored the metrics so that it can work with the new reldir
store. However, this metric is not correctly computed for reldirs (see
the comments) before. With the refactor, the value will be computed only
when we have an initial value for the reldir size. The refactor keeps
the incorrectness of the computation when there are more than 1
database.

For the tests, we currently run all the tests with v2, and I'll set it
to false and add some v2-specific tests before merging, probably also
v1->v2 migration tests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-14 20:31:54 +00:00
Christian Schwarz
a32e8871ac compute/pageserver: correlation of logs through backend PID (via application_name) (#10810)
This PR makes compute set the `application_name` field to the
PG backend process PID which is also included in each compute log line.

This allows correlation of Pageserver connection logs with compute logs
in a way that was guesswork before this PR.

In future, we can switch for a more unique identifier for a page_service
session.

Refs
- discussion in
https://neondb.slack.com/archives/C08DE6Q9C3B/p1739465208296169?thread_ts=1739462628.361019&cid=C08DE6Q9C3B
- fixes https://github.com/neondatabase/neon/issues/10808
2025-02-14 20:11:42 +00:00
Christian Schwarz
9177312ba6 basebackup: use Timeline::get for get_rel instead of get_rel_page_at_lsn (#10476)
I noticed the opportunity to simplify here while working on
https://github.com/neondatabase/neon/pull/9353 .

The only difference is the zero-fill behavior: if one reads past rel
size,
`get_rel_page_at_lsn` returns a zeroed page whereas `Timeline::get`
returns an error.

However, the `endblk` is at most rel size large, because `nblocks` is eq
`get_rel_size`, see a few lines above this change.

We're using the same LSN (`self.lsn`) for everything, so there is no
chance of non-determinism.

Refs:

- Slack discussion debating correctness:
https://neondb.slack.com/archives/C033RQ5SPDH/p1737457010607119
2025-02-14 17:57:18 +00:00
Christian Schwarz
b992a1a62a page_service: include socket send & recv queue length in slow flush log mesage (#10823)
# Summary

In 
- https://github.com/neondatabase/neon/pull/10813

we added slow flush logging but it didn't log the TCP send & recv queue
length.
This PR adds that data to the log message.

I believe the implementation to be safe & correct right now, but it's
brittle and thus this PR should be reverted or improved upon once the
investigation is over.

Refs:
- stacked atop https://github.com/neondatabase/neon/pull/10813
- context:
https://neondb.slack.com/archives/C08DE6Q9C3B/p1739464533762049?thread_ts=1739462628.361019&cid=C08DE6Q9C3B
- improves  https://github.com/neondatabase/neon/issues/10668
- part of https://github.com/neondatabase/cloud/issues/23515

# How It Works

The trouble is two-fold:
1. getting to the raw socket file descriptor through the many Rust types
that wrap it and
2. integrating with the `measure()` function

Rust wraps it in types to model file descriptor lifetimes and ownership,
and usually one can get access using `as_raw_fd()`.
However, we `split()` the stream and the resulting
[`tokio::io::WriteHalf`](https://docs.rs/tokio/latest/tokio/io/struct.WriteHalf.html)
.
Check the PR commit history for my attempts to do it.

My solution is to get the socket fd before we wrap it in our protocol
types, and to store that fd in the new `PostgresBackend::socket_fd`
field.
I believe it's safe because the lifetime of `PostgresBackend::socket_fd`
value == the lifetime of the `TcpStream` that wrap and store in
`PostgresBackend::framed`.
Specifically, the only place that close()s the socket is the `impl Drop
for TcpStream`.
I think the protocol stack calls `TcpStream::shutdown()`, but, that
doesn't `close()` the file descriptor underneath.

Regarding integration with the `measure()` function, the trouble is that
`flush_fut` is currently a generic `Future` type. So, we just pass in
the `socket_fd` as a separate argument.

A clean implementation would convert the `pgb_writer.flush()` to a named
future that provides an accessor for the socket fd while not being
polled.
I tried (see PR history), but failed to break through the `WriteHalf`.


# Testing

Tested locally by running

```
./target/debug/pagebench get-page-latest-lsn --num-clients=1000 --queue-depth=1000
```
in one terminal, waiting a bit, then
```
pkill -STOP pagebench
```
then wait for slow logs to show up in `pageserver.log`.
Pick one of the slow log message's port pairs, e.g., `127.0.0.1:39500`, 
and then checking sockstat output
```
ss -ntp | grep '127.0.0.1:39500'
```

to ensure that send & recv queue size match those in the log message.
2025-02-14 16:20:07 +00:00
Gleb Novikov
3d7a32f619 fast import: allow restore to provided connection string (#10407)
Within https://github.com/neondatabase/cloud/issues/22089 we decided
that would be nice to start with import that runs dump-restore into a
running compute (more on this
[here](https://www.notion.so/neondatabase/2024-Jan-13-Migration-Assistant-Next-Steps-Proposal-Revised-17af189e004780228bdbcad13eeda93f?pvs=4#17af189e004780de816ccd9c13afd953))
We could do it by writing another tool or by extending existing
`fast_import.rs`, we chose the latter.

In this PR, I have added optional `restore_connection_string` as a cli
arg and as a part of the json spec. If specified, the script will not
run postgres and will just perform restore into provided connection
string.

TODO:
- [x] fast_import.rs:
	- [x] cli arg in the fast_import.rs
	- [x] encoded connstring in json spec 
- [x] simplify `fn main` a little, take out too verbose stuff to some
functions
- [ ] ~~allow streaming from dump stdout to restore stdin~~ will do in a
separate PR
- [ ] ~~address
https://github.com/neondatabase/neon/pull/10251#pullrequestreview-2551877845~~
will do in a separate PR
- [x] tests:
    - [x] restore with cli arg in the fast_import.rs
    - [x] restore with encoded connstring in json spec in s3
    - [ ] ~~test with custom dbname~~ will do in a separate PR
- [ ] ~~test with s3 + pageserver + fast import binary~~
https://github.com/neondatabase/neon/pull/10487
- [ ]
~~https://github.com/neondatabase/neon/pull/10271#discussion_r1923715493~~
will do in a separate PR

neondatabase/cloud#22775

---------

Co-authored-by: Eduard Dykman <bird.duskpoet@gmail.com>
2025-02-14 16:10:06 +00:00
Christian Schwarz
fac5db3c8d page_service: emit periodic log message while response flush is slow (#10813)
The logic might seem a bit intricate / over-optimized, but I recently
spent time benchmarking this code path in the context of a nightly
pagebench regression
(https://github.com/neondatabase/cloud/issues/21759)
and I want to avoid regressing it any further.

Ideally would also log the socket send & recv queue length like we do on
the compute side in
- https://github.com/neondatabase/neon/pull/10673

But that is proving difficult due to the Rust abstractions that wrap the
socket fd.
Work in progress on that is happening in 
- https://github.com/neondatabase/neon/pull/10823

Regarding production impact, I am worried at a theoretical level that
the additional logging may cause a downward spiral in the case where a
pageserver is slow to flush because there is not enough CPU. The logging
would consume more CPU and thereby slow down flushes even more. However,
I don't think this matters practically speaking.


# Refs

- context:
https://neondb.slack.com/archives/C08DE6Q9C3B/p1739464533762049?thread_ts=1739462628.361019&cid=C08DE6Q9C3B
- fixes https://github.com/neondatabase/neon/issues/10668
- part of https://github.com/neondatabase/cloud/issues/23515

# Testing

Tested locally by running

```
./target/debug/pagebench get-page-latest-lsn --num-clients=1000 --queue-depth=1000
```
in one terminal, waiting a bit, then
```
pkill -STOP pagebench
```
then wait for slow logs to show up in `pageserver.log`.
To see that the completion log message is logged, run
```
pkill -CONT pagebench
```
2025-02-14 14:37:03 +00:00
John Spray
a82a6631fd storage controller: prioritize reconciles for user-facing operations (#10822)
## Problem

Some situations may produce a large number of pending reconciles. If we
experience an issue where reconciles are processed more slowly than
expected, that can prevent us responding promptly to user requests like
tenant/timeline CRUD.

This is a cleaner implementation of the hotfix in
https://github.com/neondatabase/neon/pull/10815

## Summary of changes

- Introduce a second semaphore for high priority tasks, with
configurable units (default 256). The intent is that in practical
situations these user-facing requests should never have to wait.
- Use the high priority semaphore for: tenant/timeline CRUD, and shard
splitting operations. Use normal priority for everything else.
2025-02-14 13:25:43 +00:00
Folke Behrens
da7496e1ee proxy: Post-refactor + future clippy lint cleanup (#10824)
* Clean up deps and code after logging and binary refactor
* Also include future clippy lint cleanup
2025-02-14 12:34:09 +00:00
a-masterov
646e011c4d Tests the test-upgrade scripts themselves (#10664)
## Problem
We run the compatibility tests only if we are upgrading the extension.
An accidental code change may break the test itself, so we have to check
this code as well.
## Summary of changes
The test is scheduled once a day to save time and resources.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-02-14 11:41:57 +00:00
Arpad Müller
878c1c7110 offload_timeline: check if the timeline is archived on HasChildren error (#10776)
PR #10305 makes sure that there is no *actual* race, i.e. we will never
attempt to offload a timeline that has just been unarchived, or similar.

However, if a timeline has been unarchived and has children that are
unarchived too, we will get an error log line. Such races can occur as
in compaction we check if the timeline can be offloaded way before we
attempt to offload it: the result might change in the meantime.

This patch checks if the delete guard can't be obtained because the
timeline has unarchived children, and if yes, it does another check for
whether the timeline has become unarchived or not. If it is unarchived,
it just prints an info log msg and integrates itself into the error
suppression logic of the compaction calling into it.

If you squint at it really closely, there is still a possible race in
which we print an error log, but this one is unlikely because the
timeline and its children need to be archived right after the check for
whether the timeline has any unarchived children, and right before the
check whether the timeline is archived. Archival involves a network
operation while nothing between these two checks does that, so it's very
unlikely to happen in real life.


https://github.com/neondatabase/cloud/issues/23979#issuecomment-2651265729
2025-02-14 10:21:50 +00:00
John Spray
996f0a3753 storcon: fix eliding parameters from proxied URL labels (#10817)
## Problem

We had code for stripping IDs out of proxied paths to reduce cardinality
of metrics, but it was only stripping out tenant IDs, and leaving in
timeline IDs and query parameters (e.g. LSN in lsn->timestamp lookups).

## Summary of changes

- Use a more general regex approach.

There is still some risk that a future pageserver API might include a
parameter in `/the/path/`, but we control that API and it is not often
extended. We will also alert on metrics cardinality in staging so that
if we made that mistake we would notice.
2025-02-14 09:57:19 +00:00
Konstantin Knizhnik
8bdb1828c8 Perform seqscan to fill LFC chunks with data so that on-disk file size included size of table (#10775)
## Problem

See https://github.com/neondatabase/neon/issues/10755

Random access pattern of pgbench leaves sparse chunks, which makes the
on-disk size of file.cache unpredictable.

## Summary of changes

Perform seqscan to fill LFC chunks with data so that on-disk file size
included size of table.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-14 08:19:56 +00:00
Alexander Bayandin
3e8bf2159d CI(build-and-test): run benchmarks after deploy job (#10791)
## Problem

`benchmarks` is a long-running and non-blocking job. If, on Staging, a
deploy-blocking job fails, restarting it requires cancelling any running
`benchmarks` jobs, which is a waste of CI resources and requires a
couple of extra clicks for a human to do.

Ref: https://neondb.slack.com/archives/C059ZC138NR/p1739292995400899

## Summary of changes
- Run `benchmarks` after `deploy` job
- Handle `benchmarks` run in PRs with `run-benchmarks` label but without
`deploy` job.
2025-02-13 22:03:47 +00:00
Arpad Müller
5008324460 Fix utilization URL and ensure heartbeats work (#10811)
There was a typo in the name of the utilization endpoint URL, fix it.
Also, ensure that the heartbeat mechanism actually works.

Related: #10583, #10429

Part of #9011
2025-02-13 20:55:53 +00:00
Christian Schwarz
487f3202fe pageserver read path: abort on fatal IO errors from disk / filesystem (#10786)
Before this PR, an IO error returned from the kernel, e.g., due to a bad
disk, would get bubbled up, all the way to a user-visible query failing.

This is against the IO error handling policy where we have established
and is hence being rectified in this PR.
[[(internal Policy document
link)]](bef44149f7/src/storage/handling_io_and_logical_errors.md (L33-L35))

The practice on the write path seems to be that we call
`maybe_fatal_err()` or `fatal_err()` fairly high up the stack.
That is, regardless of whether std::fs, tokio::fs, or VirtualFile is
used to perform the IO.

For the read path, I choose a centralized approach in this PR by
checking for errors as close to the kernel interface as possible.
I believe this is better for long-term consistency.

To mitigate the problem of missing context if we abort so far down in
the stack, the `on_fatal_io_error` now captures and logs a backtrace.

I grepped the pageserver code base for `fs::read` to convince myself
that all non-VirtualFile reads already handle IO errors according to
policy.

Refs

- fixes https://github.com/neondatabase/neon/issues/10454
2025-02-13 20:53:39 +00:00
Alex Chi Z.
6a741fd1c2 fix(pageserver): ensure all basebackup client errors are caught (#10793)
## Problem

We didn't catch all client errors causing alerts.

## Summary of changes

Client errors should be wrapped with ClientError so that it doesn't fire
alerts.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-13 19:38:02 +00:00
a-masterov
7ac7755dad Add tests for pgtap (#10589)
## Problem
We do not test `pgtap` which is shipped with Neon
## Summary of changes
Test and binaries for `pgtap` are added.
2025-02-13 19:04:08 +00:00
Arseny Sher
98e18e9a54 Add s3 storage to test_s3_wal_replay (#10809)
## Problem

The test is flaky: WAL in remote storage appears to be corrupted. One of
hypotheses so far is that corruption is the result of local fs
implementation being non atomic, and safekeepers may concurrently PUT
the same segment. That's dubious though because by looking at local_fs
impl I'd expect then early EOF on segment read rather then observed
zeros in test failures, but other directions seem even less probable.

## Summary of changes

Let's add s3 backend as well and see if it is also flaky. Also add some
more logging around segments uploads.

ref https://github.com/neondatabase/neon/issues/10761
2025-02-13 18:05:15 +00:00
Tristan Partin
0cf9157adc Handle new compute_ctl_config parameter in compute spec requests (#10746)
There is now a compute_ctl_config field in the response that currently
only contains a JSON Web Key set. compute_ctl currently doesn't do
anything with the keys, but will in the future.

The reasoning for the new field is due to the nature of empty computes.
When an empty compute is created, it does not have a tenant. A compute
spec is the primary means of communicating the details of an attached
tenant. In the empty compute state, there is no spec. Instead we wait
for the control plane to pass us one via /configure. If we were to
include the jwks field in the compute spec, we would have a partial
compute spec, which doesn't logically make sense.

Instead, we can have two means of passing settings to the compute:

- spec: tenant specific config details
- compute_ctl_config: compute specific settings

For instance, the JSON Web Key set passed to the compute is independent
of any tenant. It is a setting of the compute whether it is attached or
not.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-13 18:04:36 +00:00
Tristan Partin
b6f972ed83 Increase the extension server request timeout to 1 minute (#10800)
pg_search is 46ish MB. All other remote extensions are around hundeds of
KB. 3 seconds is not long enough to download the tarball if the S3
gateway cache doesn't already contain a copy. According to our setup,
the cache is limited to 10 GB in size and anything that has not been
accessed for an hour is purged.

This is really bad for scaling to 0, even more so if you're the only
project actively using the extension in a production Kubernetes cluster.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-13 17:33:27 +00:00
John Spray
a4d0a34591 tests: flush in test_isolation (#10658)
## Problem

This test occasionally fails while the test teardown tries to do a
graceful shutdown, because the test has quickly written lots of data
into the pageserver.

Closes: #10654 

## Summary of changes

- Call `post_checks` at the end of `test_isolation`, as we already do
for test_pg_regress -- this improves our detection of issues, and as a
nice side effect flushes the pageserver.
- Ignore pg_notify files when validating state at end of test, these are
not expected to be the same
2025-02-13 16:23:51 +00:00
John Spray
ae463f366b tests: broaden allow-list for #10720 workaround (#10807)
## Problem

In #10752 I used an overly-strict regex that only ignored error on a
particular key.

## Summary of changes

- Drop key from regex so it matches all such errors
2025-02-13 16:15:04 +00:00
Alexey Kondratov
8c2f85b209 chore(compute): Postgres 17.3, 16.7, 15.11 and 14.16 (#10771)
## Summary of changes

Bump all minor versions. The only non-trivial conflict was between
-
0350b876b0
- and
bd09a752f4

It seems that just adding this extra argument is enough.

I also got conflict with

c1c9df3159
but for some reason only in PG 15. Yet, that was a trivial one around
```c
		if (XLogCtl)
			LWLockRelease(ControlFileLock);
		/* durable_rename already emitted log message */
		return false;
```
in `xlog.c`

## Postgres PRs

- https://github.com/neondatabase/postgres/pull/580
- https://github.com/neondatabase/postgres/pull/579
- https://github.com/neondatabase/postgres/pull/577
- https://github.com/neondatabase/postgres/pull/578
2025-02-13 13:28:05 +00:00
JC Grünhage
e37ba8642d Integrate cargo-chef into Dockerfile (#10782)
## Problem
The build of the neon container image is not caching any part of the
rust build, making it fairly slow.

## Summary of changes
Cache dependency building using cargo-chef.
2025-02-13 13:08:46 +00:00
Vlad Lazar
8fea43a5ba pageserver: make heatmap generation additive (#10597)
## Problem

Previously, when cutting over to cold secondary locations,
we would clobber the previous, good, heatmap with a cold one.
This is because heatmap generation used to include only resident layers.

Once this merges, we can add an endpoint which triggers full heatmap
hydration on attached locations to heal cold migrations.

## Summary of changes

With this patch, heatmap generation becomes additive. If we have a
heatmap from when this location was secondary, the new uploaded heatmap
will be the result of a reconciliation between the old one and the on
disk resident layers.

More concretely, when we have the previous heatmap:
1. Filter the previous heatmap and keep layers that are (a) present
in the current layer map, (b) visible, (c) not resident. Call this set
of layers `visible_non_resident`.
2. From the layer map, select all layers that are resident and visible.
Call this set of layers `resident`.
3. The new heatmap is the result of merging the two disjoint sets.

Related https://github.com/neondatabase/neon/issues/10541
2025-02-13 12:48:47 +00:00
Arpad Müller
536bdb3209 storcon: track safekeepers in memory, send heartbeats to them (#10583)
In #9011, we want to schedule timelines to safekeepers. In order to do
such scheduling, we need information about how utilized a safekeeper is
and if it's available or not.

Therefore, send constant heartbeats to the safekeepers and try to figure
out if they are online or not.

Includes some code from #10440.
2025-02-13 11:06:30 +00:00
John Spray
b8095f84a0 pageserver: make true GC cutoff visible in admin API, rebrand latest_gc_cutoff as applied_gc_cutoff (#10707)
## Problem

We expose `latest_gc_cutoff` in our API, and callers understandably were
using that to validate LSNs for branch creation. However, this is _not_
the true GC cutoff from a user's point of view: it's just the point at
which we last actually did GC. The actual cutoff used when validating
branch creations and page_service reads is the min() of latest_gc_cutoff
and the planned GC lsn in GcInfo.

Closes: https://github.com/neondatabase/neon/issues/10639

## Summary of changes

- Expose the more useful min() of GC cutoffs as `gc_cutoff_lsn` in the
API, so that the most obviously named field is really the one people
should use.
- Retain the ability to read the LSN at which GC was actually done, in
an `applied_gc_cutoff_lsn` field.
- Internally rename `latest_gc_cutoff_lsn` to `applied_gc_cutoff_lsn`
("latest" was a confusing name, as the value in GcInfo is more up to
date in terms of what a user experiences)
- Temporarily preserve the old `latest_gc_cutoff_lsn` field for compat
with control plane until we update it to use the new field.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-02-13 10:33:47 +00:00
Ivan Efremov
356cca23a5 fix(proxy): Change HSet to HDel for cancellation key metric (#10789) 2025-02-13 10:22:13 +00:00
JC Grünhage
7b966a2b71 CI(trigger-e2e-tests): fix checking for successful image pushes (#10803)
## Problem
https://github.com/neondatabase/neon/pull/10613 changed how images are
pushed, and therefore also how we have to wait for images to be pushed
in `trigger-e2e-tests`. The `trigger-e2e-tests` workflow is triggered in
three different ways:
- When a pull request is pushed to that is already ready to review, here
we call the workflow from `build_and_test`
- When a pull request is marked ready for review, then the workflow is
triggered directly
- When a push to `main` or `release(-.*)?` triggers `build_and_test` and
that indirectly calls `trigger-e2e-tests`.

The second of these paths had a bug, which was not tested in the PR,
because this path being different wasn't clear to me.

## Summary of changes
Fix the jq statement that caused the bug.
2025-02-13 10:13:26 +00:00
Conrad Ludgate
23352dc2e9 Merge pull request #10802 from neondatabase/rc/release-proxy/2025-02-13
Proxy release 2025-02-13
2025-02-13 08:41:01 +00:00
github-actions[bot]
c65fc5a955 Proxy release 2025-02-13 2025-02-13 06:02:01 +00:00
JC Grünhage
e38694742c fix(ci): don't try pushing to prod container registries from main (#10795)
## Problem
https://github.com/neondatabase/neon/pull/10613 changed how images are
pushed, and there was a small mismatch between the github workflow and
the script generating what to push where. This resulted in the workflow
trying to push images to prod registries from the main branch, even
though we don't do that and therefore didn't generate a mapping for
those registries in the script that decides what to push where.

This misconception happened because promote-images-dev pushed to dev
registries, and promote-images-prod pushed to prod registries, but
promote-images-prod also updated the latest tag in the dev registries if
and only if we are on the main branch. This last bit is why the
push-<component>-image-prod jobs were trying to run on the main branch.

## Summary of changes
Don't try pushing to prod registries from the main branch.
2025-02-12 20:26:05 +00:00
Arpad Müller
922f3ee17d Compress git history of Azure SDK (#10790)
Switch the Azure SDK git fork to one with a compressed git history. This
helps with download speed of the git repository.

closes #10732
2025-02-12 19:48:11 +00:00
Arpad Müller
61d2474632 Also check by the planned gc cutoff for lease creation (#10764)
We don't want to allow new leases below the planned gc cutoff either.
Other APIs like branch creation or getpage requests already enforce
this.
2025-02-12 19:29:17 +00:00
JC Grünhage
b77dd66bc4 refactor(ci): overhaul container image pushing (#10613)
## Problem
Retagging container images and pushing container images taken from one
registry to another is very tangled up with artifact building and not
separated by component. This makes not building compute for storage
releases and vice versa pretty tricky. To enable that, I want to clean
up retagging and pushing of container images and then continue on making
the pipelines for releases leaner by not building unnecessary things.

## Summary of changes
- Add a reusable workflow that can push to ACR, ECR and Docker Hub,
while being very flexible in terms of source and target images. This
allows for retagging and pushing images between container registries.
- Stop pushing images to registries aside of docker hub in the jobs that
build the images
- Split image pushing into 4 different jobs (not mentioning special
cases):
  - neon-dev
  - neon-prod
  - compute-dev
  - compute-prod

## TODO
- Consider also using this for `pin-build-tools-image`, as it's
basically another instance of the same thing.

## Known limitations
- The ECR part of this workflow supports authenticating to multiple AWS
accounts and therefore multiple ECR endpoints, but the ACR part only
supports one Azure Account. If someone with more knowledge on Azure can
tell me whether an equivalent to
https://github.com/aws-actions/amazon-ecr-login?tab=readme-ov-file#login-to-ecr-on-multiple-aws-accounts
is easily possible, that'd be great.
- The `image_map` input is a bit complex. It expects something along the
lines of
  ```
  {
    "docker.io/neondatabase/compute-node-v14:13196061314": [
      "docker.io/neondatabase/compute-node-v14:13196061314",

"369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-v14:13196061314",
      "neoneastus2.azurecr.io/neondatabase/compute-node-v14:13196061314"
    ],
    "docker.io/neondatabase/compute-node-v15:13196061314": [
      "docker.io/neondatabase/compute-node-v15:13196061314",

"369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-v15:13196061314",
      "neoneastus2.azurecr.io/neondatabase/compute-node-v15:13196061314"
    ]
  }
  ```
to map from source to target image. We have a small python step to
generate this map for the 4 main image pushing jobs. The concrete
example is taken from
https://github.com/neondatabase/neon/actions/runs/13196061314/job/36838584098?pr=10613#step:3:6
and shortened to two images.
2025-02-12 17:54:51 +00:00
Alexey Kondratov
49775d28e4 fix(compute): Respect skip_pg_catalog_updates in reconfigure() (#10696)
## Problem

We respect `skip_pg_catalog_updates` at the initial start, but ignore at
the follow-up `/configure`. Yet, it's used for storage->cplane->compute
notify requests after migrations, shard split, etc. So every time we get
them, applying the new config takes much longer than it should because
we go through Postgres catalog checks. Cplane sets this flag, when it
does serves notify attach call
9068c7d743

Related to `inc-403`, for example

## Summary of changes

Look at `skip_pg_catalog_updates` in `compute.reconfigure()`
2025-02-12 17:54:21 +00:00
Alexander Bayandin
f45f9209b9 CI(trigger-e2e-tests): check permissions before running jobs (#10785)
## Problem

PRs created by external contributors, in some cases might list failed
jobs
- `Trigger E2E Tests / cancel-previous-e2e-tests`
- `Trigger E2E Tests / tag`

They don't block the merge, and tests in fact pass (their counterparts
in internal PR), but because jobs are triggered from an external PR (and
not from the corresponding internal one) they still present as red
marks.

For example https://github.com/neondatabase/neon/pull/10778

## Summary of changes
- Check permissions before triggering e2e tests
2025-02-12 17:00:23 +00:00
Cheng Chen
20fe4b8ec3 chore(compute): pg_mooncake v0.1.2 (#10778)
## Problem
Upgrade pg_mooncake to v0.1.2

## Summary of changes

https://github.com/Mooncake-Labs/pg_mooncake/blob/main/CHANGELOG.md#012-2025-02-11
2025-02-12 16:29:19 +00:00
Erik Grinaker
f62047ae97 pageserver: add separate semaphore for L0 compaction (#10780)
## Problem

L0 compaction frequently gets starved out by other background tasks and
image/GC compaction. L0 compaction must be responsive to keep read
amplification under control.

Touches #10694.
Resolves #10689.

## Summary of changes

Use a separate semaphore for the L0-only compaction pass.

* Add a `CONCURRENT_L0_COMPACTION_TASKS` semaphore and
`BackgroundLoopKind::L0Compaction`.
* Add a setting `compaction_l0_semaphore` (default off via
`compaction_l0_first`).
* Use the L0 semaphore when doing an `OnlyL0Compaction` pass.
* Use the background semaphore when doing a regular compaction pass
(which includes an initial L0 pass).
* While waiting for the background semaphore, yield for L0 compaction if
triggered.
* Add `CompactFlags::NoYield` to disable L0 yielding, and set it for the
HTTP API route.
* Remove the old `use_compaction_semaphore` setting and
compaction-scoped semaphore.
* Remove the warning when waiting for a semaphore; it's noisy and we
have metrics.
2025-02-12 16:12:21 +00:00
Fedor Dikarev
ec354884ea Feat/pin docker images to sha (#10730)
## Problem

With current approach for the base images in `Dockerfiles`, it's hard to
track when image is updated, and as they are base, than update will
invalidate all the layers, as base image changed.
That also becomes more complicated, as we have a number of runners, and
they may have different images with the tag `bookworm-slim`, so that
will lead to invalidate caches, when image build on one runner will be
used on another runners.

To fix that problem, we could pin our base images to the specific sha,
and that not only align images across runners, and also will allow us to
have reproducible build and don't depend on any spontaneous changes in
upstream.

Fix: https://github.com/neondatabase/cloud/issues/24084

## Summary of changes
Beside of the main goal, that PR also included some small changes around
Dockerfiles:

1. Main change: use `SHA` for `bookworm-slim` and `bullseye-slim` debian
images
2. For the layers requiring `curl` we could add `curl` and `unzip` to
the `build-deps` image, and use it as a base image for all the steps,
removing extra dependency on `alpine/curl`
3. added `retry-on-host-error=on` for the `wgetrc` as it happened to me:
fail to resolve hostname
2025-02-12 14:03:10 +00:00
John Spray
9989d8bfae tests: make Workload more determinstic (#10741)
## Problem

Previously, Workload was reconfiguring the compute before each run of
writes, which was meant to be a no-op when nothing changed, but was
actually writing extra data due to an issue being fixed in
https://github.com/neondatabase/neon/pull/10696.

The row counts in tests were too low in some cases, these tests were
only working because of those extra writes that shouldn't have been
happening, and moreover were relying on checkpoints happening.

## Summary of changes

- Only reconfigure compute if the attached pageserver actually changed.
If pageserver is set to None, that means controller is managing
everything, so never reconfigure compute.
- Update tests that wrote too few rows.

---------

Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2025-02-12 12:35:29 +00:00
Heikki Linnakangas
9537829ccd fast_import: Make CPU & memory size configurable (#10709)
The old values assumed that you have at least about 18 GB of RAM
available (shared_buffers=10GB and maintenance_work_mem=8GB). That's a
lot when testing locally. Make it configurable, and make the default
assumption much smaller: 256 MB.

This is nice for local testing, but it's also in preparation for
starting to use VMs to run these jobs. When launched in a VM, the
control plane can set these env variables according to the max size of
the VM.

Also change the formula for how RAM is distributed: use 10% of RAM for
shared_buffers, and 70% for maintenance_work_mem. That leaves a good
amount for misc. other stuff and the OS. A very large shared_buffers
setting won't typically help with bulk loading. It won't help with the
network and I/O of processing all the tables, unless maybe if the whole
database fits in shared buffers, but even then it's not much faster than
using local disk. Bulk loading is all sequential I/O. It also won't help
much with index creation, which is also sequential I/O. A large
maintenance_work_mem can be quite useful, however, so that's where we
put most of the RAM.
2025-02-12 11:43:23 +00:00
Mikhail Kot
2c4c6e6330 fix(neon): Add tests clarifying postgres sigabrt on pageserver unavailability (#10666)
Resolves: https://github.com/neondatabase/neon/issues/5734

When we query pageserver and it's unavailable after some retries,
postgres sigabrt's. This is intended behavior so I've added tests
checking it
2025-02-12 10:52:26 +00:00
Erik Grinaker
71c30e52fa pageserver: properly yield for L0 compaction (#10769)
## Problem

When image compaction yields for L0 compaction, it may not immediately
schedule L0 compaction, because it just goes on to compact the next
pending timeline.

Touches #10694.
Requires #10744.

## Summary of changes

Extend `CompactionOutcome` with `YieldForL0` and `Skipped` variants, and
immediately schedule an L0 compaction pass in the `YieldForL0` case.
2025-02-11 23:43:58 +00:00
Erik Grinaker
6c83ac3fd2 pageserver: do all L0 compaction before image compaction (#10744)
## Problem

Image compaction can starve out L0 compaction if a tenant has several
timelines with L0 debt.

Touches #10694.
Requires #10740.

## Summary of changes

* Add an initial L0 compaction pass, in order of L0 count.
* Add a tenant option `compaction_l0_first` to control the L0 pass
(disabled by default).
* Add `CompactFlags::OnlyL0Compaction` to run an L0-only compaction
pass.
* Clean up the compaction iteration logic.

A later PR will use separate semaphores for the L0 and image compaction
passes to avoid cross-tenant L0 starvation. That PR will also make image
compaction yield if _any_ of the tenant's timelines have pending L0
compaction to further avoid starvation.
2025-02-11 22:08:46 +00:00
Heikki Linnakangas
635b67508b Split utils::http to separate crate (#10753)
Avoids compiling the crate and its dependencies into binaries that don't
need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the
release-line-debug-size-lto profile.
2025-02-11 22:06:53 +00:00
dependabot[bot]
9491154eae build(deps): bump cryptography from 43.0.1 to 44.0.1 in the pip group (#10773)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-02-11 21:23:17 +00:00
Heikki Linnakangas
b5e09fdaf3 Re-order Dockerfile steps for putting together final compute image (#10736)
Run "apt install" first, and only then COPY the files from the
intermediary build layers to the final image. This way, if you modify
any of the sources that trigger e.g. rebuilding compute_ctl, the "apt
install" step can still be cached.
2025-02-11 20:10:06 +00:00
John Spray
cd51ed2f86 tests: parametrize test_graceful_cluster_restart on AZ count (#10427)
## Problem

In https://github.com/neondatabase/neon/pull/10411 fill logic changes
such that it benefits us to test it with & without AZs set up. I didn't
extend the test inline in that PR because there were overlapping test
changes in flight to add `num_az` parameter.

## Summary of changes

- Parameterise test on AZ count (1 or 2)
- When AZ count is 2, use a different balance check that just asserts
the _tenants_ are balanced (since AZ affinity is chosen on a per-tenant
basis)
2025-02-11 20:09:41 +00:00
Folke Behrens
f62bc28086 proxy: Move binaries into the lib (#10758)
* This way all clippy lints defined in the lib also cover the binary
code.
* It's much easier to detect unused code.
* Fix all discovered lints.
2025-02-11 19:46:23 +00:00
Tristan Partin
da9c101939 Implement a second HTTP server within compute_ctl (#10574)
The compute_ctl HTTP server has the following purposes:

- Allow management via the control plane
- Provide an endpoint for scaping metrics
- Provide APIs for compute internal clients
  - Neon Postgres extension for installing remote extensions
  - local_proxy for installing extensions and adding grants

The first two purposes require the HTTP server to be available outside
the compute.

The Neon threat model is a bad actor within our internal network. We
need to reduce the surface area of attack. By exposing unnecessary
unauthenticated HTTP endpoints to the internal network, we increase the
surface area of attack. For endpoints described in the third bullet
point, we can just run an extra HTTP server, which is only bound to the
loopback interface since all consumers of those endpoints are within the
compute.
2025-02-11 18:02:22 +00:00
Arpad Müller
f7b2293317 Hardlink resident layers during detach ancestor (#10729)
After a detach ancestor operation, we don't want to on-demand download
layers that are already resident. This has shown to impede performance,
sometimes quite a lot (50 seconds:
https://github.com/neondatabase/neon/issues/8828#issuecomment-2643735644)

Fixes #8828.
2025-02-11 16:58:34 +00:00
Arpad Müller
be447ba4f8 Change timeline_offloading setting default to true (#10760)
This changes the default value of the `timeline_offloading` pageserver
and tenant configs to true, now that offloading has been rolled out
without problems.

There is also a small fix in the tenant config merge function, where we
applied the `lazy_slru_download` value instead of `timeline_offloading`.

Related issue: https://github.com/neondatabase/cloud/issues/21353
2025-02-11 16:36:54 +00:00
Christian Schwarz
9247331c67 fix(page_service / batching): smgr op latency metric of dropped responses include flush time (#10756)
# Problem

Say we have a batch of 10 responses to send out.

Then, even with

- #10728

we've still only called observe_execution_end_flush_start for the first
3 responses.

The remaining 7 response timers are still ticking.

When compute now closes the connection, the waiting flush fails with an
error and we `drop()` the remaining 7 responses' smgr op timers. The
`impl Drop for SmgrOpTimer` will observe an execution time that includes
the flush time.

In practice, this is supsected to produce the `+Inf` observations in the
smgr op latency histogram we've seen since the introduction of
pipelining, even after shipping #10728.

refs:
- fixup of https://github.com/neondatabase/neon/pull/10042
- fixup of https://github.com/neondatabase/neon/pull/10728
- fixes https://github.com/neondatabase/neon/issues/10754
2025-02-11 14:05:59 +00:00
John Spray
fcedd10226 tests: temporarily permit a log error (#10752)
## Problem

These tests can encounter a bug in the pageserver read path (#9185)
which occurs under the very specific circumstances that the tests
create, but is very unlikely to happen in the field.

We will fix the bug, but in the meantime let's un-flake the tests.

Related: https://github.com/neondatabase/neon/issues/10720

## Summary of changes

- Permit "could not find data for key" errors in tests affected by #9185
2025-02-11 12:37:09 +00:00
Arpad Müller
a4ea1e53ae Apply Azure SDK patch to periodically load workload identity file (#10415)
The SDK bug https://github.com/Azure/azure-sdk-for-rust/issues/1739 was
originally worked around via #10378, but now upstream has provided a fix
in [this](https://github.com/Azure/azure-sdk-for-rust/pull/1997) PR,
which we've been asked to test.

So this is what this PR is doing: revert #10378 (to make sure we fail if
the bug isn't fixed by the SDK PR), and apply the SDK PR to our fork.

Currently pointing to my local branch to check CI. I'd like to merge the
[SDK fork PR](https://github.com/neondatabase/azure-sdk-for-rust/pull/2)
before merging this to main.
2025-02-11 09:40:22 +00:00
Heikki Linnakangas
c26131c2b3 Link pgbouncer dynamically (#10749)
I don't see the point of static linking, postgres itself and many of the
extensions are already built dynamically.

One reason for the change is that I'm working on bigger changes to start
using systemd in the compute, and as part of that I wanted to add the
--with-systemd configure option to pgbouncer, and there doesn't seem to
be a static version of libsystemd (at least not on Debian).
2025-02-11 07:48:54 +00:00
Andrew Rudenko
4ab18444ec compute_ctl: database_schema should keep process::Child as part of returned value (#10273)
## Problem

/database_schema endpoint returns incomplete output from `pg_dump`

## Summary of changes

The Tokio process was not used properly. The returned stream does not
include `process::Child`, and the process is scheduled to be killed
immediately after the `get_database_schema` call when `cmd` goes out of
scope.

The solution in this PR is to return a special Stream implementation
that retains `process::Child`.
2025-02-11 07:02:13 +00:00
Heikki Linnakangas
98883e4b30 compute_ctl: Use a single tokio runtime (#10743)
compute_ctl is mostly written in synchronous fashion, intended to run in
a single thread. However various parts had become async, and they
launched their own tokio runtimes to run the async code. For example, VM
monitor ran in its own multi-threaded runtime, and apply_spec_sql()
launched another multi-threaded runtime to run the per-database SQL
commands in parallel. In addition to that, a few places used a
current-thread runtime to run async code in the main thread, or launched
a current-thread runtime in a *different* thread to run background
tasks.

Unify the runtimes so that there is only one tokio runtime. It's created
very early at process startup, and the main thread "enters" the runtime,
so that it's always available for tokio::spawn() and runtime.block_on()
calls. All code that needs to run async code uses the same runtime.

The main thread still mostly runs in a synchronous fashion. When it
needs to run async code, it uses rt.block_on().

Spawn fewer additional threads, prefer to spawn tokio tasks instead.
Convert some code that ran synchronously in background threads into
async. I didn't go all the way, though, some background threads are
still spawned.
2025-02-11 00:39:44 +00:00
Tristan Partin
3d143ad799 Unbrick the forward compatibility test failures (#10747)
Since the merge of https://github.com/neondatabase/neon/pull/10523,
forward compatibility tests have been broken everywhere.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-10 22:22:10 +00:00
Alex Chi Z.
b0c7ee0175 feat(pageserver): better gc_compaction_split heuristics (#10727)
## Problem

close https://github.com/neondatabase/neon/issues/10213

`range_search` only returns the top-most layers that may satisfy the
search, so it doesn't include all layers that might be accessed (the
user needs to recursively call this function). We need to retrieve the
full layer map and find overlaps in order to have a correct heuristics
of the job split.

## Summary of changes

Retrieve all layers and find overlaps instead of doing `range_search`.
The patch also reduces the time holding the layer map read guard.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-10 19:33:34 +00:00
Erik Grinaker
8c4e94107d pageserver: notify compaction loop at threshold (#10740)
## Problem

The compaction loop currently runs periodically, which can cause it to
wait for up to 20 seconds before starting L0 compaction by default.

Also, when we later separate the semaphores for L0 compaction and image
compaction, we want to give up waiting for the image compaction
semaphore if L0 compaction is needed on any timeline.

Touches #10694.

## Summary of changes

Notify the compaction loop when an L0 flush (on any timeline) exceeds
`compaction_threshold`.

Also do some opportunistic cleanups in the area.
2025-02-10 17:48:09 +00:00
Heikki Linnakangas
c368b0fe14 Use a cache mount to speed up rebuilding compute node image (#10737)
Building the compute rust binaries from scratch is pretty slow, it takes
between 4-15 minutes on my laptop, depending on which compiler flags and
other tricks I use. A cache mount allows caching the dependencies and
incremental builds, which speeds up rebuilding significantly when you
only makes a small change in a source file.
2025-02-10 16:58:29 +00:00
Heikki Linnakangas
aba61a3712 Download awscli in separate layer in Dockerfile, to allow caching (#10733)
The awscli was downloaded at the last stages of the overall compute
image build, which meant that if you modified any part of the build, it
would trigger a re-download of the awscli. That's a bit annoying when
developing locally and rebuilding the compute image repeatedly. Move it
to a separate layer, to cache separately and to avoid the spurious
rebuilds.
2025-02-10 16:48:28 +00:00
Tristan Partin
946da3f7e2 Require --compute-id when running compute_ctl (#10523)
The compute_id will be used when verifying claims sent by the control
plane.

Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-10 16:46:20 +00:00
Ivan Efremov
73633e27ed fix(proxy): Log errors from the local proxy in auth-broker (#10659)
Handle errors from local proxy by parsing HTTP response in auth broker
code

Closes [#19476](https://github.com/neondatabase/cloud/issues/19476)
2025-02-10 16:06:13 +00:00
Konstantin Knizhnik
0cf0119751 Add --save_records option to pg_waldump (#10626)
## Problem

Make it possible to dump WAL records in format recognised by walredo
process.
Intended usage:

```
pg_waldump -R 1663/5/16396  -B 771727 000000010000000100000034 --save-records=/tmp/walredo.records
postgres --wal-redo < /tmp/walredo.records > /tmp/page.img
```

## Summary of changes

Related Postgres PRs:
https://github.com/neondatabase/postgres/pull/575
https://github.com/neondatabase/postgres/pull/572

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-10 15:48:03 +00:00
Alex Chi Z.
b37f52fdf1 feat(pageserver): dump read path on missing key error (#10528)
## Problem

helps investigate https://github.com/neondatabase/neon/issues/10482

## Summary of changes

In debug mode and testing mode, we will record all files visited by a
read operation, and print it out when it errors.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-10 14:25:56 +00:00
Alex Chi Z.
443c8d0b4b feat(pageserver): repartition on L0-L1 boundary (#10548)
## Problem

Reduce the read amplification when doing `repartition`.

## Summary of changes

Compute the L0-L1 boundary LSN and do repartition here.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-10 14:25:48 +00:00
Alexander Bayandin
2f36bdb218 CI(build-neon): fix duplicated builds (#10731)
## Problem

Parameterising `build-neon` job with `test-cfg` makes it to build
exactly the same thing several times.

See
-
874accd6ed/.github/workflows/_build-and-test-locally.yml (L51-L52)
-
https://github.com/neondatabase/neon/actions/runs/13215068271/job/36893373038

## Summary of changes
- Extract `sanitizers` to a separate input from `test-cfg` and set it
separately
- Don't parametrise `build-neon` with `test-cfg`
2025-02-10 12:29:39 +00:00
Ivan Efremov
e7118213ab impr(proxy): Set TTL for Redis cancellation map keys (#10671)
Use expire() op to set TTL for Redis cancellation key
2025-02-10 10:51:53 +00:00
a-masterov
d204d51faf Fix the upgrade test for pg_jwt by adding the database name (#10738)
## Problem
The upgrade test for pg_jwt does not work correctly.
## Summary of changes
The script for the upgrade test is modified to use the database
`contrib_regression`.
2025-02-10 09:56:46 +00:00
Erik Grinaker
ac55e2dbe5 pageserver: improve tenant housekeeping task (#10725)
# Problem

walredo shutdown is done in the compaction task. Let's move it to tenant
housekeeping.

# Summary of changes

* Rename "ingest housekeeping" to "tenant housekeeping".
* Move walredo shutdown into tenant housekeeping.
* Add a constant `WALREDO_IDLE_TIMEOUT` set to 3 minutes (previously 10x
compaction threshold).
2025-02-08 12:42:55 +00:00
Erik Grinaker
874accd6ed pageserver: misc task cleanups (#10723)
This patch does a bunch of superficial cleanups of `tenant::tasks` to
avoid noise in subsequent PRs. There are no functional changes.

PS: enable "hide whitespace" when reviewing, due to the unindentation of
large async blocks.
2025-02-08 11:02:13 +00:00
Christian Schwarz
6cd3b501ec fix(page_service / batching): smgr op latency metrics includes the flush time of preceding requests (#10728)
Before this PR, if a batch contains N responses, the smgr op latency
reported for response (N-i) would include the time we spent flushing
the preceding requests.

refs:
- fixup of https://github.com/neondatabase/neon/pull/10042
- fixes https://github.com/neondatabase/neon/issues/10674
2025-02-08 09:28:09 +00:00
Christian Schwarz
bf20d78292 fix(page_service): page reconstruct error log does not include shard_id label (#10680)
# Problem

Before this PR, the `shard_id` field was missing when page_service logs
a reconstruct error.

This was caused by batching-related refactorings.

Example from staging:

```
2025-01-30T07:10:04.346022Z ERROR page_service_conn_main{peer_addr=...}:process_query{tenant_id=... timeline_id=...}:handle_pagerequests:request:handle_get_page_at_lsn_request_batched{req_lsn=FFFFFFFF/FFFFFFFF}: error reading relation or page version: Read error: whole vectored get request failed because one or more of the requested keys were missing: could not find data for key  ...
```

# Changes

Delay creation of the handler-specific span until after shard routing

This also avoids the need for the record() call in the pagestream hot
path.

# Testing

Manual testing with a failpoint that is part of this PR's history but
will be squashed away.


# Refs

- fixes https://github.com/neondatabase/neon/issues/10599
2025-02-07 19:45:39 +00:00
Arpad Müller
2656c713a4 Revert recent AWS SDK update (#10724)
We've been seeing some regressions in staging since the AWS SDK updates:
https://github.com/neondatabase/neon/issues/10695 . We aren't sure the
regression was caused by the SDK update, but the issues do involve S3,
so it's not unlikely. By reverting the SDK update we find out whether it
was really the SDK update, or something else.

Reverts the two PRs:

* https://github.com/neondatabase/neon/pull/10588
* https://github.com/neondatabase/neon/pull/10699

https://neondb.slack.com/archives/C08C2G15M6U/p1738576986047179
2025-02-07 17:37:53 +00:00
John Spray
5e95860e70 tests: wait for manifest persistence in test_timeline_archival_chaos (#10719)
## Problem

This test would sometimes fail its assertion that a timeline does not
revert to active once archived. That's because it was using the
in-memory offload state, not the persistent state, so this was sometimes
lost across a pageserver restart.

Closes: https://github.com/neondatabase/neon/issues/10389

## Summary of changes

- When reading offload status, read from pageserver API _and_ remote
storage before considering the timeline offloaded
2025-02-07 16:27:39 +00:00
Heikki Linnakangas
0abff59e97 compute: Allow postgres user to power off the VM (#10710)
I plan to use this when launching a fast_import job in a VM. There's
currently no good way for an executable running in a NeonVM to exit
gracefully and have the VM shut down. The inittab we use always respawns
the payload command. The idea is that the control plane can use
"fast_import ... && poweroff" as the command, so that when fast_import
completes successfully, the VM is terminated, and the k8s Pod and
VirtualMachine object are marked as completed successfully.

I'm working on bigger changes to how we launch VMs, and will try to come
up with a nicer system for that, but in the meanwhile, this quick hack
allows us to proceed with using VMs for one-off jobs like fast_import.
2025-02-07 16:03:01 +00:00
John Spray
9609f7547e tests: address warnings in timeline shutdown (#10702)
## Problem

There are a couple of log warnings tripping up
`test_timeline_archival_chaos`

- `[stopping left-over name="timeline_delete"
tenant_shard_id=2d526292b67dac0e6425266d7079c253
timeline_id=Some(44ba36bfdee5023672c93778985facd9)
kind=TimelineDeletionWorker\n')](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10672/13161357302/index.html#/testresult/716b997bb1d8a021)`
- `ignoring attempt to restart exited flush_loop
503d8f401d8887cfaae873040a6cc193/d5eed0673ba37d8992f7ec411363a7e3\n')`

Related: https://github.com/neondatabase/neon/issues/10389

## Summary of changes

- Downgrade the 'ignoring attempt to restart' to info -- there's nothing
in the design that forbids this happening, i.e. someone calling
maybe_spawn_flush_loop concurrently with shutdown()
- Prevent timeline deletion tasks outliving tenants by carrying a
gateguard. This logically makes sense because the deletion process does
call into Tenant to update manifests.
2025-02-07 15:29:34 +00:00
Erik Grinaker
d6e87a3a9c pageserver: add separate, disabled compaction semaphore (#10716)
## Problem

L0 compaction can get starved by other background tasks. It needs to be
responsive to avoid read amp blowing up during heavy write workloads.

Touches #10694.

## Summary of changes

Add a separate semaphore for compaction, configurable via
`use_compaction_semaphore` (disabled by default). This is primarily for
testing in staging; it needs further work (in particular to split
image/L0 compaction jobs) before it can be enabled.
2025-02-07 15:11:31 +00:00
Arpad Müller
f5243992fa safekeeper: make timeline deletions a bit more verbose (#10721)
Make timeline deletion print the sub-steps, so that we can narrow down
some stuck timeline deletion issues we are observing.

https://neondb.slack.com/archives/C08C2G15M6U/p1738930694716009
2025-02-07 15:06:26 +00:00
John Spray
95220ba43e tests: fix flaky endpoint in test_ingest_logical_message (#10700)
## Problem

Endpoint kept running while timeline was deleted, causing forbidden
warnings on the pageserver when the tenant is not found.

## Summary of changes

- Explicitly stop the endpoint before the end of the test, so that it
isn't trying to talk to the pageserver in the background while things
are torn down
2025-02-07 14:51:36 +00:00
John Spray
08f92bb916 pageserver: clean up DeletionQueue push_layers_sync (#10701)
## Problem

This is tech debt. While we introduced generations for tenants, some
legacy situations without generations needed to delete things inline
(async operation) instead of enqueing them (sync operation).

## Summary of changes

- Remove the async code, replace calls with the sync variant, and assert
that the generation is always set
2025-02-07 13:03:01 +00:00
Fedor Dikarev
8f651f9582 switch from localtest.me to local.neon.build (#10714)
## Problem
Ref: https://github.com/neondatabase/neon/issues/10632

We use dns named `*.localtest.me` in our test, and that domain is
well-known and widely used for that, with all the records there resolve
to the localhost, both IPv4 and IPv6: `127.0.0.1` and `::1`

In some cases on our runners these addresses resolves only to `IPv6`,
and so components fail to connect when runner doesn't have `IPv6`
address. We suspect issue in systemd-resolved here
(https://github.com/systemd/systemd/issues/17745)
To workaround that and improve test stability, we introduced our own
domain `*.local.neon.build` with IPv4 address `127.0.0.1` only

See full details and troubleshoot log in referred issue.

p.s.
If you're FritzBox user, don't forget to add that domain
`local.neon.build` to the `DNS Rebind Protection` section under `Home
Network -> Network -> Network Settings`, otherwise FritzBox will block
addresses, resolving to the local addresses.
For other devices/vendors, please check corresponding documentation, if
resolving `local.neon.build` will produce empty answer for you.

## Summary of changes
Replace all the occurrences of `localtest.me` with `local.neon.build`
2025-02-07 12:25:16 +00:00
Arseny Sher
b5a239c4ae Add reconciliation details to sk membership change rfc (#10514)
## Problem

RFC pointed out the need of reconciliation, but wasn't detailed how it
can be done.

## Summary of changes

Add these details.
2025-02-07 11:20:49 +00:00
Alexander Lakhin
de05258419 Adjust diesel schema check for build with sanitizers (#10711)
We need to disable the detection of memory leaks when running
``neon_local init` for build with sanitizers to avoid an error thrown by
AddressSanitizer.
2025-02-07 08:56:39 +00:00
Peter Bendel
e73d681a0e Patch pgcopydb and fix another segfault (#10706)
## Problem

Found another pgcopydb segfault in error handling

```bash
2025-02-06 15:30:40.112 51299 ERROR  pgsql.c:2330              [TARGET -738302813] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51298 ERROR  pgsql.c:2330              [TARGET -1407749748] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51297 ERROR  pgsql.c:2330              [TARGET -2073308066] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.112 51300 ERROR  pgsql.c:2330              [TARGET 1220908650] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.432 51300 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.513 51290 ERROR  copydb.c:773              Sub-process 51300 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:40.578 51299 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:40.613 51290 ERROR  copydb.c:773              Sub-process 51299 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:41.253 51298 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:41.314 51290 ERROR  copydb.c:773              Sub-process 51298 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:43.133 51297 ERROR  pgsql.c:2536              [Postgres] FATAL:  terminating connection due to administrator command
2025-02-06 15:30:43.215 51290 ERROR  copydb.c:773              Sub-process 51297 exited with code 0 and signal Segmentation fault
2025-02-06 15:30:43.215 51290 ERROR  indexes.c:123             Some INDEX worker process(es) have exited with error, see above for details
2025-02-06 15:30:43.215 51290 ERROR  indexes.c:59              Failed to create indexes, see above for details
2025-02-06 15:30:43.232 51271 ERROR  copydb.c:768              Sub-process 51290 exited with code 12
```

```bashadmin@ip-172-31-38-164:~/pgcopydb$ gdb /usr/local/pgsql/bin/pgcopydb core
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/pgsql/bin/pgcopydb...
[New LWP 51297]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `pgcopydb: create index ocr.ocr_pipeline_step_results_version_pkey             '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630
630				*newLinePtr = '\0';
(gdb) bt
#0  0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630
#1  0x0000aaaac3a3a678 in pgsql_execute_log_error (pgsql=pgsql@entry=0xffffd8b87040, result=result@entry=0x0, 
    sql=sql@entry=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", 
    debugParameters=debugParameters@entry=0xaaaaec5f92f0, context=context@entry=0x0) at pgsql.c:2322
#2  0x0000aaaac3a3bbec in pgsql_execute_with_params (pgsql=pgsql@entry=0xffffd8b87040, 
    sql=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", paramCount=paramCount@entry=0, 
    paramTypes=paramTypes@entry=0x0, paramValues=paramValues@entry=0x0, context=context@entry=0x0, parseFun=parseFun@entry=0x0) at pgsql.c:1649
#3  0x0000aaaac3a3c468 in pgsql_execute (pgsql=pgsql@entry=0xffffd8b87040, sql=<optimized out>) at pgsql.c:1522
#4  0x0000aaaac3a245f4 in copydb_create_index (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, index=index@entry=0xffff81f71800, ifNotExists=<optimized out>) at indexes.c:846
#5  0x0000aaaac3a24ca8 in copydb_create_index_by_oid (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, indexOid=<optimized out>) at indexes.c:410
#6  0x0000aaaac3a25040 in copydb_index_worker (specs=specs@entry=0xffffd8b8ec98) at indexes.c:297
#7  0x0000aaaac3a25238 in copydb_start_index_workers (specs=specs@entry=0xffffd8b8ec98) at indexes.c:209
#8  0x0000aaaac3a252f4 in copydb_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:112
#9  0x0000aaaac3a253f4 in copydb_start_index_supervisor (specs=0xffffd8b8ec98) at indexes.c:57
#10 copydb_start_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:34
#11 0x0000aaaac3a51ff4 in copydb_process_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:146
#12 0x0000aaaac3a520dc in copydb_copy_all_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:69
#13 0x0000aaaac3a0ccd8 in cloneDB (copySpecs=copySpecs@entry=0xffffd8b8ec98) at cli_clone_follow.c:602
#14 0x0000aaaac3a0d2cc in start_clone_process (pid=0xffffd8b743d8, copySpecs=0xffffd8b8ec98) at cli_clone_follow.c:502
#15 start_clone_process (copySpecs=copySpecs@entry=0xffffd8b8ec98, pid=pid@entry=0xffffd8b89788) at cli_clone_follow.c:482
#16 0x0000aaaac3a0d52c in cli_clone (argc=<optimized out>, argv=<optimized out>) at cli_clone_follow.c:164
#17 0x0000aaaac3a53850 in commandline_run (command=command@entry=0xffffd8b9eb88, argc=0, argc@entry=22, argv=0xffffd8b9edf8, argv@entry=0xffffd8b9ed48) at /home/admin/pgcopydb/src/bin/pgcopydb/../lib/subcommands.c/commandline.c:71
#18 0x0000aaaac3a01464 in main (argc=22, argv=0xffffd8b9ed48) at main.c:140
(gdb) 

```

The problem is most likely that the following call returned a message in
a read-only memory segment where we cannot replace \n with \0 in
string_utils.c splitLines() function
```C
char *message = PQerrorMessage(pgsql->connection);
```

## Summary of changes

modified the patch to also address this problem
2025-02-06 20:21:18 +00:00
Anastasia Lubennikova
44b905d14b Fix remote extension lookup (#10708)
when library name doesn't match extension name.
The bug was introduced by recent commit ebc55e6a
2025-02-06 19:21:38 +00:00
Arseny Sher
186199f406 Update aws sdk (#10699)
## Problem

We have unclear issue with stuck s3 client, probably after partial aws
sdk update without updating sdk-s3.
https://github.com/neondatabase/neon/pull/10588
Let's try to update s3 as well.

## Summary of changes

Result of running

cargo update -p aws-types -p aws-sigv4 -p aws-credential-types -p
aws-smithy-types -p aws-smithy-async -p aws-sdk-kms -p aws-sdk-iam -p
aws-sdk-s3 -p aws-config

ref https://github.com/neondatabase/neon/issues/10695
2025-02-06 17:28:27 +00:00
OBBO67
82cbab7512 Switch reqlsns[0].request_lsn to arrow operator in neon_read_at_lsnv() (#10620) (#10687)
## Problem

Currently the following line below uses array subscript notation which
is confusing since `reqlsns` is not an array but just a pointer to a
struct.

```
XLogWaitForReplayOf(reqlsns[0].request_lsn);
```

## Summary of changes

Switch from array subscript notation to arrow operator to improve
readability of code.

Close #10620.
2025-02-06 17:26:26 +00:00
Erik Grinaker
2943590694 pageserver: use histogram for background job semaphore waits (#10697)
## Problem

We don't have visibility into how long an individual background job is
waiting for a semaphore permit.

## Summary of changes

* Make `pageserver_background_loop_semaphore_wait_seconds` a histogram
rather than a sum.
* Add a paced warning when a task takes more than 10 minutes to get a
permit (for now).
* Drive-by cleanup of some `EnumMap` usage.
2025-02-06 17:17:47 +00:00
John Spray
df06c41085 tests: don't detach from controller in test_issue_5878 (#10675)
## Problem

This test called NeonPageserver.tenant_detach, which as well as
detaching locally on the pageserver, also updates the storage controller
to put the tenant into Detached mode. When the test runs slowly in debug
mode, it sometimes takes long enough that the background_reconcile loop
wakes up and drops the tenant from memory in response, such that the
pageserver can't validate its deletions and the test does not behave as
expected.

Closes: https://github.com/neondatabase/neon/issues/10513

## Summary of changes

- Call the pageserver HTTP client directly rather than going via
NeonPageserver.tenant_detach
2025-02-06 15:18:50 +00:00
Alexander Bayandin
ddd7c36343 CI(approved-for-ci-run): Use internal CI_ACCESS_TOKEN for cloning repo (#10693)
## Problem

The default `GITHUB_TOKEN` is used to push changes created with
`approved-for-ci-run`, which doesn't work:
```
Run git push --force origin "${BRANCH}"
remote: Permission to neondatabase/neon.git denied to github-actions[bot].
fatal: unable to access 'https://github.com/neondatabase/neon/': The requested URL returned error: 403
```
Ref:
https://github.com/neondatabase/neon/actions/runs/13166108303/job/36746518291?pr=10687

## Summary of changes
- Use `CI_ACCESS_TOKEN` to clone an external repo
- Remove unneeded `actions/checkout`
2025-02-06 14:40:22 +00:00
Peter Bendel
839f41f5bb fix pgcopydb seg fault and -c idle_in_transaction_session_timeout=0 (#10692)
## Problem

During ingest_benchmark which uses `pgcopydb`
([see](https://github.com/dimitri/pgcopydb))we sometimes had outages.
- when PostgreSQL COPY step failed we got a segfault (reported
[here](https://github.com/dimitri/pgcopydb/issues/899))
- the root cause was Neon idle_in_transaction_session_timeout is set to
5 minutes which is suboptimal for long-running tasks like project import
(reported [here](https://github.com/dimitri/pgcopydb/issues/900))

## Summary of changes

Patch pgcopydb to avoid segfault.
override idle_in_transaction_session_timeout and set it to "unlimited"
2025-02-06 14:39:45 +00:00
Alex Chi Z.
f22d41eaec feat(pageserver): num of background job metrics (#10690)
## Problem

We need a metrics to know what's going on in pageserver's background
jobs.

## Summary of changes

* Waiting tasks: task still waiting for the semaphore.
* Running tasks: tasks doing their actual jobs.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-02-06 14:39:37 +00:00
Alexander Lakhin
977781e423 Enable sanitizers for postgres v17 (#10401)
Add a build with sanitizers (asan, ubsan) to the CI pipeline and run
tests on it.

See https://github.com/neondatabase/neon/issues/6053

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-02-06 12:53:43 +00:00
Arpad Müller
67b71538d0 Limit returned lsn for timestamp by the planned gc cutoff (#10678)
Often the output of the timestamp->lsn API is used as input for branch
creation, and branch creation takes the planned lsn into account, i.e.
rejects lsn's as branch lsns that are before the planned lsn.

This patch doesn't fix all race conditions, it's still racy. But at
least it is a step into the right direction.

For #10639
2025-02-06 11:17:08 +00:00
Erik Grinaker
f4cfa725b8 pageserver: add a few critical errors (#10657)
## Problem

Following #10641, let's add a few critical errors.

Resolves #10094.

## Summary of changes

Adds the following critical errors:

* WAL sender read/decode failure.
* WAL record ingestion failure.
* WAL redo failure.
* Missing key during compaction.

We don't add an error for missing keys during GetPage requests, since
we've seen a handful of these in production recently, and the cause is
still unclear (most likely a benign race).
2025-02-06 10:30:27 +00:00
Arpad Müller
05326cc247 Skip gc cutoff lsn check at timeline creation if lease exists (#10685)
Right now, branch creation doesn't care if a lsn lease exists or not, it
just fails if the passed lsn is older than either the last or the
planned gc cutoff.

However, if an lsn lease exists for a given lsn, we can actually create
a branch at that point: nothing has been gc'd away.

This prevents race conditions that #10678 still leaves around.

Related: #10639
https://github.com/neondatabase/cloud/issues/23667
2025-02-06 10:10:11 +00:00
Arpad Müller
b66fbd6176 Warn on basebackups for archived timelines (#10688)
We don't want any external requests for an archived timeline. This
includes basebackup requests, i.e. when a compute is being started up.

Therefore, we'd like to forbid such basebackup requests: any attempt to
get a basebackup on an archived timeline (or any getpage request really)
is a cplane bug. Make this a warning for now so that, if there is
potentially a bug, we can detect cases in the wild before they cause
stuck operations, but the intention is to return an error eventually.

Related: #9548
2025-02-06 10:09:20 +00:00
Vlad Lazar
95588dab98 safekeeper: fix wal fan-out shard subscription data race (#10677)
## Problem

[This select
arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L414)
runs when we want to attach a new reader to the current cursor.
It checks the current position of the cursor and resets it if required.

The current position of the cursor is updated in the [other select
arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L336-L345).
That runs when we get some WAL to send.

Now, what happens if we want to attach two shards consecutively to the
cursor?
Let's say [this select
arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L397)
runs twice in a row.

Let's assume cursor is currently at LSN X. First shard wants to attach
at position V
and the other one at W. Assume X > W > V.

First shard resets the stream to position V. Second shard comes in, 
sees stale cursor position X and resets it to W. This means that the 
first shard doesn't get wal in the [V, W) range.

## Summary of changes

Ultimately, this boils down to the current position not being kept in
sync with the reset of the WAL stream. This patch fixes the race by
updating it when resetting the WAL stream and adds a unit test repro.

Closes https://github.com/neondatabase/cloud/issues/23750
2025-02-06 09:24:28 +00:00
Christian Schwarz
1686d9e733 perf(page_service): dont .instrument(span.clone()) the response flush (#10686)
On my AX102 Hetzner box, removing this line removes about 20us from the
`latency_mean` result in

`test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant`.

If the same 20us can be removed in the nightly benchmark run, this will
be a ~10% improvement because there, mean latencies are about ~220us.

This span was added during batching refactors, we didn't have it before,
and I don't think it's terribly useful.

refs
- https://github.com/neondatabase/cloud/issues/21759
2025-02-06 08:33:37 +00:00
Ivan Efremov
3e624581cd Merge pull request #10691 from neondatabase/rc/release-proxy/2025-02-06
Proxy release 2025-02-06
2025-02-06 10:23:43 +02:00
Erik Grinaker
abcd00181c pageserver: set a concurrency limit for LocalFS (#10676)
## Problem

The local filesystem backend for remote storage doesn't set a
concurrency limit. While it can't/won't enforce a concurrency limit
itself, this also bounds the upload queue concurrency. Some tests create
thousands of uploads, which slows down the quadratic scheduling of the
upload queue, and there is no point spawning that many Tokio tasks.

Resolves #10409.

## Summary of changes

Set a concurrency limit of 100 for the LocalFS backend.

Before: `test_layer_map[release-pg17].test_query: 68.338 s`
After: `test_layer_map[release-pg17].test_query: 5.209 s`
2025-02-06 07:24:36 +00:00
Konstantin Knizhnik
01f0be03b5 Fix bugs in lfc_cache_containsv (#10682)
## Problem

Incorrect manipulations with iteration index in `lfc_cache_containsv`

## Summary of changes

```
-		int		this_chunk = Min(nblocks, BLOCKS_PER_CHUNK - chunk_offs);
+		int		this_chunk = Min(nblocks - i, BLOCKS_PER_CHUNK - chunk_offs);		int		this_chunk = ```
 -		if (i + 1 >= nblocks)
+		if (i >= nblocks)
```

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-06 07:00:00 +00:00
github-actions[bot]
fedf4f169c Proxy release 2025-02-06 2025-02-06 06:02:11 +00:00
Konstantin Knizhnik
81cd30e4d6 Use #ifdef instead of #if USE_ASSERT_CHECKING (#10683)
## Problem

USE_ASSERT _CHECKING is defined as empty entity. but it is checked using
#if

## Summary of changes

Replace `#if USE_ASSERT _CHECKING` with `#ifdef USE_ASSERT _CHECKING` as
done in other places in Postgres

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-06 05:47:56 +00:00
Konstantin Knizhnik
7fc6953da4 Is neon superuser (#10625)
## Problem

is_neon_superuser() fiunction is public in pg14/pg15
but statically defined in publicationcmd.c in pg16/pg17

## Summary of changes

Make this function public for all Postgres version.
It is intended to be used not only in  publicationcmd.c

See
https://github.com/neondatabase/postgres/pull/573
https://github.com/neondatabase/postgres/pull/576

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-02-06 05:42:14 +00:00
Christian Schwarz
77f9e74d86 pgxn: include socket send & recv queue size in slow response logs (#10673)
# Problem

When we see an apparent slow request, one possible cause is that the
client is failing to consume responses, but we don't have a clear way to
see that.

# Solution

- Log the socket queue depths on slow/stuck connections, so that we have
an indication of whether the compute is keeping up with processing the
connection's responses.

refs
- slack https://neondb.slack.com/archives/C036U0GRMRB/p1738652644396329
- refs https://github.com/neondatabase/cloud/issues/23515
- refs https://github.com/neondatabase/cloud/issues/23486
2025-02-06 01:14:29 +00:00
Alex Chi Z.
0ceeec9be3 fix(pageserver): schedule compaction immediately if pending (#10684)
## Problem

The code is intended to reschedule compaction immediately if there are
pending tasks. We set the duration to 0 before if there are pending
tasks, but this will go through the `if period == Duration::ZERO {`
branch and sleep for another 10 seconds.

## Summary of changes

Set duration to 1 so that it doesn't sleep for too long.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-05 22:11:50 +00:00
Alex Chi Z.
733a57247b fix(pageserver): disallow gc-compaction produce l0 layer (#10679)
## Problem

Any compaction should never produce l0 layers. This never happened in my
experiments, but would be good to guard it early.

## Summary of changes

Disallow gc-compaction to produce l0 layers.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-05 20:44:28 +00:00
Heikki Linnakangas
6699a30a49 Make it easy to build only a subset of extensions into compute image (#10655)
The full build of all extensions takes a long time. When working locally
on parts that don't need extensions, you can iterate more quickly by
skipping the unnecessary extensions.

This adds a build argument to the dockerfile to specify extensions to
build. There are three options:

- EXTENSIONS=all (default)
- EXTENSIONS=minimal: Build only a few extensions that are listed in
shared_preload_libraries in the default neon config.
- EXTENSIONS=none: Build no extensions (except for the mandatory 'neon'
extension).
2025-02-05 18:07:51 +00:00
Alex Chi Z.
133b89a83d feat(pageserver): continue from last incomplete image layer creation (#10660)
## Problem

close https://github.com/neondatabase/neon/issues/10651

## Summary of changes

* Image layer creation starts from the next partition of the last
processed partition if the previous attempt was not complete.
* Add tests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-05 17:35:39 +00:00
Arseny Sher
fba22a7123 Record more timings in test_layer_map (#10670)
## Problem

It it is not very clear how much time take different operations.

## Summary of changes

Record more timings.

ref https://github.com/neondatabase/neon/issues/10409
2025-02-05 17:00:26 +00:00
John Spray
14e05276a3 storcon: fix a case where optimise could get stuck on unschedulable node (#10648)
## Problem

When a shard has two secondary locations, but one of them is on a node
with MaySchedule::No, the optimiser would get stuck, because it couldn't
decide which secondary to remove.

This is generally okay if a node is offline, but if a node is in Pause
mode for a long period of time, it's a problem.

Closes: https://github.com/neondatabase/neon/issues/10646

## Summary of changes

- Instead of insisting on finding a node in the wrong AZ to remove, find
an available node in the _right_ AZ, and remove all the others. This
ensures that if there is one live suitable node, then other
offline/paused nodes cannot hold things up.
2025-02-05 16:05:12 +00:00
Tristan Partin
ebc55e6ae8 Fix logic for checking if a compute can install a remote extension (#10656)
Given a remote extensions manifest of the following:

```json
  {
    "public_extensions": [],
    "custom_extensions": null,
    "library_index": {
      "pg_search": "pg_search"
    },
    "extension_data": {
      "pg_search": {
        "control_data": {
          "pg_search.control": "comment = 'pg_search: Full text search for PostgreSQL using BM25'\ndefault_version = '0.14.1'\nmodule_pathname = '$libdir/pg_search'\nrelocatable = false\nsuperuser = true\nschema = paradedb\ntrusted = true\n"
        },
        "archive_path": "13117844657/v14/extensions/pg_search.tar.zst"
      }
    }
  }
```

We were allowing a compute to install a remote extension that wasn't
listed in either public_extensions or custom_extensions.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-02-05 14:58:33 +00:00
Erik Grinaker
f07119cca7 pageserver: add pageserver_wal_ingest_values_committed metric (#10653)
## Problem

We don't have visibility into the ratio of image vs. delta pages
ingested in Pageservers. This might be useful to determine whether we
should compress WAL records before storing them, which in turn might
make compaction more efficient.

## Summary of changes

Add `pageserver_wal_ingest_values_committed` metric with dimensions
`class=metadata|data` and `kind=image|delta`.
2025-02-05 14:33:04 +00:00
Vlad Lazar
47975d06d9 storcon: silence cplane 404s on tenant creation (#10665)
## Problem

We get WARN log noise on tenant creations. Cplane creates tenants via
/location_config. That returns the attached locations in the response
and spawns a reconciliation which will also attempt to notify cplane. If
the notification is attempted before cplane persists the shards to its
database, storcon gets back a 404. The situation is harmless, but
annoying.

## Summary of Changes

* Add a tenant creation hint to the reconciler config
* If the hint is true and we get back a 404 on the notification from
cplane, ignore the error, but still queue the reconcile up for a retry.

Closes https://github.com/neondatabase/cloud/issues/20732
2025-02-05 12:41:09 +00:00
Fedor Dikarev
472007dd7c ci: unify Dockerfiles, set bash as SHELL for debian layers, make cpan step as separate RUN (#10645)
## Problem
Ref: https://github.com/neondatabase/cloud/issues/23461

and follow-up after: https://github.com/neondatabase/neon/pull/10553

we used `echo` to set-up `.wgetrc` and `.curlrc`, and there we used `\n`
to make these multiline configs with one echo command.

The problem is that Debian `/bin/sh`'s built-in echo command behaves
differently from the `/bin/echo` executable and from the `echo` built-in
in `bash`. Namely, it does not support the`-e` option, and while it does
treat `\n` as a newline, passing `-e` here will add that `-e` to the
output.
At the same time, when we use different base images, for example
`alpine/curl`, their `/bin/sh` supports and requires `-e` for treating
escape sequences like `\n`.
But having different `echo` and remembering difference in their
behaviour isn't best experience for the developer and makes bad
experience maintaining Dockerfiles.

Work-arounds:

- Explicitly use `/bin/bash` (like in this PR)
- Use `/bin/echo` instead of the shell's built-in echo function
- Use printf "foo\n" instead of echo -e "foo\n"

## Summary of changes
1. To fix that, we process with the option setting `/bin/bash` as a
SHELL for the debian-baysed layers
2. With no changes for `alpine/curl` based layers.
3. And one more change here: in `extensions` layer split to the 2 steps:
installing dependencies from `CPAN` and installing `lcov` from github,
so upgrading `lcov` could reuse previous layer with installed cpan
modules.
2025-02-04 18:58:02 +00:00
Vlad Lazar
f9009d6b80 pageserver: write heatmap to disk after uploading it (#10650)
## Problem

We wish to make heatmap generation additive in
https://github.com/neondatabase/neon/pull/10597.
However, if the pageserver restarts and has a heatmap on disk from when
it was a secondary long ago,
we can end up keeping extra layers on the secondary's disk.

## Summary of changes

Persist the heatmap after a successful upload.
2025-02-04 17:52:54 +00:00
Alex Chi Z.
cab60b6d9f fix(pagesever): stablize gc-compaction tests (#10621)
## Problem

Hopefully this can resolve
https://github.com/neondatabase/neon/issues/10517. The reason why the
test is flaky is that after restart the compute node might write some
data so that the pageserver flush some layers, and in the end, causing
L0 compaction to run, and we cannot get the test scenario as we want.

## Summary of changes

Ensure all L0 layers are compacted before starting the test.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-04 16:11:31 +00:00
Erik Grinaker
06090bbccd pageserver: log critical error on ClearVmBits for unknown pages (#10634)
## Problem

In #9895, we fixed some issues where `ClearVmBits` were broadcast to all
shards, even those not owning the VM relation. As part of that, we found
some ancient code from #1417, which discarded spurious incorrect
`ClearVmBits` records for pages outside of the VM relation. We added
observability in #9911 to see how often this actually happens in the
wild.

After two months, we have not seen this happen once in production or
staging. However, out of caution, we don't want a hard error and break
WAL ingestion.

Resolves #10067.

## Summary of changes

Log a critical error when ingesting `ClearVmBits` for unknown VM
relations or pages.
2025-02-04 14:55:11 +00:00
Folke Behrens
dcf335a251 proxy: Switch proxy to JSON logging (#9857)
## Problem

We want to switch proxy and ideally all Rust services to structured JSON
logging to support better filtering and cross-referencing with tracing.

## Summary of changes

* Introduce a custom tracing-subscriber to write the JSON. In a first
attempt a customized tracing::fmt::FmtSubscriber was used, but it's very
inefficient and can still generate invalid JSON. It's also doesn't allow
us to add important fields to the root object.
* Make this opt in: the `LOGFMT` env var can be set to `"json"` to
enable to new logger at startup.
2025-02-04 14:50:53 +00:00
Arpad Müller
b6e9daea9a storcon: only allow errrors of the server cert verification (#10644)
This PR does a bunch of things:

* only allow errors of the server cert verification, not of the TLS
handshake. The TLS handshake doesn't cause any errors for us so we can
just always require it to be valid. This simplifies the code a little.
* As the solution is more permanent than originally anticipated, I think
it makes sense to move the `AcceptAll` verifier outside.
* log the connstr information. this helps with figuring out which domain
names are configured in the connstr, etc. I think it is generally useful
to print it. make extra sure that the password is not leaked.

Follow-up of #10640
2025-02-04 14:01:57 +00:00
a-masterov
d5c3a4e2b9 Add support for pgjwt test (#10611)
## Problem
We don't currently test pgjwt, while it is based on pg_prove and can be
easily added
## Summary of changes
The test for pgjwt was added.
2025-02-04 13:49:44 +00:00
Heikki Linnakangas
8107140f7f Refactor compute dockerfile (#10371)
Refactor how extensions are built in compute Dockerfile
    
1. Rename some of the extension layers, so that names correspond more
   precisely to the upstream repository name and the source directory
   name. For example, instead of "pg-jsonschema-pg-build", spell it
   "pg_jsonschema-build". Some of the layer names had the extra "pg-"
   part, and some didn't; harmonize on not having it. And use an
   underscore if the upstream project name uses an underscore.

2. Each extension now consists of two dockerfile targets:
   [extension]-src and [extension]-build. By convention, the -src
   target downloads the sources and applies any neon-specific patches
   if necessary. The source tarball is downloaded and extracted under
   /ext-src. For example, the 'pgvector' extension creates the
   following files and directory:

        /ext-src/pgvector.tar.gz  # original tarball
/ext-src/pgvector.patch # neon-specific patch, copied from patches/ dir
/ext-src/pgvector-src/ # extracted tarball, with patch applied

    This separation avoids re-downloading the sources every time the
    extension is recompiled. The 'extension-tests' target also uses the
    [extension]-src layers, by copying the /ext-src/ dirs from all
    the extensions together into one image.

This refactoring came about when I was experimenting with different
ways of splitting up the Dockerfile so that each extension would be in
a separate file. That's not part of this PR yet, but this is a good
step in modularizing the extensions.
2025-02-04 10:35:43 +00:00
Alex Chi Z.
e219d48bfe refactor(pageserver): clearify compaction return value (#10643)
## Problem

## Summary of changes

Make the return value of the set of compaction functions less confusing.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-03 21:56:55 +00:00
Alex Chi Z.
c1be84197e feat(pageserver): preempt image layer generation if L0 piles up (#10572)
## Problem

Image layer generation could block L0 compactions for a long time.

## Summary of changes

* Refactored the return value of `create_image_layers_for_*` functions
to make it self-explainable.
* Preempt image layer generation in `Try` mode if L0 piles up.

Note that we might potentially run into a state that only the beginning
part of the keyspace gets image coverage. In that case, we either need
to implement something to prioritize some keyspaces with image coverage,
or tune the image_creation_threshold to ensure that the frequency of
image creation could keep up with L0 compaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-02-03 20:55:47 +00:00
dependabot[bot]
d80cbb2443 build(deps): bump openssl from 0.10.66 to 0.10.70 in /test_runner/pg_clients/rust/tokio-postgres in the cargo group across 1 directory (#10642)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-03 19:42:40 +00:00
Erik Grinaker
06b45fd0fd utils/logging: add critical! macro and metric (#10641)
## Problem

We don't currently have good alerts for critical errors, e.g. data
loss/corruption.

Touches #10094.

## Summary of changes

Add a `critical!` macro and corresponding
`libmetrics_tracing_event_count{level="critical"}` metric. This will:

* Emit an `ERROR` log message with prefix `"CRITICAL:"` and a backtrace.
* Increment `libmetrics_tracing_event_count{level="critical"}`, and
indirectly `level="error"`.
* Trigger a pageable alert (via the metric above).
* In debug builds, panic the process.

I'll add uses of the macro separately.
2025-02-03 19:23:12 +00:00
John Spray
715e20343a storage controller: improve scheduling of tenants created in PlacementPolicy::Secondary (#10590)
## Problem

I noticed when onboarding lots of tenants that the AZ scheduling
violation stat was climbing, before falling later as optimisations
happened. This was happening because we first add the tenant with
PlacementPolicy::Secondary, and then later go to
PlacementPolicy::Attached, and the scheduler's behavior led to a bad AZ
choice:
1. Create a secondary location in the non-preferred AZ
2. Upgrade to Attached where we promote that non-preferred-AZ location
to attached and then create another secondary
3. Optimiser later realises we're in the wrong AZ and moves us

## Summary of changes

- Extend some logging to give more information about AZs
- When scheduling secondary location in PlacementPolicy::Secondary,
select it as if we were attached: in this mode, our business goal is to
have a warm pageserver location that we can make available as attached
quickly if needed, therefore we want it to be in the preferred AZ.
- Make optimize_secondary logic the same, so that it will consider a
secondary location in the preferred AZ to be optimal when in
PlacementPolicy::Secondary
- When transitioning to from PlacementPolicy::Attached(N) to
PlacementPolicy::Secondary, instead of arbitrarily picking a location to
keep, prefer to keep the location in the preferred AZ
2025-02-03 19:01:16 +00:00
Arpad Müller
c774f0a147 storcon db: allow accepting any TLS certificate (#10640)
We encountered some TLS validation errors for the storcon since applying
#10614. Add an option to downgrade them to logged errors instead to
allow us to debug with more peace.

cc issue https://github.com/neondatabase/cloud/issues/23583
2025-02-03 18:21:01 +00:00
Folke Behrens
628a9616c4 fix(proxy): Don't use --is-private-access-proxy to disable IP check (#10633)
## Problem
* The behavior of this flag changed. Plus, it's not necessary to disable
the IP check as long as there are no IPs listed in the local postgres.

## Summary of changes
* Drop the flag from the command in the README.md section.
* Change the postgres URL passed to proxy to not use the endpoint
hostname.
* Also swap postgres creation and proxy startup, so the DB is running
when proxy comes up.
2025-02-03 14:12:41 +00:00
Alexander Bayandin
43682624b5 CI(pg-clients): fix logical replication tests (#10623)
## Problem

Tests for logical replication (on Staging) have been failing for some
time because logical replication is not enabled for them. This issue
occurred after switching to an org API key with a different default
setting, where logical replication was not enabled by default.

## Summary of changes
- Add `enable_logical_replication` input to
`actions/neon-project-create`
- Enable logical replication in `test-logical-replication` job
2025-02-03 13:41:41 +00:00
Em Sharnoff
e617a3a075 vm-monitor: Improve error display (#10542)
Logging errors with the debug format specifier causes multi-line errors,
which are sometimes a pain to deal with. Instead, we should use anyhow's
alternate display format, which shows the same information on a single
line.

Also adjusted a couple of error messages that were stale.

Fixes neondatabase/cloud#14710.
2025-02-03 13:34:11 +00:00
Fedor Dikarev
23ca8b061b Use actions/checkout for checkout (#10630)
## Problem
1. First of all it's more correct
2. Current usage allows ` Time-of-Check-Time-of-Use (TOCTOU) 'Pwn
Request' vulnerabilities`. Please check security slack channel or reach
me for more details. I will update PR description after merge.

## Summary of changes
1. Use `actions/checkout` with `ref: ${{
github.event.pull_request.head.sha }}`

Discovered by and Co-author: @varunsh-coder
2025-02-03 12:55:48 +00:00
Anastasia Lubennikova
b1bc33eb4d Fix logical_replication_sync test fixture (#10531)
Fixes flaky test_lr_with_slow_safekeeper test #10242

Fix query to `pg_catalog.pg_stat_subscription` catalog to handle table
synchronization and parallel LR correctly.
2025-02-03 12:44:47 +00:00
OBBO67
b1e451091a pageserver: clean up references to timeline delete marker, uninit marker (#5718) (#10627)
## Problem

Since [#5580](https://github.com/neondatabase/neon/pull/5580) the delete
and uninit file markers are no longer needed.

## Summary of changes

Remove the remaining code for the delete and uninit markers.

Additionally removes the `ends_with_suffix` function as it is no longer
required.

Closes [#5718](https://github.com/neondatabase/neon/issues/5718).
2025-02-03 11:54:07 +00:00
Arpad Müller
87ad50c925 storcon: use diesel-async again, now with tls support (#10614)
Successor of #10280 after it was reverted in #10592.

Re-introduce the usage of diesel-async again, but now also add TLS
support so that we connect to the storcon database using TLS. By
default, diesel-async doesn't support TLS, so add some code to make us
explicitly request TLS.

cc https://github.com/neondatabase/cloud/issues/23583
2025-02-03 11:53:51 +00:00
Alexander Bayandin
89b9f74077 CI(pre-merge-checks): do not run conclusion job for PRs (#10619)
## Problem

While working on https://github.com/neondatabase/neon/pull/10617 I
(unintentionally) merged the PR before the main CI pipeline has
finished.
I suspect this happens because we have received all the required job
results from the pre-merge-checks workflow, which runs on PRs that
include changes to relevant files.

## Summary of changes
- Skip the `conclusion` job in `pre-merge-checks` workflows for PRs
2025-02-03 09:40:12 +00:00
John Spray
f071800979 tests: stabilize shard locations earlier in test_scrubber_tenant_snapshot (#10606)
## Problem

This test would sometimes emit unexpected logs from the storage
controller's requests to do migrations, which overlap with the test's
restarts of pageservers, where those migrations are happening some time
after a shard split as the controller moves load around.

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10602/13067323736/index.html#testresult/f66f1329557a1fc5/retries

## Summary of changes

- Do a reconcile_until_idle after shard split, so that the rest of the
test doesn't run concurrently with migrations
2025-02-03 09:02:21 +00:00
Peter Bendel
4dfe60e2ad revert https://github.com/neondatabase/neon/pull/10616 (#10631)
## Problem

https://github.com/neondatabase/neon/pull/10616 was only intended
temparily during the weekend, want to reset to prior state

## Summary of changes

revert https://github.com/neondatabase/neon/pull/10616 but keep fixes in
https://github.com/neondatabase/neon/pull/10622
2025-02-03 09:00:23 +00:00
Arpad Müller
8ae6f656a6 Don't require partial backup semaphore capacity for deletions (#10628)
In the safekeeper, we block deletions on the timeline's gate closing,
and any `WalResidentTimeline` keeps the gate open (because it owns a
gate lock object). Thus, unless the `main_task` function of a partial
backup doesn't return, we can't delete the associated timeline.

In order to make these tasks exit early, we call the cancellation token
of the timeline upon its shutdown. However, the partial backup task
wasn't looking for the cancellation while waiting to acquire a partial
backup permit.

On a staging safekeeper we have been in a situation in the past where
the semaphore was already empty for a duration of many hours, rendering
all attempted deletions unable to proceed until a restart where the
semaphore was reset:
https://neondb.slack.com/archives/C03H1K0PGKH/p1738416586442029
2025-02-03 04:11:06 +00:00
Peter Bendel
b9e1a67246 fix generate matrix for olap for saturdays (#10622)
## Problem

when introducing pg17 for job step `Generate matrix for OLAP benchmarks`
I introduced a syntax error that only hits on Saturdays.

## Summary of changes

Remove trailing comma

## successful test run

https://github.com/neondatabase/neon/actions/runs/13086363907
2025-02-01 11:09:45 +00:00
Folke Behrens
6318828c63 Update rust to 1.84.1 (#10618)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://releases.rs/docs/1.84.1/).

Prior update was in https://github.com/neondatabase/neon/pull/10328.

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-01-31 20:52:17 +00:00
Stefan Radig
6dd48ba148 feat(proxy): Implement access control with VPC endpoint checks and block for public internet / VPC (#10143)
- Wired up filtering on VPC endpoints
- Wired up block access from public internet / VPC depending on per
project flag
- Added cache invalidation for VPC endpoints (partially based on PR from
Raphael)
- Removed BackendIpAllowlist trait

---------

Co-authored-by: Ivan Efremov <ivan@neon.tech>
2025-01-31 20:32:57 +00:00
Conrad Ludgate
ad1a41157a feat(proxy): optimizing the chances of large write in copy_bidirectional (#10608)
We forked copy_bidirectional to solve some issues like fast-shutdown
(disallowing half-open connections) and to introduce better error
tracking (which side of the conn closed down).

A change recently made its way upstream offering performance
improvements: https://github.com/tokio-rs/tokio/pull/6532. These seem
applicable to our fork, thus it makes sense to apply them here as well.
2025-01-31 19:14:27 +00:00
Tristan Partin
fcd195c2b6 Migrate compute_ctl arg parsing to clap derive (#10497)
The primary benefit is that all the ad hoc get_matches() calls are no
longer necessary. Now all it takes to get at the CLI arguments is
referencing a struct member. It's also great the we can replace the ad
hoc CLI struct we had with this more formal solution.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-31 19:04:26 +00:00
Peter Bendel
bc7822d90c temporarily disable some steps and run more often to expose more pgbench --initialize in benchmarking workflow (#10616)
## Problem

we want to disable some steps in benchmarking workflow that do not
initialize new projects and instead run the test more frequently

Test run
 https://github.com/neondatabase/neon/actions/runs/13077737888
2025-01-31 18:41:17 +00:00
Alexander Bayandin
48c87dc458 CI(pre-merge-checks): fix condition (#10617)
## Problem

Merge Queue fails if changes include Rust code.

## Summary of changes
- Fix condition for `build-build-tools-image`
- Add a couple of no-op `false ||` to make predicates look 
symmetric
2025-01-31 18:07:26 +00:00
John Spray
aedeb1c7c2 pageserver: revise logging of cancelled request results (#10604)
## Problem

When a client dropped before a request completed, and a handler returned
an ApiError, we would log that at error severity. That was excessive in
the case of a request erroring on a shutdown, and could cause test
flakes.

example:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/13067651123/index.html#suites/ad9c266207b45eafe19909d1020dd987/6021ce86a0d72ae7/

```
Cancelled request finished with an error: ShuttingDown
```

## Summary of changes

- Log a different info-level on ShuttingDown and ResourceUnavailable API
errors from cancelled requests
2025-01-31 17:43:54 +00:00
John Spray
a93e9f22fc pageserver: remove faulty debug assertion in compaction (#10610)
## Problem

This assertion is incorrect: it is legal to see another shard's data at
this point, after a shard split.

Closes: https://github.com/neondatabase/neon/issues/10609

## Summary of changes

- Remove faulty assertion
2025-01-31 17:43:31 +00:00
JC Grünhage
10cf5e7a38 Move cargo-deny into a separate workflow on a schedule (#10289)
## Problem
There are two (related) problems with the previous handling of
`cargo-deny`:
- When a new advisory is added to rustsec that affects a dependency,
unrelated pull requests will fail.
- New advisories rely on pushes or PRs to be surfaced. Problems that
already exist on main will only be found if we try to merge new things
into main.

## Summary of changes
We split out `cargo-deny` into a separate workflow that runs on all PRs
that touch `Cargo.lock`, and on a schedule on `main`, `release`,
`release-compute` and `release-proxy` to find new advisories.
2025-01-31 13:42:59 +00:00
Arpad Müller
dce617fe07 Update to rebased rust-postgres (#10584)
Update to a rebased version of our rust-postgres patches, rebased on
[this](98f5a11bc0)
commit this time.

With #10280 reapplied, this means that the rust-postgres crates will be
deduplicated, as the new crate versions are finally compatible with the
requirements of diesel-async.

Earlier update: #10561

rust-postgres PR: https://github.com/neondatabase/rust-postgres/pull/39
2025-01-31 12:40:20 +00:00
Alexander Bayandin
503bc72d31 CI: add diesel print-schema check (#10527)
## Problem

We want to check that `diesel print-schema` doesn't generate any changes
(`storage_controller/src/schema.rs`) in comparison with the list of
migration.

## Summary of changes
- Add `diesel_cli` to `build-tools` image
- Add `Check diesel schema` step to `build-neon` job, at this stage we
have all required binaries, so don't need to compile anything
additionally
- Check runs only on x86 release builds to be sure we do it at least
once per CI run.
2025-01-31 11:48:46 +00:00
Fedor Dikarev
89cff08354 unify pg-build-nonroot-with-cargo base layer and config retries in curl (#10575)
Ref: https://github.com/neondatabase/cloud/issues/23461

## Problem

Just made changes around and see these 2 base layers could be optimised.

and after review comment from @myrrc setting up timeouts and retries in
`alpine/curl` image

## Summary of changes
2025-01-31 11:46:33 +00:00
Erik Grinaker
afbcebe7f7 test_runner: force-compact in test_sharding_autosplit (#10605)
## Problem

This test may not fully detect data corruption during splits, since we
don't force-compact the entire keyspace.

## Summary of changes

Force-compact all data in `test_sharding_autosplit`.
2025-01-31 11:31:58 +00:00
Arpad Müller
7d5c70c717 Update AWS SDK crates (#10588)
We want to keep the AWS SDK up to date as that way we benefit from new
developments and improvements.

Prior update was in #10056
2025-01-31 11:23:12 +00:00
John Spray
f09cfd11cb pageserver: exclude archived timelines from freeze+flush on shutdown (#10594)
## Problem

If offloading races with normal shutdown, we get a "failed to freeze and
flush: cannot flush frozen layers when flush_loop is not running, state
is Exited". This is harmless but points to it being quite strange to try
and freeze and flush such a timeline. flushing on shutdown for an
archived timeline isn't useful.

Related: https://github.com/neondatabase/neon/issues/10389

## Summary of changes

- During Timeline::shutdown, ignore ShutdownMode::FreezeAndFlush if the
timeline is archived
2025-01-31 10:54:14 +00:00
Arseny Sher
765ba43438 Allow pageserver unreachable errors in test_scrubber_tenant_snapshot (#10585)
## Problem

test_scrubber_tenant_snapshot restarts pageservers, but log validation
fails tests on any non white listed storcon warnings, making the test
flaky.

## Summary of changes

Allow warns like
2025-01-29T12:37:42.622179Z WARN reconciler{seq=1
tenant_id=2011077aea9b4e8a60e8e8a19407634c shard_id=0004}: Call to node
2 (localhost:15352) management API failed, will retry (attempt 1):
receive body: error sending request for url
(http://localhost:15352/v1/tenant/2011077aea9b4e8a60e8e8a19407634c-0004/location_config):
client error (Connect)

ref https://github.com/neondatabase/neon/issues/10462
2025-01-31 10:33:24 +00:00
Folke Behrens
6041a93591 Update tokio base crates (#10556)
Update `tokio` base crates and their deps. Pin `tokio` to at least 1.41
which stabilized task ID APIs.

To dedup `mio` dep the `notify` crate is updated. It's used in
`compute_tools`.

9f81828429/compute_tools/src/pg_helpers.rs (L258-L367)
2025-01-31 09:54:31 +00:00
Conrad Ludgate
738bf83583 chore: replace dashmap with clashmap (#10582)
## Problem

Because dashmap 6 switched to hashbrown RawTable API, it required us to
use unsafe code in the upgrade:
https://github.com/neondatabase/neon/pull/8107

## Summary of changes

Switch to clashmap, a fork maintained by me which removes much of the
unsafe and ultimately switches to HashTable instead of RawTable to
remove much of the unsafe requirement on us.
2025-01-31 09:53:43 +00:00
Anna Stepanyan
423e239617 [infra/notes] impr: add issue types to issue templates (#10018)
refs #0000

---------

Co-authored-by: Fedor Dikarev <fedor@neon.tech>
2025-01-31 06:29:06 +00:00
Heikki Linnakangas
df87a55609 tests: Speed up test_pgdata_import_smoke on Postgres v17 (#10567)
The test runs this query:

    select count(*), sum(data::bigint)::bigint from t

to validate the test results between each part of the test. It performs
a simple sequential scan and aggregation, but was taking an order of
magnitude longer on v17 than on previous Postgres versions, which
sometimes caused the test to time out. There were two reasons for that:

1. On v17, the planner estimates the table to have only only one row. In
reality it has 305790 rows, and older versions estimated it at 611580,
which is not too bad given that the table has not been analyzed so the
planner bases that estimate just on the number of pages and the widths
of the datatypes. The new estimate of 1 row is much worse, and it leads
the planner to disregard parallel plans, whereas on older versions you
got a Parallel Seq Scan.

I tracked this down to upstream commit 29cf61ade3, "Consider fillfactor
when estimating relation size". With that commit,
table_block_relation_estimate_size() function calculates that each page
accommodates less than 1 row when the fillfactor is taken into account,
which rounds down to 0. In reality, the executor will always place at
least one row on a page regardless of fillfactor, but the new estimation
formula doesn't take that into account.

I reported this to pgsql-hackers
(https://www.postgresql.org/message-id/2bf9d973-7789-4937-a7ca-0af9fb49c71e%40iki.fi),
we don't need to do anything more about it in neon. It's OK to not use
parallel scans here; once issue 2. below is addressed, the queries are
fast enough without parallelism..

2. On v17, prefetching was not happening for the sequential scan. That's
because starting with v17, buffers are reserved in the shared buffer
cache before prefetching is initiated, and we use a tiny
shared_buffers=1MB setting in the tests. The prefetching is effectively
disabled with such a small shared_buffers setting, to protect the system
from completely starving out of buffers.

   To address that, simply bump up shared_buffers in the test.

This patch addresses the second issue, which is enough to fix the
problem.
2025-01-30 22:55:17 +00:00
John Spray
5e0c40709f storcon: refine chaos selection logic (#10600)
## Problem

In https://github.com/neondatabase/neon/pull/10438 it was pointed out
that it would be good to avoid picking tenants in ID order, and also to
avoid situations where we might double-select the same tenant.

There was an initial swing at this in
https://github.com/neondatabase/neon/pull/10443, where Chi suggested a
simpler approach which is done in this PR

## Summary of changes

- Split total set of tenants into in and out of home AZ
- Consume out of home AZ first, and if necessary shuffle + consume from
out of home AZ
2025-01-30 22:45:43 +00:00
John Spray
e1273acdb1 pageserver: handle shutdown cleanly in layer download API (#10598)
## Problem

This API is used in tests and occasionally for support. It cast all
errors to 500.

That can cause a failure on the log checks:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/13056992876/index.html#suites/ad9c266207b45eafe19909d1020dd987/683a7031d877f3db/

## Summary of changes

- Avoid using generic anyhow::Error for layer downloads
- Map shutdown cases to 503 in http route
2025-01-30 22:43:36 +00:00
John Spray
d18f6198e1 storcon: fix AZ-driven tenant selection in chaos (#10443)
## Problem

In https://github.com/neondatabase/neon/pull/10438 I had got the
function for picking tenants backwards, and it was preferring to move
things _away_ from their preferred AZ.

## Summary of changes

- Fix condition in `is_attached_outside_preferred_az`
2025-01-30 22:17:07 +00:00
John Spray
6da7c556c2 pageserver: fix race cleaning up timeline files when shut down during bootstrap (#10532)
## Problem

Timeline bootstrap starts a flush loop, but doesn't reliably shut down
the timeline (incl. waiting for flush loop to exit) before destroying
UninitializedTimeline, and that destructor tries to clean up local
storage. If local storage is still being written to, then this is
unsound.

Currently the symptom is that we see a "Directory not empty" error log,
e.g.
https://neon-github-public-dev.s3.amazonaws.com/reports/main/12966756686/index.html#testresult/5523f7d15f46f7f7/retries

## Summary of changes

- Move fallible IO part of bootstrap into a function (notably, this is
fallible in the case of the tenant being shut down while creation is
happening)
- When that function returns an error, call shutdown() on the timeline
2025-01-30 20:33:22 +00:00
a-masterov
bf6d5e93ba Run tests of the contrib extensions (#10392)
## Problem
We don't test the extensions, shipped with contrib
## Summary of changes
The tests are now running
2025-01-30 19:32:35 +00:00
Arpad Müller
4d2c2e9460 Revert "storcon: switch to diesel-async and tokio-postgres (#10280)" (#10592)
There was a regression of #10280, tracked in
[#23583](https://github.com/neondatabase/cloud/issues/23583).

I have ideas how to fix the issue, but we are too close to the release
cutoff, so revert #10280 for now. We can revert the revert later :).
2025-01-30 19:23:25 +00:00
John Spray
bae0de643e tests: relax constraints on test_timeline_archival_chaos (#10595)
## Problem

The test asserts that it completes at least 10 full timeline lifecycles,
but the noisy CI environment sometimes doesn't meet that goal.

Related: https://github.com/neondatabase/neon/issues/10389

## Summary of changes

- Sleep for longer between pageserver restarts, so that the timeline
workers have more chance to make progress
- Sleep for shorter between retries from timeline worker, so that they
have better chance to get in while a pageserver is up between restarts
- Relax the success condition to complete at least 5 iterations instead
of 10
2025-01-30 19:22:59 +00:00
Cheng Chen
8293b252b2 chore(compute): pg_mooncake v0.1.1 (#10578)
## Problem
Upgrade pg_mooncake to v0.1.1

## Summary of changes

https://github.com/Mooncake-Labs/pg_mooncake/blob/main/CHANGELOG.md#011-2025-01-29
2025-01-30 18:33:25 +00:00
Peter Bendel
6c8fc909d6 Benchmarking PostgreSQL17: for OLAP need specific connstr secrets (#10587)
## Problem

for OLAP benchmarks we need specific connstr secrets with different
database names for each job step

This is a follow-up for https://github.com/neondatabase/neon/pull/10536
In previous PR we used a common GitHub secret for a shared re-use
project that has 4 databases: neondb, tpch, clickbench and userexamples.

[Failure
example](https://neon-github-public-dev.s3.amazonaws.com/reports/main/13044872855/index.html#suites/54d0af6f403f1d8611e8894c2e07d023/fc029330265e9f6e/):


```log
# /tmp/neon/pg_install/v17/bin/psql user=neondb_owner dbname=neondb host=ep-broad-brook-w2luwzzv.us-east-2.aws.neon.build sslmode=require options='-cstatement_timeout=0 ' -c -- $ID$
-- TPC-H/TPC-R Pricing Summary Report Query (Q1)
-- Functional Query Definition
-- Approved February 1998
...
ERROR:  relation "lineitem" does not exist

```

## Summary of changes

We need dedicated GitHub secrets and dedicated connection strings for
each of the use cases.

## Test run
https://github.com/neondatabase/neon/actions/runs/13053968231
2025-01-30 16:41:46 +00:00
Heikki Linnakangas
efe42db264 tests: test_pgdata_import_smoke requires the 'testing' cargo feature (#10569)
It took me ages to figure out why it was failing on my laptop. What I
saw was that when the test makes the 'import_pgdata' in the pageserver,
the pageserver actually performs a regular 'bootstrap' timeline creation
by running initdb, with no importing. It boiled down to the json request
that the test uses:

```
        {
            "new_timeline_id": str(timeline_id),
            "import_pgdata": {
                "idempotency_key": str(idempotency),
                "location": {"LocalFs": {"path": str(importbucket.absolute())}},
            },
        },
```

and how serde deserializes into rust structs. The 'LocalFs' enum variant
in `models.rs` is gated on the 'testing' cargo feature. On a non-testing
build, that got deserialized into the default Bootstrap enum variant, as
a valid TimelineCreateRequestModeImportPgdata variant could not be
formed.

PS. IMHO we should get rid of the testing feature, compile in all the
functionality, and have a runtime flag to disable anything dangeorous.
With that, you would've gotten a nice "feature only enabled in testing
mode" error in this case, or the test would've simply worked. But that's
another story.
2025-01-30 16:11:26 +00:00
Alex Chi Z.
cf6dee946e fix(pageserver): gc-compaction race with read (#10543)
## Problem

close https://github.com/neondatabase/neon/issues/10482

## Summary of changes

Add an extra lock on the read path to protect against races. The read
path has an implication that only certain kind of compactions can be
performed. Garbage keys must first have an image layer covering the
range, and then being gc-ed -- they cannot be done in one operation. An
alternative to fix this is to move the layers read guard to be acquired
at the beginning of `get_vectored_reconstruct_data_timeline`, but that
was intentionally optimized out and I don't want to regress.

The race is not limited to image layers. Gc-compaction will consolidate
deltas automatically and produce a flat delta layer (i.e., when we have
retain_lsns below the gc-horizon). The same race would also cause
behaviors like getting an un-replayable key history as in
https://github.com/neondatabase/neon/issues/10049.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-30 15:25:29 +00:00
Alexey Kondratov
be51b10da7 chore(compute): Print some compute_ctl errors in debug mode (#10586)
## Problem

In some cases, we were returning a very shallow error like `error
sending request for url (XXX)`, which made it very hard to figure out
the actual error.

## Summary of changes

Use `{:?}` in a few places, and remove it from places where we were
printing a string anyway.
2025-01-30 14:31:49 +00:00
Arpad Müller
93714c4c7b secondary downloader: load metadata on loading of timeline (#10539)
Related to #10308, we might have legitimate changes in file size or
generation. Those changes should not cause warn log lines.

In order to detect changes of the generation number while the file size
stayed the same, load the metadata that we store on disk on loading of
the timeline.

Still do a comparison with the on-disk layer sizes to find any
discrepancies that might occur due to race conditions (new metadata file
gets written but layer file has not been updated yet, and PS shuts
down). However, as it's possible to hit it in a race conditon, downgrade
it to a warning.

Also fix a mistake in #10529: we want to compare the old with the new
metadata, not the old metadata with itself.
2025-01-30 12:03:36 +00:00
John Spray
ab627ad9fd storcon_cli: fix spurious error setting preferred AZ (#10568)
## Problem

The client code for `tenant-set-preferred-az` declared response type
`()`, so printed a spurious error on each use:
```
Error: receive body: error decoding response body: invalid type: map, expected unit at line 1 column 0
```

The requests were successful anyway.

## Summary of changes

- Declare the proper return type, so that the command succeeds quietly.
2025-01-30 11:54:02 +00:00
Erik Grinaker
6a2afa0c02 pageserver: add per-timeline read amp histogram (#10566)
## Problem

We don't have per-timeline observability for read amplification.

Touches https://github.com/neondatabase/cloud/issues/23283.

## Summary of changes

Add a per-timeline `pageserver_layers_per_read` histogram.

NB: per-timeline histograms are expensive, but probably worth it in this
case.
2025-01-30 11:24:49 +00:00
Alexander Bayandin
8804d58943 Nightly Benchmarks: use pgbench from artifacts (#10370)
We don't use statically linked OpenSSL anymore (#10302), 
it's ok to switch to Neon's pgbench for pgvector benchmarks
2025-01-30 11:18:07 +00:00
Erik Grinaker
d3db96c211 pageserver: add pageserver_deltas_per_read_global metric (#10570)
## Problem

We suspect that Postgres checkpoints will limit the number of page
deltas necessary to reconstruct a page, but don't know for certain.

Touches https://github.com/neondatabase/cloud/issues/23283.

## Summary of changes

Add `pageserver_deltas_per_read_global` metric.

This pairs with `pageserver_layers_per_read_global` from #10573.
2025-01-30 10:55:07 +00:00
Erik Grinaker
b24727134c pageserver: improve read amp metric (#10573)
## Problem

The current global `pageserver_layers_visited_per_vectored_read_global`
metric does not appear to accurately measure read amplification. It
divides the layer count by the number of reads in a batch, but this
means that e.g. 10 reads with 100 L0 layers will only measure a read amp
of 10 per read, while the actual read amp was 100.

While the cost of layer visits are amortized across the batch, and some
layers may not intersect with a given key, each visited layer
contributes directly to the observed latency for every read in the
batch, which is what we care about.

Touches https://github.com/neondatabase/cloud/issues/23283.
Extracted from #10566.

## Summary of changes

* Count the number of layers visited towards each read in the batch,
instead of the average across the batch.
* Rename `pageserver_layers_visited_per_vectored_read_global` to
`pageserver_layers_per_read_global`.
* Reduce the read amp log warning threshold down from 512 to 100.
2025-01-30 09:27:40 +00:00
Alexander Lakhin
a7a706cff7 Fix submodule reference after #10473 (#10577) 2025-01-30 09:09:43 +00:00
Folke Behrens
86d5798108 Merge pull request #10576 from neondatabase/rc/release-proxy/2025-01-30
Proxy release 2025-01-30
2025-01-30 08:52:09 +01:00
github-actions[bot]
8b4088dd8a Proxy release 2025-01-30 2025-01-30 06:02:00 +00:00
Alex Chi Z.
77ea9b16fe fix(pageserver): use the larger one of upper limit and threshold (#10571)
## Problem

Follow up of https://github.com/neondatabase/neon/pull/10550 in case the
upper limit is set larger than threshold. It does not make sense for
someone to enforce the behavior like "if there are >= 50 L0s, only
compact 10 of them".

## Summary of changes

Use the maximum of compaction threshold and upper limit when selecting
L0 files to compact.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-30 00:05:40 +00:00
Alex Chi Z.
9dff6cc2a4 fix(pageserver): skip repartition if we need L0 compaction (#10547)
## Problem

Repartition is slow, but it's only used in image layer creation. We can
skip it if we have a lot of L0 layers to ingest.

## Summary of changes

If L0 compaction is not complete, do not repartition and do not create
image layers.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-29 21:32:50 +00:00
Erik Grinaker
ff298afb97 pageserver: add level for timeline layer metrics (#10563)
## Problem

We don't have good observability for per-timeline compaction debt,
specifically the number of delta layers in the frozen, L0, and L1
levels.

Touches https://github.com/neondatabase/cloud/issues/23283.

## Summary of changes

* Add a `level` label for `pageserver_layer_{count,size}` with values
`l0`, `l1`, and `frozen`.
* Track metrics for frozen layers.

There is already a `kind={delta,image}` label. `kind=image` is only
possible for `level=l1`.

We don't include the currently open ephemeral layer, only frozen layers.
There is always exactly 1 ephemeral layer, with a dynamic size which is
already tracked in `pageserver_timeline_ephemeral_bytes`.
2025-01-29 21:10:56 +00:00
Fedor Dikarev
de1c35fab3 add retries for apt, wget and curl (#10553)
Ref: https://github.com/neondatabase/cloud/issues/23461

## Problem
> recent CI failure due to apt-get:
```
4.266 E: Failed to fetch http://deb.debian.org/debian/pool/main/g/gcc-10/libgfortran5_10.2.1-6_arm64.deb  Error reading from server - read (104: Connection reset by peer) [IP: 146.75.122.132 80]
```

https://github.com/neondatabase/neon/actions/runs/11144974698/job/30973537767?pr=9186
thinking about if there should be a mirror-selector at the beginning of
the dockerfile so that it uses a debian mirror closer to the build
server?
## Summary of changes
We could consider adding local mirror or proxy and keep it close to our
self-hosted runners.
For now lets just add retries for `apt`, `wget` and `curl`

thanks to @skyzh for reporting that in October 2024, I just finally
found time to take a look here :)
2025-01-29 21:02:54 +00:00
Peter Bendel
62819aca36 Add PostgreSQL version 17 benchmarks (#10536)
## Problem

benchmarking.yml so far is only running benchmarks with PostgreSQL
version 16.
However neon recently changed the default for new customers to
PostgreSQL version 17.

See related [epic](https://github.com/neondatabase/cloud/issues/23295)

## Summary of changes

We do not want to run every job step with both pg 16 and 17 because this
would need excessive resources (runners, computes) and extend the
benchmarking run wall clock time too much.

So we select an opinionated subset of testcases that we also report in
weekly reporting and add a postgres v17 job step.

For re-use projects associated Neon projects have been created and
connection strings have been added to neon database organization
secrets.

A follow up is to add the reporting for these new runs to some grafana
dashboards.
2025-01-29 20:21:42 +00:00
Tristan Partin
707a926057 Remove unused compute_ctl HTTP routes (#10544)
These are not used anywhere within the platform, so let's remove dead
code.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-29 19:22:01 +00:00
Alex Chi Z.
5bcefb4ee1 fix(pageserver): compaction perftest wrt upper limit (#10564)
## Problem

The config is added in https://github.com/neondatabase/neon/pull/10550
causing behavior change for l0 compaction.

close https://github.com/neondatabase/neon/issues/10562

## Summary of changes

Fix the test case to consider the effect of upper_limit.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-29 18:43:39 +00:00
Alexey Kondratov
34322b2424 chore(compute): Simplify new compute_ctl metrics and fix flaky test (#10560)
## Problem

1. d04d924 added separate metrics for total requests and failures
separately, but it doesn't make much sense. We could just have a unified
counter with `http_status`.
2. `test_compute_migrations_retry` had a race, i.e., it was waiting for
the last successful migration, not an actual failure. This was revealed
after adding an assert on failure metric in d04d924.

## Summary of changes

1. Switch to unified counters for `compute_ctl` requests.
2. Add a waiting loop into `test_compute_migrations_retry` to eliminate
the race.

Part of neondatabase/cloud#17590
2025-01-29 18:09:25 +00:00
Vlad Lazar
fdfbc7b358 pageserver: hold GC while reading from a timeline (#10559)
## Problem

If we are GC-ing because a new image layer was added while traversing
the timeline, then it will remove layers that are required for
fulfilling the current get request (read-path cannot "look back" and
notice the new image layer).

## Summary of Changes

Prevent GC from progressing on the current timeline while it is being
visited for a read.

Epic: https://github.com/neondatabase/neon/issues/9376
2025-01-29 17:08:25 +00:00
Conrad Ludgate
190c19c034 chore: update rust-postgres on rebase (#10561)
I tried a full update of our tokio-postgres fork before. We hit some
breaking change. This PR only pulls in ~50% of the changes from
upstream: https://github.com/neondatabase/rust-postgres/pull/38.
2025-01-29 17:02:07 +00:00
Mikhail Kot
34e560fe37 download exporters from releases rather than using docker images (#10551)
Use releases for postgres-exporter, pgbouncer-exporter, and sql-exporter
2025-01-29 15:52:00 +00:00
Tristan Partin
7922458b98 Use num_cpus from the workspace in pageserver (#10545)
Luckily they were the same version, so we didn't spend time compiling
two versions, which could have been the case in the future.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-29 15:45:36 +00:00
a-masterov
34d9e2d8e3 Add a test for GrapgQL (#10156)
## Problem
We currently don't run the tests shipped with `pg_graphql`.
## Summary of changes
The tests for `pg_graphql` are added.
2025-01-29 15:01:56 +00:00
Conrad Ludgate
2f82c21c63 chore: update rust-postgres fork (#10557)
I updated the fork to fix some lints. Cargo keeps getting confused by it
so let's just update the lockfile here
2025-01-29 12:55:24 +00:00
Ivan Efremov
222cc181e9 impr(proxy): Move the CancelMap to Redis hashes (#10364)
## Problem
The approach of having CancelMap as an in-memory structure increases
code complexity,
as well as putting additional load for Redis streams.

## Summary of changes
- Implement a set of KV ops for Redis client;
- Remove cancel notifications code;
- Send KV ops over the bounded channel to the handling background task
for removing and adding the cancel keys.


Closes #9660
2025-01-29 11:19:10 +00:00
alexanderlaw
4d2328ebe3 Fix C code to satisfy sanitizers (#10473) 2025-01-29 10:05:43 +00:00
a-masterov
9f81828429 Test extension upgrade compatibility (#10244)
## Problem
We have to test the extensions, shipped with Neon for compatibility
before the upgrade.
## Summary of changes
Added the test for compatibility with the upgraded extensions.
2025-01-29 09:19:11 +00:00
Arseny Sher
9ab13d6e2c Log statements in test_layer_map (#10554)
## Problem

test_layer_map doesn't log statements and it is not clear how long they
take.

## Summary of changes

Do log them.

ref https://github.com/neondatabase/neon/issues/10409
2025-01-29 09:16:00 +00:00
Alex Chi Z.
983e18e63e feat(pageserver): add compaction_upper_limit config (#10550)
## Problem

Follow-up of the incident, we should not use the same bound on
lower/upper limit of compaction files. This patch adds an upper bound
limit, which is set to 50 for now.

## Summary of changes

Add `compaction_upper_limit`.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-01-28 23:18:32 +00:00
Alex Chi Z.
b735df6ff0 fix(pageserver): make image layer generation atomic (#10516)
## Problem

close https://github.com/neondatabase/neon/issues/8362

## Summary of changes

Use `BatchLayerWriter` to ensure we clean up image layers after failed
compaction.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-28 21:29:51 +00:00
Fedor Dikarev
68cf0ba439 run benchmark tests on small-metal runners (#10549)
## Problem
Ref: https://github.com/neondatabase/cloud/issues/23314

We suspect some inconsistency in Benchmark tests runs could be due to
different type of runners they are landed in.
To have that aligned in both terms: failure rates and benchmark results,
lets run them for now on `small-metal` servers and see the progress for
the tests stability.
 
## Summary of changes
2025-01-28 21:26:38 +00:00
Alexey Kondratov
d04d924649 feat(compute): Add some basic compute_ctl metrics (#10504)
## Problem

There are several parts of `compute_ctl` with a very low visibility of
errors:
1. DB migrations that run async in the background after compute start.
2. Requests made to control plane (currently only `GetSpec`).
3. Requests made to the remote extensions server.

## Summary of changes

Add new counters to quickly evaluate the amount of errors among the
fleet.

Part of neondatabase/cloud#17590
2025-01-28 19:24:07 +00:00
JC Grünhage
f5fdaa6dc6 feat(ci): generate basic release notes with links (#10511)
## Problem
https://github.com/neondatabase/neon/pull/10448 removed release notes,
because if their generation failed, the whole release was failing.
People liked them though, and wanted some basic release notes as a
fall-back instead of completely removing them.

## Summary of changes
Include basic release notes that link to the release PR and to a diff to
the previous release.
2025-01-28 19:13:39 +00:00
Vlad Lazar
c54cd9e76a storcon: signal LSN wait to pageserver during live migration (#10452)
## Problem

We've seen the ingest connection manager get stuck shortly after a
migration.

## Summary of changes

A speculative mitigation is to use the same mechanism as get page
requests for kicking LSN ingest. The connection manager monitors
LSN waits and queries the broker if no updates are received for the
timeline.

Closes https://github.com/neondatabase/neon/issues/10351
2025-01-28 17:33:07 +00:00
Erik Grinaker
1010b8add4 pageserver: add l0_flush_wait_upload setting (#10534)
## Problem

We need a setting to disable the flush upload wait, to test L0 flush
backpressure in staging.

## Summary of changes

Add `l0_flush_wait_upload` setting.
2025-01-28 17:21:05 +00:00
Folke Behrens
ae4b2af299 fix(proxy): Use correct identifier for usage metrics upload (#10538)
## Problem

The request data and usage metrics S3 requests use the same identifier
shown in logs, causing confusion about what type of upload failed.

## Summary of changes

Use the correct identifier for usage metrics uploads.

neondatabase/cloud#23084
2025-01-28 17:08:17 +00:00
Tristan Partin
15fecb8474 Update axum to 0.8.1 (#10332)
Only a few things that needed updating:

- async_trait was removed
- Message::Text takes a Utf8Bytes object instead of a String

Signed-off-by: Tristan Partin <tristan@neon.tech>
Co-authored-by: Conrad Ludgate <connor@neon.tech>
2025-01-28 15:32:59 +00:00
Erik Grinaker
47677ba578 pageserver: disable L0 backpressure by default (#10535)
## Problem

We'll need further improvements to compaction before enabling L0 flush
backpressure by default. See:
https://neondb.slack.com/archives/C033RQ5SPDH/p1738066068960519?thread_ts=1737818888.474179&cid=C033RQ5SPDH.

Touches #5415.

## Summary of changes

Disable `l0_flush_delay_threshold` by default.
2025-01-28 14:51:30 +00:00
Arpad Müller
83b6bfa229 Re-download layer if its local and on-disk metadata diverge (#10529)
In #10308, we noticed many warnings about the local layer having
different sizes on-disk compared to the metadata.

However, the layer downloader would never redownload layer files if the
sizes or generation numbers change. This is obviously a bug, which we
aim to fix with this PR.

This change also moves the code deciding what to do about a layer to a
dedicated function: before we handled the "routing" via control flow,
but now it's become too complicated and it is nicer to have the
different verdicts for a layer spelled out in a list/match.
2025-01-28 13:39:53 +00:00
Erik Grinaker
ed942b05f7 Revert "pageserver: revert flush backpressure" (#10402)" (#10533)
This reverts commit 9e55d79803.

We'll still need this until we can tune L0 flush backpressure and
compaction. I'll add a setting to disable this separately.
2025-01-28 13:33:58 +00:00
Vlad Lazar
62a717a2ca pageserver: use PS node id for SK appname (#10522)
## Problem

This one is fairly embarrassing. Safekeeper node id was used in the
pageserver application name
when connecting to safekeepers.

## Summary of changes

Use the right node id.

Closes https://github.com/neondatabase/neon/issues/10461
2025-01-28 13:11:51 +00:00
Peter Bendel
c8fbbb9b65 Test ingest_benchmark with different stripe size and also PostgreSQL version 17 (#10510)
We want to verify if pageserver stripe size has an impact on ingest
performance.
We want to verify if ingest performance has improved or regressed with
postgres version 17.

## Summary of changes

- Allow to create new project with different postgres versions
- allow to pre-shard new project with different stripe sizes instead of
relying on storage manager to shard_split the project once a threshold
is exceeded

Replaces https://github.com/neondatabase/neon/pull/10509

Test run https://github.com/neondatabase/neon/actions/runs/12986410381
2025-01-27 21:06:05 +00:00
John Spray
d73f4a6470 pageserver: retry wrapper on manifest upload (#10524)
## Problem

On remote storage errors (e.g. I/O timeout) uploading tenant manifest,
all of compaction could fail. This is a problem IRL because we shouldn't
abort compaction on a single IO error, and in tests because it generates
spurious failures.

Related:
https://github.com/orgs/neondatabase/projects/51/views/2?sliceBy%5Bvalue%5D=jcsp&pane=issue&itemId=93692919&issue=neondatabase%7Cneon%7C10389

## Summary of changes

- Use `backoff::retry` when uploading tenant manifest
2025-01-27 21:02:25 +00:00
Heikki Linnakangas
5477d7db93 fast_import: fixes for Postgres v17 (#10414)
Now that the tests are run on v17, they're also run in debug mode, which
is slow. Increase statement_timeout in the test to work around that.
2025-01-27 19:47:49 +00:00
Arpad Müller
eb9832d846 Remove PQ_LIB_DIR env var (#10526)
We now don't need libpq any more for the build of the storage
controller, as we use `diesel-async` since #10280. Therefore, we remove
the env var that gave cargo/rustc the location for libpq.

Follow-up of #10280
2025-01-27 19:38:18 +00:00
Christian Schwarz
3d36dfe533 fix: noisy broker subscription failed error during storage broker deploys (#10521)
During broker deploys, pageservers log this noisy WARN en masse.

I can trivially reproduce the WARN message in neon_local by SIGKILLing
broker during e.g. `pgbench -i`.

I don't understand why tonic is not detecting the error as
`Code::Unavailable`.

Until we find time to understand that / fix upstream, this PR adds the
error message to the existing list of known error messages that get
demoted to INFO level.

Refs:
-  refs https://github.com/neondatabase/neon/issues/9562
2025-01-27 19:19:55 +00:00
John Spray
ebf44210ba remote_storage: less sensitive timeout logging in ABS listings (#10518)
## Problem

We were logging a warning after a single request timeout, while listing
objects.

Closes: https://github.com/neondatabase/neon/issues/10166

## Summary of changes

- These timeouts are a pretty normal part of life, so back it off to
only log a warning after two in a row.
2025-01-27 17:44:18 +00:00
John Spray
aabf455dfb README: clarify that neon_local is a dev/test tool (#10512)
## Problem

From time to time, folks discover our `control_plane/` folder and make
the (reasonable) mistake of thinking it's a tool for running full-sized
Neon systems, whereas in reality it is a tool for dev/test.

## Summary of changes

- Change control_plane's readme title to "Local Development Control
Plane (`neon_local`)`
- Change "Running local installation" to "Running a local development
environment" in the main readme
2025-01-27 17:24:42 +00:00
John Spray
aec92bfc34 pageserver: decrease utilization MAX_SHARDS (#10489)
## Problem

The intent of this parameter is to have pageservers consider themselves
"full" if they've got lots of shards, even if they have plenty of
capacity. It works, but because we typically successfully oversubscribe
capacity up to 200%, the MAX_SHARDS limit is effectively doubled, so
this 20,000 value ends up meaning 40,000, whereas the original intent
was to limit nodes to ~10000 shards.

## Summary of changes

- Change MAX_SHARDS to 5000, so that a node with 5000 will get a 100%
utilization, which is equivalent in practice to being considered "half
full" by the storage controller in capacity terms.

This is all a bit subtle and indiret. Originally the limit was baked
into the pageserver with the idea that the pageserver knows better what
its own resources tolerate than the storage controller does, but in
practice it would be probably be easier to understand all this if we
just did it controller-side. So there's scope to refactor here in
future.
2025-01-27 17:03:32 +00:00
Arpad Müller
b0b4b7dd8f storcon: switch to diesel-async and tokio-postgres (#10280)
Switches the storcon away from using diesel's synchronous APIs in favour
of `diesel-async`.

Advantages:

* less C dependencies, especially no openssl, which might be behind the
bug: https://github.com/neondatabase/cloud/issues/21010
* Better to only have async than mix of async plus `spawn_blocking`

We had to turn off usage of the connection pool for migrations, as
diesel migrations don't support async APIs. Thus we still use
`spawn_blocking` in that one place. But this is explicitly done in one
of the `diesel-async` examples.
2025-01-27 14:25:11 +00:00
Mikhail Kot
4dd4096f11 Pgbouncer exporter in compute image (#10503)
https://github.com/neondatabase/cloud/issues/19081
Include pgbouncer_exporter in compute image and run it at port 9127
2025-01-27 14:09:21 +00:00
Erik Grinaker
be718ed121 pageserver: disable L0 flush stalls, tune delay threshold (#10507)
## Problem

In ingest benchmarks, we see L0 compaction delays of over 10 minutes due
to image compaction. We can't stall L0 flushes for that long.

## Summary of changes

Disable L0 flush stalls, and bump the default L0 flush delay threshold
from 20 to 30 L0 layers.
2025-01-25 16:51:54 +00:00
Konstantin Knizhnik
9f1408fdf3 Do not assign max(lsn) to maxLastWrittenLsn in SetLastWrittenLSNForblokv (#10474)
## Problem

See https://github.com/neondatabase/neon/issues/10281

`SetLastWrittenLSNForBlockv` is assigning max(lsn) to
`maxLastWrittenLsn` while its should contain only max LSN not present in
LwLSN cache. It case unnecessary waits in PS.

## Summary of changes

Restore status-quo for pg17.

Related Postgres PR: https://github.com/neondatabase/postgres/pull/563

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-24 14:57:32 +00:00
Conrad Ludgate
7000aaaf75 chore: fix h2 stubgen (#10491)
## Problem

## Summary of changes

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-01-24 14:55:48 +00:00
Erik Grinaker
ef2a2555b1 pageserver: tighten compaction failure detection (#10502)
## Problem

If compaction fails, we disable L0 flush stalls to avoid persistent
stalls. However, the logic would unset the failure marker on offload
failures or shutdown. This can lead to sudden L0 flush stalls if we try
and fail to offload a timeline with compaction failures, or if there is
some kind of shutdown race.

Touches #10405.

## Summary of changes

Don't touch the compaction failure marker on offload failures or
shutdown.
2025-01-24 13:55:05 +00:00
Konstantin Knizhnik
d8ab6ddb0f Check if relation has storage in calculate_relation_size (#10477)
## Problem

Parent of partitioned table has no storage, it relfilelocator is zero.
It cab be incorrectly hashed and produce wrong results.

See https://github.com/neondatabase/postgres/pull/518

## Summary of changes

This problem is already addressed in pg17.
Add the same check for all other PG versions.

Postgres PRs:
https://github.com/neondatabase/postgres/pull/566
https://github.com/neondatabase/postgres/pull/565
https://github.com/neondatabase/postgres/pull/564

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-24 12:43:52 +00:00
JC Grünhage
dcc437da1d Make promote-images-prod depend on promote-images-dev (#10494)
## Problem
After talking about it again with @bayandin again this should replace
the changes from https://github.com/neondatabase/neon/pull/10475. While
the previous changes worked, they are less visually clear in what
happens, and we might end up in a situation where we update `latest`,
but don't actually have the tagged image pushed that contains the same
changes. The latter would result in potentially hard to debug
situations.

## Summary of changes
Revert c283aaaf8d and make
promote-images-prod depend on promote-images-dev instead.
2025-01-24 11:03:39 +00:00
a-masterov
c286fea018 Print logs in extensions test in another step to improve readability (#10483)
## Problem
The containers' log output is mixed with the tests' output, so you must
scroll up to find the error.
## Summary of changes
Printing of containers' logs moved to a separate step.
2025-01-24 10:44:48 +00:00
Vlad Lazar
de8276488d tests: enable wal reader fanout in tests (#10301)
Note: this has to merge after the release is cut on `2025-01-17` for
compat tests to start passing.

## Problem

SK wal reader fan-out is not enabled in tests by default.

## Summary of changes

Enable it.
2025-01-24 10:34:57 +00:00
Erik Grinaker
ddb9ae1214 pageserver: add compaction backpressure for layer flushes (#10405)
## Problem

There is no direct backpressure for compaction and L0 read
amplification. This allows a large buildup of compaction debt and read
amplification.

Resolves #5415.
Requires #10402.

## Summary of changes

Delay layer flushes based on the number of level 0 delta layers:

* `l0_flush_delay_threshold`: delay flushes such that they take 2x as
long (default `2 * compaction_threshold`).
* `l0_flush_stall_threshold`: stall flushes until level 0 delta layers
drop below threshold (default `4 * compaction_threshold`).

If either threshold is reached, ephemeral layer rolls also synchronously
wait for layer flushes to propagate this backpressure up into WAL
ingestion. This will bound the number of frozen layers to 1 once
backpressure kicks in, since all other frozen layers must flush before
the rolled layer.

## Analysis

This will significantly change the compute backpressure characteristics.
Recall the three compute backpressure knobs:

* `max_replication_write_lag`: 500 MB (based on Pageserver
`last_received_lsn`).
* `max_replication_flush_lag`: 10 GB (based on Pageserver
`disk_consistent_lsn`).
* `max_replication_apply_lag`: disabled (based on Pageserver
`remote_consistent_lsn`).

Previously, the Pageserver would keep ingesting WAL and build up
ephemeral layers and L0 layers until the compute hit
`max_replication_flush_lag` at 10 GB and began backpressuring. Now, once
we delay/stall WAL ingestion, the compute will begin backpressuring
after `max_replication_write_lag`, i.e. 500 MB. This is probably a good
thing (we're not building up a ton of compaction debt), but we should
consider tuning these settings.

`max_replication_flush_lag` probably doesn't serve a purpose anymore,
and we should consider removing it.

Furthermore, the removal of the upload barrier in #10402 will mean that
we no longer backpressure flushes based on S3 uploads, since
`max_replication_apply_lag` is disabled. We should consider enabling
this as well.

### When and what do we compact?

Default compaction settings:

* `compaction_threshold`: 10 L0 delta layers.
* `compaction_period`: 20 seconds (between each compaction loop check).
* `checkpoint_distance`: 256 MB (size of L0 delta layers).
* `l0_flush_delay_threshold`: 20 L0 delta layers.
* `l0_flush_stall_threshold`: 40 L0 delta layers.

Compaction characteristics:

* Minimum compaction volume: 10 layers * 256 MB = 2.5 GB.
* Additional compaction volume (assuming 128 MB/s WAL): 128 MB/s * 20
seconds = 2.5 GB (10 L0 layers).
* Required compaction bandwidth: 5.0 GB / 20 seconds = 256 MB/s.

### When do we hit `max_replication_write_lag`?

Depending on how fast compaction and flushes happens, the compute will
backpressure somewhere between `l0_flush_delay_threshold` or
`l0_flush_stall_threshold` + `max_replication_write_lag`.

* Minimum compute backpressure lag: 20 layers * 256 MB + 500 MB = 5.6 GB
* Maximum compute backpressure lag: 40 layers * 256 MB + 500 MB = 10.0
GB

This seems like a reasonable range to me.
2025-01-24 09:47:28 +00:00
Erik Grinaker
9e55d79803 Reapply "pageserver: revert flush backpressure" (#10270) (#10402)
This reapplies #10135. Just removing this flush backpressure without
further mitigations caused read amp increases during bulk ingestion
(predictably), so it was reverted. We will replace it by
compaction-based backpressure.

## Problem

In #8550, we made the flush loop wait for uploads after every layer.
This was to avoid unbounded buildup of uploads, and to reduce compaction
debt. However, the approach has several problems:

* It prevents upload parallelism.
* It prevents flush and upload pipelining.
* It slows down ingestion even when there is no need to backpressure.
* It does not directly backpressure based on compaction debt and read
amplification.

We will instead implement compaction-based backpressure in a PR
immediately following this removal (#5415).

Touches #5415.
Touches #10095.

## Summary of changes

Remove waiting on the upload queue in the flush loop.
2025-01-24 08:35:35 +00:00
Alex Chi Z.
8d47a60de2 fix(pageserver): handle dup layers during gc-compaction (#10430)
## Problem

If gc-compaction decides to rewrite an image layer, it will now cause
index_part to lose reference to that layer. In details,

* Assume there's only one image layer of key 0000...AAAA at LSN 0x100
and generation 0xA in the system.
* gc-compaction kicks in at gc-horizon 0x100, and then produce
0000...AAAA at LSN 0x100 and generation 0xB.
* It submits a compaction result update into the index part that unlinks
0000-AAAA-100-A and adds 0000-AAAA-100-B

On the remote storage / local disk side, this is fine -- it unlinks
things correctly and uploads the new file. However, the
`index_part.json` itself doesn't record generations. The buggy procedure
is as follows:

1. upload the new file
2. update the index part to remove the old file and add the new file
3. remove the new file

Therefore, the correct update result process for gc-compaction should be
as follows:

* When modifying the layer map, delete the old one and upload the new
one.
* When updating the index, uploading the new one in the index without
deleting the old one.

## Summary of changes

* Modify `finish_gc_compaction` to correctly order insertions and
deletions.
* Update the way gc-compaction uploads the layer files.
* Add new tests.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-23 21:54:44 +00:00
Alexey Kondratov
6166482589 feat(compute): Automatically create release PRs (#10495)
We've finally transitioned to using a separate `release-compute` branch.
Now, we can finally automatically create release PRs on Fri and release
them during the following week.

Part of neondatabase/cloud#11698
2025-01-23 20:47:20 +00:00
Arpad Müller
ca6d72ba2a Increase reconciler timeout after shard split (#10490)
Sometimes, especially when the host running the tests is overloaded, we
can run into reconcile timeouts in
`test_timeline_ancestor_detach_idempotent_success`, making the test
flaky. By increasing the timeouts from 30 seconds to 120 seconds, we can
address the flakiness.

Fixes #10464
2025-01-23 16:43:04 +00:00
a-masterov
b6c0f66619 CI(autocomment): add the lfc state (#10121)
## Problem
Currently, the report does not contain the LFC state of the failed
tests.
## Summary of changes
Added the LFC state to the link to the allure report.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-01-23 14:52:07 +00:00
Mikhail Kot
3702ec889f Enable postgres_fdw (#10426)
Update compute image to include postgres_fdw #3720
2025-01-23 13:22:31 +00:00
Anastasia Lubennikova
8e8df1b453 Disable logical replication subscribers (#10249)
Drop logical replication subscribers 
before compute starts on a non-main branch.

Add new compute_ctl spec flag: drop_subscriptions_before_start
If it is set, drop all the subscriptions from the compute node
before it starts.

To avoid race on compute start, use new GUC
neon.disable_logical_replication_subscribers
to temporarily disable logical replication workers until we drop the
subscriptions.

Ensure that we drop subscriptions exactly once when endpoint starts on a
new branch.
It is essential, because otherwise, we may drop not only inherited, but
newly created subscriptions.

We cannot rely only on spec.drop_subscriptions_before_start flag,
because if for some reason compute restarts inside VM,
it will start again with the same spec and flag value.

To handle this, we save the fact of the operation in the database
in the neon.drop_subscriptions_done table.
If the table does not exist, we assume that the operation was never
performed, so we must do it.
If table exists, we check if the operation was performed on the current
timeline.

fixes: https://github.com/neondatabase/neon/issues/8790
2025-01-23 11:02:15 +00:00
Alex Chi Z.
92d95b08cf fix(pageserver): extend split job key range to the end (#10484)
## Problem

Not really a bug fix, but hopefully can reproduce
https://github.com/neondatabase/neon/issues/10482 more.

If the layer map does not contain layers that end at exactly the end
range of the compaction job, the current split algorithm will produce
the last job that ends at the maximum layer key. This patch extends it
all the way to the compaction job end key.

For example, the user requests a compaction of 0000...FFFF. However, we
only have a layer 0000..3000 in the layer map, and the split job will
have a range of 0000..3000 instead of 0000..FFFF.

This is not a correctness issue but it would be better to fix it so that
we can get consistent job splits.

## Summary of changes

Compaction job split will always cover the full specified key range.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-23 00:15:46 +00:00
Arpad Müller
0af40b5494 Only churn rows once in test_scrubber_physical_gc_ancestors (#10481)
## Problem

PR #10457 was supposed to fix the flakiness of
`test_scrubber_physical_gc_ancestors`, but instead it made it even more
flaky. However, the original error causes disappeared, now to be
replaced by key not found errors.

See this for a longer explanation:
https://github.com/neondatabase/neon/issues/10391#issuecomment-2608018967

## Solution

This does one churn rows after all compactions, and before we do any
timeline gc's. That way, we remain more accessible at older lsn's.
2025-01-22 19:45:12 +00:00
Arpad Müller
c60b91369a Expose safekeeper APIs for creation and deletion (#10478)
Add APIs for timeline creation and deletion to the safekeeper client
crate. Going to be used later in #10440.

Split off from #10440.

Part of https://github.com/neondatabase/neon/issues/9011
2025-01-22 18:52:16 +00:00
a-masterov
f1473dd438 Fix the connection error for extension tests (#10480)
## Problem
The trust connection to the compute required for `pg_anon` was removed.
However, the PGPASSWORD environment variable was not added to
`docker-compose.yml`.
This caused connection errors, which were interpreted as success due to
errors in the bash script.
## Summary of changes
The environment variable was added, and the logic in the bash script was
fixed.
2025-01-22 16:34:57 +00:00
JC Grünhage
c283aaaf8d Tag images from docker-hub in promote-images-prod (#10475)
## Problem

https://github.com/neondatabase/neon/actions/runs/12896686483/job/35961290336#step:5:107
showed that `promote-images-prod` was missing another dependency.

## Summary of changes
Modify `promote-images-prod` to tag based on docker-hub images, so that
`promote-images-prod` does not rely on `promote-images-dev`. The result
should be the exact same, but allows the two jobs to run in parallel.
2025-01-22 16:09:41 +00:00
Vlad Lazar
414ed82c1f pageserver: issue concurrent IO on the read path (#9353)
## Refs

- Epic: https://github.com/neondatabase/neon/issues/9378

Co-authored-by: Vlad Lazar <vlad@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>

## Problem

The read path does its IOs sequentially.
This means that if N values need to be read to reconstruct a page,
we will do N IOs and getpage latency is `O(N*IoLatency)`.

## Solution

With this PR we gain the ability to issue IO concurrently within one
layer visit **and** to move on to the next layer without waiting for IOs
from the previous visit to complete.

This is an evolved version of the work done at the Lisbon hackathon,
cf https://github.com/neondatabase/neon/pull/9002.

## Design

### `will_init` now sourced from disk btree index keys

On the algorithmic level, the only change is that the
`get_values_reconstruct_data`
now sources `will_init` from the disk btree index key (which is
PS-page_cache'd), instead
of from the `Value`, which is only available after the IO completes.

### Concurrent IOs, Submission & Completion 

To separate IO submission from waiting for its completion, while
simultaneously
feature-gating the change, we introduce the notion of an `IoConcurrency`
struct
through which IO futures are "spawned".

An IO is an opaque future, and waiting for completions is handled
through
`tokio::sync::oneshot` channels.
The oneshot Receiver's take the place of the `img` and `records` fields
inside `VectoredValueReconstructState`.

When we're done visiting all the layers and submitting all the IOs along
the way
we concurrently `collect_pending_ios` for each value, which means
for each value there is a future that awaits all the oneshot receivers
and then calls into walredo to reconstruct the page image.
Walredo is now invoked concurrently for each value instead of
sequentially.
Walredo itself remains unchanged.

The spawned IO futures are driven to completion by a sidecar tokio task
that
is separate from the task that performs all the layer visiting and
spawning of IOs.
That tasks receives the IO futures via an unbounded mpsc channel and
drives them to completion inside a `FuturedUnordered`.

(The behavior from before this PR is available through
`IoConcurrency::Sequential`,
which awaits the IO futures in place, without "spawning" or "submitting"
them
anywhere.)

#### Alternatives Explored

A few words on the rationale behind having a sidecar *task* and what
alternatives were considered.

One option is to queue up all IO futures in a FuturesUnordered that is
polled
the first time when we `collect_pending_ios`.

Firstly, the IO futures are opaque, compiler-generated futures that need
to be polled at least once to submit their IO. "At least once" because
tokio-epoll-uring may not be able to submit the IO to the kernel on
first
poll right away.

Second, there are deadlocks if we don't drive the IO futures to
completion
independently of the spawning task.
The reason is that both the IO futures and the spawning task may hold
some
_and_ try to acquire _more_ shared limited resources.
For example, both spawning task and IO future may try to acquire
* a VirtualFile file descriptor cache slot async mutex (observed during
impl)
* a tokio-epoll-uring submission slot (observed during impl)
* a PageCache slot (currently this is not the case but we may move more
code into the IO futures in the future)

Another option is to spawn a short-lived `tokio::task` for each IO
future.
We implemented and benchmarked it during development, but found little
throughput improvement and moderate mean & tail latency degradation.
Concerns about pressure on the tokio scheduler made us discard this
variant.

The sidecar task could be obsoleted if the IOs were not arbitrary code
but a well-defined struct.
However,
1. the opaque futures approach taken in this PR allows leaving the
existing
   code unchanged, which
2. allows us to implement the `IoConcurrency::Sequential` mode for
feature-gating
   the change.

Once the new mode sidecar task implementation is rolled out everywhere,
and `::Sequential` removed, we can think about a descriptive submission
& completion interface.
The problems around deadlocks pointed out earlier will need to be solved
then.
For example, we could eliminate VirtualFile file descriptor cache and
tokio-epoll-uring slots.
The latter has been drafted in
https://github.com/neondatabase/tokio-epoll-uring/pull/63.

See the lengthy doc comment on `spawn_io()` for more details.

### Error handling

There are two error classes during reconstruct data retrieval:
* traversal errors: index lookup, move to next layer, and the like
* value read IO errors

A traversal error fails the entire get_vectored request, as before this
PR.
A value read error only fails that value.

In any case, we preserve the existing behavior that once
`get_vectored` returns, all IOs are done. Panics and failing
to poll `get_vectored` to completion will leave the IOs dangling,
which is safe but shouldn't happen, and so, a rate-limited
log statement will be emitted at warning level.
There is a doc comment on `collect_pending_ios` giving more code-level
details and rationale.

### Feature Gating

The new behavior is opt-in via pageserver config.
The `Sequential` mode is the default.
The only significant change in `Sequential` mode compared to before
this PR is the buffering of results in the `oneshot`s.

## Code-Level Changes

Prep work:
  * Make `GateGuard` clonable.

Core Feature:
* Traversal code: track  `will_init` in `BlobMeta` and source it from
the Delta/Image/InMemory layer index, instead of determining `will_init`
  after we've read the value. This avoids having to read the value to
  determine whether traversal can stop.
* Introduce `IoConcurrency` & its sidecar task.
  * `IoConcurrency` is the clonable handle.
  * It connects to the sidecar task via an `mpsc`.
* Plumb through `IoConcurrency` from high level code to the
  individual layer implementations' `get_values_reconstruct_data`.
  We piggy-back on the `ValuesReconstructState` for this.
   * The sidecar task should be long-lived, so, `IoConcurrency` needs
     to be rooted up "high" in the call stack.
   * Roots as of this PR:
     * `page_service`: outside of pagestream loop
     * `create_image_layers`: when it is called
     * `basebackup`(only auxfiles + replorigin + SLRU segments)
   * Code with no roots that uses `IoConcurrency::sequential`
     * any `Timeline::get` call
       * `collect_keyspace` is a good example
       * follow-up: https://github.com/neondatabase/neon/issues/10460
* `TimelineAdaptor` code used by the compaction simulator, unused in
practive
     * `ingest_xlog_dbase_create`
* Transform Delta/Image/InMemoryLayer to
  * do their values IO in a distinct `async {}` block
  * extend the residence of the Delta/Image layer until the IO is done
  * buffer their results in a `oneshot` channel instead of straight
    in `ValuesReconstructState` 
* the `oneshot` channel is wrapped in `OnDiskValueIo` /
`OnDiskValueIoWaiter`
    types that aid in expressiveness and are used to keep track of
    in-flight IOs so we can print warnings if we leave them dangling.
* Change `ValuesReconstructState` to hold the receiving end of the
 `oneshot` channel aka `OnDiskValueIoWaiter`.
* Change `get_vectored_impl` to `collect_pending_ios` and issue walredo
concurrently, in a `FuturesUnordered`.

Testing / Benchmarking:
* Support queue-depth in pagebench for manual benchmarkinng.
* Add test suite support for setting concurrency mode ps config
   field via a) an env var and b) via NeonEnvBuilder.
* Hacky helper to have sidecar-based IoConcurrency in tests.
   This will be cleaned up later.

More benchmarking will happen post-merge in nightly benchmarks, plus in
staging/pre-prod.
Some intermediate helpers for manual benchmarking have been preserved in
https://github.com/neondatabase/neon/pull/10466 and will be landed in
later PRs.
(L0 layer stack generator!)

Drive-By:
* test suite actually didn't enable batching by default because
`config.compatibility_neon_binpath` is always Truthy in our CI
environment
  => https://neondb.slack.com/archives/C059ZC138NR/p1737490501941309
* initial logical size calculation wasn't always polled to completion,
which was
  surfaced through the added WARN logs emitted when dropping a 
  `ValuesReconstructState` that still has inflight IOs.
* remove the timing histograms
`pageserver_getpage_get_reconstruct_data_seconds`
and `pageserver_getpage_reconstruct_seconds` because with planning,
value read
IO, and walredo happening concurrently, one can no longer attribute
latency
to any one of them; we'll revisit this when Vlad's work on
tracing/sampling
  through RequestContext lands.
* remove code related to `get_cached_lsn()`.
  The logic around this has been dead at runtime for a long time,
  ever since the removal of the materialized page cache in #8105.

## Testing

Unit tests use the sidecar task by default and run both modes in CI.
Python regression tests and benchmarks also use the sidecar task by
default.
We'll test more in staging and possibly preprod.

# Future Work

Please refer to the parent epic for the full plan.

The next step will be to fold the plumbing of IoConcurrency
into RequestContext so that the function signatures get cleaned up.

Once `Sequential` isn't used anymore, we can take the next
big leap which is replacing the opaque IOs with structs
that have well-defined semantics.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2025-01-22 15:30:23 +00:00
Alexey Kondratov
881e351f69 feat(compute): Allow installing both 0.8.0 and 0.7.4 pgvector (#10345)
## Problem

Both these versions are binary compatible, but the way pgvector
structures the SQL files forbids installing 0.7.4 if you have a 0.8.0
distribution. Yet, some users may need a previous version for backward
compatibility, e.g., restoring the dump.

See this thread for discussion

https://neondb.slack.com/archives/C04DGM6SMTM/p1735911490242919?thread_ts=1731343604.259169&cid=C04DGM6SMTM

## Summary of changes

Put `vector--0.7.4.sql` file into compute image to allow installing this
version as well.

Tested on staging and it seems to be working as expected:
```sql
select * from pg_available_extensions where name = 'vector';
  name  | default_version | installed_version |                       comment                        
--------+-----------------+-------------------+------------------------------------------------------
 vector | 0.8.0           | (null)            | vector data type and ivfflat and hnsw access methods

create extension vector version '0.7.4';

select * from pg_available_extensions where name = 'vector';
  name  | default_version | installed_version |                       comment                        
--------+-----------------+-------------------+------------------------------------------------------
 vector | 0.8.0           | 0.7.4             | vector data type and ivfflat and hnsw access methods

alter extension vector update;

select * from pg_available_extensions where name = 'vector';
  name  | default_version | installed_version |                       comment                        
--------+-----------------+-------------------+------------------------------------------------------
 vector | 0.8.0           | 0.8.0             | vector data type and ivfflat and hnsw access methods

drop extension vector;
create extension vector;

select * from pg_available_extensions where name = 'vector';
  name  | default_version | installed_version |                       comment                        
--------+-----------------+-------------------+------------------------------------------------------
 vector | 0.8.0           | 0.8.0             | vector data type and ivfflat and hnsw access methods
```

If we find out it's a good approach, we can adopt the same for other
extensions with a stable ABI -- support both `current` and `current - 1`
releases.
2025-01-22 12:38:23 +00:00
Christian Schwarz
b31ce14083 initial logical size calculation: always poll to completion (#10471)
# Refs

- extracted from https://github.com/neondatabase/neon/pull/9353

# Problem

Before this PR, when task_mgr shutdown is signalled, e.g. during
pageserver shutdown or Tenant shutdown, initial logical size calculation
stops polling and drops the future that represents the calculation.

This is against the current policy that we poll all futures to
completion.

This became apparent during development of concurrent IO which warns if
we drop a `Timeline::get_vectored` future that still has in-flight IOs.

We may revise the policy in the future, but, right now initial logical
size calculation is the only part of the codebase that doesn't adhere to
the policy, so let's fix it.

## Code Changes

- make sensitive exclusively to `Timeline::cancel`
- This should be sufficient for all cases of shutdowns; the sensitivity
to task_mgr shutdown is unnecessary.
- this broke the various cancel tests in `test_timeline_size.py`, e.g.,
`test_timeline_initial_logical_size_calculation_cancellation`
- the tests would time out because the await point was not sensitive to
cancellation
- to fix this, refactor `pausable_failpoint` so that it accepts a
cancellation token
- side note: we _really_ should write our own failpoint library; maybe
after we get heap-allocated RequestContext, we can plumb failpoints
through there.
2025-01-22 12:28:26 +00:00
Christian Schwarz
b4d87b9dfe fix(tests): actually enable pipelinig by default in the test suite (#10472)
## Problem

PR #9993 was supposed to enable `page_service_pipelining` by default for
all `NeonEnv`s, but this was ineffective in our CI environment.

Thus, CI Python-based tests and benchmarks, unless explicitly
configuring pipelining, were still using serial protocol handling.

## Analysis

The root cause was that in our CI environment,
`config.compatibility_neon_binpath` is always Truthy.
It's not in local environments, which is why this slipped through in
local testing.

Lesson: always add a log line ot pageserver startup and spot-check tests
to ensure the intended default is picked up.

## Summary of changes

Fix it. Since enough time has passed, the compatiblity snapshot contains
a recent enough software version so we don't need to worry about
`compatibility_neon_binpath` anymore.

## Future Work

The question how to add a new default except for compatibliity tests,
which is what the broken code was supposed to do, is still unsolved.

Slack discussion:
https://neondb.slack.com/archives/C059ZC138NR/p1737490501941309
2025-01-22 10:10:43 +00:00
Conrad Ludgate
2b49d6ee05 feat: adjust the tonic features to remove axum dependency (#10348)
To help facilitate an upgrade to axum 0.8
(https://github.com/neondatabase/neon/pull/10332#pullrequestreview-2541989619)
this massages the tonic dependency features so that tonic does not
depend on axum.
2025-01-22 09:15:52 +00:00
Erik Grinaker
14e1f89053 pageserver: eagerly notify flush waiters (#10469)
## Problem

Currently, the layer flush loop will continue flushing layers as long as
any are pending, and only notify waiters once there are no further
layers to flush. This can cause waiters to wait longer than necessary,
and potentially starve them if pending layers keep arriving faster than
they can be flushed. The impact of this will increase when we add
compaction backpressure and propagate it up into the WAL receiver.

Extracted from #10405.

## Summary of changes

Break out of the layer flush loop once we've flushed up to the requested
LSN. If further flush requests have arrived in the meanwhile, flushing
will resume immediately after.
2025-01-21 22:01:27 +00:00
Erik Grinaker
8a8c656c06 pageserver: add LayerMap::watch_layer0_deltas() (#10470)
## Problem

For compaction backpressure, we need a mechanism to signal when
compaction has reduced the L0 delta layer count below the backpressure
threshold.

Extracted from #10405.

## Summary of changes

Add `LayerMap::watch_level0_deltas()` which returns a
`tokio::sync::Receiver` signalling the current L0 delta layer
count.
2025-01-21 21:18:09 +00:00
Erik Grinaker
a75e11cc00 pageserver: return duration from StorageTimeMetricsTimer (#10468)
## Problem

It's sometimes useful to obtain the elapsed duration from a
`StorageTimeMetricsTimer` for purposes beyond just recording it in
metrics (e.g. to log it).

Extracted from #10405.

## Summary of changes

Add `StorageTimeMetricsTimer.elapsed()` and return the duration from
`stop_and_record()`.
2025-01-21 20:56:34 +00:00
Alex Chi Z.
7d4bfcdc47 feat(pageserver): add config items for gc-compaction auto trigger (#10455)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

The automatic trigger is already implemented at
https://github.com/neondatabase/neon/pull/10221 but I need to write some
tests and finish my experiments in staging before I can merge it with
confidence. Given that I have some other patches that will modify the
config items, I'd like to get the config items merged first to reduce
conflicts.

## Summary of changes

* add `l2_lsn` to index_part.json -- below that LSN, data have been
processed by gc-compaction
* add a set of gc-compaction auto trigger control items into the config

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-21 19:29:38 +00:00
a-masterov
737888e5c9 Remove the tests for pg_anon (#10382)
## Problem
We are removing the `pg_anon` v1 extension from Neon. So we don't need
to test it anymore and can remove the code for simplicity.
## Summary of changes
The code required for testing `pg_anon` is removed.
2025-01-21 19:17:14 +00:00
Gleb Novikov
19bf7b78a0 fast import: basic python test (#10271)
We did not have any tests on fast_import binary yet.

In this PR I have introduced:
- `FastImport` class and tools for testing in python
- basic test that runs fast import against vanilla postgres and checks
that data is there

Should be merged after https://github.com/neondatabase/neon/pull/10251
2025-01-21 16:50:44 +00:00
Arpad Müller
7e4a39ea53 Fix two flakiness sources in test_scrubber_physical_gc_ancestors (#10457)
We currently have some flakiness in
`test_scrubber_physical_gc_ancestors`, see #10391.

The first flakiness kind is about the reconciler not actually becoming
idle within the timeout of 30 seconds. We see continuous forward
progress so this is likely not a hang. We also see this happen in
parallel to a test failure, so is likely due to runners being
overloaded. Therefore, we increase the timeout.

The second flakiness kind is an assertion failure. This one is a little
bit more tricky, but we saw in the successful run that there was some
advance of the lsn between the compaction ran (which created layer
files) and the gc run. Apparently gc rejects reductions to the single
image layer setting if the cutoff lsn is the same as the lsn of the
image layer: it will claim that that layer is newer than the space
cutoff and therefore skip it, while thinking the old layer (that we want
to delete) is the latest one (so it's not deleted).

We address the second flakiness kind by inserting a tiny amount of WAL
between the compaction and gc. This should hopefully fix things.

Related issue: #10391

(not closing it with the merger of the PR as we'll need to validate that
these changes had the intended effect).

Thanks to Chi for going over this together with me in a call.
2025-01-21 15:40:04 +00:00
JC Grünhage
624a507544 Create Github releases with empty body for now (#10448)
## Problem
When releasing `release-7574`, the Github Release creation failed with
"body is too long" (see
https://github.com/neondatabase/neon/actions/runs/12834025431/job/35792346745#step:5:77).
There's lots of room for improvement of the release notes, but for now
we'll disable them instead.

## Summary of changes
- Disable automatic generation of release notes for Github releases
- Enable creation of Github releases for proxy/compute
2025-01-21 12:45:21 +00:00
Arpad Müller
2ab9f69825 Simplify pageserver_physical_gc function (#10104)
This simplifies the code in `pageserver_physical_gc` a little bit after
the feedback in #10007 that the code is too complicated.

Most importantly, we don't pass around `GcSummary` any more in a
complicated fashion, and we save on async stream-combinator-inception in
one place in favour of `try_stream!{}`.

Follow-up of #10007
2025-01-20 21:57:15 +00:00
Alex Chi Z.
2de2b26c62 feat(pageserver): add reldir migration configs (#10439)
## Problem

Part of #9516 per RFC at https://github.com/neondatabase/neon/pull/10412

## Summary of changes

Adding the necessary config items and index_part items for the large
relation count work.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-20 20:44:12 +00:00
Matthias van de Meent
e781cf6dd8 Compute/LFC: Apply limits consistently (#10449)
Otherwise we might hit ERRORs in otherwise safe situations (such as user
queries), which isn't a great user experience.

## Problem

https://github.com/neondatabase/neon/pull/10376

## Summary of changes

Instead of accepting internal errors as acceptable, we ensure we don't
exceed our allocated usage.
2025-01-20 18:29:21 +00:00
Christian Schwarz
72130d7d6c fix(page_service / handle): panic when parallel client disconnect & Timeline shutdown (#10445)
## Refs
- fixes https://github.com/neondatabase/neon/issues/10444

## Problem

We're seeing a panic `handles are only shut down once in their lifetime`
in our performance testbed.

## Hypothesis

Annotated code in
https://github.com/neondatabase/neon/issues/10444#issuecomment-2602286415.

```
T1: drop Cache, executes up to (1)
=> HandleInner is now in state ShutDown
T2: Timeline::shutdown => PerTimelineState::shutdown  executes shutdown() again => panics
```

Likely this snuck in the final touches of #10386 where I narrowed down
the locking rules.

## Summary of changes

Make duplicate shutdowns a no-op.
2025-01-20 17:51:30 +00:00
John Spray
2657b7ec75 rfcs: add sharded ingest RFC (#8754)
## Summary

Whereas currently we send all WAL to all pageserver shards, and each
shard filters out the data that it needs,
in this RFC we add a mechanism to filter the WAL on the safekeeper, so
that each shard receives
only the data it needs.

This will place some extra CPU load on the safekeepers, in exchange for
reducing the network bandwidth
for ingesting WAL back to scaling as O(1) with shard count, rather than
O(N_shards).

Touches #9329.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Vlad Lazar <vlalazar.vlad@gmail.com>
Co-authored-by: Vlad Lazar <vlad@neon.tech>
2025-01-20 17:33:07 +00:00
Christian Schwarz
02fc58b878 impr(timeline handles): add more tests covering reference cyle (#10446)
The other test focus on the external interface usage while the tests
added in this PR add some testing around HandleInner's lifecycle,
ensuring we don't leak it once either connection gets dropped or
per-timeline-state is shut down explicitly.
2025-01-20 14:37:24 +00:00
Arpad Müller
b312a3c320 Move DeleteTimelineFlow::prepare to separate function and use enum (#10334)
It was requested by review in #10305 to use an enum or something like it
for distinguishing the different modes instead of two parameters,
because two flags allow four combinations, and two of them don't really
make sense/ aren't used.

follow-up of #10305
2025-01-20 12:50:44 +00:00
John Spray
7d761a9d22 storage controller: make chaos less disruptive to AZ locality (#10438)
## Problem

Since #9916 , the chaos code is actively fighting the optimizer: tenants
tend to be attached in their preferred AZ, so most chaos migrations were
moving them to a non-preferred AZ.

## Summary of changes

- When picking migrations, prefer to migrate things _toward_ their
preferred AZ when possible. Then pick shards to move the other way when
necessary.

The resulting behavior should be an alternating "back and forth" where
the chaos code migrates thiings away from home, and then migrates them
back on the next iteration.

The side effect will be that the chaos code actively helps to push
things into their home AZ. That's not contrary to its purpose though: we
mainly just want it to continuously migrate things to exercise
migration+notification code.
2025-01-20 09:47:23 +00:00
John Spray
8bdaee35f3 pageserver: safety checks on validity of uploaded indices (#10403)
## Problem

Occasionally, we encounter bugs in test environments that can be
detected at the point of uploading an index, but we proceed to upload it
anyway and leave a tenant in a broken state that's awkward to handle.

## Summary of changes

- Validate index when submitting it for upload, so that we can see the
issue quickly e.g. in an API invoking compaction
- Validate index before executing the upload, so that we have a hard
enforcement that any code path that tries to upload an index will not
overwrite a valid index with an invalid one.
2025-01-20 09:20:31 +00:00
Arpad Müller
b0f34099f9 Add safekeeper utilization endpoint (#10429)
Add an endpoint to obtain the utilization of a safekeeper. Future
changes to the storage controller can use this endpoint to find the most
suitable safekeepers for newly created timelines, analogously to how
it's done for pageservers already.

Initially we just want to assign by timeline count, then we can iterate
from there.

Part of https://github.com/neondatabase/neon/issues/9011
2025-01-17 21:43:52 +00:00
Vlad Lazar
6975228a76 pageserver: add initdb metrics (#10434)
## Problem

Initdb observability is poor.

## Summary of changes

Add some metrics so we can figure out which part, if any, is slow.

Closes https://github.com/neondatabase/neon/issues/10423
2025-01-17 14:51:33 +00:00
JC Grünhage
053abff71f Fix dependency on neon-image in promote-images-dev (#10437)
## Problem
871e8b325f failed CI on main because a job
ran to soon. This was caused by
ea84ec357f. While `promote-images-dev`
does not inherently need `neon-image`, a few jobs depending on
`promote-images-dev` do need it, and previously had it when it was
`promote-images`, which depended on `test-images`, which in turn
depended on `neon-image`.

## Summary of changes
To ensure jobs depending `docker.io/neondatabase/neon` images get them,
`promote-images-dev` gets the dependency to `neon-image` back which it
previously had transitively through `test-images`.
2025-01-17 14:21:30 +00:00
Tristan Partin
871e8b325f Use the request ID given by the control plane in compute_ctl (#10418)
Instead of generating our own request ID, we can just use the one
provided by the control plane. In the event, we get a request from a
client which doesn't set X-Request-ID, then we just generate one which
is useful for tracing purposes.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-16 20:46:53 +00:00
Christian Schwarz
c47c5f4ace fix(page_service pipelining): tenant cannot shut down because gate kept open while flushing responses (#10386)
# Refs

- fixes https://github.com/neondatabase/neon/issues/10309
- fixup of batching design, first introduced in
https://github.com/neondatabase/neon/pull/9851
- refinement of https://github.com/neondatabase/neon/pull/8339

# Problem

`Tenant::shutdown` was occasionally taking many minutes (sometimes up to
20) in staging and prod if the
`page_service_pipelining.mode="concurrent-futures"` is enabled.

# Symptoms

The issue happens during shard migration between pageservers.
There is page_service unavailability and hence effectively downtime for
customers in the following case:
1. The source (state `AttachedStale`) gets stuck in `Tenant::shutdown`,
waiting for the gate to close.
2. Cplane/Storcon decides to transition the target `AttachedMulti` to
`AttachedSingle`.
3. That transition comes with a bump of the generation number, causing
the `PUT .../location_config` endpoint to do a full `Tenant::shutdown` /
`Tenant::attach` cycle for the target location.
4. That `Tenant::shutdown` on the target gets stuck, waiting for the
gate to close.
5. Eventually the gate closes (`close completed`), correlating with a
`page_service` connection handler logging that it's exiting because of a
network error (`Connection reset by peer` or `Broken pipe`).

While in (4):
- `Tenant::shutdown` is stuck waiting for all `Timeline::shutdown` calls
to complete.
  So, really, this is a `Timeline::shutdown` bug.
- retries from Cplane/Storcon to complete above state transitions, fail
with errors related to the tenant mgr slot being in state
`TenantSlot::InProgress`, the tenant state being
`TenantState::Stopping`, and the timelines being in
`TimelineState::Stopping`, and the `Timeline::cancel` being cancelled.
- Existing (and/or new?) page_service connections log errors `error
reading relation or page version: Not found: Timed out waiting 30s for
tenant active state. Latest state: None`

# Root-Cause

After a lengthy investigation ([internal
write-up](https://www.notion.so/neondatabase/2025-01-09-batching-deadlock-Slow-Log-Analysis-in-Staging-176f189e00478050bc21c1a072157ca4?pvs=4))
I arrived at the following root cause.

The `spsc_fold` channel (`batch_tx`/`batch_rx`) that connects the
Batcher and Executor stages of the pipelined mode was storing a `Handle`
and thus `GateGuard` of the Timeline that was not shutting down.
The design assumption with pipelining was that this would always be a
short transient state.
However, that was incorrect: the Executor was stuck on writing/flushing
an earlier response into the connection to the client, i.e., socket
write being slow because of TCP backpressure.

The probable scenario of how we end up in that case:
1. Compute backend process sends a continuous stream of getpage prefetch
requests into the connection, but never reads the responses (why this
happens: see Appendix section).
2. Batch N is processed by Batcher and Executor, up to the point where
Executor starts flushing the response.
3. Batch N+1 is procssed by Batcher and queued in the `spsc_fold`.
4. Executor is still waiting for batch N flush to finish.
5. Batcher eventually hits the `TimeoutReader` error (10min).
From here on it waits on the
`spsc_fold.send(Err(QueryError(TimeoutReader_error)))`
which will never finish because the batch already inside the `spsc_fold`
is not
being read by the Executor, because the Executor is still stuck in the
flush.
   (This state is not observable at our default `info` log level)
6. Eventually, Compute backend process is killed (`close()` on the
socket) or Compute as a whole gets killed (probably no clean TCP
shutdown happening in that case).
7. Eventually, Pageserver TCP stack learns about (6) through RST packets
and the Executor's flush() call fails with an error.
8. The Executor exits, dropping `cancel_batcher` and its end of the
spsc_fold.
   This wakes Batcher, causing the `spsc_fold.send` to fail.
   Batcher exits.
   The pipeline shuts down as intended.
We return from `process_query` and log the `Connection reset by peer` or
`Broken pipe` error.

The following diagram visualizes the wait-for graph at (5)

```mermaid
flowchart TD
   Batcher --spsc_fold.send(TimeoutReader_error)--> Executor
   Executor --flush batch N responses--> socket.write_end
   socket.write_end --wait for TCP window to move forward--> Compute
```

# Analysis

By holding the GateGuard inside the `spsc_fold` open, the pipelining
implementation
violated the principle established in
(https://github.com/neondatabase/neon/pull/8339).
That is, that `Handle`s must only be held across an await point if that
await point
is sensitive to the `<Handle as Deref<Target=Timeline>>::cancel` token.

In this case, we were holding the Handle inside the `spsc_fold` while
awaiting the
`pgb_writer.flush()` future.

One may jump to the conclusion that we should simply peek into the
spsc_fold to get
that Timeline cancel token and be sensitive to it during flush, then.

But that violates another principle of the design from
https://github.com/neondatabase/neon/pull/8339.
That is, that the page_service connection lifecycle and the Timeline
lifecycles must be completely decoupled.
Tt must be possible to shut down one shard without shutting down the
page_service connection, because on that single connection we might be
serving other shards attached to this pageserver.
(The current compute client opens separate connections per shard, but,
there are plans to change that.)

# Solution

This PR adds a `handle::WeakHandle` struct that does _not_ hold the
timeline gate open.
It must be `upgrade()`d to get a `handle::Handle`.
That `handle::Handle` _does_ hold the timeline gate open.

The batch queued inside the `spsc_fold` only holds a `WeakHandle`.
We only upgrade it while calling into the various `handle_` methods,
i.e., while interacting with the `Timeline` via `<Handle as
Deref<Target=Timeline>>`.
All that code has always been required to be (and is!) sensitive to
`Timeline::cancel`, and therefore we're guaranteed to bail from it
quickly when `Timeline::shutdown` starts.
We will drop the `Handle` immediately, before we start
`pgb_writer.flush()`ing the responses.
Thereby letting go of our hold on the `GateGuard`, allowing the timeline
shutdown to complete while the page_service handler remains intact.

# Code Changes

* Reproducer & Regression Test
* Developed and proven to reproduce the issue in
https://github.com/neondatabase/neon/pull/10399
* Add a `Test` message to the pagestream protocol (`cfg(feature =
"testing")`).
* Drive-by minimal improvement to the parsing code, we now have a
`PagestreamFeMessageTag`.
* Refactor `pageserver/client` to allow sending and receiving
`page_service` requests independently.
  * Add a Rust helper binary to produce situation (4) from above
* Rationale: (4) and (5) are the same bug class, we're holding a gate
open while `flush()`ing.
* Add a Python regression test that uses the helper binary to
demonstrate the problem.
* Fix
   * Introduce and use `WeakHandle` as explained earlier.
* Replace the `shut_down` atomic with two enum states for `HandleInner`,
wrapped in a `Mutex`.
* To make `WeakHandle::upgrade()` and `Handle::downgrade()`
cache-efficient:
     * Wrap the `Types::Timeline` in an `Arc`
     * Wrap the `GateGuard` in an `Arc`
* The separate `Arc`s enable uncontended cloning of the timeline
reference in `upgrade()` and `downgrade()`.
If instead we were `Arc<Timeline>::clone`, different connection handlers
would be hitting the same cache line on every upgrade()/downgrade(),
causing contention.
* Please read the udpated module-level comment in `mod handle`
module-level comment for details.

# Testing & Performance

The reproducer test that failed before the changes now passes, and
obviously other tests are passing as well.

We'll do more testing in staging, where the issue happens every ~4h if
chaos migrations are enabled in storcon.

Existing perf testing will be sufficient, no perf degradation is
expected.
It's a few more alloctations due to the added Arc's, but, they're low
frequency.

# Appendix: Why Compute Sometimes Doesn't Read Responses

Remember, the whole problem surfaced because flush() was slow because
Compute was not reading responses. Why is that?

In short, the way the compute works, it only advances the page_service
protocol processing when it has an interest in data, i.e., when the
pagestore smgr is called to return pages.

Thus, if compute issues a bunch of requests as part of prefetch but then
it turns out it can service the query without reading those pages, it
may very well happen that these messages stay in the TCP until the next
smgr read happens, either in that session, or possibly in another
session.

If there’s too many unread responses in the TCP, the pageserver kernel
is going to backpressure into userspace, resulting in our stuck flush().

All of this stems from the way vanilla Postgres does prefetching and
"async IO":
it issues `fadvise()` to make the kernel do the IO in the background,
buffering results in the kernel page cache.
It then consumes the results through synchronous `read()` system calls,
which hopefully will be fast because of the `fadvise()`.

If it turns out that some / all of the prefetch results are not needed,
Postgres will not be issuing those `read()` system calls.
The kernel will eventually react to that by reusing page cache pages
that hold completed prefetched data.
Uncompleted prefetch requests may or may not be processed -- it's up to
the kernel.

In Neon, the smgr + Pageserver together take on the role of the kernel
in above paragraphs.
In the current implementation, all prefetches are sent as GetPage
requests to Pageserver.
The responses are only processed in the places where vanilla Postgres
would do the synchronous `read()` system call.
If we never get to that, the responses are queued inside the TCP
connection, which, once buffers run full, will backpressure into
Pageserver's sending code, i.e., the `pgb_writer.flush()` that was the
root cause of the problems we're fixing in this PR.
2025-01-16 20:34:02 +00:00
Tristan Partin
b0838a68e5 Enable pgx_ulid on Postgres 17 (#10397)
The extension now supports Postgres 17. The release also seems to be
binary compatible with the previous version.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-16 19:49:04 +00:00
John Spray
8f2ebc0684 tests: stabilize test_storage_controller_node_deletion (#10420)
## Problem

`test_storage_controller_node_deletion` sometimes failed because shards
were moving around during timeline creation, and neon_local isn't
tolerant of that. The movements were unexpected because the shards had
only just been created.

This was a regression from #9916

Closes: #10383 

## Summary of changes

- Make this test use multiple AZs -- this makes the storage controller's
scheduling reliably stable

Why this works: in #9916 , I made a simplifying assumption that we would
have multiple AZs to get nice stable scheduling -- it's much easier,
because each tenant has a well defined primary+secondary location when
they have an AZ preference and nodes have different AZs. Everything
still works if you don't have multiple AZs, but you just have this quirk
that sometimes the optimizer can disagree with initial scheduling, so
once in a while a shard moves after being created -- annoying for tests,
harmless IRL.
2025-01-16 19:00:16 +00:00
Vlad Lazar
3a285a046b pageserver: include node id when subscribing to SK (#10432)
## Problem

All pageserver have the same application name which makes it hard to
distinguish them.

## Summary of changes

Include the node id in the application name sent to the safekeeper. This
should gives us
more visibility in logs. There's a few metrics that will increase in
cardinality by `pageserver_count`,
but that's fine.
2025-01-16 18:51:56 +00:00
John Spray
da13154791 storcon: revise fill logic to prioritize AZ (#10411)
## Problem

Node fills were limited to moving (total shards / node_count) shards. In
systems that aren't perfectly balanced already, that leads us to skip
migrating some of the shards that belong on this node, generating work
for the optimizer later to gradually move them back.

## Summary of changes

- Where a shard has a preferred AZ and is currently attached outside
this AZ, then always promote it during fill, irrespective of target fill
count
2025-01-16 17:33:46 +00:00
John Spray
2e13a3aa7a storage controller: handle legacy TenantConf in consistency_check (#10422)
## Problem

We were comparing serialized configs from the database with serialized
configs from memory. If fields have been added/removed to TenantConfig,
this generates spurious consistency errors. This is fine in test
environments, but limits the usefulness of this debug API in the field.

Closes: https://github.com/neondatabase/neon/issues/10369

## Summary of changes

- Do a decode/encode cycle on the config before comparing it, so that it
will have exactly the expected fields.
2025-01-16 16:56:44 +00:00
Alex Chi Z.
cccc196848 refactor(pageserver): make partitioning an ArcSwap (#10377)
## Problem

gc-compaction needs the partitioning data to decide the job split. This
refactor allows concurrent access/computing the partitioning.

## Summary of changes

Make `partitioning` an ArcSwap so that others can access the
partitioning while we compute it. Fully eliminate the `repartition is
called concurrently` warning when gc-compaction is going on.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-16 15:33:37 +00:00
Arpad Müller
e436dcad57 Rename "disabled" safekeeper scheduling policy to "pause" (#10410)
Rename the safekeeper scheduling policy "disabled" to "pause".

A rename was requested in
https://github.com/neondatabase/neon/pull/10400#discussion_r1916259124,
as the "disabled" policy is meant to be analogous to the "pause" policy
for pageservers.

Also simplify the `SkSchedulingPolicyArg::from_str` function, relying on
the `from_str` implementation of `SkSchedulingPolicy`. Latter is used
for the database format as well, so it is quite stable. If we ever want
to change the UI, we'll need to duplicate the function again but this is
cheap.
2025-01-16 14:30:49 +00:00
John Spray
21d7b6a258 tests: refactor test_tenant_delete_races_timeline_creation (#10425)
## Problem

Threads spawned in `test_tenant_delete_races_timeline_creation` are not
joined before the test ends, and can generate
`PytestUnhandledThreadExceptionWarning` in other tests.


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10419/12805365523/index.html#/testresult/53a72568acd04dbd

## Summary of changes

- Wrap threads in ThreadPoolExecutor which will join them before the
test ends
- Remove a spurious deletion call -- the background thread doing
deletion ought to succeed.
2025-01-16 14:11:33 +00:00
JC Grünhage
86dbc44db1 CI: Run check-codestyle-rust as part of pre-merge-checks (#10387)
## Problem

When multiple changes are grouped in a merge group to be merged as part
of the merge queue, the changes might individually pass
`check-codestyle-rust` but not in their combined form.

## Summary of changes

- Move `check-codestyle-rust` into a reusable workflow that is called
from it's previous location in `build_and_test.yml`, and additionally
call it from `pre_merge_checks.yml`. The additional call does not run on
ARM, only x86, to ensure the merge queue continues being responsive.
- Trigger `pre_merge_checks.yml` on PRs that change any of the workflows
running in `pre_merge_checks.yml`, so that we get feedback on those
early an not only after trying to merge those changes.
2025-01-16 09:20:24 +00:00
Tristan Partin
58f6af6c9a Clean up compute_ctl extension server code (#10417) 2025-01-16 08:35:36 +00:00
Matthias van de Meent
7be971081a Make sure we request pages with a known-flushed LSN. (#10413)
This should fix the largest source of flakyness of
test_nbtree_pagesplit_cycleid.

## Problem

https://github.com/neondatabase/neon/issues/10390

## Summary of changes

By using a guaranteed-flushed LSN, we ensure that PS won't have to wait
forever.

(If it does wait forever, we know the issue can't be with Compute's WAL)
2025-01-16 08:34:11 +00:00
Ivan Efremov
c91905e643 Merge pull request #10416 from neondatabase/rc/release-proxy/2025-01-16
Proxy release 2025-01-16
2025-01-16 10:04:38 +02:00
Arseny Sher
6fe4c6798f Add START_WAL_PUSH proto_version and allow_timeline_creation options. (#10406)
## Problem

As part of https://github.com/neondatabase/neon/issues/8614 we need to
pass options to START_WAL_PUSH.

## Summary of changes

Add two options. `allow_timeline_creation`, default true, disables
implicit timeline creation in the connection from compute. Eventually
such creation will be forbidden completely, but as we migrate to
configurations we need to support both: current mode and configurations
enabled where creation by compute is disabled.

`proto_version` specifies compute <-> sk protocol version. We have it
currently in the first greeting package also, but I plan to change tag
size from u64 to u8, which would make it hard to use. Command is more
appropriate place for it anyway.
2025-01-16 08:01:19 +00:00
github-actions[bot]
44b4e355a2 Proxy release 2025-01-16 2025-01-16 06:02:04 +00:00
Matthias van de Meent
2eda484ef6 prefetch: Read more frequently from TCP buffer (#10394)
This reduces pressure on the OS TCP read buffer by increasing the
moments we read data out of the receive buffer, and increasing the
number of bytes we can pull from that buffer when we do reads.

## Problem

A backend may not always consume its prefetch data quick enough

## Summary of changes

We add a new function `prefetch_pump_state` which pulls as many prefetch
requests from the OS TCP receive buffer as possible, but without
blocking.

This thus reduces pressure on OS-level TCP buffers, thus increasing
throughput by limiting throttling caused by full TCP buffers.
2025-01-16 02:43:47 +00:00
Mikhail Kot
c7429af8a0 Enable dblink (#10358)
Update compute image to include dblink #3720
2025-01-15 22:29:18 +00:00
Alex Chi Z.
a753349cb0 feat(pageserver): validate data integrity during gc-compaction (#10131)
## Problem

part of https://github.com/neondatabase/neon/issues/9114
part of investigation of
https://github.com/neondatabase/neon/issues/10049

## Summary of changes

* If `cfg!(test) or cfg!(feature = testing)`, then we will always try
generating an image to ensure the history is replayable, but not put the
image layer into the final layer results, therefore discovering wrong
key history before we hit a read error.
* I suspect it's easier to trigger some races if gc-compaction is
continuously run on a timeline, so I increased the frequency to twice
per 10 churns.
* Also, create branches in gc-compaction smoke tests to get more test
coverage.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad@neon.tech>
2025-01-15 22:04:06 +00:00
Gleb Novikov
55a68b28a2 fast import: restore to neondb (not postgres) database (#10251)
## Problem

`postgres` is system database at neon, so we need to do `pg_restore`
into `neondb` instead

https://github.com/neondatabase/cloud/issues/22100

## Summary of changes

Changed fast_import a little bit:
1. After succesfull connection creating `neondb` in postgres instance
2. Changed restore connstring to use new db
3. Added optional `source_connection_string`, which allows to skip
`s3_prefix` and just connect directly.
4. Added `-i` that stops process until sigterm 

## TODO
- [x] test image in cplane e2e
- [ ] Change import job image back to latest after this merged (partial
revert of https://github.com/neondatabase/cloud/pull/22338)
2025-01-15 20:51:09 +00:00
John Spray
fb0e2acb2f pageserver: add page_trace API for debugging (#10293)
## Problem

When a pageserver is receiving high rates of requests, we don't have a
good way to efficiently discover what the client's access pattern is.

Closes: https://github.com/neondatabase/neon/issues/10275

## Summary of changes

- Add
`/v1/tenant/x/timeline/y/page_trace?size_limit_bytes=...&time_limit_secs=...`
API, which returns a binary buffer.
- Add `pagectl page-trace` tool to decode and analyze the output.

---------

Co-authored-by: Erik Grinaker <erik@neon.tech>
2025-01-15 19:07:22 +00:00
Arpad Müller
efaec6cdf8 Add endpoint and storcon cli cmd to set sk scheduling policy (#10400)
Implementing the last missing endpoint of #9981, this adds support to
set the scheduling policy of an individual safekeeper, as specified in
the RFC. However, unlike in the RFC we call the endpoint
`scheduling_policy` not `status`

Closes #9981.

As for why not use the upsert endpoint for this: we want to have the
safekeeper upsert endpoint be used for testing and for deploying new
safekeepers, but not for changes of the scheduling policy. We don't want
to change any of the other fields when marking a safekeeper as
decommissioned for example, so we'd have to first fetch them only to
then specify them again. Of course one can also design an endpoint where
one can omit any field and it doesn't get modified, but it's still not
great for observability to put everything into one big "change something
about this safekeeper" endpoint.
2025-01-15 18:15:30 +00:00
Tristan Partin
3d41069dc4 Update pgrx in extension builds to 0.12.9 (#10372)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-15 16:26:58 +00:00
Vlad Lazar
dbebede7bf safekeeper: fan out from single wal reader to multiple shards (#10190)
## Problem

Safekeepers currently decode and interpret WAL for each shard
separately.
This is wasteful in terms of CPU memory usage - we've seen this in
profiles.

## Summary of changes

Fan-out interpreted WAL to multiple shards. 
The basic is that wal decoding and interpretation happens in a separate
tokio task and senders
attach to it. Senders only receive batches concerning their shard and
only past the Lsn they've last seen.

Fan-out is gated behind the `wal_reader_fanout` safekeeper flag
(disabled by default for now).

When fan-out is enabled, it might be desirable to control the absolute
delta between the
current position and a new shard's desired position (i.e. how far behind
or ahead a shard may be).
`max_delta_for_fanout` is a new optional safekeeper flag which dictates
whether to create a new
WAL reader or attach to the existing one. By default, this behaviour is
disabled. Let's consider enabling
it if we spot the need for it in the field.

## Testing

Tests passed [here](https://github.com/neondatabase/neon/pull/10301)
with wal reader fanout enabled
as of
34f6a71718.

Related: https://github.com/neondatabase/neon/issues/9337
Epic: https://github.com/neondatabase/neon/issues/9329
2025-01-15 15:33:54 +00:00
Tristan Partin
3e529f124f Remove leading slashes when downloading remote files (#10396)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-15 15:29:52 +00:00
Arseny Sher
05a71c7d6a safekeeper: add membership configuration switch endpoint (#10241)
## Problem

https://github.com/neondatabase/neon/issues/9965

## Summary of changes

Add to safekeeper http endpoint to switch membership configuration. Also
add it to python client for tests, and add simple test itself.
2025-01-15 14:16:04 +00:00
Alexander Bayandin
b9464865b6 benchmarks: report successful runs to slack as well (#10393)
## Problem

Successful `benchmarks` runs doesn't have enough visibility

Ref https://neondb.slack.com/archives/C069Z2199DL/p1736868055094539

## Summary of changes
- Report both successful and failed `benchmarks` to Slack
- Update `slackapi/slack-github-action` action
2025-01-15 13:05:05 +00:00
Vlad Lazar
1577430408 safekeeper: decode and interpret for multiple shards in one go (#10201)
## Problem

Currently, we call `InterpretedWalRecord::from_bytes_filtered`
from each shard. To serve multiple shards at the same time,
the API needs to allow for enquiring about multiple shards.

## Summary of changes

This commit tweaks it a pretty brute force way. Naively, we could
just generate the shard for a key, but pre and post split shards
may be subscribed at the same time, so doing it efficiently is more
complex.
2025-01-15 11:10:24 +00:00
Erik Grinaker
05d17a10ae rfc: add CPU and heap profiling RFC (#10085)
This document proposes a standard cross-team pattern for CPU and memory
profiling across applications and languages, using the
[pprof](https://github.com/google/pprof) profile format.

It enables both ad hoc profiles via HTTP endpoints, and continuous
profiling across the fleet via [Grafana Cloud
Profiles](https://grafana.com/docs/grafana-cloud/monitor-applications/profiles/).
Continuous profiling incurs an overhead of about 0.1% CPU usage and 3%
slower heap allocations.


[Rendered](https://github.com/neondatabase/neon/blob/erik/profiling-rfc/docs/rfcs/040-profiling.md)

Touches #9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
2025-01-15 10:35:38 +00:00
Arseny Sher
2d0ea08524 Add safekeeper membership conf to control file. (#10196)
## Problem

https://github.com/neondatabase/neon/issues/9965

## Summary of changes

Add safekeeper membership configuration struct itself and storing it in
the control file. In passing also add creation timestamp to the control
file (there were cases where I wanted it in the past).

Remove obsolete unused PersistedPeerInfo struct from control file (still
keep it control_file_upgrade.rs to have it in old upgrade code).

Remove the binary representation of cfile in the roundtrip test.
Updating it is annoying, and we still test the actual roundtrip.

Also add configuration to timeline creation http request, currently used
only in one python test. In passing, slightly change LSNs meaning in the
request: normally start_lsn is passed (the same as ancestor_start_lsn in
similar pageserver call), but we allow specifying higher commit_lsn for
manual intervention if needed. Also when given LSN initialize
term_history with it.
2025-01-15 09:45:58 +00:00
Arseny Sher
c98cbbeac1 Add migration details to safekeeper membership RFC. (#10272)
## Problem

https://github.com/neondatabase/neon/pull/8455 wasn't specific enough on
migration from current situation to enabling generations.

## Summary of changes

Describe the missing parts, including control plane pushing generation
to compute, which also defines whether generations are enabled -- non
zero value does it.
2025-01-15 09:41:49 +00:00
John Spray
47c1640acc storage controller: pagination for tenant listing API (#10365)
## Problem

For large deployments, the `control/v1/tenant` listing API can time out
transmitting a monolithic serialized response.

## Summary of changes

- Add `limit` and `start_after` parameters to listing API
- Update storcon_cli to use these parameters and limit requests to 1000
items at a time
2025-01-14 21:37:32 +00:00
Erik Grinaker
6debb49b87 pageserver: coalesce index uploads when possible (#10248)
## Problem

With upload queue reordering in #10218, we can easily get into a
situation where multiple index uploads are queued back to back, which
can't be parallelized. This will happen e.g. when multiple layer flushes
enqueue layer/index/layer/index/... and the layers skip the queue and
are uploaded in parallel.

These index uploads will incur serial S3 roundtrip latencies, and may
block later operations.

Touches #10096.

## Summary of changes

When multiple back-to-back index uploads are ready to upload, only
upload the most recent index and drop the rest.
2025-01-14 21:10:17 +00:00
Erik Grinaker
e58e29e639 pageserver: limit number of upload queue tasks (#10384)
## Problem

The upload queue can currently schedule an arbitrary number of tasks.
This can both spawn an unbounded number of Tokio tasks, and also
significantly slow down upload queue scheduling as it's quadratic in
number of operations.

Touches #10096.

## Summary of changes

Limit the number of inprogress tasks to the remote storage upload
concurrency. While this concurrency limit is shared across all tenants,
there's certainly no point in scheduling more than this -- we could even
consider setting the limit lower, but don't for now to avoid
artificially constraining tenants.
2025-01-14 18:01:14 +00:00
Heikki Linnakangas
d36112d20f Simplify compute dockerfile by setting PATH just once (#10357)
By setting PATH in the 'pg-build' layer, all the extension build layers
will inherit. No need to pass PG_CONFIG to all the various make
invocations either: once pg_config is in PATH, the Makefiles will pick
it up from there.
2025-01-14 17:02:35 +00:00
Erik Grinaker
ffaa52ff5d pageserver: reorder upload queue when possible (#10218)
## Problem

The upload queue currently sees significant head-of-line blocking. For
example, index uploads act as upload barriers, and for every layer flush
we schedule a layer and index upload, which effectively serializes layer
uploads.

Resolves #10096.

## Summary of changes

Allow upload queue operations to bypass the queue if they don't conflict
with preceding operations, increasing parallelism.

NB: the upload queue currently schedules an explicit barrier after every
layer flush as well (see #8550). This must be removed to enable
parallelism. This will require a better mechanism for compaction
backpressure, see e.g. #8390 or #5415.
2025-01-14 16:31:59 +00:00
John Spray
aa7323a384 storage controller: quality of life improvements for AZ handling (#10379)
## Problem

Since https://github.com/neondatabase/neon/pull/9916, the preferred AZ
of a tenant is much more impactful, and we would like to make it more
visible in tooling.

## Summary of changes

- Include AZ in node describe API
- Include AZ info in node & tenant outputs in CLI
- Add metrics for per-node shard counts, labelled by AZ
- Add a CLI for setting preferred AZ on a tenant
- Extend AZ-setting API+CLI to handle None for clearing preferred AZ
2025-01-14 15:30:43 +00:00
Christian Schwarz
2466a2f977 page_service: throttle individual requests instead of the batched request (#10353)
## Problem

Before this PR, the pagestream throttle was applied weighted on a
per-batch basis.
This had several problems:

1. The throttle occurence counters were only bumped by `1` instead of
`batch_size`.
2. The throttle wait time aggregator metric only counted one wait time,
irrespective
of `batch_size`. That makes sense in some ways of looking at it but not
in others.
3. If the last request in the batch runs into the throttle, the other
requests in the
batch are also throttled, i.e., over-throttling happens (theoretical,
didn't measure
   it in practice).

## Solution

It occured to me that we can simply push the throttling upwards into
`pagestream_read_message`.

This has the added benefit that in pipeline mode, the `executor` stage
will, if it is idle,
steal whatever requests already made it into the `spsc_fold` and execute
them; before this
change, that was not the case - the throttling happened in the
`executor` stage instead of
the `batcher` stage.
   
## Code Changes

There are two changes in this PR:

1. Lifting up the throttling into the `pagestream_read_message` method.
2. Move the throttling metrics out of the `Throttle` type into
`SmgrOpMetrics`.
Unlike the other smgr metrics, throttling is per-tenant, hence the Arc.
3. Refactor the `SmgrOpTimer` implementation to account for the new
observation states,
   and simplify its design.
4. Drive-by-fix flush time metrics. It was using the same `now` in the
`observe_guard` every time.

The `SmgrOpTimer` is now a state machine.
Each observation point moves the state machine forward.
If a timer object is dropped early some "pair"-like metrics still
require an increment or observation.
That's done in the Drop implementation, by driving the state machine to
completion.
2025-01-14 15:28:01 +00:00
Alex Chi Z.
9bdb14c1c0 fix(pageserver): ensure initial image layers have correct key ranges (#10374)
## Problem

Discovered during the relation dir refactor work.

If we do not create images as in this patch, we would get two set of
image layers:

```
0000...METADATA_KEYS
0000...REL_KEYS
```

They overlap at the same LSN and would cause data loss for relation
keys. This doesn't happen in prod because initial image layer generation
is never called, but better to be fixed to avoid future issues with the
reldir refactors.

## Summary of changes

* Consolidate create_image_layers call into a single one.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-14 15:27:48 +00:00
Conrad Ludgate
df4abd8b14 fix: force-refresh azure identity token (#10378)
## Problem

Because of https://github.com/Azure/azure-sdk-for-rust/issues/1739, our
identity token file was not being refreshed. This caused our uploads to
start failing when the storage token expired.

## Summary of changes

Drop and recreate the remote storage config every time we upload in
order to force reload the identity token file.
2025-01-14 12:53:32 +00:00
Konstantin Knizhnik
a039f8381f Optimize vector get last written LSN (#10360)
## Problem

See https://github.com/neondatabase/neon/issues/10281

pg17 performs extra lock/unlock operation when fetching LwLSN.

## Summary of changes

Perform all lookups under one lock, moving initialization of not found
keys to separate loop.

Related Postgres PR:
https://github.com/neondatabase/postgres/pull/553

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-14 05:54:30 +00:00
Tristan Partin
430b556b34 Update postgres-exporter and sql_exporter in computes (#10349)
The postgres-exporter was much further out of date, but let's just bump
both.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-14 00:44:39 +00:00
Konstantin Knizhnik
1783501eaa Increase max connection for replica to prevent test flukyness (#10306)
## Problem

See https://github.com/neondatabase/neon/issues/10167
Too small number of `max_connections` (2) can cause failures of
test_physical_replication_config_mismatch_too_many_known_xids test

## Summary of changes

Increase `max_connections` to 5

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-13 20:01:03 +00:00
John Spray
fd1368d31e storcon: rework scheduler optimisation, prioritize AZ (#9916)
## Problem

We want to do a more robust job of scheduling tenants into their home
AZ: https://github.com/neondatabase/neon/issues/8264.

Closes:  https://github.com/neondatabase/neon/issues/8969

## Summary of changes

### Scope

This PR combines prioritizing AZ with a larger rework of how we do
optimisation. The rationale is that just bumping AZ in the order of
Score attributes is a very tiny change: the interesting part is lining
up all the optimisation logic to respect this properly, which means
rewriting it to use the same scores as the scheduler, rather than the
fragile hand-crafted logic that we had before. Separating these changes
out is possible, but would involve doing two rounds of test updates
instead of one.

### Scheduling optimisation

`TenantShard`'s `optimize_attachment` and `optimize_secondary` methods
now both use the scheduler to pick a new "favourite" location. Then
there is some refined logic for whether + how to migrate to it:
- To decide if a new location is sufficiently "better", we generate
scores using some projected ScheduleContexts that exclude the shard
under consideration, so that we avoid migrating from a node with
AffinityScore(2) to a node with AffinityScore(1), only to migrate back
later.
- Score types get a `for_optimization` method so that when we compare
scores, we will only do an optimisation if the scores differ by their
highest-ranking attributes, not just because one pageserver is lower in
utilization. Eventually we _will_ want a mode that does this, but doing
it here would make scheduling logic unstable and harder to test, and to
do this correctly one needs to know the size of the tenant that one is
migrating.
- When we find a new attached location that we would like to move to, we
will create a new secondary location there, even if we already had one
on some other node. This handles the case where we have a home AZ A, and
want to migrate the attachment between pageservers in that AZ while
retaining a secondary location in some other AZ as well.
- A unit test is added for
https://github.com/neondatabase/neon/issues/8969, which is implicitly
fixed by reworking optimisation to use the same scheduling scores as
scheduling.
2025-01-13 19:33:00 +00:00
Alex Chi Z.
e9ed53b14f feat(pageserver): support inherited sparse keyspace (#10313)
## Problem

In preparation to https://github.com/neondatabase/neon/issues/9516. We
need to store rel size and directory data in the sparse keyspace, but it
does not support inheritance yet.

## Summary of changes

Add a new type of keyspace "sparse but inherited" into the system.

On the read path: we don't remove the key range when we descend into the
ancestor. The search will stop when (1) the full key range is covered by
image layers (which has already been implemented before), or (2) we
reach the end of the ancestor chain.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-13 15:43:01 +00:00
Conrad Ludgate
a338aee132 feat(local_proxy): use ed25519 signatures with pg_session_jwt (#10290)
Generally ed25519 seems to be much preferred for cryptographic strength
to P256 nowadays, and it is NIST approved finally. We should use it
where we can as it's also faster than p256.

This PR makes the re-signed JWTs between local_proxy and pg_session_jwt
use ed25519.

This does introduce a new dependency on ed25519, but I do recall some
Neon Authorise customers asking for support for ed25519, so I am
justifying this dependency addition in the context that we can then
introduce support for customer ed25519 keys

sources:
* https://csrc.nist.gov/pubs/fips/186-5/final subsection 7 (EdDSA)
* https://datatracker.ietf.org/doc/html/rfc8037#section-3.1
2025-01-13 15:20:46 +00:00
Heikki Linnakangas
96243af651 Stop building unnecessary extension tarballs (#10355)
We build "custom extensions" from a different repository nowadays.
2025-01-13 15:01:13 +00:00
John Spray
ef8bfacd6b storage controller: API + CLI for migrating secondary locations (#10284)
## Problem

Currently, if we want to move a secondary there isn't a neat way to do
that: we just have migration API for the attached location, and it is
only clean to use that if you've manually created a secondary via
pageserver API in the place you're going to move it to.

Secondary migration API enables:
- Moving the secondary somewhere because we would like to later move the
attached location there.
- Move the secondary location because we just want to reclaim some disk
space from its current location.

## Summary of changes

- Add `/migrate_secondary` API
- Add `tenant-shard-migrate-secondary` CLI
- Add tests for above
2025-01-13 14:52:43 +00:00
Konstantin Knizhnik
ceacc29609 Start with minimal prefetch distance to minimize prefetch overhead for exact or limited index scans (#10359)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1736526089437179

In case of queries index scan with LIMIT clause, multiple backends can
concurrently send larger number of duplicated prefetch requests which
are not stored in LFC and so actually do useless job.

Current implementation of index prefetch starts with maximal prefetch
distance (10 by default now) when there are no key bounds, so in queries
with LIMIT clause like `select * from T order by pk limit 1` compute can
send a lot of useless prefetch requests to page server.

## Summary of changes

Always start with minimal prefetch distance even if there are not key
boundaries.

Related Postgres PRs:
https://github.com/neondatabase/postgres/pull/552
https://github.com/neondatabase/postgres/pull/551
https://github.com/neondatabase/postgres/pull/550
https://github.com/neondatabase/postgres/pull/549

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-13 14:26:11 +00:00
Erik Grinaker
b31ed0acd1 utils: add ?force=true hint for CPU profiler (#10368)
This makes it less annoying to try to take a CPU profile when a
continuous profile is already running.
2025-01-13 14:23:42 +00:00
Alexander Bayandin
b2d0e1a519 Link OpenSSL dynamically (#10302)
## Problem
Statically linked OpenSSL is buggy in multithreaded environment:
- https://github.com/neondatabase/cloud/issues/16155
- https://github.com/neondatabase/neon/issues/8275

## Summary of changes
- Link OpenSSL dynamically (revert OpenSSL part from
https://github.com/neondatabase/neon/pull/8074)

Before:
```
ldd /usr/local/v17/lib/libpq.so
        linux-vdso.so.1 (0x0000ffffb5ce4000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb5c10000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb5650000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffb5ca7000)
```

After:
```
ldd /usr/local/v17/lib/libpq.so
        linux-vdso.so.1 (0x0000ffffbf3e8000)
        libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3 (0x0000ffffbf260000)
        libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3 (0x0000ffffbec00000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffbf1c0000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbea50000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffffbf3ab000)
```
2025-01-13 14:13:02 +00:00
John Spray
d1bc36f536 storage controller: fix retries of compute hook notifications while a secondary node is offline (#10352)
## Problem

We would sometimes fail to retry compute notifications:
1. Try and send, set compute_notify_failure if we can't
2. On next reconcile, reconcile() fails for some other reason (e.g.
tried to talk to an offline node), and we fail the `result.is_ok() &&
must_notify` condition around the re-sending.

Closes: https://github.com/neondatabase/cloud/issues/22612

## Summary of changes

- Clarify the meaning of the reconcile result: it should be Ok(()) if
configuring attached location worked, even if secondary or detach
locations cannot be reached.
- Skip trying to talk to secondaries if they're offline
- Even if reconcile fails and we can't send the compute notification (we
can't send it because we're not sure if it's really attached), make sure
we save the `compute_notify_failure` flag so that subsequent reconciler
runs will try again
- Add a regression test for the above
2025-01-13 13:31:57 +00:00
Erik Grinaker
0b9032065e utils: allow 60-second CPU profiles (#10367)
Taking continuous profiles every 20 seconds is likely too expensive (in
dollar terms). Let's try 60-second profiles. We can now interrupt
running profiles via `?force=true`, so this should be fine.
2025-01-13 13:14:23 +00:00
Heikki Linnakangas
09fe3b025c Add a websockets tunnel and a test for the proxy's websockets support. (#3823)
For testing the proxy's websockets support.

I wrote this to test https://github.com/neondatabase/neon/issues/3822.
Unfortunately, that bug can *not* be reproduced with this tunnel. The
bug only appears when the client pipelines the first query with the
authentication messages. The tunnel doesn't do that.

---

Update (@conradludgate 2025-01-10):

We have since added some websocket tests, but they manually implemented
a very simplistic setup of the postgres protocol. Introducing the tunnel
would make more complex testing simpler in the future.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2025-01-13 11:35:39 +00:00
John Spray
12053cf832 storage controller: improve consistency_check_api (#10363)
## Problem

Limitations found while using this to investigate
https://github.com/neondatabase/neon/issues/10234:
- If we hit a node consistency issue, we drop out and don't check shards
for consistency
- The messages printed after a shard consistency issue are huge, and
grafana appears to drop them.

## Summary of changes

- Defer node consistency errors until the end of the function, so that
we always proceed to check shards for consistency
- Print out smaller log lines that just point out the diffs between
expected and persistent state
2025-01-13 11:18:14 +00:00
Conrad Ludgate
de199d71e1 chore: Address lints introduced in rust 1.85.0 beta (#10340)
With a new beta build of the rust compiler, it's good to check out the
new lints. Either to find false positives, or find flaws in our code.
Additionally, it helps reduce the effort required to update to 1.85 in 6
weeks.
2025-01-13 10:34:36 +00:00
Erik Grinaker
22a6460010 libs/utils: add force parameter for /profile/cpu (#10361)
## Problem

It's only possible to take one CPU profile at a time. With Grafana
continuous profiling, a (low-frequency) CPU profile will always be
running, making it hard to take an ad hoc CPU profile at the same time.

Resolves #10072.

## Summary of changes

Add a `force` parameter for `/profile/cpu` which will end and return an
already running CPU profile, starting a new one for the current caller.
2025-01-13 10:01:18 +00:00
Erik Grinaker
cd982a82ec pageserver,safekeeper: increase heap profiling frequency to 2 MB (#10362)
## Problem

Currently, the heap profiling frequency is every 1 MB allocated. Taking
a profile stack trace takes about 1 µs, and allocating 1 MB takes about
15 µs, so the overhead is about 6.7% which is a bit high. This is a
fixed cost regardless of whether heap profiles are actually accessed.

## Summary of changes

Increase the heap profiling sample frequency from 1 MB to 2 MB, which
reduces the overhead to about 3.3%. This seems acceptable, considering
performance-sensitive code will avoid allocations as far as possible
anyway.
2025-01-13 09:44:59 +00:00
Heikki Linnakangas
8327f68043 Minor cleanup of extension build commands (#10356)
There used to be some pg version dependencies in these extensions, but
now that there isn't, follow the simpler pattern used in other
extensions. No change in the produced images.
2025-01-11 17:39:27 +00:00
Heikki Linnakangas
846e8fdce4 Remove obsolete hnsw extension (#8008)
This has been deprecated and disabled for new installations for a long
time. Let's remove it for good.
2025-01-11 14:20:50 +00:00
Heikki Linnakangas
70a3bf37a0 Stop building 'compute-tools' image (#10333)
It's been unused from time immemorial.

---------

Co-authored-by: Matthias van de Meent <matthias@neon.tech>
2025-01-11 13:09:55 +00:00
Arpad Müller
23c0748cdd Remove active column (#10335)
We don't need or want the `active` column. Remove it. Vlad pointed out
that this is safe.

Thanks to the separation of the schemata in earlier PRs, this is easy.

follow-up of #10205

Part of https://github.com/neondatabase/neon/issues/9981
2025-01-11 02:52:45 +00:00
Alex Chi Z.
b5d54ba52a refactor(pageserver): move queue logic to compaction.rs (#10330)
## Problem

close https://github.com/neondatabase/neon/issues/10031, part of
https://github.com/neondatabase/neon/issues/9114

## Summary of changes

Move the compaction job generation to `compaction.rs`, thus making the
code more readable and debuggable. We now also return running job
through the get compaction job API, versus before we only return
scheduled jobs.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-10 20:53:00 +00:00
Christian Schwarz
58332cb361 pageserver: remove unused metric pageserver_layers_visited_per_read_global (#10141)
As of commit "pageserver: remove legacy read path" (#8601) we always use
vectored get, which has a separate metric.
2025-01-10 20:35:50 +00:00
Christian Schwarz
9b43204893 fix(page_service): Timeline::gate held open while throttling (#10314)
When we moved throttling up from Timeline::get into page_service,
we stopped being sensitive to `Timeline::cancel`, even though we're
holding a Handle and thus a guard on the `Timeline::gate` open.

This PR rectifies the situation.

Refs

- Found while investigating #10309 (hung detach because gate kept open),
  but not expected to be the root cause of that issue because the
  affected tenants are not being throttled according to their metrics.
2025-01-10 19:21:01 +00:00
Christian Schwarz
cdd34dfc12 impr(utils/spsc_fold): add another test case (#10319)
Wondered about the case covered here while investigating #10309.
2025-01-10 19:19:48 +00:00
Vlad Lazar
3cd5034eac storcon: don't assume host:port format in storcon client (#10347)
## Problem

https://github.com/neondatabase/infra/pull/2725 updated the scrubber to
use a non-host
port endpoint for storcon. That breaks when unwrapping the port.

## Summary of changes

Support both `host:port` and `host` formats for the storcon api.
2025-01-10 18:35:16 +00:00
Cheng Chen
425b777840 chore(compute): pg_mooncake v0.1.0 (#10337)
## Problem
Upgrade pg_mooncake to v0.1.0

## Summary of changes
2025-01-10 16:38:13 +00:00
John Spray
4398051385 tests: smaller datasets in LFC tests (#10346)
## Problem

These two tests came up in #9537 as doing multi-gigabyte I/O, and from
inspection of the tests it doesn't seem like they need that to fulfil
their purpose.

## Summary of changes

- In test_local_file_cache_unlink, run fewer background threads with a
smaller number of rows. These background threads AFAICT exist to make
sure some I/O is going on while we unlink the LFC directory, but 5
threads should be enough for "some".
- In test_lfc_resize, tweak the test to validate that the cache size is
larger than the final size before resizing it, so that we're sure we're
writing enough data to really be doing something. Then decrease the
pgbench scale.
2025-01-10 15:53:23 +00:00
Folke Behrens
71bca6f580 poetry: Update packaging for poetry v2 (#10344)
## Problem

When poetry v2 (released Jan 5) is used it needs `packaging.metadata`
module, but we downgrade `packaging` to 23.0. `packaging==23.1`
introduced the metadata submodule.

## Summary of changes

Update `packaging` to 24.2.
2025-01-10 14:32:26 +00:00
John Spray
105f66c4ce tests: move test_parallel_copy into performance tree (#10343)
## Problem

This test writes ~5GB of data. It is not suitable to run in parallel
with all the other small tests in test_runner/regress.

via #9537 

## Summary of changes

- Move test_parallel_copy into the performance directory, so that it
does not run in parallel with other tests
2025-01-10 13:57:26 +00:00
John Spray
0d4fce2d35 tests: refine how compat snapshot is generated (#10342)
## Problem

I noticed in https://github.com/neondatabase/neon/pull/9537 that tests
which work with compat snapshots were writing several hundred MB of
data, which isn't really necessary.

Also, the snapshots are large but don't have the proper variety of
storage format features, e.g. they could just have L0 deltas.

## Summary of changes

- Use smaller scale factor and runtime to generate less data
- Configure a small layer size and use force image layer generation so
that our output contains L1 deltas and image layers, and has a decent
number of entries in the layer map
2025-01-10 13:57:23 +00:00
Erik Grinaker
2b8ea1e768 utils: add flamegraph for heap profiles (#10223)
## Problem

Unlike CPU profiles, the `/profile/heap` endpoint can't automatically
generate SVG flamegraphs. This requires the user to install and use
`pprof` tooling, which is unnecessary and annoying.

Resolves #10203.

## Summary of changes

Add `format=svg` for the `/profile/heap` route, and generate an SVG
flamegraph using the `inferno` crate, similarly to what `pprof-rs`
already does for CPU profiles.
2025-01-10 12:14:29 +00:00
Christian Schwarz
db00eb41a1 fix(spsc_fold): potentially missing wakeup when send()ing in state SenderWaitsForReceiverToConsume (#10318)
# Problem

Before this PR, there were cases where send() in state
SenderWaitsForReceiverToConsume would never be woken up
by the receiver, because it never registered with `wake_sender`.

Example Scenario 1: we stop polling a send() future A that was waiting
for the receiver to consume. We drop A and create a new send() future B.
B would return Poll::Pending and never regsister a waker.

Example Scenario 2: a send() future A transitions from HasData
to SenderWaitsForReceiverToConsume. This registers the context X
with `wake_sender`. But before the Receiver consumes the data,
we poll A from a different context Y.
The state is still SenderWaitsForReceiverToConsume, but we wouldn't
register the new context with `wake_sender`.
When the Receiver comes around to consume and `wake_sender.notify()`s,
it wakes the old context X instead of Y.

# Fix

Register the waker in the case where we're polled in
state `SenderWaitsForReceiverToConsume`.

# Relation to #10309

I found this bug while investigating #10309.
There was never proof that this bug here is the root cause for #10309.
In the meantime we found a more probably hypothesis
for the root cause than what is being fixed here.
Regardless, let's walk through my thought process about
how it might have been relevant:

There (in page_service), Scenario 1 does not apply because
we poll the send() future to completion.

Scenario 2 (`tokio::join!`) also does not apply with the
current `tokio::join!()` impl, because it will just poll each
future every time, each with the same context.
Although if we ever used something like a FuturesUnordered anywhere,
that will be using a different context, so, in that case,
the bug might materialize.

Regarding tokio & spurious poll in general:
@conradludgate is not aware of any spurious wakeup cases in current
tokio,
but within a `tokio::join!()`, any wake meant for one future will poll
all
the futures, so that can appear as a spurious wake up to the N-1 futures
of the `tokio::join!()`.
2025-01-10 11:06:03 +00:00
Conrad Ludgate
735c66dc65 fix(proxy): propagate the existing ComputeUserInfo to connect for cancellation (#10322)
## Problem

We were incorrectly constructing the ComputeUserInfo, used for
cancellation checks, based on the return parameters from postgres. This
didn't contain the correct info.

## Summary of changes

Propagate down the existing ComputeUserInfo.
2025-01-10 09:36:51 +00:00
Folke Behrens
77660f3d88 proxy: Fix parsing of UnknownTopic with payload (#10339)
## Problem

When the proxy receives a `Notification` with an unknown topic it's
supposed to use the `UnknownTopic` unit variant. Unfortunately, in
adjacently tagged enums serde will not simply ignore the configured
content if found and try to deserialize a map/object instead.

## Summary of changes

* Use a custom deserialize function to ignore variant content.
* Add a little unit test covering both cases.
2025-01-10 09:12:31 +00:00
Folke Behrens
b6205af4a5 Update tracing/otel crates (#10311)
Update the tracing(-x) and opentelemetry(-x) crates.

Some breaking changes require updating our code:
* Initialization is done via builders now

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/CHANGELOG.md#0270
* Errors from OTel SDK are logged via tracing crate as well.

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry/CHANGELOG.md#0270
2025-01-10 08:48:03 +00:00
Arpad Müller
6149ac8834 Handle race between auto-offload and unarchival (#10305)
## Problem

Auto-offloading as requested by the compaction task is racy with
unarchival, in that the compaction task might attempt to offload an
unarchived timeline. By that point it will already have set the timeline
to the `Stopping` state however, which makes it unusable for any
purpose. For example:

1. compaction task decides to offload timeline
2. timeline gets unarchived
3. `offload_timeline` gets called by compaction task
  * sets timeline's state to `Stopping`
  * realizes that the timeline can't be unarchived, errors out
6. endpoint can't be started as the timeline is `Stopping` and thus
'can't be found'.

A future iteration of the compaction task can't "heal" this state either
as the timeline will still not be archived, same goes for other
automatic stuff. The only way to heal this is a tenant detach+attach, or
alternatively a pageserver restart.

Furthermore, the compaction task is especially amenable for such races
as it first stores `can_offload` into a variable, figures out whether
compaction is needed (which takes some time), and only then does it
attempt an offload operation: the time difference between "check" and
"use" is non-trivially small.

To make it even worse, we start the compaction task right after attach
of a tenant, and it is a common pattern by pageserver users to attach a
tenant to then immediately unarchive a timeline, so that an endpoint can
be started.

## Solutions not adopted

The simplest solution is to move the `can_offload` check to right before
attempting of the offload. But this is not a good solution, as no lock
is held between that check and timeline shutdown. So races would still
be possible, just become less likely.

I explored using the timeline state for this, as in adding an additional
enum variant. But `Timeline::set_state` is racy (#10297).

## Adopted solution

We use the lock on the timeline's upload queue as an arbiter: either
unarchival gets to it first and sours the state for auto-offloading, or
auto-offloading shuts it down, which stops any parallel unarchival in
its tracks. The key part is not releasing the upload queue's lock
between the check whether the timeline is archived or not, and shutting
it down (the actual implementation only sets `shutting_down` but it has
the same effect on `initialized_mut()` as a full shutdown). The rest of
the patch is stuff that follows from this.

We also move the part where we set the state to `Stopping` to after that
arbiter has decided the fate of the timeline. For deletions, we do keep
it inside `DeleteTimelineFlow::prepare` however, so that it is called
with all of the the timelines locks held that the function allocates
(timelines lock most importantly). This is only a precautionary measure
however, as I didn't want to analyze deletion related code for possible
races.

## Future changes

It might make sense to move `can_offload` to right before the offload
attempt. Maybe some other properties might have changed as well.
Although this will not be perfect either as no lock is held. I want to
keep it out of this change to emphasize that this move wasn't the main
reason we are race free now.

Fixes #10220
2025-01-09 20:41:49 +00:00
Tristan Partin
49756a0d01 Implement compute_ctl management API in Axum (#10099)
This is a refactor to create better abstractions related to our
management server. It cleans up the code, and prepares everything for
authorized communication to and from the control plane.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-09 20:08:26 +00:00
Arpad Müller
99b5a6705f Update rust to 1.84.0 (#10328)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://releases.rs/docs/1.84.0/).

Prior update was in #9926.
2025-01-09 18:29:09 +00:00
Alexey Kondratov
f37eeb56ad fix(compute_ctl): Resolve issues with dropping roles having dangling permissions (#10299)
## Problem

In Postgres, one cannot drop a role if it has any dependent objects in
the DB. In `compute_ctl`, we automatically reassign all dependent
objects in every DB to the corresponding DB owner. Yet, it seems that it
doesn't help with some implicit permissions. The issue is reproduced by
installing a `postgis` extension because it creates some views and
tables in the public schema.

## Summary of changes

Added a repro test without using a `postgis`: i) create a role via
`compute_ctl` (with `neon_superuser` grant); ii) create a test role, a
table in schema public, and grant permissions via the role in
`neon_superuser`.

To fix the issue, I added a new `compute_ctl` code that removes such
dangling permissions before dropping the role. It's done in the least
invasive way, i.e., only touches the schema public, because i) that's
the problem we had with PostGIS; ii) it creates a smaller chance of
messing anything up and getting a stuck operation again, just for a
different reason.

Properly, any API-based catalog operations should fail gracefully and
provide an actionable error and status code to the control plane,
allowing the latter to unwind the operation and propagate an error
message and hint to the user. In this sense, it's aligned with another
feature request https://github.com/neondatabase/cloud/issues/21611

Resolve neondatabase/cloud#13582
2025-01-09 16:39:53 +00:00
Arpad Müller
bebc46e713 Add scheduling_policy column to safekeepers table (#10205)
Add a `scheduling_policy` column to the safekeepers table of the storage
controller.

Part of #9981
2025-01-09 15:55:02 +00:00
John Spray
ad51622568 remote_storage: enable Azure connection pooling by default (#10324)
## Problem

Initially we defaulted this to zero to reduce risk. We have now been
using pooling in staging for some time without issues, so let's make it
the default for anyone using this software without setting the config
explicitly.

Closes: https://github.com/neondatabase/cloud/issues/20971

## Summary of changes

- Set Azure blob storage connection pool size to 8 by default
2025-01-09 15:34:06 +00:00
John Spray
ac6cca17ac storcon: don't log a heartbeat error during shutdown (#10325)
## Problem

Occasionally we see an unexpected error like:
```
ERROR spawn_heartbeat_driver: Failed to update node state 1 after heartbeat round: Shutting down\n')
Hint: use scripts/check_allowed_errors.sh to test any new allowed_error you add
```


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10324/12690404952/index.html#/testresult/63406a0687bf6eca

## Summary of changes

- Explicitly handle ApiError::ShuttingDown as a no-op when mutating node
status
2025-01-09 15:33:44 +00:00
Alex Chi Z.
640ac4fc9e fix(pageserver): report timestamp is in the past if the key is missing (#10210)
## Problem

If for some reasons we already garbage-collected the data under an LSN
but the caller uses a past LSN for the find_time_cutoff function, now we
will report a missing key error and GC will never proceed.

Note that missing key error can also happen if the key is really missing
(i.e., during the past offload incidents)

## Summary of changes

Make sure GC proceeds by bumping the LSN. When time_cutoff=None, we will
not increase the time_cutoff (it will be set to latest_gc_cutoff). If we
really need to bump the GC LSN for maintenance purpose, we need a
separate API to do that.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-09 14:43:20 +00:00
Konstantin Knizhnik
20c40eb733 Add response tag to getpage request in V3 protocol version (#8686)
## Problem

We have several serious data corruption incidents caused by mismatch of
get-age requests:
https://neondb.slack.com/archives/C07FJS4QF7V/p1723032720164359

We hope that the problem is fixed now. But it is better to prevent such
kind of problems in future.

Part of https://github.com/neondatabase/cloud/issues/16472

## Summary of changes

This PR introduce new V3 version of compute<->pageserver protocol,
adding tag to getpage response.
So now compute is able to check if it really gets response to the
requested page.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2025-01-09 13:12:04 +00:00
Vlad Lazar
f4739d49e3 pageserver: tweak interpreted ingest record metrics (#10291)
## Problem
The filtered record metric doesn't make sense for interpreted ingest. 

## Summary of changes
While of dubious utility in the first place, this patch replaces them
with records received and records observed metrics for interpreted
ingest:
* received records cause the pageserver to do _something_: write a key,
value pair to storage, update some metadata or flush pending
modifications
* observed records are a shard 0 concept and contain only key metadata
used in tracking relation sizes (received records include observed
records)
2025-01-09 12:31:02 +00:00
Arseny Sher
030ab1c0e8 TLA+ spec for safekeeper membership change (#9966)
## Problem

We want to define the algorithm for safekeeper membership change.

## Summary of changes

Add spec for it, several models and logs of checking them.

ref https://github.com/neondatabase/neon/issues/8699
2025-01-09 12:26:17 +00:00
John Spray
5baa4e7f0a docker: don't set LD_LIBRARY_PATH (#10321)
## Problem

This was causing storage controller to still use neon-built libpq
instead of vanilla libpq.

Since https://github.com/neondatabase/neon/pull/10269 we have a vanilla
postgres in the system path -- anything that wants a postgres library
will use that.

## Summary of changes

- Remove LD_LIBRARY_PATH assignment in Dockerfile
2025-01-09 11:47:55 +00:00
Folke Behrens
03666a1f37 Merge pull request #10320 from neondatabase/rc/release-proxy/2025-01-09
Proxy release 2025-01-09
2025-01-09 10:19:07 +01:00
Tristan Partin
5b2751397d Refactor MigrationRunner::run_migrations() to call a helper (#10232)
This will make it easier to add per-db migrations, such as that for
CVE-2024-4317.

Link: https://www.postgresql.org/support/security/CVE-2024-4317/
Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-09 07:05:07 +00:00
github-actions[bot]
9c92242ca0 Proxy release 2025-01-09 2025-01-09 06:02:06 +00:00
Ivan Efremov
fcfff72454 impr(proxy): Decouple ip_allowlist from the CancelClosure (#10199)
This PR removes the direct dependency of the IP allowlist from
CancelClosure, allowing for more scalable and flexible IP restrictions
and enabling the future use of Redis-based CancelMap storage.

Changes:
- Introduce a new BackendAuth async trait that retrieves the IP
allowlist through existing authentication methods;
- Improve cancellation error handling by instrument() async
cancel_sesion() rather than dropping it.
- Set and store IP allowlist for SCRAM Proxy to consistently perform IP
allowance check
 
 Relates to #9660
2025-01-08 19:34:53 +00:00
Anastasia Lubennikova
0ad0db6ff8 compute: dropdb DROP SUBSCRIPTION fix (#10066)
## Problem
Project gets stuck if database with subscriptions was deleted via API /
UI.

https://github.com/neondatabase/cloud/issues/18646

## Summary of changes
Before dropping the database, drop all the subscriptions in it.
Do not drop slot on publisher, because we have no guarantee that the
slot still exists or that the publisher is reachable.

Add `DropSubscriptionsForDeletedDatabases` phase to run these operations
in all databases, we're about to delete.
Ignore the error if the database does not exist.
2025-01-08 18:55:04 +00:00
John Spray
68d8acfd05 storage controller: don't hold detached tenants in memory (#10264)
## Problem

Typical deployments of neon have some tenants that stay in use
continuously, and a background churning population of tenants that are
created and then fall idle, and are configured to Detached state.
Currently, this churn of short lived tenants results in an
ever-increasing memory footprint.

Closes: https://github.com/neondatabase/neon/issues/9712

## Summary of changes

- At startup, filter to only load shards that don't have Detached policy
- In process_result, check if a tenant's shards are all Detached and
observed=={}, and if so drop them from memory
- In tenant_location_conf and other tenant mutators, load the tenants'
shards on-demand if they are not present
2025-01-08 18:12:09 +00:00
Vlad Lazar
dc284247a5 storage_controller: fix node flap detach race (#10298)
## Problem

The observed state removal may race with the inline updates of the
observed state done from `Service::node_activate_reconcile`.

This was intended to work as follows:
1. Detaches while the node is unavailable remove the entry from the
   observed state.
2. `Service::node_activate_reconcile` diffs the locations returned
   by the pageserver with the observed state and detaches in-line
   when required.

## Summary of changes

This PR removes step (1) and lets background reconciliations
deal with the mismatch between the intent and observed state.
A follow up will attempt to remove `Service::node_activate_reconcile`
altogether.

Closes https://github.com/neondatabase/neon/issues/10253
2025-01-08 10:26:53 +00:00
Alex Chi Z.
5c76e2a983 fix(storage-scrubber): ignore errors if index_part is not consistent (#10304)
## Problem

Consider the pageserver is doing the following sequence of operations:

* upload X files
* update index_part to add X and remove Y
* delete Y files

When storage scrubber obtains the initial timeline snapshot before
"update index_part" (that is the old version that contains Y but not X),
and then obtains the index_part file after it gets updated, it will
report all Y files are missing.

## Summary of changes

Do not report layer file missing if index_part listed and downloaded are
not the same (i.e. different last_modified times)

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-01-07 23:24:17 +00:00
Erik Grinaker
237dae71a1 Revert "pageserver,safekeeper: disable heap profiling (#10268)" (#10303)
This reverts commit b33299dc37.

Heap profiles weren't the culprit after all.

Touches #10225.
2025-01-07 22:49:00 +00:00
Fedor Dikarev
43a5e575d6 ci: use reusable workflow for MacOs build (#9889)
Closes: https://github.com/neondatabase/cloud/issues/17784

## Problem
Currently, we run the whole CI pipeline for any changes. It's slow and
expensive.

## Suggestion
Starting with MacOs builds:
- check what files were changed
- rebuild only needed parts
- reuse results from previous builds when available
- run builds in parallel when possible

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2025-01-07 20:00:56 +00:00
Folke Behrens
0a117fb1f1 proxy: Parse Notification twice only for unknown topic (#10296)
## Problem

We currently parse Notification twice even in the happy path.

## Summary of changes

Use `#[serde(other)]` to catch unknown topics and defer the second
parsing.
2025-01-07 15:24:54 +00:00
JC Grünhage
4aa9786c6b Fix promote-images-prod after splitting it out (#10292)
## Problem
`promote-images` was split into `promote-images-dev` and
`promote-images-prod` in
https://github.com/neondatabase/neon/pull/10267.

`dev` credentials were loaded in `promote-images-dev` and `prod`
credentials were loaded in `promote-images-prod`, but
`promote-images-prod` needs `dev` credentials as well to access the
`dev` images to replicate them from `dev` to `prod`.

## Summary of changes
Load `dev` credentials in `promote-images-prod` as well.
2025-01-07 13:45:18 +00:00
Matthias van de Meent
be38123e62 Fix accounting of dropped prefetched GetPage requests (#10276)
Apparently, we failed to do this bookkeeping in quite a few places...

## Problem

Fixes https://github.com/neondatabase/cloud/issues/22364

## Summary of changes

Add accounting of dropped requests. Note that this includes prefetches
dropped due to things like "PS connection dropped unexpectedly" or
"prefetch queue is already full", but *not* (yet?) "dropped due to
backend shutdown".
2025-01-07 10:41:52 +00:00
JC Grünhage
ea84ec357f Split promote-images into promote-images-dev and promote-images-prod (#10267)
## Problem
`trigger-e2e-tests` waits half an hour before starting to run. Nearly
half of that time can be saved by promoting images before tests on them
are complete, so the e2e tests can run in parallel.

On `main` and `release{,-proxy,-compute}`, `promote-images` updates
`latest` and pushes things to prod ecr, so we want to run
`promote-images` only after `test-images` is done, but on other
branches, there is no harm in promoting images that aren't tested yet.

## Summary of changes

To promote images into dev container registries sooner, `promote-images`
is split into `promote-images-dev` and `promote-images-prod`. The former
pushes to dev container registries, the latter to prod ones. The latter
also waits for `test-images`, while the former doesn't. This allows to
run `trigger-e2e-tests` sooner.
2025-01-07 10:36:05 +00:00
Matthias van de Meent
30863c0104 libpagestore: timeout = max(0, difference), not min(0, difference) (#10274)
Using `min(0, ...)` causes us to fail to wait in most situations, so a
lack of data would be a hot wait loop, which is bad.

## Problem

We noticed high CPU usage in some situations
2025-01-07 09:07:38 +00:00
Alexander Bayandin
02f81b6469 Fix clippy warning on macOS (#10282)
## Problem

On macOS:

```
error: unused variable: `disable_lfc_resizing`
   --> compute_tools/src/bin/compute_ctl.rs:431:9
    |
431 |         disable_lfc_resizing,
    |         ^^^^^^^^^^^^^^^^^^^^ help: try ignoring the field: `disable_lfc_resizing: _`
    |
    = note: `-D unused-variables` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(unused_variables)]`
```

## Summary of changes
- Initialise `disable_lfc_resizing` only on Linux (because it's used on
Linux only in further bloc)
2025-01-06 20:28:33 +00:00
Alexander Bayandin
ad7f14d526 test_runner: update packages for Python 3.13 (#10285)
## Problem

It's impossible to run regression tests with Python 3.13 as some
dependencies don't support it (some of them are outdated, and `jsonnet`
doesn't support it at all yet)

## Summary of changes
- Update dependencies for Python 3.13
- Install `jsonnet` only on Python < 3.13 and skip relevant tests on
Python 3.13

Closes #10237
2025-01-06 20:25:31 +00:00
Erik Grinaker
b342a02b1c Dockerfile: build with force-frame-pointers=yes (#10286)
See
https://github.com/neondatabase/neon/pull/10226#issuecomment-2573725182.
2025-01-06 20:17:43 +00:00
Alex Chi Z.
4a6556e269 fix(pageserver): ensure GC computes time cutoff using the same start time (#10193)
## Problem

close https://github.com/neondatabase/neon/issues/10192

## Summary of changes

* `find_gc_time_cutoff` takes `now` parameter so that all branches
compute the cutoff based on the same start time, avoiding races.
* gc-compaction uses a single `get_gc_compaction_watermark` function to
get the safe LSN to compact.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-01-06 19:29:18 +00:00
Erik Grinaker
95f1920231 cargo: build with frame pointers (#10226)
## Problem

Frame pointers are typically disabled by default (depending on CPU
architecture), to improve performance. This frees up a CPU register, and
avoids a couple of instructions per function call. However, it makes
stack unwinding much more inefficient, since it has to use DWARF debug
information instead, and gives worse results with e.g. `perf` and eBPF
profiles. The `backtrace` implementation of `libunwind` is also
suspected to cause seg faults.

The performance benefit of frame pointer omission doesn't appear to
matter that much on modern 64-bit CPU architectures (which have plenty
of registers and optimized instruction execution), and benchmarks did
not show measurable overhead.

The Rust standard library and jemalloc already enable frame pointers by
default.

For more information, see
https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html.

Resolves #10224.
Touches #10225.

## Summary of changes

Enable frame pointers in all builds, and use frame pointers for pprof-rs
stack sampling.
2025-01-06 17:27:08 +00:00
Conrad Ludgate
fda52a0005 feat(proxy): dont trigger error alerts for unknown topics (#10266)
## Problem

Before the holidays, and just before our code freeze, a change to cplane
was made that started publishing the topics from #10197. This triggered
our alerts and put us in a sticky situation as it was not an error, and
we didn't want to silence the alert for the entire holidays, and we
didn't want to release proxy 2 days in a row if it was not essential.

We fixed it eventually by rewriting the alert based on logs, but this is
not a good solution.

## Summary of changes

Introduces an intermediate parsing step to check the topic name first,
to allow us to ignore parsing errors for any topics we do not know
about.
2025-01-06 13:05:35 +00:00
Busra Kugler
406cca643b Update neon_fixtures.py - remove logs (#10219)
We need to remove this line to prevent aws keys exposing in the public
s3 buckets
2025-01-06 10:44:23 +00:00
dependabot[bot]
b368e62cfc build(deps): bump jinja2 from 3.1.4 to 3.1.5 in the pip group (#10236)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-04 15:40:50 +00:00
John Spray
4b2f56862d docker: include vanilla debian postgres client (#10269)
## Problem

We are chasing down segfaults in the storage controller
https://github.com/neondatabase/cloud/issues/21010

This is for use by the storage controller, which links dynamically with
`libpq`. We currently use the neon-built libpq, but this may be unsafe
for use from multi-threaded programs like the controller, as it uses a
statically linked openssl

Precursor to https://github.com/neondatabase/neon/pull/10258

## Summary of changes

- Include `postgresql-15` in container builds.

The reason for using version 15 is simply because that is what's
available in Debian 12 without adding any extra repositories, and we
don't have any special need for latest version in our libpq usage.
2025-01-03 16:16:04 +00:00
Erik Grinaker
a77e87a48a pageserver: assert that uploads don't modify indexed layers (#10228)
## Problem

It's not legal to modify layers that are referenced by the current layer
index. Assert this in the upload queue, as preparation for upload queue
reordering.

Touches #10096.

## Summary of changes

Add a debug assertion that the upload queue does not modify layers
referenced by the current index.

I could be convinced that this should be a plain assertion, but will be
conservative for now.
2025-01-03 16:03:19 +00:00
Erik Grinaker
1393cc668b Revert "pageserver: revert flush backpressure (#8550) (#10135)" (#10270)
This reverts commit f3ecd5d76a.

It is
[suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759)
to have caused significant read amplification in the [ingest
benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs)
(specifically during index creation).

We will revisit an intermediate improvement here to unblock [upload
parallelism](https://github.com/neondatabase/neon/issues/10096) before
properly addressing [compaction
backpressure](https://github.com/neondatabase/neon/issues/8390).
2025-01-03 15:38:51 +00:00
Erik Grinaker
b33299dc37 pageserver,safekeeper: disable heap profiling (#10268)
## Problem

Since enabling continuous profiling in staging, we've seen frequent seg
faults. This is suspected to be because jemalloc and pprof-rs take a
stack trace at the same time, and the handlers aren't signal safe.
jemalloc does this probabilistically on every allocation, regardless of
whether someone is taking a heap profile, which means that any CPU
profile has a chance to cause a seg fault.

Touches #10225.

## Summary of changes

For now, just disable heap profiles -- CPU profiles are more important,
and we need to be able to take them without risking a crash.
2025-01-03 15:21:31 +00:00
John Spray
e9d30edc7f pageserver: fix a 500 during timeline creation + shutdown (#10259)
## Problem

The test_create_churn_during_restart test fails if timeline creation
calls return 500 errors (because the API shouldn't do it), and it's
sometimes failing, for example:

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/3ce2e7045465012e

## Summary of changes

- Avoid handling UploadQueueShutDownOrStopped case as an Other (i.e.
500)
2025-01-03 13:13:22 +00:00
Arpad Müller
1303cd5d05 Fix defusing race between Tenant::shutdown and offload_timeline (#10150)
There is a race condition between `Tenant::shutdown`'s `defuse_for_drop`
loop and `offload_timeline`, where timeline offloading can insert into a
tenant that is in the process of shutting down, in fact so far
progressed that the `defuse_for_drop` has already been called.

This prevents warn log lines of the form:

```
offloaded timeline <hash> was dropped without having cleaned it up at the ancestor
```

The solution piggybacks on the `offloaded_timelines` lock: both the
defuse loop and the offloaded timeline insertion need to acquire the
lock, and we know that the defuse loop only runs after the tenant has
set its `TenantState` to `Stopping`.

So if we hold the `offloaded_timelines` lock, and know that the
`TenantState` is not `Stopping`, then we know that the defuse loop has
not ran yet, and holding the lock ensures that it doesn't start running
while we are inserting the offloaded timeline.

Fixes #10070
2025-01-03 12:36:01 +00:00
John Spray
c08759f367 storcon: verbose logs in rare case of shards not attached yet (#10262)
## Problem

When we do a timeline CRUD operation, we check that the shards we need
to mutate are currently attached to a pageserver, by reading
`generation` and `generation_pageserver` from the database.

If any don't appear to be attached, we respond with a a 503 and "One or
more shards in tenant is not yet attached".

This is happening more often than expected, and it's not obvious with
current logging what's going on: specifically which shard has a problem,
and exactly what we're seeing in these persistent generation columns.

(Aside: it's possible that we broke something with the change in #10011
which clears generation_pageserver when we detach a shard, although if
so the mechanism isn't trivial: what should happen is that if we stamp
on generation_pageserver if a reconciler is running, then it shouldn't
matter because we're about to

## Summary of changes

- When we are in Attached mode but find that
generation_pageserver/generation are unset, output details while looping
over shards.
2025-01-03 10:55:15 +00:00
John Spray
ba9722a2fd tests: add upload wait in test_scrubber_physical_gc_ancestors (#10260)
## Problem

We see periodic failures in `test_scrubber_physical_gc_ancestors`, where
the logs show that the pageserver is creating image layers that should
cause child shards to no longer reference their parents' layers, but
then the scrubber runs and doesn't find any unreferenced layers.[


https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/78ea06dea6ba8dd3

From inspecting the code & test, it seems like this could be as simple
as the test failing to wait for uploads before running the scrubber. It
had a 2 second delay built in to satisfy the scrubbers time threshold
checks, which on a lightly loaded machine would also have been easily
enough for uploads to complete, but our test machines are more heavily
loaded all the time.

## Summary of changes

- Wait for uploads to complete after generating images layers in
test_scrubber_physical_gc_ancestors, so that the scrubber should
reliably see the post-compaction metadata.
2025-01-03 10:55:07 +00:00
John Spray
2d4f267983 cargo: update diesel, pq-sys (#10256)
## Problem

Versions of `diesel` and `pq-sys` were somewhat stale. I was checking on
libpq->openssl versions while investigating a segfault via
https://github.com/neondatabase/cloud/issues/21010. I don't think these
rust bindings are likely to be the source of issues, but we might as
well freshen them as a precaution.

## Summary of changes

- Update diesel to 2.2.6
- Update pq-sys to 0.6.3
2025-01-03 10:20:18 +00:00
Ivan Efremov
7a598b9842 [proxy/docs]imprv: Add local testing section to proxy README (#10230)
Add commands to run proxy locally with the mocked control plane
2025-01-03 10:04:58 +00:00
Tristan Partin
eefad27538 Inline various migration queries (#10231)
There was no value in saving them off to temporary variables.

Signed-off-by: Tristan Partin <tristan@neon.tech>

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-02 22:12:56 +00:00
Em Sharnoff
cd10c719f9 compute: Add spec support for disabling LFC resizing (#10132)
ref neondatabase/cloud#21731

## Problem

When we manually override the LFC size for particular computes,
autoscaling will typically undo that because vm-monitor will resize LFC
itself.

So, we'd like a way to make vm-monitor not set LFC size — this actually
already exists, if we just don't give vm-monitor a postgres connection
string.

## Summary of changes

Add a new field to the compute spec, `disable_lfc_resizing`. When set to
`true`, we pass in `None` for its postgres connection string. That
matches the configuration tested in `neondatabase/autoscaling` CI.
2025-01-02 19:45:59 +00:00
Tristan Partin
363ea97f69 Add more substantial tests for compute migrations (#9811)
The previous tests really didn't do much. This set should be quite a bit
more encompassing.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2025-01-02 18:37:50 +00:00
Conrad Ludgate
56e6ebfe17 chore: building compute_tools and local_proxy together (#10257)
## Problem

Building local_proxy and compute_tools features the same dependency
tree, but as they are currently built in separate clean layers all that
progress is wasted. For our arm builds that's an extra 10 minutes.

## Summary of changes

Combines the compute_tools and local_proxy build layers.
2025-01-02 16:05:14 +00:00
Raphael 'kena' Poss
1622fd8bda proxy: recognize but ignore the 3 new redis message types (#10197)
## Problem

https://neondb.slack.com/archives/C085MBDUSS2/p1734604792755369

## Summary of changes

Recognize and ignore the 3 new broadcast messages:
- `/block_public_or_vpc_access_updated`
- `/allowed_vpc_endpoints_updated_for_org`
- `/allowed_vpc_endpoints_updated_for_projects`
2025-01-02 16:02:48 +00:00
Konstantin Knizhnik
8c7dcd2598 Set heartbeat interval for chaos test (#10222)
## Problem

See https://neondb.slack.com/archives/C033RQ5SPDH/p1734707873215729

test_timeline_archival_chaos becomes more flaky with increased heartbeat
interval

Resolves #10250.

## Summary of changes

Override heatbeat interval for `test_timelirn_archival_chaos.py`

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-02 14:14:18 +00:00
Folke Behrens
ee22d4c9ef proxy: Set TCP_NODELAY for compute connections (#10240)
neondatabase/cloud#19184
2025-01-02 13:32:24 +00:00
JC Grünhage
26600f2973 Skip running clippy without default features (#10098)
## Problem

Running clippy with `cargo hack --feature-powerset` in CI isn't
particularly fast. This PR follows-up on
https://github.com/neondatabase/neon/pull/8912 to improve the speed of
our clippy runs.

Parallelism as suggested in
https://github.com/neondatabase/neon/issues/9901 was tested, but didn't
show consistent enough improvements to be worth it. It actually
increased the amount of work done, as there's less cache hits when
clippy runs are spread out over multiple target directories.
Additionally, parallelism makes it so caching needs to be thought about
more actively and copying around target directories to enable
parallelism eats the rest of the performance gains from parallel
execution.

After some discussion, the decision was to instead cut down on the
number of jobs that are running further. The easiest way to do this is
to not run clippy *without* default features. The list of default
features is empty for all crates, and I haven't found anything using
`cfg(feature = "default")` either, so this is likely not going to change
anything except speeding the runs up.

## Summary of changes

Reduce the amount of feature combinations tried by `cargo hack` (as
suggested in
https://github.com/neondatabase/neon/pull/8912#pullrequestreview-2286482368)
by never disabling default features.

## Alternatives

- We can split things out into different jobs which reduces the time
until everything is finished by running more things in parallel. This
does however decreases the amount of cache hits and increases the amount
of time spent on overhead tasks like repo cloning and restoring caches
by doing those multiple times instead of once.
- We could replace `cargo hack [...] clippy` with `cargo clippy [...];
cargo clippy --features testing`. I'm not 100% sure how this compares to
the change here in the PR, but it does seem to run a bit faster. That
likely means it's doing less work, but without understanding what
exactly we loose by that I'd rather not do that for now. I'd appreciate
input on this though.
2025-01-02 11:33:42 +00:00
Konstantin Knizhnik
b3cd883f93 Unlock LFC mutex when LFC cache is disabled (#10235)
## Problem

See https://github.com/neondatabase/neon/issues/10233
`lfc_containsv` returns with holding lock when LFC was disabled.
This bug was introduced in commit  78938d1b59

## Summary of changes

Release lock before return.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-01-02 11:28:15 +00:00
Conrad Ludgate
38c7a2abfc chore(proxy): pre-load native tls certificates and propagate compute client config (#10182)
Now that we construct the TLS client config for cancellation as well as
connect, it feels appropriate to construct the same config once and
re-use it elsewhere. It might also help should #7500 require any extra
setup, so we can easily add it to all the appropriate call sites.
2025-01-02 09:36:13 +00:00
Conrad Ludgate
f94248a594 chore(libs/proxy): refactor tokio-postgres connection control flow (#10247)
In #10207 it was clear there was some confusion with the current
connection logic. To analyse the flow to make sure there was no poll
stalling, I ended up with the following refactor.

Notable changes:
1. Now all functions called `poll_xyz` and that have a `cx: &mut
Context` argument must return a `Poll<_>` type, and can only return
`Pending` iff an internal poll call also returned `Pending`
2. State management is handled entirely by `poll_messages`. There are
now only 2 states which makes it much easier to keep track of.

Each commit should be self-reviewable and should be simple to verify
that it keeps the same behaviour
2025-01-02 09:35:28 +00:00
Alex Chi Z.
9c53b41245 fix(pageserver): update remote latest_gc_cutoff after gc-compaction (#10209)
## Problem

close https://github.com/neondatabase/neon/issues/10208
part of #9114 

## Summary of changes

* Ensure remote `latest_gc_cutoff` is up-to-date before removing any
files for gc-compaction.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-19 18:40:20 +00:00
Konstantin Knizhnik
197a89ab3d Increase default stotrage controller heartbeat interval from 100msec … (#10206)
## Problem

Currently default value of storage controller heartbeat interval is
100msec. It means that 10 times per second it establish connection to
PS. And it seems to be quite expensive.
At MacOS right now storage_controller consumes 70% CPU and trusts - 30%.
So together they completely utilize one core.
A lot of us has Macs. Let's save environment a little bit and do not
waste electricity and contribute to global warming.

By the way, on prod we have interval  10seconds 

## Summary of changes

Increase heartbeat interval from 100msec to 1 second.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-19 18:32:32 +00:00
Alex Chi Z.
b89e02f3e8 fix(pageserver): consider partial compaction layer map in layer check (#10044)
## Problem

In https://github.com/neondatabase/neon/pull/9897 we temporarily
disabled the layer valid check because the current one only considers
the end result of all compaction algorithms, but partial gc-compaction
would temporarily produce an "invalid" layer map.

part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

Allow LSN splits to overlap in the slow path check. Currently, the valid
check is only used in storage scrubber (background job) and during
gc-compaction (without taking layer lock). Therefore, it's fine for such
checks to be a little bit inefficient but more accurate.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-12-19 18:04:53 +00:00
Konstantin Knizhnik
04517c6ff3 Do not reload config file on PS reconnect (#10204)
## Problem

See https://github.com/neondatabase/neon/issues/10184
and
https://neondb.slack.com/archives/C04DGM6SMTM/p1733997259898819

Reloading config file inside parallel worker cause it's termination

## Summary of changes

Remove call of `HandleMainLoopInterrupts()` 
Update of page server URL is propagated by postmaster through shared
memory and we should not reload config for it.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-19 15:22:39 +00:00
Vlad Lazar
628451d68e safekeeper: short-circuit interpreted wal sender (#10202)
## Problem

Safekeeper may currently send a batch to the pageserver even if it
hasn't decoded a new record.
I think this is quite unlikely in the field, but worth adressing.

## Summary of changes

Don't send anything if we haven't decoded a full record. Once this
merges and releases, the `InterpretedWalRecords` struct can be updated
to remove the Option wrapper for `next_record_lsn`.
2024-12-19 14:04:46 +00:00
Vlad Lazar
502d512fe2 safekeeper: lift benchmarking utils into safekeeper crate (#10200)
## Problem

The benchmarking utilities are also useful for testing. We want to write
tests in the safekeeper crate.

## Summary of changes

This commit lifts the utils to the safekeeper crate. They are compiled
if the benchmarking features is enabled or if in test mode.
2024-12-19 14:04:42 +00:00
John Spray
afda6d4700 storage_scrubber: don't report half-created timelines as corruption (#10198)
## Problem

test_timeline_archival_chaos does timeline creation with failure
injection, and thereby sometimes leaves timelines in a part created
state. This was being reported as corruption by the scrubber on test
teardown, because it considered a layer without an index to be an
invalid state. This was incorrect: the scrubber should accept this
state, it occurs legitimately during timeline creation.

Closes: https://github.com/neondatabase/neon/issues/9988

## Summary of changes

- Report a timeline with layers but no index as Relic rather than
MissingIndexPart.
- We retain the MissingIndexPart variant for the case where an index
_was_ found in the listing, but was not found by a subsequent GET, i.e.
racing with deletion.
2024-12-19 12:55:05 +00:00
John Spray
65042cbadd tests: use high IO concurrency in test_pgdata_import_smoke, use effective_io_concurrency=2 in tests by default (#10114)
## Problem

`test_pgdata_import_smoke` writes two gigabytes of pages and then reads
them back serially. This is CPU bottlenecked and results in a long
runtime, and sensitivity to CPU load from other tests on the same
machine.

Closes: https://github.com/neondatabase/neon/issues/10071

## Summary of changes

- Use effective_io_concurrency=32 when doing sequential scans through
2GiB of pages in test_pgdata_import_smoke. This is a ~10x runtime
decrease in the parts of the test that do sequential scans.
- Also set `effective_io_concurrency=2` for tests, as I noticed while
debugging that we were doing all getpage requests serially, which is bad
for checking the stability of the batching code.
2024-12-19 10:58:49 +00:00
Folke Behrens
b135194090 proxy: Delay SASL complete message until auth is done (#10189)
The final SASL complete message can be bundled with the remainder of the
auth flow messages until ReadyForQuery.

neondatabase/cloud#19184
2024-12-19 10:37:08 +00:00
Peter Bendel
43dc03459d Run pgbench on 10 GB scale factor on database with n relations (e.g. 10k) (#10172)
## Problem

We want to verify how much / if pgbench throughput and latency on Neon
suffers if the database contains many other relations, too.

## Summary of changes

Modify the benchmarking.yml pgbench-compare job to
- create an addiitional project at scale factor 10 GiB
- before running pgbench add n tables (initially 10k) to the database
- then compare the pgbench throughput and latency to the existing
pgbench-compare at 10 Gib scale factor

We use a realistic template for the n relations that is a partitioned
table with some realistic data types, indexes and constraints - similar
to a table that we use internally.

Example run:
https://github.com/neondatabase/neon/actions/runs/12377565956/job/34547386959
2024-12-19 10:25:44 +00:00
Christian Schwarz
a1b0558493 fast import: importer: use aws s3 cli (#10162)
## Problem

s5cmd doesn't pick up the pod service account

```
2024/12/16 16:26:01 Ignoring, HTTP credential provider invalid endpoint host, "169.254.170.23", only loopback hosts are allowed. <nil>
ERROR "ls s3://neon-dev-bulk-import-us-east-2/import-pgdata/fast-import/v1/br-wandering-hall-w2xobawv": NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors
```

## Summary of changes

Switch to offical CLI.


## Testing

Tested the pre-merge image in staging, using `job_image` override in
project settings.


https://neondb.slack.com/archives/C033RQ5SPDH/p1734554944391949?thread_ts=1734368383.258759&cid=C033RQ5SPDH

## Future Work

Switch back to s5cmd once https://github.com/peak/s5cmd/pull/769 gets
merged.

## Refs

- fixes https://github.com/neondatabase/cloud/issues/21876

---------

Co-authored-by: Gleb Novikov <NanoBjorn@users.noreply.github.com>
2024-12-19 10:04:17 +00:00
Alex Chi Z.
cc138b56f9 fix(pageserver): run psql in thread to avoid blocking (#10177)
## Problem

ref https://github.com/neondatabase/neon/issues/10170
ref https://github.com/neondatabase/neon/issues/9994

The psql command will block the main thread, causing other async tasks
to timeout (i.e., HTTP connect). Therefore, we need to move it to an I/O
executor thread.

## Summary of changes

* run psql connection in a thread

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
2024-12-19 09:45:06 +00:00
Konstantin Knizhnik
61fcf64c22 Fix flukyness of test_physical_and_logical_replicaiton.py (#10176)
## Problem

See https://github.com/neondatabase/neon/issues/10037
test_physical_and_logical_replication.py sometimes failed.

## Summary of changes

Add `wait_replica_caughtup` to wait for replica sync

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-18 19:15:38 +00:00
Alex Chi Z.
6d3e8096fc refactor(test): tighten up test_gc_feedback (#10126)
## Problem

In https://github.com/neondatabase/neon/pull/8103 we changed the test
case to have more test coverage of gc_compaction. Now that we have
`test_gc_compaction_smoke`, we can revert this test case to serve its
original purpose and revert the parameter changes.

part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

* Revert pitr_interval from 60s to 10s.
* Assert the physical/logical size ratio in the benchmark.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-12-18 18:10:05 +00:00
Alex Chi Z.
3d1c3a80ae feat(pageserver): add compact queue http endpoint (#10173)
## Problem

We cannot get the size of the compaction queue and access the info.

Part of #9114 

## Summary of changes

* Add an API endpoint to get the compaction queue.
* gc_compaction test case now waits until the compaction finishes.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-18 18:09:02 +00:00
John Spray
835287ba3a neon_local: add a flock to protect against concurrent execution (#10185)
## Problem

`neon_local` has always been unsafe to run concurrently with itself: it
uses simple text files for persistent state, and concurrent runs will
step on each other.

In some test environments we intentionally handle this with mutexes in
python land, but it's fragile to try and always remember to do that.

## Summary of changes

- Add a `flock` based mutex around the `main` function of neon_local,
using the repo directory as the file to lock
- Clean up an Option<> around control_plane_api, this is a drive-by
change because it was one of the fields that had a weird effect when
previous concurrent stuff stamped on it.
2024-12-18 16:29:47 +00:00
Conrad Ludgate
d63602cc78 chore(proxy): fully remove allow-self-signed-compute flag (#10168)
When https://github.com/neondatabase/cloud/pull/21856 is merged, this
flag is no longer necessary.
2024-12-18 16:03:14 +00:00
Erik Grinaker
1668d39b7c safekeeper: fix typo in allowlist for /profile/heap (#10186) 2024-12-18 15:51:53 +00:00
Alex Chi Z.
1d12efc428 fix(pageserver): allow repartition errors during gc-compaction smoke tests (#10164)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

In https://github.com/neondatabase/neon/pull/10127 we fixed the race,
but we didn't add the errors to the allowlist.

## Summary of changes

* Allow repartition errors in the gc-compaction smoke test.

I think it might be worth to refactor the code to allow multiple threads
getting a copy of repartition status (i.e., using Rcu) in the future.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-18 15:37:26 +00:00
Arpad Müller
85696297c5 Add safekeepers command to storcon_cli for listing (#10151)
Add a `safekeepers` subcommand to `storcon_cli` that allows listing the
safekeepers.

```
$ curl -X POST --url http://localhost:1234/control/v1/safekeeper/42 --data \
  '{"active":true, "id":42, "created_at":"2023-10-25T09:11:25Z", "updated_at":"2024-08-28T11:32:43Z","region_id":"neon_local","host":"localhost","port":5454,"http_port":0,"version":123,"availability_zone_id":"us-east-2b"}'
$ cargo run --bin storcon_cli  -- --api http://localhost:1234 safekeepers
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.38s
     Running `target/debug/storcon_cli --api 'http://localhost:1234' safekeepers`
+----+---------+-----------+------+-----------+------------+
| Id | Version | Host      | Port | Http Port | AZ Id      |
+==========================================================+
| 42 | 123     | localhost | 5454 | 0         | us-east-2b |
+----+---------+-----------+------+-----------+------------+
```

Also:

* Don't return the raw `SafekeeperPersistence` struct that contains the
raw database presentation, but instead a new
`SafekeeperDescribeResponse` struct.
* The `SafekeeperPersistence` struct leaves out the `active` field on
purpose because we want to deprecate it and replace it with a
`scheduling_policy` one.

Part of https://github.com/neondatabase/neon/issues/9981
2024-12-18 12:47:56 +00:00
Konstantin Knizhnik
aaf980f70d Online checkpoint replication state (#9976)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1733180965970089

Replication state is checkpointed only by shutdown checkpoint.
It means that replication snapshots are not removed till compute
shutdown.

## Summary of changes

Checkpoint replication state during online checkpoint

Related Postgres PR:
https://github.com/neondatabase/postgres/pull/546

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-18 09:34:38 +00:00
Conrad Ludgate
a354071dd0 Merge pull request #10180 from neondatabase/rc/release-proxy/2024-12-17
Proxy release 2024-12-17
2024-12-18 06:31:05 +00:00
github-actions[bot]
758680d4f8 Proxy release 2024-12-17 2024-12-17 22:06:42 +00:00
a-masterov
c52514ab02 Fix allure report creation on periodic pg_regress testing (#10171)
## Problem
The allure report finishes with the error `HttpError: Resource not
accessible by integration` while running the `pg_regress` test against a
cloud staging project due to a lack of permissions.
## Summary of changes
The permissions are added.
2024-12-17 20:47:44 +00:00
Conrad Ludgate
2ee6bc5ec4 chore(proxy): update vendored postgres libs to edition 2021 (#10139)
I ran `cargo fix --edition` in each project prior, and it found nothing
that needed fixing.
2024-12-17 20:06:18 +00:00
John Spray
fd230227f2 storcon: include preferred AZ in compute notifications (#9953)
## Problem

It is unreliable for the control plane to infer the AZ for computes from
where the tenant is currently attached, because if a tenant happens to
be in a degraded state or a release is ongoing while a compute starts,
then the tenant's attached AZ can be a different one to where it will
run long-term, and the control plane doesn't check back later to restart
the compute.

This can land in parallel with
https://github.com/neondatabase/neon/pull/9947

## Summary of changes

- Thread through the preferred AZ into the compute hook code via the
reconciler
- Include the preferred AZ in the body of compute hook notifications
2024-12-17 20:04:09 +00:00
Ivan Efremov
93e958341f [proxy]: Use TLS for cancellation queries (#10152)
## Problem
pg_sni_router assumes that all the streams are upgradable to TLS.
Cancellation requests were declined because of using NoTls config.

## Summary of changes
Provide TLS client config for cancellation requests.

Fixes
[#21789](https://github.com/orgs/neondatabase/projects/65/views/1?pane=issue&itemId=90911361&issue=neondatabase%7Ccloud%7C21789)
2024-12-17 19:26:54 +00:00
Tristan Partin
7dddbb9570 Add pg_repack extension (#10100)
Our solutions engineers and some customers would like to have this
extension available.

Link: https://github.com/neondatabase/cloud/issues/18890

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-17 18:36:55 +00:00
Erik Grinaker
a55853f67f utils: symbolize heap profiles (#10153)
## Problem

Jemalloc heap profiles aren't symbolized. This is inconvenient, and
doesn't work with Grafana Cloud Profiles.

Resolves #9964.

## Summary of changes

Symbolize the heap profiles in-process, and strip unnecessary cruft.

This uses about 100 MB additional memory to cache the DWARF information,
but I believe this is already the case with CPU profiles, which use the
same library for symbolization. With cached DWARF information, the
symbolization CPU overhead is negligible.

Example profiles:

*
[pageserver.pb.gz](https://github.com/user-attachments/files/18141395/pageserver.pb.gz)
*
[safekeeper.pb.gz](https://github.com/user-attachments/files/18141396/safekeeper.pb.gz)
2024-12-17 16:51:58 +00:00
Mikhail Kot
007b13b79a Don't build tests in compute image, use ninja (#10149)
Don't build tests in h3 and rdkit: ~15 min speedup.
Use Ninja as cmake generator where possible: ~10 min speedup.
Clean apt cache for smaller images: around 250mb size loss for
intermediate layers
2024-12-17 16:43:54 +00:00
Alexey Kondratov
2dfd3cab8c fix(compute): Report compute_backpressure_throttling_seconds as counter (#10125)
## Problem

It was reported as `gauge`, but it's actually a `counter`.

Also add `_total` suffix as that's the convention for counters.

The corresponding flux-fleet PR:
https://github.com/neondatabase/flux-fleet/pull/386
2024-12-17 16:14:07 +00:00
John Spray
b5833ef259 remote_storage: configurable connection pooling for ABS (#10169)
## Problem

The ABS SDK's default behavior is to do no connection pooling, i.e. open
and close a fresh connection for each request. Under high request rates,
this can result in an accumulation of TCP connections in TIME_WAIT or
CLOSE_WAIT state, and in extreme cases exhaustion of client ports.

Related: https://github.com/neondatabase/cloud/issues/20971

## Summary of changes

- Add a configurable `conn_pool_size` parameter for Azure storage,
defaulting to zero (current behavior)
- Construct a custom reqwest client using this connection pool size.
2024-12-17 12:24:51 +00:00
Erik Grinaker
b0e43c2f88 postgres_ffi: add WalStreamDecoder::complete_record() benchmark (#10158)
Touches #10097.
2024-12-17 10:35:00 +00:00
a-masterov
e226d7a3d1 Fix docker compose with PG17 (#10165)
## Problem
It's impossible to run docker compose with compute v17 due to `pg_anon`
extension which is not supported under PG17.
## Summary of changes
The auto-loading of `pg_anon` is disabled by default
2024-12-17 08:16:54 +00:00
Folke Behrens
aa7ab9b3ac proxy: Allow dumping TLS session keys for debugging (#10163)
## Problem

To debug issues with TLS connections there's no easy way to decrypt
packets unless a client has special support for logging the keys.

## Summary of changes

Add TLS session keys logging to proxy via `SSLKEYLOGFILE` env var gated
by flag.
2024-12-16 18:56:24 +00:00
Erik Grinaker
28ccda0a63 test_runner: ignore error in test_timeline_archival_chaos (#10161)
Resolves #10159.
2024-12-16 17:10:55 +00:00
Conrad Ludgate
59b7ff8988 chore(proxy): disallow unwrap and unimplemented (#10142)
As the title says, I updated the lint rules to no longer allow unwrap or
unimplemented.

Three special cases:
* Tests are allowed to use them
* std::sync::Mutex lock().unwrap() is common because it's usually
correct to continue panicking on poison
* `tokio::spawn_blocking(...).await.unwrap()` is common because it will
only error if the blocking fn panics, so continuing the panic is also
correct

I've introduced two extension traits to help with these last two, that
are a bit more explicit so they don't need an expect message every time.
2024-12-16 16:37:15 +00:00
Conrad Ludgate
2e4c9c5704 chore(proxy): remove allow_self_signed from regular proxy (#10157)
I noticed that the only place we use this flag is for testing console
redirect proxy. Makes sense to me to make this assumption more explicit.
2024-12-16 16:11:39 +00:00
Erik Grinaker
3d30a7a934 pageserver: make RemoteTimelineClient::schedule_index_upload infallible (#10155)
Remove an unnecessary `Result` and address a `FIXME`.
2024-12-16 15:54:47 +00:00
Conrad Ludgate
6565fd4056 chore: fix clippy lints 2024-12-06 (#10138) 2024-12-16 15:33:21 +00:00
Arseny Sher
c5e3314c6e Add test restarting compute at WAL page boundary (#10111)
## Problem

We've had similar test in test_logical_replication, but then removed it
because it wasn't needed to trigger LR related bug. Restarting at WAL
page boundary is still a useful test, so add it separately back.

## Summary of changes

Add the test.
2024-12-16 14:53:04 +00:00
Arseny Sher
1ed0e52bc8 Extract safekeeper http client to separate crate. (#10140)
## Problem

We want to use safekeeper http client in storage controller and
neon_local.

## Summary of changes

Extract it to separate crate. No functional changes.
2024-12-16 12:07:24 +00:00
Conrad Ludgate
24d6587914 chore(proxy): refactor self-signed config (#10154)
## Problem

While reviewing #10152 I found it tricky to actually determine whether
the connection used `allow_self_signed_compute` or not.

I've tried to remove this setting in the past:
* https://github.com/neondatabase/neon/pull/7884
* https://github.com/neondatabase/neon/pull/7437
* https://github.com/neondatabase/cloud/pull/13702

But each time it seems it is used by e2e tests

## Summary of changes

The `node_info.allow_self_signed_computes` is always initialised to
false, and then sometimes inherits the proxy config value. There's no
need this needs to be in the node_info, so removing it and propagating
it via `TcpMechansim` is simpler.
2024-12-16 11:15:25 +00:00
John Spray
ebcbc1a482 pageserver: tighten up code around SLRU dir key handling (#10082)
## Problem

Changes in #9786 were functionally complete but missed some edges that
made testing less robust than it should have been:
- `is_key_disposable` didn't consider SLRU dir keys disposable
- Timeline `init_empty` was always creating SLRU dir keys on all shards

The result was that when we had a bug
(https://github.com/neondatabase/neon/pull/10080), it wasn't apparent in
tests, because one would only encounter the issue if running on a
long-lived timeline with enough compaction to drop the initially created
empty SLRU dir keys, _and_ some CLog truncation going on.

Closes: https://github.com/neondatabase/cloud/issues/21516

## Summary of changes

- Update is_key_global and init_empty to handle SLRU dir keys properly
-- the only functional impact is that we avoid writing some spurious
keys in shards >0, but this makes testing much more robust.
- Make `test_clog_truncate` explicitly use a sharded tenant

The net result is that if one reverts #10080, then tests fail (i.e. this
PR is a reproducer for the issue)
2024-12-16 10:06:08 +00:00
Konstantin Knizhnik
117c1b5dde Do not perform prefetch for temp relations (#10146)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1734002916827019

With recent prefetch fixes for pg17 and `effective_io_concurrency=100` 
pg_regress test stats.sql is failed when set temp_buffers to 100.
Stream API will try to lock all this 100 buffers for prefetch.

## Summary of changes

Disable such behaviour for temp relations.
Postgres PR: https://github.com/neondatabase/postgres/pull/548

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-16 06:03:53 +00:00
Erik Grinaker
f3ecd5d76a pageserver: revert flush backpressure (#8550) (#10135)
## Problem

In #8550, we made the flush loop wait for uploads after every layer.
This was to avoid unbounded buildup of uploads, and to reduce compaction
debt. However, the approach has several problems:

* It prevents upload parallelism.
* It prevents flush and upload pipelining.
* It slows down ingestion even when there is no need to backpressure.
* It does not directly backpressure WAL ingestion (only via
`disk_consistent_lsn`), and will build up in-memory layers.
* It does not directly backpressure based on compaction debt and read
amplification.

An alternative solution to these problems is proposed in #8390.

In the meanwhile, we revert the change to reduce the impact on ingest
throughput. This does reintroduce some risk of unbounded
upload/compaction buildup. Until
https://github.com/neondatabase/neon/issues/8390, this can be addressed
in other ways:

* Use `max_replication_apply_lag` (aka `remote_consistent_lsn`), which
will more directly limit upload debt.
* Shard the tenant, which will spread the flush/upload work across more
Pageservers and move the bottleneck to Safekeeper.

Touches #10095.

## Summary of changes

Remove waiting on the upload queue in the flush loop.
2024-12-15 09:45:12 +00:00
Mikhail Kot
cf161e1556 fix(adapter): password not set in role drop (#10130)
## Problem

When entry was dropped and password wasn't set, new entry
had uninitialized memory in controlplane adapter

Resolves: https://github.com/neondatabase/cloud/issues/14914

## Summary of changes

Initialize password in all cases, add tests.
Minor formatting for less indentation
2024-12-14 17:37:13 +00:00
Konstantin Knizhnik
2521eba674 Check for invalid down link while prefetching B-Tree leave pages for index-only scan (#9867)
## Problem

See #9866

Index-only scan prefetch implementation doesn't take in account that
down link may be invalid

## Summary of changes

Check that downlink is valid block number


Correspondent Postgres PRs:
https://github.com/neondatabase/postgres/pull/534
https://github.com/neondatabase/postgres/pull/535
https://github.com/neondatabase/postgres/pull/536
https://github.com/neondatabase/postgres/pull/537

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-13 20:46:41 +00:00
Alexander Bayandin
d56fea680e CI: always require aws-oicd-role-arn input to be set (#10145)
## Problem
`benchmarking` job fails because `aws-oicd-role-arn` input is not set

## Summary of changes:
- Set `aws-oicd-role-arn` for `benchmarking job
- Always require `aws-oicd-role-arn` to be set
- Rename `aws_oicd_role_arn` to `aws-oicd-role-arn` for consistency
2024-12-13 19:56:32 +00:00
Alex Chi Z.
7ee5dca752 fix(pageserver): race between gc-compaction and repartition (#10127)
## Problem

close https://github.com/neondatabase/neon/issues/10124

gc-compaction split_gc_jobs is holding the repartition lock for too long
time.

## Summary of changes

* Ensure split_gc_compaction_jobs drops the repartition lock once it
finishes cloning the structures.
* Update comments.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-13 18:22:25 +00:00
Tristan Partin
07d1db54b3 Improve comments and log messages in the logical replication monitor (#9974)
Improved comments will help others when they read the code, and the log
messages will help others understand why the logical replication monitor
works the way it does.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-13 18:10:42 +00:00
Konstantin Knizhnik
eeabecd89f Correctly update LFC used_pages in case of LFC resize (#10128)
## Problem

LFC used_pages statistic is not updated in case of LFC resize (shrinking
`neon.file_cache_size_limit`)

## Summary of changes

Update `lfc_ctl->used_pages` in `lfc_change_limit_hook`

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-13 17:40:26 +00:00
Christian Schwarz
fcff752851 fix(test_timeline_archival_chaos): flakiness caused by orphan layers (#10083)
The test was failing with the scary but generic message `Remote storage
metadata corrupted`.

The underlying scrubber error is `Orphan layer detected: ...`.

The test kills pageserver at random points, hence it's expected that we
leak layers if we're killed in the window after layer upload but before
it's referenced from index part.

Refer to generation numbers RFC for details.

Refs:
- fixes https://github.com/neondatabase/neon/issues/9988
- root-cause analysis
https://github.com/neondatabase/neon/issues/9988#issuecomment-2520673167
2024-12-13 16:28:21 +00:00
Alexander Bayandin
2c91062828 test_prefetch: reduce timeout to default 5m from 10m (#10105)
## Problem

`test_prefetch` is flaky
(https://github.com/neondatabase/neon/issues/9961), but if it passes,
the run time is less than 30 seconds — we don't need an extended timeout
for it.

## Summary of changes
- Remove extended test timeout for `test_prefetch`
2024-12-13 14:52:54 +00:00
Arseny Sher
ce8eb089f3 Extract public sk types to safekeeper_api (#10137)
## Problem

We want to extract safekeeper http client to separate crate for use in
storage controller and neon_local. However, many types used in the API
are internal to safekeeper.

## Summary of changes

Move them to safekeeper_api crate. No functional changes.

ref https://github.com/neondatabase/neon/issues/9011
2024-12-13 14:06:27 +00:00
a-masterov
7dc382601c Fix pg_regress tests on a cloud staging instance (#10134)
## Problem
pg_regress tests start failing due to unique ids added to Neon error
messages
## Summary of changes
Patches updated
2024-12-13 13:59:04 +00:00
Rahul Patil
2451969d5c fix(ci): Allow github-action-script to post reports (#10136)
Allow github-action-script to post reports.

Failed CI:
https://github.com/neondatabase/neon/actions/runs/12304655364/job/34342554049#step:13:514
2024-12-13 12:22:15 +00:00
JC Grünhage
59ef701925 CI(deploy): fix git tag/release creation (#10119)
## Problem

When moving the comment on proxy-releases from the yaml doc into a
javascript code block, I missed converting the comment marker from `#`
to `//`.

## Summary of changes

Correctly convert comment marker.
2024-12-12 23:38:20 +00:00
Alexander Bayandin
ac04bad457 CI: don't run debug builds with LFC (#10123)
## Problem

I've noticed that debug builds with LFC fail more frequently and for
some reason ,their failure do block merging (but it should not)

## Summary of changes
- Do not run Debug builds with LFC
2024-12-12 22:55:38 +00:00
Peter Bendel
2f3f98a319 use OIDC role instead of AWS access keys for managing test runner (#10117)
in periodic pagebench workflow

## Problem

for background see https://github.com/neondatabase/cloud/issues/21545

## Summary of changes

use OIDC role to manage runners instead of AWS access key which needs to
be periodically rotated

## logs

seems to work in
https://github.com/neondatabase/neon/actions/runs/12298575888/job/34322306127#step:6:1
2024-12-12 20:25:39 +00:00
Alex Chi Z.
5ff4b991c7 feat(pageserver): gc-compaction split over LSN (#9900)
## Problem

part of https://github.com/neondatabase/neon/issues/9114, stacked PR
over https://github.com/neondatabase/neon/pull/9897, partially
refactored to help with
https://github.com/neondatabase/neon/issues/10031

## Summary of changes

* gc-compaction takes `above_lsn` parameter. We only compact the layers
above this LSN, and all data below the LSN are treated as if they are on
the ancestor branch.
* refactored gc-compaction to take `GcCompactJob` that describes the
rectangular range to be compacted.
* Added unit test for this case.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-12 20:23:24 +00:00
John Spray
a93e3d31cc storcon: refine logic for choosing AZ on tenant creation (#10054)
## Problem

When we update our scheduler/optimization code to respect AZs properly
(https://github.com/neondatabase/neon/pull/9916), the choice of AZ
becomes a much higher-stakes decision. We will pretty much always run a
tenant in its preferred AZ, and that AZ is fixed for the lifetime of the
tenant (unless a human intervenes)

Eventually, when we do auto-balancing based on utilization, I anticipate
that part of that will be to automatically change the AZ of tenants if
our original scheduling decisions have caused imbalance, but as an
interim measure, we can at least avoid making this scheduling decision
based purely on which AZ contains the emptiest node.

This is a precursor to https://github.com/neondatabase/neon/pull/9947

## Summary of changes

- When creating a tenant, instead of scheduling a shard and then reading
its preferred AZ back, make the AZ decision first.
- Instead of choosing AZ based on which node is emptiest, use the median
utilization of nodes in each AZ to pick the AZ to use. This avoids bad
AZ decisions during periods when some node has very low utilization
(such as after replacing a dead node)

I considered also making the selection a weighted pseudo-random choice
based on utilization, but wanted to avoid destabilising tests with that
for now.
2024-12-12 19:35:38 +00:00
Rahul Patil
6d5687521b fix(ci): Allow github-script to post test reports (#10120)
Allow github-script to post test reports
2024-12-12 18:53:35 +00:00
Heikki Linnakangas
53721266f1 Disable connection logging in pgbouncer by default (#10118)
It can produce a lot of logs, making pgbouncer itself consume all CPU in
extreme cases. We saw that happen in stress testing.
2024-12-12 17:05:58 +00:00
a-masterov
2f3433876f Change the channel for notification. (#10112)
## Problem
Now notifications about failures in `pg_regress` tests run on the
staging cloud instance, reach the channel `on-call-staging-stream`,
while they should reach `on-call-qa-staging-stream`
## Summary of changes
The channel changed.
2024-12-12 16:34:07 +00:00
Rahul Patil
58d45c6e86 ci(fix): Use OIDC auth to login on ECR (#10055)
## Problem

CI currently uses static credentials in some places. These are less
secure and hard to maintain, so we are going to deprecate them and use
OIDC auth.

## Summary of changes
- ci(fix): Use OIDC auth to upload artifact on s3
- ci(fix): Use OIDC auth to login on ECR
2024-12-12 15:13:08 +00:00
Conrad Ludgate
e502e880b5 chore(proxy): remove code for old API (#10109)
## Problem

Now that https://github.com/neondatabase/cloud/issues/15245 is done, we
can remove the old code.

## Summary of changes

Removes support for the ManagementV2 API, in favour of the ProxyV1 API.
2024-12-12 13:42:50 +00:00
Arseny Sher
c9a773af37 Fix test_subscriber_synchronous_commit flakiness. (#10057)
6f7aeaa configured LFC for USE_LFC case, but omitted setting
shared_buffers for non USE_LFC, causing flakiness.

ref https://github.com/neondatabase/neon/issues/9989
2024-12-12 11:57:00 +00:00
Vlad Lazar
ec0ce06c16 tests: default interpreted proto in tests (#10079)
## Problem

We aren't using the sharded interpreted wal receiver protocol in all
tests.

## Summary of changes

Default to the interpreted protocol.
2024-12-12 10:53:10 +00:00
Conrad Ludgate
1738fd0a96 Merge pull request #10107 from neondatabase/rc/release-proxy/2024-12-12
Proxy release 2024-12-12
2024-12-12 10:21:30 +00:00
Conrad Ludgate
87b7edfc72 Merge branch 'release-proxy' into rc/release-proxy/2024-12-12 2024-12-12 09:58:31 +00:00
Alexander Bayandin
0bd8eca9ca Storage: create release PRs On Fridays (#10017)
## Problem

To give Storage more time on preprod — create a release branch on Friday

## Summary of changes
- Automatically create Storage release PR on Friday instead of Monday
2024-12-12 09:18:50 +00:00
Misha Sakhnov
739f627b96 Bump vm-builder v0.35.0 -> v0.37.1 (#10015)
Bump version to pick up changes introduced in the neonvm-daemon to
support sys fs based CPU scaling
(https://github.com/neondatabase/autoscaling/issues/1082).

Previous update: https://github.com/neondatabase/neon/pull/9208
2024-12-12 08:45:52 +00:00
github-actions[bot]
def05700d5 Proxy release 2024-12-12 2024-12-12 06:02:08 +00:00
Arpad Müller
342cbea255 storcon: add safekeeper list API (#10089)
This adds an API to the storage controller to list safekeepers
registered to it.

This PR does a `diesel print-schema > storage_controller/src/schema.rs`
because of an inconsistency between up.sql and schema.rs, introduced by
[this](2c142f14f7)
commit, so there is some updates of `schema.rs` due to that. As a
followup to this, we should maybe think about running `diesel
print-schema` in CI.

Part of #9981
2024-12-12 01:09:24 +00:00
Tristan Partin
b391b29bdc Improve typing in test_runner/fixtures/httpserver.py (#10103)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-11 22:21:42 +00:00
Erik Grinaker
5126ebbfed test_runner: bump test_check_visibility_map timeout (#10091)
## Problem

`test_check_visibility_map` has been seen to time out in debug tests.

## Summary of changes

Bump the timeout to 10 minutes (test reports indicate 7 minutes is
sufficient).

We don't want to disable the test entirely in debug builds, to exercise
this with debug assertions enabled.

Resolves #10069.
2024-12-11 21:37:25 +00:00
Arpad Müller
7fa986bc92 Do tenant manifest validation with index-part (#10007)
This adds some validation of invariants that we want to uphold wrt the
tenant manifest and `index_part.json`:

* the data the manifest has about a timeline must match with the data in
`index_part.json`. It might actually change, e.g. when we do reparenting
during detach ancestor, but that requires the timeline to be
unoffloaded, i.e. removed from the manifest.
* any timeline mentioned in index part, must, if present, be archived.
If we unarchive, we first update the tenant manifest to unoffload, and
only then update index part. And one needs to archive before offloading.
* it is legal for timelines to be mentioned in the manifest but have no
`index_part`: this is a temporary state visible during deletion of the
timeline. if the pageserver crashed, an attach of the tenant will clean
the state up.
* it is also legal for offloaded timelines to have an
`ancestor_retain_lsn` of None while having an `ancestor_timeline_id`.
This is for the to-be-added flattening functionality: the plan is to set
former to None if we have flattened a timeline.

follow-up of #9942
part of #8088
2024-12-11 20:10:22 +00:00
Vlad Lazar
e8395807a5 storcon: allow for more concurrency in drain/fill operations (#10093)
## Problem

We saw the drain/fill operations not drain fast enough in ap-southeast.

## Summary of changes

These are some quick changes to speed it up:
* double reconcile concurrency - this is now half of the available
reconcile bandwidth
* reduce the waiter polling timeout - this way we can spawn new
reconciliations faster
2024-12-11 19:43:40 +00:00
Vlad Lazar
a3e80448e8 pageserver/storcon: add patch endpoints for tenant config metrics (#10020)
## Problem

Cplane and storage controller tenant config changes are not additive.
Any change overrides all existing tenant configs. This would be fine if
both did client side patching, but that's not the case.

Once this merges, we must update cplane to use the PATCH endpoint.

## Summary of changes

### High Level

Allow for patching of tenant configuration with a `PATCH
/v1/tenant/config` endpoint.
It takes the same data as it's PUT counterpart. For example the payload
below will update `gc_period` and unset `compaction_period`. All other
fields are left in their original state.
```
{
  "tenant_id": "1234",
  "gc_period": "10s",
  "compaction_period": null
}
```

### Low Level
* PS and storcon gain `PATCH /v1/tenant/config` endpoints. PS endpoint
is only used for cplane managed instances.
* `storcon_cli` is updated to have separate commands for
`set-tenant-config` and `patch-tenant-config`

Related https://github.com/neondatabase/cloud/issues/21043
2024-12-11 19:16:33 +00:00
Anastasia Lubennikova
ef233e91ef Update compute_installed_extensions metric: (#9891)
add owned_by_superuser field to filter out system extensions.

While on it, also correct related code:
- fix the metric setting: use set() instead of inc() in a loop.
inc() is not idempotent and can lead to incorrect results
if the function called multiple times. Currently it is only called at
compute start, but this will change soon.
- fix the return type of the installed_extensions endpoint
to match the metric. Currently it is only used in the test.
2024-12-11 16:43:26 +00:00
Mikhail Kot
dee2041cd3 walproposer: fix link error on debian 12 / ubuntu 22 (#10090)
## Problem

Linking walproposer library (e.g. `cargo t`) produces linker errors:
/home/myrrc/neon/pgxn/neon/walproposer_compat.c:169: undefined reference
to `pg_snprintf'

The library with these symbols (libpgcommon.a) is present

## Summary of changes

Changed order of libraries resolution for linker
2024-12-11 16:23:59 +00:00
Arseny Sher
e4bb1ca7d8 Increase neon_local http client to compute timeout in reconfigure. (#10088)
Seems like 30s sometimes not enough when CI runners are overloaded,
causing pull_timeline flakiness.

ref
https://github.com/neondatabase/neon/issues/9731#issuecomment-2535946443
2024-12-11 15:46:50 +00:00
a-masterov
b987648e71 Enable LFC for all the PG versions. (#10068)
## Problem
We added support for LFC for tests but are still using it only for the
PG17 release.

## Summary of changes
LFC is enabled for all PG versions. Errors in tests with LFC enabled now
block merging as usual. We keep tests with disabled LFC for PG17
release. Tests on debug builds with LFC enabled still don't affect
permission to merge.
2024-12-11 15:28:10 +00:00
Mikhail Kot
c79c1dd8e9 compute_ctl: don't panic if control plane can't be reached (#10078)
## Problem

If the control plane cannot be reached for some reason, compute_ctl
panics

## Summary of changes

panic is removed in favour of returning an error.
Code is reformatted a bit for more flat control flow

Resolves: #5391
2024-12-11 15:03:11 +00:00
Vlad Lazar
a53db73851 pageserver: don't drop multixact slrus on non zero shards (#10086)
## Problem

We get slru truncation commands on non-zero shards.
Compaction will drop the slru dir keys and ingest will fail when
receiving such records.
https://github.com/neondatabase/neon/pull/10080 fixed it for clog, but
not for multixact.

## Summary of changes

Only truncate multixact slrus on shard zero. I audited the rest of the
ingest code and it looks
fine from this pov.
2024-12-11 14:28:18 +00:00
Christian Schwarz
9ae980bf4f page_service: don't count time spent in Batcher towards smgr latency metrics (#10075)
## Problem

With pipelining enabled, the time a request spends in the batcher stage
counts towards the smgr op latency.

If pipelining is disabled, that time is not accounted for.

In practice, this results in a jump in smgr getpage latencies in various
dashboards and degrades the internal SLO.

## Solution

In a similar vein to #10042 and with a similar rationale, this PR stops
counting the time spent in batcher stage towards smgr op latency.

The smgr op latency metric is reduced to the actual execution time.

Time spent in batcher stage is tracked in a separate histogram.
I expect to remove that histogram after batching rollout is complete,
but it will be helpful in the meantime to reason about the rollout.
2024-12-11 13:37:08 +00:00
Vlad Lazar
665369c439 wal_decoder: fix compact key protobuf encoding (#10074)
## Problem

Protobuf doesn't support 128 bit integers, so we encode the keys as two
64 bit integers. Issue is that when we split the 128 bit compact key we
use signed 64 bit integers to represent the two halves. This may result
in a negative lower half when relnode is larger than `0x00800000`. When
we convert the lower half to an i128 we get a negative `CompactKey`.

## Summary of Changes

Use unsigned integers when encoding into Protobuf.

## Deployment

* Prod: We disabled the interpreted proto, so no compat concerns.
* Staging: Disable the interpreted proto, do one release, and then
release the fixed version.
We do this because a negative int32 will convert to a large uint32 value
and could give
a key in the actual pageserver space. In production we would around this
by adding new
fields to the proto and deprecating the old ones, but we can make our
lives easy here.
* Pre-prod: Same as staging
2024-12-11 12:35:02 +00:00
JC Grünhage
d7aeca2f34 CI(deploy): create git tags/releases before triggering deploy workflows (#10022)
## Problem

When dev deployments are disabled (or fail), the tags for releases
aren't created. It makes more sense to have tag and release creation
before the deployment to prevent situations like
[this](https://github.com/neondatabase/neon/pull/9959).

It is not enough to move the tag creation before the deployment. If the
deployment fails, re-running the job isn't possible because the API call
to create the tag will fail.

## Summary of changes

- Tag/Release creation now happens before the deployment
- The two steps for tag and release have been merged into a bigger one
- There's new checks to ensure the that if the tags/releases already
exist as expected, things will continue just fine.
2024-12-11 09:41:34 +00:00
John Spray
38415a9816 pageserver: fix ingest handling of CLog truncate (#10080)
## Problem

In #9786 we stop storing SLRUs on non-zero shards.

However, there was one code path during ingest that still tries to
enumerate SLRU relations on all shards. This fails if it sees a tenant
who has never seen any write to an SLRU, or who has done such thorough
compaction+GC that it has dropped its SLRU directory key.

## Summary of changes

- Avoid trying to list SLRU relations on nonzero shards
2024-12-11 09:16:11 +00:00
Matthias van de Meent
597125e124 Disable readstream's reliance on seqscan readahead (#9860)
Neon doesn't have seqscan detection of its own, so stop read_stream from
trying to utilize that readahead, and instead make it issue readahead of
its own.

## Problem

@knizhnik noticed that we didn't issue smgrprefetch[v] calls for
seqscans in PG17 due to the move to the read_stream API, which assumes
that the underlying IO facilities do seqscan detection for readahead.
That is a wrong assumption when Neon is involved, so let's remove the
code that applies that assumption.

## Summary of changes
Remove the cases where seqscans are detected and prefetch is disabled as
a consequence, and instead don't do that detection.

PG PR: https://github.com/neondatabase/postgres/pull/532
2024-12-11 00:51:05 +00:00
Matthias van de Meent
e71d20d392 Emit nbtree vacuum cycle id in nbtree xlog through forced FPIs (#9932)
This fixes neondatabase/neon#9929.

## Postgres repo PRS:
- PG17: https://github.com/neondatabase/postgres/pull/538
- PG16: https://github.com/neondatabase/postgres/pull/539
- PG15: https://github.com/neondatabase/postgres/pull/540
- PG14: https://github.com/neondatabase/postgres/pull/541

## Problem
see #9929 

## Summary of changes

We update the split code to force the code to emit an FPI whenever the
cycle ID might be interesting for concurrent btree vacuum.
2024-12-10 19:42:52 +00:00
Alex Chi Z.
aa0554fd1e feat(test_runner): allowed_errors in storage scrubber (#10062)
## Problem

resolve
https://github.com/neondatabase/neon/issues/9988#issuecomment-2528239437

## Summary of changes

* New verbose mode for storage scrubber scan metadata (pageserver) that
contains the error messages.
* Filter allowed_error list from the JSON output to determine the
healthy flag status.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-10 17:00:47 +00:00
Heikki Linnakangas
b853f78136 Print a log message if GetPage response takes too long (#10046)
We have metrics for GetPage request latencies, but this is an extra
measure to capture requests that take way too long in the logs. The log
message is printed every 10 s, until the response is received:

```
PG:2024-12-09 16:02:07.715 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 10.000 s, still waiting (sent 10613 requests, received 10612 responses)
PG:2024-12-09 16:02:17.723 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 20.008 s, still waiting (sent 10613 requests, received 10612 responses)
PG:2024-12-09 16:02:19.719 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] received response from pageserver after 22.006 s
```
2024-12-10 16:26:56 +00:00
Alex Chi Z.
6ad99826c1 fix(pageserver): refresh_gc_info should always increase cutoff (#9862)
## Problem

close https://github.com/neondatabase/cloud/issues/19671

```
Timeline -----------------------------
         ^ last GC happened LSN
              ^ original retention period setting = 24hr
> refresh-gc-info updates the gc_info
              ^ planned cutoff (gc_info)
         ^ customer set retention to 48hr, and it's still within the last GC LSN
         ^1   ^2 we have two choices: (1) update the planned cutoff to
                 move backwards, or (2) keep the current one
```

In this patch, we decided to keep the current cutoff instead of moving
back the gc_info to avoid races. In the future, we could allow the
planned gc cutoff to go back once cplane sends a retention_history
tenant config update, but this requires a careful revisit of the code.

## Summary of changes

Ensure that GC cutoffs never go back if retention settings get changed.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-10 15:23:26 +00:00
Konstantin Knizhnik
311ee793b9 Fix handling in-flight requersts in prefetch buffer resize (#9968)
## Problem

See https://github.com/neondatabase/neon/issues/9961
Current implementation of prefetch buffer resize doesn't correctly
handle in-flight requests

## Summary of changes

1. Fix index of entry we should wait for if new prefetch buffer size is
smaller than number of in-flight requests.
2. Correctly set flush position

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-10 15:01:40 +00:00
Erik Grinaker
ad472bd4a1 test_runner: add visibility map test (#9940)
Verifies that visibility map pages are correctly maintained across
shards.

Touches #9914.
2024-12-10 12:07:00 +00:00
Arpad Müller
c51db1db61 Replace MAX_KEYS_PER_DELETE constant with function (#10061)
Azure has a different per-request limit of 256 items for bulk deletion
compared to the number of 1000 on AWS. Therefore, we need to support
multiple values. Due to `GenericRemoteStorage`, we can't add an
associated constant, but it has to be a function.

The PR replaces the `MAX_KEYS_PER_DELETE` constant with a function of
the same name, implemented on both the `RemoteStorage` trait as well as
on `GenericRemoteStorage`.

The value serves as hint of how many objects to pass to the
`delete_objects` function.

Reading:

* https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch
* https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html

Part of #7931
2024-12-10 11:29:38 +00:00
Ivan Efremov
34c1295594 [proxy] impr: Additional logging for cancellation queries (#10039)
## Problem
Since cancellation tasks spawned in the background sometimes logs
missing context.

https://neondb.slack.com/archives/C060N3SEF9D/p1733427801527419?thread_ts=1733419882.560159&cid=C060N3SEF9D

## Summary of changes
Add `session_id` and change loglevel for cancellation queries
2024-12-10 10:14:28 +00:00
Evan Fleming
b593e51eae safekeeper: use arc for global timelines and config (#10051)
Hello! I was interested in potentially making some contributions to Neon
and looking through the issue backlog I found
[8200](https://github.com/neondatabase/neon/issues/8200) which seemed
like a good first issue to attempt to tackle. I see it was assigned a
while ago so apologies if I'm stepping on any toes with this PR. I also
apologize for the size of this PR. I'm not sure if there is a simple way
to reduce it given the footprint of the components being changed.

## Problem
This PR is attempting to address part of the problem outlined in issue
[8200](https://github.com/neondatabase/neon/issues/8200). Namely to
remove global static usage of timeline state in favour of
`Arc<GlobalTimelines>` and to replace wasteful clones of
`SafeKeeperConf` with `Arc<SafeKeeperConf>`. I did not opt to tackle
`RemoteStorage` in this PR to minimize the amount of changes as this PR
is already quite large. I also did not opt to introduce an
`SafekeeperApp` wrapper struct to similarly minimize changes but I can
tackle either or both of these omissions in this PR if folks would like.

## Summary of changes
- Remove static usage of `GlobalTimelines` in favour of
`Arc<GlobalTimelines>`
- Wrap `SafeKeeperConf` in `Arc` to avoid wasteful clones of the
underlying struct

## Some additional thoughts
- We seem to currently store `SafeKeeperConf` in `GlobalTimelines` and
then expose it through a public`get_global_config` function which
requires locking. This seems needlessly wasteful and based on observed
usage we could remove this public accessor and force consumers to
acquire `SafeKeeperConf` through the new Arc reference.
2024-12-09 21:09:20 +00:00
Alex Chi Z.
4c4cb80186 fix(pageserver): fix gc-compaction racing with legacy gc (#10052)
## Problem

close https://github.com/neondatabase/neon/issues/10049, close
https://github.com/neondatabase/neon/issues/10030, close
https://github.com/neondatabase/neon/issues/8861

part of https://github.com/neondatabase/neon/issues/9114

The legacy gc process calls `get_latest_gc_cutoff`, which uses a Rcu
different than the gc_info struct. In the gc_compaction_smoke test case,
the "latest" cutoff could be lower than the gc_info struct, causing
gc-compaction to collect data that could be accessed by
`latest_gc_cutoff`. Technically speaking, there's nothing wrong with
gc-compaction using gc_info without considering latest_gc_cutoff,
because gc_info is the source of truth. But anyways, let's fix it.

## Summary of changes

* gc-compaction uses `latest_gc_cutoff` instead of gc_info to determine
the gc horizon.
* if a gc-compaction is scheduled via tenant compaction iteration, it
will take the gc_block lock to avoid racing with functionalities like
detach ancestor (if it's triggered via manual compaction API without
scheduling, then it won't take the lock)

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-12-09 20:06:06 +00:00
a-masterov
92273b6d5e Enable the pg_regress tests on staging for PG17 (#9978)
## Problem
Currently, we run the `pg_regress` tests only for PG16
However, PG17 is a part of Neon and should be tested as well 
## Summary of changes
Modified the workflow and added a patch for PG17 enabling the
`pg_regress` tests.
The problem with leftovers was solved by using branches.
2024-12-09 19:30:39 +00:00
Arpad Müller
e74e7aac93 Use updated patched azure SDK crates (#10036)
For a while already, we've been unable to update the Azure SDK crates
due to Azure adopting use of a non-tokio async runtime, see #7545.

The effort to upstream the fix got stalled, and I think it's better to
switch to a patched version of the SDK that is up to date.

Now we have a fork of the SDK under the neondatabase github org, to
which I have applied Conrad's rebased patches to:
https://github.com/neondatabase/azure-sdk-for-rust/tree/neon .

The existence of a fork will also help with shipping bulk delete support
before it's upstreamed (#7931).

Also, in related news, the Azure SDK has gotten a rift in development,
where the main branch pertains to a future, to-be-officially-blessed
release of the SDK, and the older versions, which we are currently
using, are on the `legacy` branch. Upstream doesn't really want patches
for the `legacy` branch any more, they want to focus on the `main`
efforts. However, even then, the `legacy` branch is still newer than
what we are having right now, so let's switch to `legacy` for now.

Depending on how long it takes, we can switch to the official version of
the SDK once it's released or switch to the upstream `main` branch if
there is changes we want before that.

As a nice side effect of this PR, we now use reqwest 0.12 everywhere,
dropping the dependency on version 0.11.

Fixes #7545
2024-12-09 15:50:06 +00:00
Vlad Lazar
4cca5cdb12 deps: update url to 2.5.4 for RUSTSEC-2024-0421 (#10059)
## Problem

See https://rustsec.org/advisories/RUSTSEC-2024-0421

## Summary of changes

Update url crate to 2.5.4.
2024-12-09 14:57:42 +00:00
Arpad Müller
9d425b54f7 Update AWS SDK crates (#10056)
Result of running:

cargo update -p aws-types -p aws-sigv4 -p aws-credential-types -p
aws-smithy-types -p aws-smithy-async -p aws-sdk-kms -p aws-sdk-iam -p
aws-sdk-s3 -p aws-config

We want to keep the AWS SDK up to date as that way we benefit from new
developments and improvements.
2024-12-09 12:46:59 +00:00
John Spray
ec790870d5 storcon: automatically clear Pause/Stop scheduling policies to enable detaches (#10011)
## Problem

We saw a tenant get stuck when it had been put into Pause scheduling
mode to pin it to a pageserver, then it was left idle for a while and
the control plane tried to detach it.

Close: https://github.com/neondatabase/neon/issues/9957

## Summary of changes

- When changing policy to Detached or Secondary, set the scheduling
policy to Active.
- Add a test that exercises this
- When persisting tenant shards, set their `generation_pageserver` to
null if the placement policy is not Attached (this enables consistency
checks to work, and avoids leaving state in the DB that could be
confusing/misleading in future)
2024-12-07 13:05:09 +00:00
Christian Schwarz
4d7111f240 page_service: don't count time spent flushing towards smgr latency metrics (#10042)
## Problem

In #9962 I changed the smgr metrics to include time spent on flush.

It isn't under our (=storage team's) control how long that flush takes
because the client can stop reading requests.

## Summary of changes

Stop the timer as soon as we've buffered up the response in the
`pgb_writer`.

Track flush time in a separate metric.

---------

Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>
2024-12-07 08:57:55 +00:00
Alex Chi Z.
b1fd086c0c test(pageserver): disable gc_compaction smoke test for now (#10045)
## Problem

The test is flaky.

## Summary of changes

Disable the test.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-06 22:30:04 +00:00
Heikki Linnakangas
b6eea65597 Fix error message if PS connection is lost while receiving prefetch (#9923)
If the pageserver connection is lost while receiving the prefetch
request, the prefetch queue is cleared. The error message prints the
values from the prefetch slot, but because the slot was already cleared,
they're all zeros:

LOG: [NEON_SMGR] [shard 0] No response from reading prefetch entry 0:
0/0/0.0 block 0. This can be caused by a concurrent disconnect

To fix, make local copies of the values.

In the passing, also add a sanity check that if the receive() call
succeeds, the prefetch slot is still intact.
2024-12-06 20:56:57 +00:00
Alex Chi Z.
c42c28b339 feat(pageserver): gc-compaction split job and partial scheduler (#9897)
## Problem

part of https://github.com/neondatabase/neon/issues/9114, stacked PR
over #9809

The compaction scheduler now schedules partial compaction jobs.

## Summary of changes

* Add the compaction job splitter based on size.
* Schedule subcompactions using the compaction scheduler.
* Test subcompaction scheduler in the smoke regress test.
* Temporarily disable layer map checks

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-12-06 18:44:26 +00:00
Tristan Partin
e4837b0a5a Bump sql_exporter to 0.16.0 (#10041)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-06 17:43:55 +00:00
Erik Grinaker
14c4fae64a test_runner/performance: add improved bulk insert benchmark (#9812)
Adds an improved bulk insert benchmark, including S3 uploads.

Touches #9789.
2024-12-06 15:17:15 +00:00
Vlad Lazar
cc70fc802d pageserver: add metric for number of wal records received by each shard (#10035)
## Problem

With the current metrics we can't identify which shards are ingesting
data at any given time.

## Summary of changes

Add a metric for the number of wal records received for processing by
each shard. This is per (tenant, timeline, shard).
2024-12-06 12:51:41 +00:00
Alexey Kondratov
fa07097f2f chore: Reorganize and refresh CODEOWNERS (#10008)
## Problem

We didn't have a codeowner for `/compute`, so nobody was auto-assigned
for PRs like #9973

## Summary of changes

While on it:
1. Group codeowners into sections.
2. Remove control plane from the `/compute_tools` because it's primarily
the internal `compute_ctl` code.
3. Add control plane (and compute) to `/libs/compute_api` because that's
the shared public interface of the compute.
2024-12-06 11:44:50 +00:00
Erik Grinaker
7838659197 pageserver: assert that keys belong to shard (#9943)
We've seen cases where stray keys end up on the wrong shard. This
shouldn't happen. Add debug assertions to prevent this. In release
builds, we should be lenient in order to handle changing key ownership
policies.

Touches #9914.
2024-12-06 10:24:13 +00:00
Vlad Lazar
3f1c542957 pageserver: add disk consistent and remote lsn metrics (#10005)
## Problem

There's no metrics for disk consistent LSN and remote LSN. This stuff is
useful when looking at ingest performance.

## Summary of changes

Two per timeline metrics are added: `pageserver_disk_consistent_lsn` and
`pageserver_projected_remote_consistent_lsn`. I went for the projected
remote lsn instead of the visible one
because that more closely matches remote storage write tput. Ideally we
would have both, but these metrics are expensive.
2024-12-06 10:21:52 +00:00
Erik Grinaker
ec4072f845 pageserver: add wait_until_flushed parameter for timeline checkpoint (#10013)
## Problem

I'm writing an ingest benchmark in #9812. To time S3 uploads, I need to
schedule a flush of the Pageserver's in-memory layer, but don't actually
want to wait around for it to complete (which will take a minute).

## Summary of changes

Add a parameter `wait_until_flush` (default `true`) for
`timeline/checkpoint` to control whether to wait for the flush to
complete.
2024-12-06 10:12:39 +00:00
Erik Grinaker
56f867bde5 pageserver: only zero truncated FSM page on owning shard (#10032)
## Problem

FSM pages are managed like regular relation pages, and owned by a single
shard. However, when truncating the FSM relation the last FSM page was
zeroed out on all shards. This is unnecessary and potentially confusing.

The superfluous keys will be removed during compactions, as they do not
belong on these shards.

Resolves #10027.

## Summary of changes

Only zero out the truncated FSM page on the owning shard.
2024-12-06 07:22:22 +00:00
Arpad Müller
d1ab7471e2 Fix desc_str for Azure container (#10021)
Small logs fix I've noticed while working on
https://github.com/neondatabase/cloud/issues/19963 .
2024-12-05 20:51:57 +00:00
Tristan Partin
6ff4175fd7 Send Content-Type header on reconfigure request from neon_local (#10029)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-05 20:30:35 +00:00
Tristan Partin
6331cb2161 Bump anyhow to 1.0.94 (#10028)
We were over a year out of date.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-05 19:42:52 +00:00
Alex Chi Z.
71f38d1354 feat(pageserver): support schedule gc-compaction (#9809)
## Problem

part of https://github.com/neondatabase/neon/issues/9114

gc-compaction can take a long time. This patch adds support for
scheduling a gc-compaction job. The compaction loop will first handle
L0->L1 compaction, and then gc compaction. The scheduled jobs are stored
in a non-persistent queue within the tenant structure.

This will be the building block for the partial compaction trigger -- if
the system determines that we need to do a gc compaction, it will
partition the keyspace and schedule several jobs. Each of these jobs
will run for a short amount of time (i.e, 1 min). L0 compaction will be
prioritized over gc compaction.

## Summary of changes
 
* Add compaction scheduler in tenant.
* Run scheduled compaction in integration tests.
* Change the manual compaction API to allow schedule a compaction
instead of immediately doing it.
* Add LSN upper bound as gc-compaction parameter. If we schedule partial
compactions, gc_cutoff might move across different runs. Therefore, we
need to pass a pre-determined gc_cutoff beforehand. (TODO: support LSN
lower bound so that we can compact arbitrary "rectangle" in the layer
map)
* Refactor the gc_compaction internal interface.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-05 19:37:17 +00:00
Tristan Partin
c0ba416967 Add compute_logical_snapshots_bytes metric (#9887)
This metric exposes the size of all non-temporary logical snapshot
files.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-05 19:04:33 +00:00
Alexey Kondratov
13e8105740 feat(compute): Allow specifying the reconfiguration concurrency (#10006)
## Problem

We need a higher concurrency during reconfiguration in case of many DBs,
but the instance is already running and used by the client. We can
easily get out of `max_connections` limit, and the current code won't
handle that.

## Summary of changes

Default to 1, but also allow control plane to override this value for
specific projects. It's also recommended to bump
`superuser_reserved_connections` += `reconfigure_concurrency` for such
projects to ensure that we always have enough spare connections for
reconfiguration process to succeed.

Quick workaround for neondatabase/cloud#17846
2024-12-05 17:57:25 +00:00
Erik Grinaker
db79304416 storage_controller: increase shard scan timeout (#10000)
## Problem

The node shard scan timeout of 1 second is a bit too aggressive, and
we've seen this cause test failures. The scans are performed in parallel
across nodes, and the entire operation has a 15 second timeout.

Resolves #9801.

## Summary of changes

Increase the timeout to 5 seconds. This is still enough to time out on a
network failure and retry successfully within 15 seconds.
2024-12-05 17:29:21 +00:00
Ivan Efremov
b547681e08 Merge pull request #10024 from neondatabase/rc/release-proxy/2024-12-05
Proxy release 2024-12-05
2024-12-05 15:35:35 +02:00
Ivan Efremov
0fd211537b proxy: Present new auth backend cplane_proxy_v1 (#10012)
Implement a new auth backend based on the current Neon backend to switch
to the new Proxy V1 cplane API.

Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)
2024-12-05 13:00:40 +02:00
Yuchen Liang
a83bd4e81c pageserver: fix buffered-writer on macos build (#10019)
## Problem

In https://github.com/neondatabase/neon/pull/9693, we forgot to check
macos build. The [CI
run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468)
on main showed that macos build failed with unused variables and dead
code.

## Summary of changes

- add `allow(dead_code)` and `allow(unused_variables)` to the relevant
code that is not used on macos.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-12-05 13:00:40 +02:00
Conrad Ludgate
ecdad5e6d5 chore: update rust-postgres (#10002)
Like #9931 but without rebasing upstream just yet, to try and minimise
the differences.

Removes all proxy-specific commits from the rust-postgres fork, now that
proxy no longer depends on them. Merging upstream changes to come later.
2024-12-05 13:00:40 +02:00
Conrad Ludgate
d028929945 chore: update clap (#10009)
This updates clap to use a new version of anstream
2024-12-05 13:00:40 +02:00
Yuchen Liang
7b0e3db868 pageserver: make BufferedWriter do double-buffering (#9693)
Closes #9387.

## Problem

`BufferedWriter` cannot proceed while the owned buffer is flushing to
disk. We want to implement double buffering so that the flush can happen
in the background. See #9387.

## Summary of changes

- Maintain two owned buffers in `BufferedWriter`.
- The writer is in charge of copying the data into owned, aligned
buffer, once full, submit it to the flush task.
- The flush background task is in charge of flushing the owned buffer to
disk, and returned the buffer to the writer for reuse.
- The writer and the flush background task communicate through a
bi-directional channel.

For in-memory layer, we also need to be able to read from the buffered
writer in `get_values_reconstruct_data`. To handle this case, we did the
following
- Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`,
and use `Arc` to share it between writer and background task.
- leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`,
one clone will be submitted to the channel, the other clone will be
saved within the writer to serve reads. When we want to reuse the
buffer, we can invoke `IoBuffer::into_mut`, which gives us back the
mutable aligned buffer.
- InMemoryLayer reads is now aware of the maybe_flushed part of the
buffer.

**Caveat**

- We removed the owned version of write, because this interface does not
work well with buffer alignment. The result is that without direct IO
enabled,
[`download_object`](a439d57050/pageserver/src/tenant/remote_timeline_client/download.rs (L243))
does one more memcpy than before this PR due to the switch to use
`_borrowed` version of the write.
- "Bypass aligned part of write" could be implemented later to avoid
large amount of memcpy.

**Testing**
- use an oneshot channel based control mechanism to make flush behavior
deterministic in test.
- test reading from `EphemeralFile` when the last submitted buffer is
not flushed, in-progress, and done flushing to disk.


## Performance


We see performance improvement for small values, and regression on big
values, likely due to being CPU bound + disk write latency.


[Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4)


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-05 13:00:40 +02:00
John Spray
088eb72dd7 tests: make storcon scale test AZ-aware (#9952)
## Problem

We have a scale test for the storage controller which also acts as a
good stress test for scheduling stability. However, it created nodes
with no AZs set.

## Summary of changes

- Bump node count to 6 and set AZs on them.

This is a precursor to other AZ-related PRs, to make sure any new code
that's landed is getting scale tested in an AZ-aware environment.
2024-12-05 13:00:40 +02:00
a-masterov
d550e3f626 Create a branch for compute release (#9637)
## Problem
We practice a manual release flow for the compute module. This will
allow automation of the compute release process.

## Summary of changes
The workflow was modified to make a compute release automatically on the
branch release-compute.
## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-12-05 13:00:40 +02:00
Erik Grinaker
8c6b41daf5 Display reqwest error source (#10004)
## Problem

Reqwest errors don't include details about the inner source error. This
means that we get opaque errors like:

```
receive body: error sending request for url (http://localhost:9898/v1/location_config)
```

Instead of the more helpful:

```
receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out
```

Touches #9801.

## Summary of changes

Include the source error for `reqwest::Error` wherever it's displayed.
2024-12-05 13:00:40 +02:00
Alexey Kondratov
bbb050459b feat(compute): Set default application_name for pgbouncer connections (#9973)
## Problem

When client specifies `application_name`, pgbouncer propagates it to the
Postgres. Yet, if client doesn't do it, we have hard time figuring out
who opens a lot of Postgres connections (including the `cloud_admin`
ones).

See this investigation as an example:
https://neondb.slack.com/archives/C0836R0RZ0D

## Summary of changes

I haven't found this documented, but it looks like pgbouncer accepts
standard Postgres connstring parameters in the connstring in the
`[databases]` section, so put the default `application_name=pgbouncer`
there. That way, we will always see who opens Postgres connections. I
did tests, and if client specifies a `application_name`, pgbouncer
overrides this default, so it only works if it's not specified or set to
blank `&application_name=` in the connection string.

This is the last place we could potentially open some Postgres
connections without `application_name`. Everything else should be either
of two:
1. Direct client connections without `application_name`, but these
should be strictly non-`cloud_admin` ones
2. Some ad-hoc internal connections, so if we see spikes of unidentified
`cloud_admin` connections, we will need to investigate it again.

Fixes neondatabase/cloud#20948
2024-12-05 13:00:40 +02:00
Conrad Ludgate
cab498c787 feat(proxy): add option to forward startup params (#9979)
(stacked on #9990 and #9995)

Partially fixes #1287 with a custom option field to enable the fixed
behaviour. This allows us to gradually roll out the fix without silently
changing the observed behaviour for our customers.

related to https://github.com/neondatabase/cloud/issues/15284
2024-12-05 13:00:40 +02:00
Folke Behrens
6359342ffb Assign /libs/proxy/ to proxy team (#10003) 2024-12-05 13:00:40 +02:00
Erik Grinaker
13285c2a5e pageserver: return proper status code for heatmap_upload errors (#9991)
## Problem

During deploys, we see a lot of 500 errors due to heapmap uploads for
inactive tenants. These should be 503s instead.

Resolves #9574.

## Summary of changes

Make the secondary tenant scheduler use `ApiError` rather than
`anyhow::Error`, to propagate the tenant error and convert it to an
appropriate status code.
2024-12-05 13:00:40 +02:00
Peter Bendel
33790d14a3 fix parsing human time output like "50m37s" (#10001)
## Problem

In ingest_benchmark.yml workflow we use pgcopydb tool to migrate
project.
pgcopydb logs human time.

Our parsing of the human time doesn't work for times like "50m37s".

[Example
workflow](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:479)

contains "57m45s"

but we
[reported](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:500)
only the seconds part: 
45.000 s


## Summary of changes

add a regex pattern for Minute/Second combination
2024-12-05 13:00:40 +02:00
Peter Bendel
709b8cd371 optimize parms for ingest bench (#9999)
## Problem

we tried different parallelism settings for ingest bench 

## Summary of changes

the following settings seem optimal after merging
- SK side Wal filtering
- batched getpages

Settings:
- effective_io_concurrency 100
- concurrency limit 200 (different from Prod!)
- jobs 4, maintenance workers 7
- 10 GB chunk size
2024-12-05 13:00:40 +02:00
Vlad Lazar
1c9bbf1a92 storcon: return an error for drain attempts while paused (#9997)
## Problem

We currently allow drain operations to proceed while the node policy is
paused.

## Summary of changes

Return a precondition failed error in such cases. The orchestrator is
updated in https://github.com/neondatabase/infra/pull/2544 to skip drain
and fills if the pageserver is paused.

Closes: https://github.com/neondatabase/neon/issues/9907
2024-12-05 13:00:40 +02:00
Christian Schwarz
16163fb850 page_service: enable batching in Rust & Python Tests + Python benchmarks (#9993)
This is the first step towards batching rollout.

Refs

- rollout plan: https://github.com/neondatabase/cloud/issues/20620
- task https://github.com/neondatabase/neon/issues/9377
- uber-epic: https://github.com/neondatabase/neon/issues/9376
2024-12-05 13:00:40 +02:00
Alexander Bayandin
73ccc2b08c test_page_service_batching: fix non-numeric metrics (#9998)
## Problem

```
2024-12-03T15:42:46.5978335Z + poetry run python /__w/neon/neon/scripts/ingest_perf_test_result.py --ingest /__w/neon/neon/test_runner/perf-report-local
2024-12-03T15:42:49.5325077Z Traceback (most recent call last):
2024-12-03T15:42:49.5325603Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 165, in <module>
2024-12-03T15:42:49.5326029Z     main()
2024-12-03T15:42:49.5326316Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 155, in main
2024-12-03T15:42:49.5326739Z     ingested = ingest_perf_test_result(cur, item, recorded_at_timestamp)
2024-12-03T15:42:49.5327488Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-03T15:42:49.5327914Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 99, in ingest_perf_test_result
2024-12-03T15:42:49.5328321Z     psycopg2.extras.execute_values(
2024-12-03T15:42:49.5328940Z   File "/github/home/.cache/pypoetry/virtualenvs/non-package-mode-_pxWMzVK-py3.11/lib/python3.11/site-packages/psycopg2/extras.py", line 1299, in execute_values
2024-12-03T15:42:49.5335618Z     cur.execute(b''.join(parts))
2024-12-03T15:42:49.5335967Z psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "concurrent-futures"
2024-12-03T15:42:49.5336287Z LINE 57:             'concurrent-futures',
2024-12-03T15:42:49.5336462Z                      ^
```

## Summary of changes
- `test_page_service_batching`: save non-numeric params as `labels`
- Add a runtime check that `metric_value` is NUMERIC
2024-12-05 13:00:40 +02:00
Christian Schwarz
c719be6474 tests & benchmarks: unify the way we customize the default tenant config (#9992)
Before this PR, some override callbacks used `.default()`, others
used `.setdefault()`.

As of this PR, all callbacks use `.setdefault()` which I think is least
prone to failure.

Aligning on a single way will set the right example for future tests
that need such customization.

The `test_pageserver_getpage_throttle.py` technically is a change in
behavior: before, it replaced the `tenant_config` field, now it just
configures the throttle. This is what I believe is intended anyway.
2024-12-05 13:00:40 +02:00
Arpad Müller
718645e56c Support tenant manifests in the scrubber (#9942)
Support tenant manifests in the storage scrubber:

* list the manifests, order them by generation
* delete all manifests except for the two most recent generations
* for the latest manifest: try parsing it.

I've tested this patch by running the against a staging bucket and it
successfully deleted stuff (and avoided deleting the latest two
generations).

In follow-up work, we might want to also check some invariants of the
manifest, as mentioned in #8088.

Part of #9386
Part of #8088

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-05 13:00:40 +02:00
Conrad Ludgate
fbc8c36983 chore(proxy): enforce single host+port (#9995)
proxy doesn't ever provide multiple hosts/ports, so this code adds a lot
of complexity of error handling for no good reason.

(stacked on #9990)
2024-12-05 13:00:40 +02:00
Alexey Immoreev
5519e42612 Improvement: add console redirect timeout warning (#9985)
## Problem

There is no information on session being cancelled in 2 minutes at the
moment

## Summary of changes

The timeout being logged for the user
2024-12-05 13:00:40 +02:00
Erik Grinaker
4157eaf4c5 pageserver: respond to multiple shutdown signals (#9982)
## Problem

The Pageserver signal handler would only respond to a single signal and
initiate shutdown. Subsequent signals were ignored. This meant that a
`SIGQUIT` sent after a `SIGTERM` had no effect (e.g. in the case of a
slow or stalled shutdown). The `test_runner` uses this to force shutdown
if graceful shutdown is slow.

Touches #9740.

## Summary of changes

Keep responding to signals after the initial shutdown signal has been
received.

Arguably, the `test_runner` should also use `SIGKILL` rather than
`SIGQUIT` in this case, but it seems reasonable to respond to `SIGQUIT`
regardless.
2024-12-05 13:00:40 +02:00
Conrad Ludgate
60241127e2 chore(proxy): remove postgres config parser and md5 support (#9990)
Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres
allows us to remove a lot of dead weight from our actual postgres
connection logic.
2024-12-05 13:00:40 +02:00
John Spray
f7d5322e8b pageserver: more detailed logs when calling re-attach (#9996)
## Problem

We saw a peculiar case where a pageserver apparently got a 0-tenant
response to `/re-attach` but we couldn't see the request landing on a
storage controller. It was hard to confirm retrospectively that the
pageserver was configured properly at the moment it sent the request.

## Summary of changes

- Log the URL to which we are sending the request
- Log the NodeId and metadata that we sent
2024-12-05 13:00:40 +02:00
John Spray
41bb9c5280 pageserver: only store SLRUs & aux files on shard zero (#9786)
## Problem

Since https://github.com/neondatabase/neon/pull/9423 the non-zero shards
no longer need SLRU content in order to do GC. This data is now
redundant on shards >0.

One release cycle after merging that PR, we may merge this one, which
also stops writing those pages to shards > 0, reaping the efficiency
benefit.

Closes: https://github.com/neondatabase/neon/issues/7512
Closes: https://github.com/neondatabase/neon/issues/9641

## Summary of changes

- Avoid storing SLRUs on non-zero shards
- Bonus: avoid storing aux files on non-zero shards
2024-12-05 13:00:40 +02:00
John Spray
69c0d61c5c storcon: in shard splits, inherit parent's AZ (#9946)
## Problem

Sharded tenants should be run in a single AZ for best performance, so
that computes have AZ-local latency to all the shards.

Part of https://github.com/neondatabase/neon/issues/8264

## Summary of changes

- When we split a tenant, instead of updating each shard's preferred AZ
to wherever it is scheduled, propagate the preferred AZ from the parent.
- Drop the check in `test_shard_preferred_azs` that asserts shards end
up in their preferred AZ: this will not be true again until the
optimize_attachment logic is updated to make this so. The existing check
wasn't testing anything about scheduling, it was just asserting that we
set preferred AZ in a way that matches the way things happen to be
scheduled at time of split.
2024-12-05 13:00:40 +02:00
Christian Schwarz
63cb8ce975 pageserver: only throttle pagestream requests & bring back throttling deduction for smgr latency metrics (#9962)
## Problem

In the batching PR 
- https://github.com/neondatabase/neon/pull/9870

I stopped deducting the time-spent-in-throttle fro latency metrics,
i.e.,
- smgr latency metrics (`SmgrOpTimer`)
- basebackup latency (+scan latency, which I think is part of
basebackup).

The reason for stopping the deduction was that with the introduction of
batching, the trick with tracking time-spent-in-throttle inside
RequestContext and swap-replacing it from the `impl Drop for
SmgrOpTimer` no longer worked with >1 requests in a batch.

However, deducting time-spent-in-throttle is desirable because our
internal latency SLO definition does not account for throttling.

## Summary of changes

- Redefine throttling to be a page_service pagestream request throttle
instead of a throttle for repository `Key` reads through `Timeline::get`
/ `Timeline::get_vectored`.
- This means reads done by `basebackup` are no longer subject to any
throttle.
- The throttle applies after batching, before handling of the request.
- Drive-by fix: make throttle sensitive to cancellation.
- Rename metric label `kind` from `timeline_get` to `pagestream` to
reflect the new scope of throttling.

To avoid config format breakage, we leave the config field named
`timeline_get_throttle` and ignore the `task_kinds` field.
This will be cleaned up in a future PR.

## Trade-Offs

Ideally, we would apply the throttle before reading a request off the
connection, so that we queue the minimal amount of work inside the
process.
However, that's not possible because we need to do shard routing.

The redefinition of the throttle to limit pagestream request rate
instead of repository `Key` rate comes with several downsides:
- We're no longer able to use the throttle mechanism for other other
tasks, e.g. image layer creation.
  However, in practice, we never used that capability anyways.
- We no longer throttle basebackup.
2024-12-05 13:00:40 +02:00
Erik Grinaker
907e4aa3c4 test_runner: use immediate shutdown in test_sharded_ingest (#9984)
## Problem

`test_sharded_ingest` ingests a lot of data, which can cause shutdown to
be slow e.g. due to local "S3 uploads" or compactions. This can cause
test flakes during teardown.

Resolves #9740.

## Summary of changes

Perform an immediate shutdown of the cluster.
2024-12-05 13:00:40 +02:00
Erik Grinaker
0a2a84b766 safekeeper,pageserver: add heap profiling (#9778)
## Problem

We don't have good observability for memory usage. This would be useful
e.g. to debug OOM incidents or optimize performance or resource usage.

We would also like to use continuous profiling with e.g. [Grafana Cloud
Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/)
(see https://github.com/neondatabase/cloud/issues/14888).

This PR is intended as a proof of concept, to try it out in staging and
drive further discussions about profiling more broadly.

Touches https://github.com/neondatabase/neon/issues/9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
Depends on #9779.
Depends on #9780.

## Summary of changes

Adds a HTTP route `/profile/heap` that takes a heap profile and returns
it. Query parameters:

* `format`: output format (`jemalloc` or `pprof`; default `pprof`).

Unlike CPU profiles (see #9764), heap profiles are not symbolized and
require the original binary to translate addresses to function names. To
make this work with Grafana, we'll probably have to symbolize the
process server-side -- this is left as future work, as is other output
formats like SVG.

Heap profiles don't work on macOS due to limitations in jemalloc.
2024-12-05 13:00:40 +02:00
a-masterov
85b12ddd52 Add support for the extensions test for Postgres v17 (#9748)
## Problem
The extensions for Postgres v17 are ready but we do not test the
extensions shipped with v17
## Summary of changes
Build the test image based on Postgres v17. Run the tests for v17.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2024-12-05 13:00:40 +02:00
Christian Schwarz
dd76f1eeee page_service: batching observability & include throttled time in smgr metrics (#9870)
This PR 

- fixes smgr metrics https://github.com/neondatabase/neon/issues/9925 
- adds an additional startup log line logging the current batching
config
- adds a histogram of batch sizes global and per-tenant
- adds a metric exposing the current batching config

The issue described #9925 is that before this PR, request latency was
only observed *after* batching.
This means that smgr latency metrics (most importantly getpage latency)
don't account for
- `wait_lsn` time 
- time spent waiting for batch to fill up / the executor stage to pick
up the batch.

The fix is to use a per-request batching timer, like we did before the
initial batching PR.
We funnel those timers through the entire request lifecycle.

I noticed that even before the initial batching changes, we weren't
accounting for the time spent writing & flushing the response to the
wire.
This PR drive-by fixes that deficiency by dropping the timers at the
very end of processing the batch, i.e., after the `pgb.flush()` call.

I was **unable to maintain the behavior that we deduct
time-spent-in-throttle from various latency metrics.
The reason is that we're using a *single* counter in `RequestContext` to
track micros spent in throttle.
But there are *N* metrics timers in the batch, one per request.
As a consequence, the practice of consuming the counter in the drop
handler of each timer no longer works because all but the first timer
will encounter error `close() called on closed state`.
A failed attempt to maintain the current behavior can be found in
https://github.com/neondatabase/neon/pull/9951.

So, this PR remvoes the deduction behavior from all metrics.
I started a discussion on Slack about it the implications this has for
our internal SLO calculation:
https://neondb.slack.com/archives/C033RQ5SPDH/p1732910861704029

# Refs

- fixes https://github.com/neondatabase/neon/issues/9925
- sub-issue https://github.com/neondatabase/neon/issues/9377
- epic: https://github.com/neondatabase/neon/issues/9376
2024-12-05 13:00:40 +02:00
Christian Schwarz
8963ac85f9 storcon_cli tenant-describe: include tenant-wide information in output (#9899)
Before this PR, the storcon_cli didn't have a way to show the
tenant-wide information of the TenantDescribeResponse.

Sadly, the `Serialize` impl for the tenant config doesn't skip on
`None`, so, the output becomes a bit bloated.
Maybe we can use `skip_serializing_if(Option::is_none)` in the future.
=> https://github.com/neondatabase/neon/issues/9983
2024-12-05 13:00:40 +02:00
John Spray
4a488b3e24 storcon: use proper schedule context during node delete (#9958)
## Problem

I was touching `test_storage_controller_node_deletion` because for AZ
scheduling work I was adding a change to the storage controller (kick
secondaries during optimisation) that made a FIXME in this test defunct.
While looking at it I also realized that we can easily fix the way node
deletion currently doesn't use a proper ScheduleContext, using the
iterator type recently added for that purpose.

## Summary of changes

- A testing-only behavior in storage controller where if a secondary
location isn't yet ready during optimisation, it will be actively
polled.
- Remove workaround in `test_storage_controller_node_deletion` that
previously was needed because optimisation would get stuck on cold
secondaries.
- Update node deletion code to use a `TenantShardContextIterator` and
thereby a proper ScheduleContext
2024-12-05 13:00:40 +02:00
Alexey Kondratov
c4987b0b13 fix(testing): Use 1 MB shared_buffers even with LFC (#9969)
## Problem

After enabling LFC in tests and lowering `shared_buffers` we started
having more problems with `test_pg_regress`.

## Summary of changes

Set `shared_buffers` to 1MB to both exercise getPage requests/LFC, and
still have enough room for Postgres to operate. Everything smaller might
be not enough for Postgres under load, and can cause errors like 'no
unpinned buffers available'.

See Konstantin's comment [1] as well.

Fixes #9956

[1]:
https://github.com/neondatabase/neon/issues/9956#issuecomment-2511608097
2024-12-05 13:00:40 +02:00
Tristan Partin
84b4821118 Stop changing the value of neon.extension_server_port at runtime (#9972)
On reconfigure, we no longer passed a port for the extension server
which caused us to not write out the neon.extension_server_port line.
Thus, Postgres thought we were setting the port to the default value of
0. PGC_POSTMASTER GUCs cannot be set at runtime, which causes the
following log messages:

> LOG: parameter "neon.extension_server_port" cannot be changed without
restarting the server
> LOG: configuration file
"/var/db/postgres/compute/pgdata/postgresql.conf" contains errors;
unaffected changes were applied

Fixes: https://github.com/neondatabase/neon/issues/9945

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-05 13:00:40 +02:00
Conrad Ludgate
32ba9811f9 feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971)
Fix the HTTP AuthMethod to accomodate the JWT authorization method.
Introduces the JWT issuer as an additional field in the parquet logs
2024-12-05 13:00:40 +02:00
Folke Behrens
a0cd64c4d3 Bump OTel, tracing, reqwest crates (#9970) 2024-12-05 13:00:40 +02:00
Arseny Sher
84687b743d Update consensus protocol spec (#9607)
The spec was written for the buggy protocol which we had before the one
more similar to Raft was implemented. Update the spec with what we
currently have.

ref https://github.com/neondatabase/neon/issues/8699
2024-12-05 13:00:40 +02:00
Folke Behrens
b6f93dcec9 proxy: Create Elasticache credentials provider lazily (#9967)
## Problem

The credentials providers tries to connect to AWS STS even when we use
plain Redis connections.

## Summary of changes

* Construct the CredentialsProvider only when needed ("irsa").
2024-12-05 13:00:40 +02:00
Alexander Bayandin
4f6c594973 CI(replication-tests): fix notifications about replication-tests failures (#9950)
## Problem

`if: ${{ github.event.schedule }}` gets skipped if a previous step has
failed, but we want to run the step for both `success` and `failure`

## Summary of changes
- Add `!cancelled()` to notification step if-condition, to skip only
cancelled jobs
2024-12-05 13:00:40 +02:00
Conrad Ludgate
a750c14735 fix(proxy): forward notifications from authentication (#9948)
Fixes https://github.com/neondatabase/cloud/issues/20973. 

This refactors `connect_raw` in order to return direct access to the
delayed notices.

I cannot find a way to test this with psycopg2 unfortunately, although
testing it with psql does return the expected results.
2024-12-05 13:00:40 +02:00
John Spray
9ce0dd4e55 storcon: add metric for AZ scheduling violations (#9949)
## Problem

We can't easily tell how far the state of shards is from their AZ
preferences. This can be a cause of performance issues, so it's
important for diagnosability that we can tell easily if there are
significant numbers of shards that aren't running in their preferred AZ.

Related: https://github.com/neondatabase/cloud/issues/15413

## Summary of changes

- In reconcile_all, count shards that are scheduled into the wrong AZ
(if they have a preference), and publish it as a prometheus gauge.
- Also calculate a statistic for how many shards wanted to reconcile but
couldn't.

This is clearly a lazy calculation: reconcile all only runs
periodically. But that's okay: shards in the wrong AZ is something that
only matters if it stays that way for some period of time.
2024-12-05 13:00:40 +02:00
Erik Grinaker
0e1a336607 test_runner: improve wait_until (#9936)
Improves `wait_until` by:

* Use `timeout` instead of `iterations`. This allows changing the
timeout/interval parameters independently.
* Make `timeout` and `interval` optional (default 20s and 0.5s). Most
callers don't care.
* Only output status every 1s by default, and add optional
`status_interval` parameter.
* Remove `show_intermediate_error`, this was always emitted anyway.

Most callers have been updated to use the defaults, except where they
had good reason otherwise.
2024-12-05 13:00:40 +02:00
Anastasia Lubennikova
7fc2912d06 Update pgvector to 0.8.0 (#9733) 2024-12-05 13:00:40 +02:00
John Spray
fdf231c237 storcon: don't take any Service locks in /status and /ready (#9944)
## Problem

We saw unexpected container terminations when running in k8s with with
small CPU resource requests.

The /status and /ready handlers called `maybe_forward`, which always
takes the lock on Service::inner.

If there is a lot of writer lock contention, and the container is
starved of CPU, this increases the likelihood that we will get killed by
the kubelet.

It isn't certain that this was a cause of issues, but it is a potential
source that we can eliminate.

## Summary of changes

- Revise logic to return immediately if the URL is in the non-forwarded
list, rather than calling maybe_forward
2024-12-05 13:00:40 +02:00
Konstantin Knizhnik
1e08b5dccc Fix issues with prefetch ring buffer resize (#9847)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479


We observe the following error in the logs 
```
[XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128
```
most likely caused by changing `neon.readahead_buffer_size`

## Summary of changes

1. Copy shard state
2. Do not use prefetch_set_unused in readahead_buffer_resize
3. Change prefetch buffer overflow criteria

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-05 13:00:40 +02:00
Alexander Bayandin
030810ed3e Compute image: prepare Postgres v14-v16 for Debian 12 (#9954)
## Problem

Current compute images for Postgres 14-16 don't build on Debian 12
because of issues with extensions.
This PR fixes that, but for the current setup, it is mostly a no-op
change.

## Summary of changes
- Use `/bin/bash -euo pipefail` as SHELL to fail earlier
- Fix `plv8` build: backport a trivial patch for v8
- Fix `postgis` build: depend `sfgal` version on Debian version instead
of Postgres version


Tested in: https://github.com/neondatabase/neon/pull/9849
2024-12-05 13:00:40 +02:00
Konstantin Knizhnik
62b74bdc2c Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057)
## Problem

See https://github.com/neondatabase/neon/issues/9023

## Summary of changes

Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of
replica and primary configuration mismatch

See https://github.com/neondatabase/postgres/pull/501
See https://github.com/neondatabase/postgres/pull/502
See https://github.com/neondatabase/postgres/pull/503
See https://github.com/neondatabase/postgres/pull/504


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-12-05 13:00:40 +02:00
Folke Behrens
8b7e9ed820 Merge the consumption metric pushes (#9939)
#8564

## Problem

The main and backup consumption metric pushes are completely
independent,
resulting in different event time windows and different idempotency
keys.

## Summary of changes

* Merge the push tasks, but keep chunks the same size.
2024-12-05 13:00:40 +02:00
Christian Schwarz
5dad89acd4 page_service: rewrite batching to work without a timeout (#9851)
# Problem

The timeout-based batching adds latency to unbatchable workloads.

We can choose a short batching timeout (e.g. 10us) but that requires
high-resolution timers, which tokio doesn't have.
I thoroughly explored options to use OS timers (see
[this](https://github.com/neondatabase/neon/pull/9822) abandoned PR).
In short, it's not an attractive option because any timer implementation
adds non-trivial overheads.

# Solution

The insight is that, in the steady state of a batchable workload, the
time we spend in `get_vectored` will be hundreds of microseconds anyway.

If we prepare the next batch concurrently to `get_vectored`, we will
have a sizeable batch ready once `get_vectored` of the current batch is
done and do not need an explicit timeout.

This can be reasonably described as **pipelining of the protocol
handler**.

# Implementation

We model the sub-protocol handler for pagestream requests
(`handle_pagrequests`) as two futures that form a pipeline:

2. Batching: read requests from the connection and fill the current
batch
3. Execution: `take` the current batch, execute it using `get_vectored`,
and send the response.

The Reading and Batching stage are connected through a new type of
channel called `spsc_fold`.

See the long comment in the `handle_pagerequests_pipelined` for details.

# Changes

- Refactor `handle_pagerequests`
    - separate functions for
- reading one protocol message; produces a `BatchedFeMessage` with just
one page request in it
- batching; tried to merge an incoming `BatchedFeMessage` into an
existing `BatchedFeMessage`; returns `None` on success and returns back
the incoming message in case merging isn't possible
        - execution of a batched message
- unify the timeline handle acquisition & request span construction; it
now happen in the function that reads the protocol message
- Implement serial and pipelined model
    - serial: what we had before any of the batching changes
      - read one protocol message
      - execute protocol messages
    - pipelined: the design described above
- optionality for execution of the pipeline: either via concurrent
futures vs tokio tasks
- Pageserver config
  - remove batching timeout field
  - add ability to configure pipelining mode
- add ability to limit max batch size for pipelined configurations
(required for the rollout, cf
https://github.com/neondatabase/cloud/issues/20620 )
  - ability to configure execution mode
- Tests
  - remove `batch_timeout` parametrization
  - rename `test_getpage_merge_smoke` to `test_throughput`
- add parametrization to test different max batch sizes and execution
moes
  - rename `test_timer_precision` to `test_latency`
  - rename the test case file to `test_page_service_batching.py`
  - better descriptions of what the tests actually do

## On the holding The `TimelineHandle` in the pending batch

While batching, we hold the `TimelineHandle` in the pending batch.
Therefore, the timeline will not finish shutting down while we're
batching.

This is not a problem in practice because the concurrently ongoing
`get_vectored` call will fail quickly with an error indicating that the
timeline is shutting down.
This results in the Execution stage returning a `QueryError::Shutdown`,
which causes the pipeline / entire page service connection to shut down.
This drops all references to the
`Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the
contained `TimelineHandle`s.

- => fixes https://github.com/neondatabase/neon/issues/9850

# Performance

Local run of the benchmarks, results in [this empty
commit](1cf5b1463f)
in the PR branch.

Key take-aways:
* `concurrent-futures` and `tasks` deliver identical `batching_factor`
* tail latency impact unknown, cf
https://github.com/neondatabase/neon/issues/9837
* `concurrent-futures` has higher throughput than `tasks` in all
workloads (=lower `time` metric)
* In unbatchable workloads, `concurrent-futures` has 5% higher
`CPU-per-throughput` than that of `tasks`, and 15% higher than that of
`serial`.
* In batchable-32 workload, `concurrent-futures` has 8% lower
`CPU-per-throughput` than that of `tasks` (comparison to tput of
`serial` is irrelevant)
* in unbatchable workloads, mean and tail latencies of
`concurrent-futures` is practically identical to `serial`, whereas
`tasks` adds 20-30us of overhead

Overall, `concurrent-futures` seems like a slightly more attractive
choice.

# Rollout

This change is disabled-by-default.

Rollout plan:
- https://github.com/neondatabase/cloud/issues/20620

# Refs

- epic: https://github.com/neondatabase/neon/issues/9376
- this sub-task: https://github.com/neondatabase/neon/issues/9377
- the abandoned attempt to improve batching timeout resolution:
https://github.com/neondatabase/neon/pull/9820
- closes https://github.com/neondatabase/neon/issues/9850
- fixes https://github.com/neondatabase/neon/issues/9835
2024-12-05 13:00:40 +02:00
Matthias van de Meent
547b2d2827 Fix timeout value used in XLogWaitForReplayOf (#9937)
The previous value assumed usec precision, while the timeout used is in
milliseconds, causing replica backends to wait for (potentially) many
hours for WAL replay without the expected progress reports in logs.

This fixes the issue.

Reported-By: Alexander Lakhin <exclusion@gmail.com>

## Problem


https://github.com/neondatabase/postgres/pull/279#issuecomment-2507671817

The timeout value was configured with the assumption the indicated value
would be microseconds, where it's actually milliseconds. That causes the
backend to wait for much longer (2h46m40s) before it emits the "I'm
waiting for recovery" message. While we do have wait events configured
on this, it's not great to have stuck backends without clear logs, so
this fixes the timeout value in all our PostgreSQL branches.

## PG PRs

* PG14: https://github.com/neondatabase/postgres/pull/542
* PG15: https://github.com/neondatabase/postgres/pull/543
* PG16: https://github.com/neondatabase/postgres/pull/544
* PG17: https://github.com/neondatabase/postgres/pull/545
2024-12-05 13:00:40 +02:00
Gleb Novikov
93f29a0065 Fixed fast_import pgbin in calling get_pg_version (#9933)
Was working on https://github.com/neondatabase/cloud/pull/20795 and
discovered that fast_import is not working normally.
2024-12-05 13:00:40 +02:00
John Spray
4f36494615 pageserver: download small objects using a smaller timeout (#9938)
## Problem

It appears that the Azure storage API tends to hang TCP connections more
than S3 does.

Currently we use a 2 minute timeout for all downloads. This is large
because sometimes the objects we download are large. However, waiting 2
minutes when doing something like downloading a manifest on tenant
attach is problematic, because when someone is doing a "create tenant,
create timeline" workflow, that 2 minutes is long enough for them
reasonably to give up creating that timeline.

Rather than propagate oversized timeouts further up the stack, we should
use a different timeout for objects that we expect to be small.

Closes: https://github.com/neondatabase/neon/issues/9836

## Summary of changes

- Add a `small_timeout` configuration attribute to remote storage,
defaulting to 30 seconds (still a very generous period to do something
like download an index)
- Add a DownloadKind parameter to DownloadOpts, so that callers can
indicate whether they expect the object to be small or large.
- In the azure client, use small timeout for HEAD requests, and for GET
requests if DownloadKind::Small is used.
- Use DownloadKind::Small for manifests, indices, and heatmap downloads.

This PR intentionally does not make the equivalent change to the S3
client, to reduce blast radius in case this has unexpected consequences
(we could accomplish the same thing by editing lots of configs, but just
skipping the code is simpler for right now)
2024-12-05 13:00:40 +02:00
Alexey Kondratov
0a550f3e7d feat(compute_ctl): Always set application_name (#9934)
## Problem

It was not always possible to judge what exactly some `cloud_admin`
connections were doing because we didn't consistently set
`application_name` everywhere.

## Summary of changes

Unify the way we connect to Postgres:
1. Switch to building configs everywhere
2. Always set `application_name` and make naming consistent

Follow-up for #9919
Part of neondatabase/cloud#20948
2024-12-05 13:00:40 +02:00
Erik Grinaker
4bb9554e4a safekeeper: use jemalloc (#9780)
## Problem

To add Safekeeper heap profiling in #9778, we need to switch to an
allocator that supports it. Pageserver and proxy already use jemalloc.

Touches #9534.

## Summary of changes

Use jemalloc in Safekeeper.
2024-12-05 13:00:40 +02:00
John Spray
008616cfe6 storage controller: use proper ScheduleContext when evacuating a node (#9908)
## Problem

When picking locations for a shard, we should use a ScheduleContext that
includes all the other shards in the tenant, so that we apply proper
anti-affinity between shards. If we don't do this, then it can lead to
unstable scheduling, where we place a shard somewhere that the optimizer
will then immediately move it away from.

We didn't always do this, because it was a bit awkward to accumulate the
context for a tenant rather than just walking tenants.

This was a TODO in `handle_node_availability_transition`:
```
                        // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters
                        // for tenants without secondary locations: if they have a secondary location, then this
                        // schedule() call is just promoting an existing secondary)
```

This is a precursor to https://github.com/neondatabase/neon/issues/8264,
where the current imperfect scheduling during node evacuation hampers
testing.

## Summary of changes

- Add an iterator type that yields each shard along with a
schedulecontext that includes all the other shards from the same tenant
- Use the iterator to replace hand-crafted logic in optimize_all_plan
(functionally identical)
- Use the iterator in `handle_node_availability_transition` to apply
proper anti-affinity during node evacuation.
2024-12-05 13:00:40 +02:00
Conrad Ludgate
e61ec94fbc chore(proxy): vendor a subset of rust-postgres (#9930)
Our rust-postgres fork is getting messy. Mostly because proxy wants more
control over the raw protocol than tokio-postgres provides. As such,
it's diverging more and more. Storage and compute also make use of
rust-postgres, but in more normal usage, thus they don't need our crazy
changes.

Idea: 
* proxy maintains their subset
* other teams use a minimal patch set against upstream rust-postgres

Reviewing this code will be difficult. To implement it, I
1. Copied tokio-postgres, postgres-protocol and postgres-types from
00940fcdb5
2. Updated their package names with the `2` suffix to make them compile
in the workspace.
3. Updated proxy to use those packages
4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches
applied https://github.com/jbg/tokio-postgres-rustls/pull/32
https://github.com/jbg/tokio-postgres-rustls/pull/33)
5. Removed as much dead code as I could find in the vendored libraries
6. Updated the tokio-postgres-rustls code to use our existing channel
binding implementation
2024-12-05 13:00:40 +02:00
Erik Grinaker
e5152551ad test_runner/performance: add logical message ingest benchmark (#9749)
Adds a benchmark for logical message WAL ingestion throughput
end-to-end. Logical messages are essentially noops, and thus ignored by
the Pageserver.

Example results from my MacBook, with fsync enabled:

```
postgres_ingest: 14.445 s
safekeeper_ingest: 29.948 s
pageserver_ingest: 30.013 s
pageserver_recover_ingest: 8.633 s
wal_written: 10,340 MB
message_count: 1310720 messages
postgres_throughput: 715 MB/s
safekeeper_throughput: 345 MB/s
pageserver_throughput: 344 MB/s
pageserver_recover_throughput: 1197 MB/s
```

See
https://github.com/neondatabase/neon/issues/9642#issuecomment-2475995205
for running analysis.

Touches #9642.
2024-12-05 13:00:40 +02:00
Alexey Kondratov
b0822a5499 fix(compute_ctl): Allow usage of DB names with whitespaces (#9919)
## Problem

We used `set_path()` to replace the database name in the connection
string. It automatically does url-safe encoding if the path is not
already encoded, but it does it as per the URL standard, which assumes
that tabs can be safely removed from the path without changing the
meaning of the URL. See, e.g.,
https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks
for DBs with properly %-encoded names, like with `%20`, as they are kept
intact, but actually should be escaped.

Yet, this is not true for Postgres, where it's completely valid to have
trailing tabs in the database name.

I think this is the PR that caused this regression
https://github.com/neondatabase/neon/pull/9717, as it switched from
`postgres::config::Config` back to `set_path()`.

This was fixed a while ago already [1], btw, I just haven't added a test
to catch this regression back then :(

## Summary of changes

This commit changes the code back to use
`postgres/tokio_postgres::Config` everywhere.

While on it, also do some changes around, as I had to touch this code:
1. Bump some logging from `debug` to `info` in the spec apply path. We
do not use `debug` in prod, and it was tricky to understand what was
going on with this bug in prod.
2. Refactor configuration concurrency calculation code so it was
reusable. Yet, still keep `1` in the case of reconfiguration. The
database can be actively used at this moment, so we cannot guarantee
that there will be enough spare connection slots, and the underlying
code won't handle connection errors properly.
3. Simplify the installed extensions code. It was spawning a blocking
task inside async function, which doesn't make much sense. Instead, just
have a main sync function and call it with `spawn_blocking` in the API
code -- the only place we need it to be async.
4. Add regression python test to cover this and related problems in the
future. Also, add more extensive testing of schema dump and DBs and
roles listing API.

[1]:
4d1e48f3b9
[2]:
https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org

Resolves neondatabase/cloud#20869
2024-12-05 13:00:40 +02:00
Alexander Bayandin
1fb6ab59e8 test_runner: rerun all failed tests (#9917)
## Problem

Currently, we rerun only known flaky tests. This approach was chosen to
reduce the number of tests that go unnoticed (by forcing people to take
a look at failed tests and rerun the job manually), but it has some
drawbacks:
- In PRs, people tend to push new changes without checking failed tests
(that's ok)
- In the main, tests are just restarted without checking
(understandable)
- Parametrised tests become flaky one by one, i.e. if `test[1]` is flaky
`, test[2]` is not marked as flaky automatically (which may or may not
be the case).

I suggest rerunning all failed tests to increase the stability of GitHub
jobs and using the Grafana Dashboard with flaky tests for deeper
analysis.

## Summary of changes
- Rerun all failed tests twice at max
2024-12-05 13:00:40 +02:00
Vlad Lazar
e16439400d pageserver: return correct LSN for interpreted proto keep alive responses (#9928)
## Problem

For the interpreted proto the pageserver is not returning the correct
LSN
in replies to keep alive requests. This is because the interpreted
protocol arm
was not updating `last_rec_lsn`.

## Summary of changes

* Return correct LSN in keep-alive responses
* Fix shard field in wal sender traces
2024-12-05 13:00:40 +02:00
Arpad Müller
e401f66698 Update rust to 1.83.0, also update cargo adjacent tools (#9926)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://releases.rs/docs/1.83.0/).

Also update `cargo-hakari`, `cargo-deny`, `cargo-hack` and
`cargo-nextest` to their latest versions.

Prior update was in #9445.
2024-12-05 13:00:40 +02:00
Erik Grinaker
2fa461b668 Makefile: build pg_visibility (#9922)
Build the `pg_visibility` extension for use with `neon_local`. This is
useful to inspect the visibility map for debugging.

Touches #9914.
2024-12-05 13:00:40 +02:00
Vlad Lazar
03d90bc0b3 remote_storage/abs: count 404 and 304 for get as ok for metrics (#9912)
## Problem

We currently see elevated levels of errors for GetBlob requests. This is
because 404 and 304 are counted as errors for metric reporting.

## Summary of Changes

Bring the implementation in line with the S3 client and treat 404 and
304 responses as ok for metric purposes.

Related: https://github.com/neondatabase/cloud/issues/20666
2024-12-05 13:00:40 +02:00
Ivan Efremov
268bc890ea proxy: spawn cancellation checks in the background (#9918)
## Problem
For cancellation, a connection is open during all the cancel checks.
## Summary of changes
Spawn cancellation checks in the background, and close connection
immediately.
Use task_tracker for cancellation checks.
2024-12-05 13:00:40 +02:00
Ivan Efremov
ffc9c33eb2 proxy: Present new auth backend cplane_proxy_v1 (#10012)
Implement a new auth backend based on the current Neon backend to switch
to the new Proxy V1 cplane API.

Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)
2024-12-05 05:30:38 +00:00
Yuchen Liang
ed2d892113 pageserver: fix buffered-writer on macos build (#10019)
## Problem

In https://github.com/neondatabase/neon/pull/9693, we forgot to check
macos build. The [CI
run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468)
on main showed that macos build failed with unused variables and dead
code.

## Summary of changes

- add `allow(dead_code)` and `allow(unused_variables)` to the relevant
code that is not used on macos.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-12-05 02:16:09 +00:00
Conrad Ludgate
131585eb6b chore: update rust-postgres (#10002)
Like #9931 but without rebasing upstream just yet, to try and minimise
the differences.

Removes all proxy-specific commits from the rust-postgres fork, now that
proxy no longer depends on them. Merging upstream changes to come later.
2024-12-04 21:07:44 +00:00
Conrad Ludgate
0bab7e3086 chore: update clap (#10009)
This updates clap to use a new version of anstream
2024-12-04 17:42:17 +00:00
Yuchen Liang
e6cd5050fc pageserver: make BufferedWriter do double-buffering (#9693)
Closes #9387.

## Problem

`BufferedWriter` cannot proceed while the owned buffer is flushing to
disk. We want to implement double buffering so that the flush can happen
in the background. See #9387.

## Summary of changes

- Maintain two owned buffers in `BufferedWriter`.
- The writer is in charge of copying the data into owned, aligned
buffer, once full, submit it to the flush task.
- The flush background task is in charge of flushing the owned buffer to
disk, and returned the buffer to the writer for reuse.
- The writer and the flush background task communicate through a
bi-directional channel.

For in-memory layer, we also need to be able to read from the buffered
writer in `get_values_reconstruct_data`. To handle this case, we did the
following
- Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`,
and use `Arc` to share it between writer and background task.
- leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`,
one clone will be submitted to the channel, the other clone will be
saved within the writer to serve reads. When we want to reuse the
buffer, we can invoke `IoBuffer::into_mut`, which gives us back the
mutable aligned buffer.
- InMemoryLayer reads is now aware of the maybe_flushed part of the
buffer.

**Caveat**

- We removed the owned version of write, because this interface does not
work well with buffer alignment. The result is that without direct IO
enabled,
[`download_object`](a439d57050/pageserver/src/tenant/remote_timeline_client/download.rs (L243))
does one more memcpy than before this PR due to the switch to use
`_borrowed` version of the write.
- "Bypass aligned part of write" could be implemented later to avoid
large amount of memcpy.

**Testing**
- use an oneshot channel based control mechanism to make flush behavior
deterministic in test.
- test reading from `EphemeralFile` when the last submitted buffer is
not flushed, in-progress, and done flushing to disk.


## Performance


We see performance improvement for small values, and regression on big
values, likely due to being CPU bound + disk write latency.


[Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4)


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-04 16:54:56 +00:00
John Spray
60c0d19f57 tests: make storcon scale test AZ-aware (#9952)
## Problem

We have a scale test for the storage controller which also acts as a
good stress test for scheduling stability. However, it created nodes
with no AZs set.

## Summary of changes

- Bump node count to 6 and set AZs on them.

This is a precursor to other AZ-related PRs, to make sure any new code
that's landed is getting scale tested in an AZ-aware environment.
2024-12-04 15:04:04 +00:00
a-masterov
dec2e2fb29 Create a branch for compute release (#9637)
## Problem
We practice a manual release flow for the compute module. This will
allow automation of the compute release process.

## Summary of changes
The workflow was modified to make a compute release automatically on the
branch release-compute.
## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-12-04 13:10:00 +00:00
Erik Grinaker
699a213c5d Display reqwest error source (#10004)
## Problem

Reqwest errors don't include details about the inner source error. This
means that we get opaque errors like:

```
receive body: error sending request for url (http://localhost:9898/v1/location_config)
```

Instead of the more helpful:

```
receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out
```

Touches #9801.

## Summary of changes

Include the source error for `reqwest::Error` wherever it's displayed.
2024-12-04 13:05:53 +00:00
Alexey Kondratov
9a4157dadb feat(compute): Set default application_name for pgbouncer connections (#9973)
## Problem

When client specifies `application_name`, pgbouncer propagates it to the
Postgres. Yet, if client doesn't do it, we have hard time figuring out
who opens a lot of Postgres connections (including the `cloud_admin`
ones).

See this investigation as an example:
https://neondb.slack.com/archives/C0836R0RZ0D

## Summary of changes

I haven't found this documented, but it looks like pgbouncer accepts
standard Postgres connstring parameters in the connstring in the
`[databases]` section, so put the default `application_name=pgbouncer`
there. That way, we will always see who opens Postgres connections. I
did tests, and if client specifies a `application_name`, pgbouncer
overrides this default, so it only works if it's not specified or set to
blank `&application_name=` in the connection string.

This is the last place we could potentially open some Postgres
connections without `application_name`. Everything else should be either
of two:
1. Direct client connections without `application_name`, but these
should be strictly non-`cloud_admin` ones
2. Some ad-hoc internal connections, so if we see spikes of unidentified
`cloud_admin` connections, we will need to investigate it again.

Fixes neondatabase/cloud#20948
2024-12-04 13:05:31 +00:00
Conrad Ludgate
bd52822e14 feat(proxy): add option to forward startup params (#9979)
(stacked on #9990 and #9995)

Partially fixes #1287 with a custom option field to enable the fixed
behaviour. This allows us to gradually roll out the fix without silently
changing the observed behaviour for our customers.

related to https://github.com/neondatabase/cloud/issues/15284
2024-12-04 12:58:35 +00:00
Folke Behrens
dcd016bbfc Assign /libs/proxy/ to proxy team (#10003) 2024-12-04 12:58:31 +00:00
Erik Grinaker
7b18e33997 pageserver: return proper status code for heatmap_upload errors (#9991)
## Problem

During deploys, we see a lot of 500 errors due to heapmap uploads for
inactive tenants. These should be 503s instead.

Resolves #9574.

## Summary of changes

Make the secondary tenant scheduler use `ApiError` rather than
`anyhow::Error`, to propagate the tenant error and convert it to an
appropriate status code.
2024-12-04 12:53:52 +00:00
Peter Bendel
9d75218ba7 fix parsing human time output like "50m37s" (#10001)
## Problem

In ingest_benchmark.yml workflow we use pgcopydb tool to migrate
project.
pgcopydb logs human time.

Our parsing of the human time doesn't work for times like "50m37s".

[Example
workflow](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:479)

contains "57m45s"

but we
[reported](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:500)
only the seconds part: 
45.000 s


## Summary of changes

add a regex pattern for Minute/Second combination
2024-12-04 11:37:24 +00:00
Peter Bendel
1b3558df7a optimize parms for ingest bench (#9999)
## Problem

we tried different parallelism settings for ingest bench 

## Summary of changes

the following settings seem optimal after merging
- SK side Wal filtering
- batched getpages

Settings:
- effective_io_concurrency 100
- concurrency limit 200 (different from Prod!)
- jobs 4, maintenance workers 7
- 10 GB chunk size
2024-12-04 11:07:22 +00:00
Vlad Lazar
68205c48ed storcon: return an error for drain attempts while paused (#9997)
## Problem

We currently allow drain operations to proceed while the node policy is
paused.

## Summary of changes

Return a precondition failed error in such cases. The orchestrator is
updated in https://github.com/neondatabase/infra/pull/2544 to skip drain
and fills if the pageserver is paused.

Closes: https://github.com/neondatabase/neon/issues/9907
2024-12-04 09:25:29 +00:00
Christian Schwarz
8d93d02c2f page_service: enable batching in Rust & Python Tests + Python benchmarks (#9993)
This is the first step towards batching rollout.

Refs

- rollout plan: https://github.com/neondatabase/cloud/issues/20620
- task https://github.com/neondatabase/neon/issues/9377
- uber-epic: https://github.com/neondatabase/neon/issues/9376
2024-12-04 00:07:49 +00:00
Alexander Bayandin
023821a80c test_page_service_batching: fix non-numeric metrics (#9998)
## Problem

```
2024-12-03T15:42:46.5978335Z + poetry run python /__w/neon/neon/scripts/ingest_perf_test_result.py --ingest /__w/neon/neon/test_runner/perf-report-local
2024-12-03T15:42:49.5325077Z Traceback (most recent call last):
2024-12-03T15:42:49.5325603Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 165, in <module>
2024-12-03T15:42:49.5326029Z     main()
2024-12-03T15:42:49.5326316Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 155, in main
2024-12-03T15:42:49.5326739Z     ingested = ingest_perf_test_result(cur, item, recorded_at_timestamp)
2024-12-03T15:42:49.5327488Z                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-12-03T15:42:49.5327914Z   File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 99, in ingest_perf_test_result
2024-12-03T15:42:49.5328321Z     psycopg2.extras.execute_values(
2024-12-03T15:42:49.5328940Z   File "/github/home/.cache/pypoetry/virtualenvs/non-package-mode-_pxWMzVK-py3.11/lib/python3.11/site-packages/psycopg2/extras.py", line 1299, in execute_values
2024-12-03T15:42:49.5335618Z     cur.execute(b''.join(parts))
2024-12-03T15:42:49.5335967Z psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "concurrent-futures"
2024-12-03T15:42:49.5336287Z LINE 57:             'concurrent-futures',
2024-12-03T15:42:49.5336462Z                      ^
```

## Summary of changes
- `test_page_service_batching`: save non-numeric params as `labels`
- Add a runtime check that `metric_value` is NUMERIC
2024-12-03 22:46:18 +00:00
Christian Schwarz
944c1adc4c tests & benchmarks: unify the way we customize the default tenant config (#9992)
Before this PR, some override callbacks used `.default()`, others
used `.setdefault()`.

As of this PR, all callbacks use `.setdefault()` which I think is least
prone to failure.

Aligning on a single way will set the right example for future tests
that need such customization.

The `test_pageserver_getpage_throttle.py` technically is a change in
behavior: before, it replaced the `tenant_config` field, now it just
configures the throttle. This is what I believe is intended anyway.
2024-12-03 22:07:03 +00:00
Arpad Müller
ca85f364ba Support tenant manifests in the scrubber (#9942)
Support tenant manifests in the storage scrubber:

* list the manifests, order them by generation
* delete all manifests except for the two most recent generations
* for the latest manifest: try parsing it.

I've tested this patch by running the against a staging bucket and it
successfully deleted stuff (and avoided deleting the latest two
generations).

In follow-up work, we might want to also check some invariants of the
manifest, as mentioned in #8088.

Part of #9386
Part of #8088

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-12-03 20:39:10 +00:00
Conrad Ludgate
9ef0662a42 chore(proxy): enforce single host+port (#9995)
proxy doesn't ever provide multiple hosts/ports, so this code adds a lot
of complexity of error handling for no good reason.

(stacked on #9990)
2024-12-03 20:00:14 +00:00
Alexey Immoreev
3baef0bca3 Improvement: add console redirect timeout warning (#9985)
## Problem

There is no information on session being cancelled in 2 minutes at the
moment

## Summary of changes

The timeout being logged for the user
2024-12-03 18:59:44 +00:00
Erik Grinaker
f312c6571f pageserver: respond to multiple shutdown signals (#9982)
## Problem

The Pageserver signal handler would only respond to a single signal and
initiate shutdown. Subsequent signals were ignored. This meant that a
`SIGQUIT` sent after a `SIGTERM` had no effect (e.g. in the case of a
slow or stalled shutdown). The `test_runner` uses this to force shutdown
if graceful shutdown is slow.

Touches #9740.

## Summary of changes

Keep responding to signals after the initial shutdown signal has been
received.

Arguably, the `test_runner` should also use `SIGKILL` rather than
`SIGQUIT` in this case, but it seems reasonable to respond to `SIGQUIT`
regardless.
2024-12-03 18:47:17 +00:00
Conrad Ludgate
27a42d0f96 chore(proxy): remove postgres config parser and md5 support (#9990)
Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres
allows us to remove a lot of dead weight from our actual postgres
connection logic.
2024-12-03 18:39:23 +00:00
John Spray
b04ab468ee pageserver: more detailed logs when calling re-attach (#9996)
## Problem

We saw a peculiar case where a pageserver apparently got a 0-tenant
response to `/re-attach` but we couldn't see the request landing on a
storage controller. It was hard to confirm retrospectively that the
pageserver was configured properly at the moment it sent the request.

## Summary of changes

- Log the URL to which we are sending the request
- Log the NodeId and metadata that we sent
2024-12-03 18:36:37 +00:00
John Spray
dcb629532b pageserver: only store SLRUs & aux files on shard zero (#9786)
## Problem

Since https://github.com/neondatabase/neon/pull/9423 the non-zero shards
no longer need SLRU content in order to do GC. This data is now
redundant on shards >0.

One release cycle after merging that PR, we may merge this one, which
also stops writing those pages to shards > 0, reaping the efficiency
benefit.

Closes: https://github.com/neondatabase/neon/issues/7512
Closes: https://github.com/neondatabase/neon/issues/9641

## Summary of changes

- Avoid storing SLRUs on non-zero shards
- Bonus: avoid storing aux files on non-zero shards
2024-12-03 17:22:49 +00:00
John Spray
71d004289c storcon: in shard splits, inherit parent's AZ (#9946)
## Problem

Sharded tenants should be run in a single AZ for best performance, so
that computes have AZ-local latency to all the shards.

Part of https://github.com/neondatabase/neon/issues/8264

## Summary of changes

- When we split a tenant, instead of updating each shard's preferred AZ
to wherever it is scheduled, propagate the preferred AZ from the parent.
- Drop the check in `test_shard_preferred_azs` that asserts shards end
up in their preferred AZ: this will not be true again until the
optimize_attachment logic is updated to make this so. The existing check
wasn't testing anything about scheduling, it was just asserting that we
set preferred AZ in a way that matches the way things happen to be
scheduled at time of split.
2024-12-03 16:55:00 +00:00
Christian Schwarz
4d422b937c pageserver: only throttle pagestream requests & bring back throttling deduction for smgr latency metrics (#9962)
## Problem

In the batching PR 
- https://github.com/neondatabase/neon/pull/9870

I stopped deducting the time-spent-in-throttle fro latency metrics,
i.e.,
- smgr latency metrics (`SmgrOpTimer`)
- basebackup latency (+scan latency, which I think is part of
basebackup).

The reason for stopping the deduction was that with the introduction of
batching, the trick with tracking time-spent-in-throttle inside
RequestContext and swap-replacing it from the `impl Drop for
SmgrOpTimer` no longer worked with >1 requests in a batch.

However, deducting time-spent-in-throttle is desirable because our
internal latency SLO definition does not account for throttling.

## Summary of changes

- Redefine throttling to be a page_service pagestream request throttle
instead of a throttle for repository `Key` reads through `Timeline::get`
/ `Timeline::get_vectored`.
- This means reads done by `basebackup` are no longer subject to any
throttle.
- The throttle applies after batching, before handling of the request.
- Drive-by fix: make throttle sensitive to cancellation.
- Rename metric label `kind` from `timeline_get` to `pagestream` to
reflect the new scope of throttling.

To avoid config format breakage, we leave the config field named
`timeline_get_throttle` and ignore the `task_kinds` field.
This will be cleaned up in a future PR.

## Trade-Offs

Ideally, we would apply the throttle before reading a request off the
connection, so that we queue the minimal amount of work inside the
process.
However, that's not possible because we need to do shard routing.

The redefinition of the throttle to limit pagestream request rate
instead of repository `Key` rate comes with several downsides:
- We're no longer able to use the throttle mechanism for other other
tasks, e.g. image layer creation.
  However, in practice, we never used that capability anyways.
- We no longer throttle basebackup.
2024-12-03 15:25:58 +00:00
Erik Grinaker
bbe4dfa991 test_runner: use immediate shutdown in test_sharded_ingest (#9984)
## Problem

`test_sharded_ingest` ingests a lot of data, which can cause shutdown to
be slow e.g. due to local "S3 uploads" or compactions. This can cause
test flakes during teardown.

Resolves #9740.

## Summary of changes

Perform an immediate shutdown of the cluster.
2024-12-03 14:33:31 +00:00
Erik Grinaker
dcb24ce170 safekeeper,pageserver: add heap profiling (#9778)
## Problem

We don't have good observability for memory usage. This would be useful
e.g. to debug OOM incidents or optimize performance or resource usage.

We would also like to use continuous profiling with e.g. [Grafana Cloud
Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/)
(see https://github.com/neondatabase/cloud/issues/14888).

This PR is intended as a proof of concept, to try it out in staging and
drive further discussions about profiling more broadly.

Touches https://github.com/neondatabase/neon/issues/9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
Depends on #9779.
Depends on #9780.

## Summary of changes

Adds a HTTP route `/profile/heap` that takes a heap profile and returns
it. Query parameters:

* `format`: output format (`jemalloc` or `pprof`; default `pprof`).

Unlike CPU profiles (see #9764), heap profiles are not symbolized and
require the original binary to translate addresses to function names. To
make this work with Grafana, we'll probably have to symbolize the
process server-side -- this is left as future work, as is other output
formats like SVG.

Heap profiles don't work on macOS due to limitations in jemalloc.
2024-12-03 11:35:59 +00:00
a-masterov
a2a942f93c Add support for the extensions test for Postgres v17 (#9748)
## Problem
The extensions for Postgres v17 are ready but we do not test the
extensions shipped with v17
## Summary of changes
Build the test image based on Postgres v17. Run the tests for v17.

---------

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2024-12-03 11:25:29 +00:00
Christian Schwarz
cb10be710d page_service: batching observability & include throttled time in smgr metrics (#9870)
This PR 

- fixes smgr metrics https://github.com/neondatabase/neon/issues/9925 
- adds an additional startup log line logging the current batching
config
- adds a histogram of batch sizes global and per-tenant
- adds a metric exposing the current batching config

The issue described #9925 is that before this PR, request latency was
only observed *after* batching.
This means that smgr latency metrics (most importantly getpage latency)
don't account for
- `wait_lsn` time 
- time spent waiting for batch to fill up / the executor stage to pick
up the batch.

The fix is to use a per-request batching timer, like we did before the
initial batching PR.
We funnel those timers through the entire request lifecycle.

I noticed that even before the initial batching changes, we weren't
accounting for the time spent writing & flushing the response to the
wire.
This PR drive-by fixes that deficiency by dropping the timers at the
very end of processing the batch, i.e., after the `pgb.flush()` call.

I was **unable to maintain the behavior that we deduct
time-spent-in-throttle from various latency metrics.
The reason is that we're using a *single* counter in `RequestContext` to
track micros spent in throttle.
But there are *N* metrics timers in the batch, one per request.
As a consequence, the practice of consuming the counter in the drop
handler of each timer no longer works because all but the first timer
will encounter error `close() called on closed state`.
A failed attempt to maintain the current behavior can be found in
https://github.com/neondatabase/neon/pull/9951.

So, this PR remvoes the deduction behavior from all metrics.
I started a discussion on Slack about it the implications this has for
our internal SLO calculation:
https://neondb.slack.com/archives/C033RQ5SPDH/p1732910861704029

# Refs

- fixes https://github.com/neondatabase/neon/issues/9925
- sub-issue https://github.com/neondatabase/neon/issues/9377
- epic: https://github.com/neondatabase/neon/issues/9376
2024-12-03 11:03:23 +00:00
Christian Schwarz
15d01b257a storcon_cli tenant-describe: include tenant-wide information in output (#9899)
Before this PR, the storcon_cli didn't have a way to show the
tenant-wide information of the TenantDescribeResponse.

Sadly, the `Serialize` impl for the tenant config doesn't skip on
`None`, so, the output becomes a bit bloated.
Maybe we can use `skip_serializing_if(Option::is_none)` in the future.
=> https://github.com/neondatabase/neon/issues/9983
2024-12-03 10:55:13 +00:00
John Spray
aaee713e53 storcon: use proper schedule context during node delete (#9958)
## Problem

I was touching `test_storage_controller_node_deletion` because for AZ
scheduling work I was adding a change to the storage controller (kick
secondaries during optimisation) that made a FIXME in this test defunct.
While looking at it I also realized that we can easily fix the way node
deletion currently doesn't use a proper ScheduleContext, using the
iterator type recently added for that purpose.

## Summary of changes

- A testing-only behavior in storage controller where if a secondary
location isn't yet ready during optimisation, it will be actively
polled.
- Remove workaround in `test_storage_controller_node_deletion` that
previously was needed because optimisation would get stuck on cold
secondaries.
- Update node deletion code to use a `TenantShardContextIterator` and
thereby a proper ScheduleContext
2024-12-03 08:59:38 +00:00
Alexey Kondratov
2e9207fdf3 fix(testing): Use 1 MB shared_buffers even with LFC (#9969)
## Problem

After enabling LFC in tests and lowering `shared_buffers` we started
having more problems with `test_pg_regress`.

## Summary of changes

Set `shared_buffers` to 1MB to both exercise getPage requests/LFC, and
still have enough room for Postgres to operate. Everything smaller might
be not enough for Postgres under load, and can cause errors like 'no
unpinned buffers available'.

See Konstantin's comment [1] as well.

Fixes #9956

[1]:
https://github.com/neondatabase/neon/issues/9956#issuecomment-2511608097
2024-12-02 18:46:06 +00:00
Tristan Partin
d8ebd33fe6 Stop changing the value of neon.extension_server_port at runtime (#9972)
On reconfigure, we no longer passed a port for the extension server
which caused us to not write out the neon.extension_server_port line.
Thus, Postgres thought we were setting the port to the default value of
0. PGC_POSTMASTER GUCs cannot be set at runtime, which causes the
following log messages:

> LOG: parameter "neon.extension_server_port" cannot be changed without
restarting the server
> LOG: configuration file
"/var/db/postgres/compute/pgdata/postgresql.conf" contains errors;
unaffected changes were applied

Fixes: https://github.com/neondatabase/neon/issues/9945

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-12-02 18:06:19 +00:00
Conrad Ludgate
2dc238e5b3 feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971)
Fix the HTTP AuthMethod to accomodate the JWT authorization method.
Introduces the JWT issuer as an additional field in the parquet logs
2024-12-02 17:54:32 +00:00
Folke Behrens
243bca1c49 Bump OTel, tracing, reqwest crates (#9970) 2024-12-02 17:24:48 +00:00
Arseny Sher
fa909c27fc Update consensus protocol spec (#9607)
The spec was written for the buggy protocol which we had before the one
more similar to Raft was implemented. Update the spec with what we
currently have.

ref https://github.com/neondatabase/neon/issues/8699
2024-12-02 16:10:44 +00:00
Folke Behrens
1b60571636 proxy: Create Elasticache credentials provider lazily (#9967)
## Problem

The credentials providers tries to connect to AWS STS even when we use
plain Redis connections.

## Summary of changes

* Construct the CredentialsProvider only when needed ("irsa").
2024-12-02 15:38:12 +00:00
Alexander Bayandin
c18716bb3f CI(replication-tests): fix notifications about replication-tests failures (#9950)
## Problem

`if: ${{ github.event.schedule }}` gets skipped if a previous step has
failed, but we want to run the step for both `success` and `failure`

## Summary of changes
- Add `!cancelled()` to notification step if-condition, to skip only
cancelled jobs
2024-12-02 12:46:07 +00:00
Conrad Ludgate
cd1d2d1996 fix(proxy): forward notifications from authentication (#9948)
Fixes https://github.com/neondatabase/cloud/issues/20973. 

This refactors `connect_raw` in order to return direct access to the
delayed notices.

I cannot find a way to test this with psycopg2 unfortunately, although
testing it with psql does return the expected results.
2024-12-02 12:29:57 +00:00
John Spray
bd09369198 storcon: add metric for AZ scheduling violations (#9949)
## Problem

We can't easily tell how far the state of shards is from their AZ
preferences. This can be a cause of performance issues, so it's
important for diagnosability that we can tell easily if there are
significant numbers of shards that aren't running in their preferred AZ.

Related: https://github.com/neondatabase/cloud/issues/15413

## Summary of changes

- In reconcile_all, count shards that are scheduled into the wrong AZ
(if they have a preference), and publish it as a prometheus gauge.
- Also calculate a statistic for how many shards wanted to reconcile but
couldn't.

This is clearly a lazy calculation: reconcile all only runs
periodically. But that's okay: shards in the wrong AZ is something that
only matters if it stays that way for some period of time.
2024-12-02 11:50:22 +00:00
Erik Grinaker
5330122049 test_runner: improve wait_until (#9936)
Improves `wait_until` by:

* Use `timeout` instead of `iterations`. This allows changing the
timeout/interval parameters independently.
* Make `timeout` and `interval` optional (default 20s and 0.5s). Most
callers don't care.
* Only output status every 1s by default, and add optional
`status_interval` parameter.
* Remove `show_intermediate_error`, this was always emitted anyway.

Most callers have been updated to use the defaults, except where they
had good reason otherwise.
2024-12-02 10:26:15 +00:00
Anastasia Lubennikova
45658ccccb Update pgvector to 0.8.0 (#9733) 2024-12-02 10:10:51 +00:00
John Spray
14853a3284 storcon: don't take any Service locks in /status and /ready (#9944)
## Problem

We saw unexpected container terminations when running in k8s with with
small CPU resource requests.

The /status and /ready handlers called `maybe_forward`, which always
takes the lock on Service::inner.

If there is a lot of writer lock contention, and the container is
starved of CPU, this increases the likelihood that we will get killed by
the kubelet.

It isn't certain that this was a cause of issues, but it is a potential
source that we can eliminate.

## Summary of changes

- Revise logic to return immediately if the URL is in the non-forwarded
list, rather than calling maybe_forward
2024-12-01 18:09:58 +00:00
Konstantin Knizhnik
aad809b048 Fix issues with prefetch ring buffer resize (#9847)
## Problem

See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479


We observe the following error in the logs 
```
[XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128
```
most likely caused by changing `neon.readahead_buffer_size`

## Summary of changes

1. Copy shard state
2. Do not use prefetch_set_unused in readahead_buffer_resize
3. Change prefetch buffer overflow criteria

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-12-01 15:47:28 +00:00
Alexander Bayandin
fae8e7ba76 Compute image: prepare Postgres v14-v16 for Debian 12 (#9954)
## Problem

Current compute images for Postgres 14-16 don't build on Debian 12
because of issues with extensions.
This PR fixes that, but for the current setup, it is mostly a no-op
change.

## Summary of changes
- Use `/bin/bash -euo pipefail` as SHELL to fail earlier
- Fix `plv8` build: backport a trivial patch for v8
- Fix `postgis` build: depend `sfgal` version on Debian version instead
of Postgres version


Tested in: https://github.com/neondatabase/neon/pull/9849
2024-12-01 13:04:37 +00:00
Konstantin Knizhnik
97a9abd181 Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057)
## Problem

See https://github.com/neondatabase/neon/issues/9023

## Summary of changes

Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of
replica and primary configuration mismatch

See https://github.com/neondatabase/postgres/pull/501
See https://github.com/neondatabase/postgres/pull/502
See https://github.com/neondatabase/postgres/pull/503
See https://github.com/neondatabase/postgres/pull/504


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-12-01 12:23:10 +00:00
Folke Behrens
4abc8e5282 Merge the consumption metric pushes (#9939)
#8564

## Problem

The main and backup consumption metric pushes are completely
independent,
resulting in different event time windows and different idempotency
keys.

## Summary of changes

* Merge the push tasks, but keep chunks the same size.
2024-11-30 10:11:37 +00:00
Christian Schwarz
aa4ec11af9 page_service: rewrite batching to work without a timeout (#9851)
# Problem

The timeout-based batching adds latency to unbatchable workloads.

We can choose a short batching timeout (e.g. 10us) but that requires
high-resolution timers, which tokio doesn't have.
I thoroughly explored options to use OS timers (see
[this](https://github.com/neondatabase/neon/pull/9822) abandoned PR).
In short, it's not an attractive option because any timer implementation
adds non-trivial overheads.

# Solution

The insight is that, in the steady state of a batchable workload, the
time we spend in `get_vectored` will be hundreds of microseconds anyway.

If we prepare the next batch concurrently to `get_vectored`, we will
have a sizeable batch ready once `get_vectored` of the current batch is
done and do not need an explicit timeout.

This can be reasonably described as **pipelining of the protocol
handler**.

# Implementation

We model the sub-protocol handler for pagestream requests
(`handle_pagrequests`) as two futures that form a pipeline:

2. Batching: read requests from the connection and fill the current
batch
3. Execution: `take` the current batch, execute it using `get_vectored`,
and send the response.

The Reading and Batching stage are connected through a new type of
channel called `spsc_fold`.

See the long comment in the `handle_pagerequests_pipelined` for details.

# Changes

- Refactor `handle_pagerequests`
    - separate functions for
- reading one protocol message; produces a `BatchedFeMessage` with just
one page request in it
- batching; tried to merge an incoming `BatchedFeMessage` into an
existing `BatchedFeMessage`; returns `None` on success and returns back
the incoming message in case merging isn't possible
        - execution of a batched message
- unify the timeline handle acquisition & request span construction; it
now happen in the function that reads the protocol message
- Implement serial and pipelined model
    - serial: what we had before any of the batching changes
      - read one protocol message
      - execute protocol messages
    - pipelined: the design described above
- optionality for execution of the pipeline: either via concurrent
futures vs tokio tasks
- Pageserver config
  - remove batching timeout field
  - add ability to configure pipelining mode
- add ability to limit max batch size for pipelined configurations
(required for the rollout, cf
https://github.com/neondatabase/cloud/issues/20620 )
  - ability to configure execution mode
- Tests
  - remove `batch_timeout` parametrization
  - rename `test_getpage_merge_smoke` to `test_throughput`
- add parametrization to test different max batch sizes and execution
moes
  - rename `test_timer_precision` to `test_latency`
  - rename the test case file to `test_page_service_batching.py`
  - better descriptions of what the tests actually do

## On the holding The `TimelineHandle` in the pending batch

While batching, we hold the `TimelineHandle` in the pending batch.
Therefore, the timeline will not finish shutting down while we're
batching.

This is not a problem in practice because the concurrently ongoing
`get_vectored` call will fail quickly with an error indicating that the
timeline is shutting down.
This results in the Execution stage returning a `QueryError::Shutdown`,
which causes the pipeline / entire page service connection to shut down.
This drops all references to the
`Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the
contained `TimelineHandle`s.

- => fixes https://github.com/neondatabase/neon/issues/9850

# Performance

Local run of the benchmarks, results in [this empty
commit](1cf5b1463f)
in the PR branch.

Key take-aways:
* `concurrent-futures` and `tasks` deliver identical `batching_factor`
* tail latency impact unknown, cf
https://github.com/neondatabase/neon/issues/9837
* `concurrent-futures` has higher throughput than `tasks` in all
workloads (=lower `time` metric)
* In unbatchable workloads, `concurrent-futures` has 5% higher
`CPU-per-throughput` than that of `tasks`, and 15% higher than that of
`serial`.
* In batchable-32 workload, `concurrent-futures` has 8% lower
`CPU-per-throughput` than that of `tasks` (comparison to tput of
`serial` is irrelevant)
* in unbatchable workloads, mean and tail latencies of
`concurrent-futures` is practically identical to `serial`, whereas
`tasks` adds 20-30us of overhead

Overall, `concurrent-futures` seems like a slightly more attractive
choice.

# Rollout

This change is disabled-by-default.

Rollout plan:
- https://github.com/neondatabase/cloud/issues/20620

# Refs

- epic: https://github.com/neondatabase/neon/issues/9376
- this sub-task: https://github.com/neondatabase/neon/issues/9377
- the abandoned attempt to improve batching timeout resolution:
https://github.com/neondatabase/neon/pull/9820
- closes https://github.com/neondatabase/neon/issues/9850
- fixes https://github.com/neondatabase/neon/issues/9835
2024-11-30 00:16:24 +00:00
Matthias van de Meent
973a8d2680 Fix timeout value used in XLogWaitForReplayOf (#9937)
The previous value assumed usec precision, while the timeout used is in
milliseconds, causing replica backends to wait for (potentially) many
hours for WAL replay without the expected progress reports in logs.

This fixes the issue.

Reported-By: Alexander Lakhin <exclusion@gmail.com>

## Problem


https://github.com/neondatabase/postgres/pull/279#issuecomment-2507671817

The timeout value was configured with the assumption the indicated value
would be microseconds, where it's actually milliseconds. That causes the
backend to wait for much longer (2h46m40s) before it emits the "I'm
waiting for recovery" message. While we do have wait events configured
on this, it's not great to have stuck backends without clear logs, so
this fixes the timeout value in all our PostgreSQL branches.

## PG PRs

* PG14: https://github.com/neondatabase/postgres/pull/542
* PG15: https://github.com/neondatabase/postgres/pull/543
* PG16: https://github.com/neondatabase/postgres/pull/544
* PG17: https://github.com/neondatabase/postgres/pull/545
2024-11-29 19:10:26 +00:00
Gleb Novikov
c848f25ec2 Fixed fast_import pgbin in calling get_pg_version (#9933)
Was working on https://github.com/neondatabase/cloud/pull/20795 and
discovered that fast_import is not working normally.
2024-11-29 17:58:36 +00:00
John Spray
d5624cc505 pageserver: download small objects using a smaller timeout (#9938)
## Problem

It appears that the Azure storage API tends to hang TCP connections more
than S3 does.

Currently we use a 2 minute timeout for all downloads. This is large
because sometimes the objects we download are large. However, waiting 2
minutes when doing something like downloading a manifest on tenant
attach is problematic, because when someone is doing a "create tenant,
create timeline" workflow, that 2 minutes is long enough for them
reasonably to give up creating that timeline.

Rather than propagate oversized timeouts further up the stack, we should
use a different timeout for objects that we expect to be small.

Closes: https://github.com/neondatabase/neon/issues/9836

## Summary of changes

- Add a `small_timeout` configuration attribute to remote storage,
defaulting to 30 seconds (still a very generous period to do something
like download an index)
- Add a DownloadKind parameter to DownloadOpts, so that callers can
indicate whether they expect the object to be small or large.
- In the azure client, use small timeout for HEAD requests, and for GET
requests if DownloadKind::Small is used.
- Use DownloadKind::Small for manifests, indices, and heatmap downloads.

This PR intentionally does not make the equivalent change to the S3
client, to reduce blast radius in case this has unexpected consequences
(we could accomplish the same thing by editing lots of configs, but just
skipping the code is simpler for right now)
2024-11-29 15:11:44 +00:00
Alexey Kondratov
538e2312a6 feat(compute_ctl): Always set application_name (#9934)
## Problem

It was not always possible to judge what exactly some `cloud_admin`
connections were doing because we didn't consistently set
`application_name` everywhere.

## Summary of changes

Unify the way we connect to Postgres:
1. Switch to building configs everywhere
2. Always set `application_name` and make naming consistent

Follow-up for #9919
Part of neondatabase/cloud#20948
2024-11-29 13:55:56 +00:00
Erik Grinaker
a6073b5013 safekeeper: use jemalloc (#9780)
## Problem

To add Safekeeper heap profiling in #9778, we need to switch to an
allocator that supports it. Pageserver and proxy already use jemalloc.

Touches #9534.

## Summary of changes

Use jemalloc in Safekeeper.
2024-11-29 13:38:04 +00:00
John Spray
ea3798e3b3 storage controller: use proper ScheduleContext when evacuating a node (#9908)
## Problem

When picking locations for a shard, we should use a ScheduleContext that
includes all the other shards in the tenant, so that we apply proper
anti-affinity between shards. If we don't do this, then it can lead to
unstable scheduling, where we place a shard somewhere that the optimizer
will then immediately move it away from.

We didn't always do this, because it was a bit awkward to accumulate the
context for a tenant rather than just walking tenants.

This was a TODO in `handle_node_availability_transition`:
```
                        // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters
                        // for tenants without secondary locations: if they have a secondary location, then this
                        // schedule() call is just promoting an existing secondary)
```

This is a precursor to https://github.com/neondatabase/neon/issues/8264,
where the current imperfect scheduling during node evacuation hampers
testing.

## Summary of changes

- Add an iterator type that yields each shard along with a
schedulecontext that includes all the other shards from the same tenant
- Use the iterator to replace hand-crafted logic in optimize_all_plan
(functionally identical)
- Use the iterator in `handle_node_availability_transition` to apply
proper anti-affinity during node evacuation.
2024-11-29 13:27:49 +00:00
Conrad Ludgate
1d642d6a57 chore(proxy): vendor a subset of rust-postgres (#9930)
Our rust-postgres fork is getting messy. Mostly because proxy wants more
control over the raw protocol than tokio-postgres provides. As such,
it's diverging more and more. Storage and compute also make use of
rust-postgres, but in more normal usage, thus they don't need our crazy
changes.

Idea: 
* proxy maintains their subset
* other teams use a minimal patch set against upstream rust-postgres

Reviewing this code will be difficult. To implement it, I
1. Copied tokio-postgres, postgres-protocol and postgres-types from
00940fcdb5
2. Updated their package names with the `2` suffix to make them compile
in the workspace.
3. Updated proxy to use those packages
4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches
applied https://github.com/jbg/tokio-postgres-rustls/pull/32
https://github.com/jbg/tokio-postgres-rustls/pull/33)
5. Removed as much dead code as I could find in the vendored libraries
6. Updated the tokio-postgres-rustls code to use our existing channel
binding implementation
2024-11-29 11:08:01 +00:00
Erik Grinaker
3ffe6de0b9 test_runner/performance: add logical message ingest benchmark (#9749)
Adds a benchmark for logical message WAL ingestion throughput
end-to-end. Logical messages are essentially noops, and thus ignored by
the Pageserver.

Example results from my MacBook, with fsync enabled:

```
postgres_ingest: 14.445 s
safekeeper_ingest: 29.948 s
pageserver_ingest: 30.013 s
pageserver_recover_ingest: 8.633 s
wal_written: 10,340 MB
message_count: 1310720 messages
postgres_throughput: 715 MB/s
safekeeper_throughput: 345 MB/s
pageserver_throughput: 344 MB/s
pageserver_recover_throughput: 1197 MB/s
```

See
https://github.com/neondatabase/neon/issues/9642#issuecomment-2475995205
for running analysis.

Touches #9642.
2024-11-29 09:40:08 +00:00
Alexey Kondratov
42fb3c4d30 fix(compute_ctl): Allow usage of DB names with whitespaces (#9919)
## Problem

We used `set_path()` to replace the database name in the connection
string. It automatically does url-safe encoding if the path is not
already encoded, but it does it as per the URL standard, which assumes
that tabs can be safely removed from the path without changing the
meaning of the URL. See, e.g.,
https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks
for DBs with properly %-encoded names, like with `%20`, as they are kept
intact, but actually should be escaped.

Yet, this is not true for Postgres, where it's completely valid to have
trailing tabs in the database name.

I think this is the PR that caused this regression
https://github.com/neondatabase/neon/pull/9717, as it switched from
`postgres::config::Config` back to `set_path()`.

This was fixed a while ago already [1], btw, I just haven't added a test
to catch this regression back then :(

## Summary of changes

This commit changes the code back to use
`postgres/tokio_postgres::Config` everywhere.

While on it, also do some changes around, as I had to touch this code:
1. Bump some logging from `debug` to `info` in the spec apply path. We
do not use `debug` in prod, and it was tricky to understand what was
going on with this bug in prod.
2. Refactor configuration concurrency calculation code so it was
reusable. Yet, still keep `1` in the case of reconfiguration. The
database can be actively used at this moment, so we cannot guarantee
that there will be enough spare connection slots, and the underlying
code won't handle connection errors properly.
3. Simplify the installed extensions code. It was spawning a blocking
task inside async function, which doesn't make much sense. Instead, just
have a main sync function and call it with `spawn_blocking` in the API
code -- the only place we need it to be async.
4. Add regression python test to cover this and related problems in the
future. Also, add more extensive testing of schema dump and DBs and
roles listing API.

[1]:
4d1e48f3b9
[2]:
https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org

Resolves neondatabase/cloud#20869
2024-11-28 21:38:30 +00:00
Alexander Bayandin
e04dd3be0b test_runner: rerun all failed tests (#9917)
## Problem

Currently, we rerun only known flaky tests. This approach was chosen to
reduce the number of tests that go unnoticed (by forcing people to take
a look at failed tests and rerun the job manually), but it has some
drawbacks:
- In PRs, people tend to push new changes without checking failed tests
(that's ok)
- In the main, tests are just restarted without checking
(understandable)
- Parametrised tests become flaky one by one, i.e. if `test[1]` is flaky
`, test[2]` is not marked as flaky automatically (which may or may not
be the case).

I suggest rerunning all failed tests to increase the stability of GitHub
jobs and using the Grafana Dashboard with flaky tests for deeper
analysis.

## Summary of changes
- Rerun all failed tests twice at max
2024-11-28 19:02:57 +00:00
Vlad Lazar
eb520a14ce pageserver: return correct LSN for interpreted proto keep alive responses (#9928)
## Problem

For the interpreted proto the pageserver is not returning the correct
LSN
in replies to keep alive requests. This is because the interpreted
protocol arm
was not updating `last_rec_lsn`.

## Summary of changes

* Return correct LSN in keep-alive responses
* Fix shard field in wal sender traces
2024-11-28 17:38:47 +00:00
Arpad Müller
eb5d832e6f Update rust to 1.83.0, also update cargo adjacent tools (#9926)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://releases.rs/docs/1.83.0/).

Also update `cargo-hakari`, `cargo-deny`, `cargo-hack` and
`cargo-nextest` to their latest versions.

Prior update was in #9445.
2024-11-28 15:49:30 +00:00
Erik Grinaker
70780e310c Makefile: build pg_visibility (#9922)
Build the `pg_visibility` extension for use with `neon_local`. This is
useful to inspect the visibility map for debugging.

Touches #9914.
2024-11-28 15:48:18 +00:00
Vlad Lazar
e82f7f0dfc remote_storage/abs: count 404 and 304 for get as ok for metrics (#9912)
## Problem

We currently see elevated levels of errors for GetBlob requests. This is
because 404 and 304 are counted as errors for metric reporting.

## Summary of Changes

Bring the implementation in line with the S3 client and treat 404 and
304 responses as ok for metric purposes.

Related: https://github.com/neondatabase/cloud/issues/20666
2024-11-28 10:11:08 +00:00
Folke Behrens
8a6ee79f6f Merge pull request #9921 from neondatabase/rc/release-proxy/2024-11-28
Proxy release 2024-11-28
2024-11-28 11:09:06 +01:00
Ivan Efremov
8173dc600a proxy: spawn cancellation checks in the background (#9918)
## Problem
For cancellation, a connection is open during all the cancel checks.
## Summary of changes
Spawn cancellation checks in the background, and close connection
immediately.
Use task_tracker for cancellation checks.
2024-11-28 06:32:22 +00:00
github-actions[bot]
9052c32b46 Proxy release 2024-11-28 2024-11-28 06:02:15 +00:00
Erik Grinaker
da1daa2426 pageserver: only apply ClearVmBits on relevant shards (#9895)
# Problem

VM (visibility map) pages are stored and managed as any regular relation
page, in the VM fork of the main relation. They are also sharded like
other pages. Regular WAL writes to the VM pages (typically performed by
vacuum) are routed to the correct shard as usual. However, VM pages are
also updated via `ClearVmBits` metadata records emitted when main
relation pages are updated. These metadata records were sent to all
shards, like other metadata records. This had the following effects:

* On shards responsible for VM pages, the `ClearVmBits` applies as
expected.

* On shard 0, which knows about the VM relation and its size but doesn't
necessarily have any VM pages, the `ClearVmBits` writes may have been
applied without also having applied the explicit WAL writes to VM pages.

* If VM pages are spread across multiple shards (unlikely with 256MB
stripe size), all shards may have applied `ClearVmBits` if the pages
fall within their local view of the relation size, even for pages they
do not own.

* On other shards, this caused a relation size cache miss and a DbDir
and RelDir lookup before dropping the `ClearVmBits`. With many
relations, this could cause significant CPU overhead.

This is not believed to be a correctness problem, but this will be
verified in #9914.

Resolves #9855.

# Changes

Route `ClearVmBits` metadata records only to the shards responsible for
the VM pages.

Verification of the current VM handling and cleanup of incomplete VM
pages on shard 0 (and potentially elsewhere) is left as follow-up work.
2024-11-27 19:44:24 +00:00
Alex Chi Z.
9e3cb75bc7 fix(pageserver): flush deletion queue in reload shutdown mode (#9884)
## Problem

close https://github.com/neondatabase/neon/issues/9859

## Summary of changes

Ensure that the deletion queue gets fully flushed (i.e., the deletion
lists get applied) during a graceful shutdown.

It is still possible that an incomplete shutdown would leave deletion
list behind and cause race upon the next startup, but we assume this
will unlikely happen, and even if it happened, the pageserver should
already be at a tainted state and the tenant should be moved to a new
tenant with a new generation number.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-27 18:30:54 +00:00
Folke Behrens
5c41707bee proxy: promote two logs to error, fix multiline log (#9913)
* Promote two logs from mpsc send errors to error level. The channels
are unbounded and there shouldn't be errors.
* Fix one multiline log from anyhow::Error. Use Debug instead of
Display.
2024-11-27 18:05:46 +00:00
Erik Grinaker
cc37fa0f33 pageserver: add metrics for unknown ClearVmBits pages (#9911)
## Problem

When ingesting implicit `ClearVmBits` operations, we silently drop the
writes if the relation or page is unknown. There are implicit
assumptions around VM pages wrt. explicit/implicit updates, sharding,
and relation sizes, which can possibly drop writes incorrectly. Adding a
few metrics will allow us to investigate further and tighten up the
logic.

Touches #9855.

## Summary of changes

Add a `pageserver_wal_ingest_clear_vm_bits_unknown` metric to record
dropped `ClearVmBits` writes.

Also add comments clarifying the behavior of relation sizes on non-zero
shards.
2024-11-27 17:16:41 +00:00
Alex Chi Z.
23f5a27146 fix(storage-scrubber): valid layermap error degrades to warning (#9902)
Valid layer assumption is a necessary condition for a layer map to be
valid. It's a stronger check imposed by gc-compaction than the actual
valid layermap definition. Actually, the system can work as long as
there are no overlapping layer maps. Therefore, we degrade that into a
warning.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-27 16:07:39 +00:00
Erik Grinaker
e4f437a354 pageserver: add relsize cache metrics (#9890)
## Problem

We don't have any observability for the relation size cache. We have
seen cache misses cause significant performance impact with high
relation counts.

Touches #9855.

## Summary of changes

Adds the following metrics:

* `pageserver_relsize_cache_entries`
* `pageserver_relsize_cache_hits`
* `pageserver_relsize_cache_misses`
* `pageserver_relsize_cache_misses_old`
2024-11-27 13:54:14 +00:00
Vlad Lazar
8fdf786217 pageserver: add tenant config override for wal receiver proto (#9888)
## Problem

Can't change protocol at tenant granularity.

## Summary of changes

Add tenant config level override for wal receiver protocol.

## Links

Related: https://github.com/neondatabase/neon/issues/9336
Epic: https://github.com/neondatabase/neon/issues/9329
2024-11-27 13:46:23 +00:00
Vlad Lazar
9e0148de11 safekeeper: use protobuf for sending compressed records to pageserver (#9821)
## Problem

https://github.com/neondatabase/neon/pull/9746 lifted decoding and
interpretation of WAL to the safekeeper.
This reduced the ingested amount on the pageservers by around 10x for a
tenant with 8 shards, but doubled
the ingested amount for single sharded tenants.

Also, https://github.com/neondatabase/neon/pull/9746 uses bincode which
doesn't support schema evolution.
Technically the schema can be evolved, but it's very cumbersome.

## Summary of changes

This patch set addresses both problems by adding protobuf support for
the interpreted wal records and adding compression support. Compressed
protobuf reduced the ingested amount by 100x on the 32 shards
`test_sharded_ingest` case (compared to non-interpreted proto). For the
1 shard case the reduction is 5x.

Sister change to `rust-postgres` is
[here](https://github.com/neondatabase/rust-postgres/pull/33).

## Links

Related: https://github.com/neondatabase/neon/issues/9336
Epic: https://github.com/neondatabase/neon/issues/9329
2024-11-27 12:12:21 +00:00
Alexander Bayandin
7b41ee872e CI(pre-merge-checks): build only one build-tools-image (#9718)
## Problem

The `pre-merge-checks` workflow relies on the build-tools image. 
If changes to the `build-tools` image have been merged into the main
branch since the last CI run for a PR (with other changes to the
`build-tools`), the image will be rebuilt during the merge queue run.
Otherwise, cached images are used.
Rebuilding the image adds approximately 10 minutes on x86-64 and 20
minutes on arm64 to the process.

## Summary of changes
- parametrise `build-build-tools-image` job with arch and Debian version
- Run `pre-merge-checks` only on Debian 12 x86-64 image
2024-11-27 10:42:26 +00:00
Peter Bendel
277c33ba3f ingest benchmark: after effective_io_concurrency = 100 we can increase compute side parallelism (#9904)
## Problem

ingest benchmark tests project migration to Neon involving steps
- COPY relation data
- create indexes
- create constraints

Previously we used only 4 copy jobs, 4 create index jobs and 7
maintenance workers. After increasing effective_io_concurrency on
compute we see that we can sustain more parallelism in the ingest bench

## Summary of changes

Increase copy jobs to 8, create index jobs to 8 and maintenance workers
to 16
2024-11-27 10:09:01 +00:00
Tristan Partin
2b788cb53f Bump neon.logical_replication_max_snap_files default to 10000 (#9896)
This bump comes from a recommendation from Chi.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-26 17:49:37 +00:00
Peter Bendel
13feda0669 track how much time the flush loop is stalled waiting for uploads (#9885)
## Problem

We don't know how much time PS is losing during ingest when waiting for
remote storage uploads in the flush frozen layer loop.
Also we don't know how many remote storage requests get an permit
without waiting (not throttled by remote_storage concurrency_limit).

## Summary of changes

- Add a metric that accumulates the time waited per shard/PS
- in [remote storage semaphore wait
seconds](https://neonprod.grafana.net/d/febd9732-9bcf-4992-a821-49b1f6b02724/remote-storage?orgId=1&var-datasource=HUNg6jvVk&var-instance=pageserver-26.us-east-2.aws.neon.build&var-instance=pageserver-27.us-east-2.aws.neon.build&var-instance=pageserver-28.us-east-2.aws.neon.build&var-instance=pageserver-29.us-east-2.aws.neon.build&var-instance=pageserver-30.us-east-2.aws.neon.build&var-instance=pageserver-31.us-east-2.aws.neon.build&var-instance=pageserver-36.us-east-2.aws.neon.build&var-instance=pageserver-37.us-east-2.aws.neon.build&var-instance=pageserver-38.us-east-2.aws.neon.build&var-instance=pageserver-39.us-east-2.aws.neon.build&var-instance=pageserver-40.us-east-2.aws.neon.build&var-instance=pageserver-41.us-east-2.aws.neon.build&var-request_type=put_object&from=1731961336340&to=1731964762933&viewPanel=3)
add a first bucket with 100 microseconds to count requests that do not
need to wait on semaphore

Update: created a new version that uses a Gauge (one increasing value
per PS/shard) instead of histogram as suggested by review
2024-11-26 11:46:58 +00:00
Conrad Ludgate
96a1b71c84 chore(proxy): discard request context span during passthrough (#9882)
## Problem

The RequestContext::span shouldn't live for the entire postgres
connection, only the handshake.

## Summary of changes

* Slight refactor to the RequestContext to discard the span upon
handshake completion.
* Make sure the temporary future for the handshake is dropped (not bound
to a variable)
* Runs our nightly fmt script
2024-11-25 21:32:53 +00:00
Arpad Müller
a74ab9338d fast_import: remove hardcoding of pg_version (#9878)
Before, we hardcoded the pg_version to 140000, while the code expected
version numbers like 14. Now we use an enum, and code from
`extension_server.rs` to auto-detect the correct version. The enum helps
when we add support for a version: enums ensure that compilation fails
if one forgets to put the version to one of the `match` locations.

cc https://github.com/neondatabase/neon/pull/9218
2024-11-25 20:23:42 +00:00
Folke Behrens
7404887b81 proxy: Demote errors from cplane request routines to debug (#9886)
## Problem

Any errors from these async blocks are unconditionally logged at error
level
even though we already handle such errors based on context.

## Summary of changes

* Log raw errors from creating and executing cplane requests at debug
level.
* Inline macro calls to retain the correct callsite.
2024-11-25 19:35:32 +00:00
Folke Behrens
87e4dd23a1 proxy: Demote all cplane error replies to info log level (#9880)
## Problem

The vast majority of the error/warn logs from cplane are about time or
data transfer quotas exceeded or endpoint-not-found errors and not
operational errors in proxy or cplane.

## Summary of changes

* Demote cplane error replies to info level.
* Raise other errors from warn back to error.
2024-11-25 17:53:26 +00:00
Vlad Lazar
7a2f0ed8d4 safekeeper: lift decoding and interpretation of WAL to the safekeeper (#9746)
## Problem

For any given tenant shard, pageservers receive all of the tenant's WAL
from the safekeeper.
This soft-blocks us from using larger shard counts due to bandwidth
concerns and CPU overhead of filtering
out the records.

## Summary of changes

This PR lifts the decoding and interpretation of WAL from the pageserver
into the safekeeper.

A customised PG replication protocol is used where instead of sending
raw WAL, the safekeeper sends
filtered, interpreted records. The receiver drives the protocol
selection, so, on the pageserver side, usage
of the new protocol is gated by a new pageserver config:
`wal_receiver_protocol`.

 More granularly the changes are:
1. Optionally inject the protocol and shard identity into the arguments
used for starting replication
2. On the safekeeper side, implement a new wal sending primitive which
decodes and interprets records
 before sending them over
3. On the pageserver side, implement the ingestion of this new
replication message type. It's very similar
 to what we already have for raw wal (minus decoding and interpreting).
 
 ## Notes
 
* This PR currently uses my [branch of
rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support)
which includes the deserialization logic for the new replication message
type. PR for that is open
[here](https://github.com/neondatabase/rust-postgres/pull/32).
* This PR contains changes for both pageservers and safekeepers. It's
safe to merge because the new protocol is disabled by default on the
pageserver side. We can gradually start enabling it in subsequent
releases.
* CI tests are running on https://github.com/neondatabase/neon/pull/9747
 
 ## Links
 
 Related: https://github.com/neondatabase/neon/issues/9336
 Epic: https://github.com/neondatabase/neon/issues/9329
2024-11-25 17:29:28 +00:00
Christian Schwarz
5c2356988e page_service: add benchmark for batching (#9820)
This PR adds two benchmark to demonstrate the effect of server-side
getpage request batching added in
https://github.com/neondatabase/neon/pull/9321.

For the CPU usage, I found the the `prometheus` crate's built-in CPU
usage accounts the seconds at integer granularity. That's not enough you
reduce the target benchmark runtime for local iteration. So, add a new
`libmetrics` metric and report that.

The benchmarks are disabled because [on our benchmark nodes, timer
resolution isn't high
enough](https://neondb.slack.com/archives/C059ZC138NR/p1732264223207449).
They work (no statement about quality) on my bare-metal devbox.

They will be refined and enabled once we find a fix. Candidates at time
of writing are:
- https://github.com/neondatabase/neon/pull/9822
- https://github.com/neondatabase/neon/pull/9851


Refs:

- Epic: https://github.com/neondatabase/neon/issues/9376
- Extracted from https://github.com/neondatabase/neon/pull/9792
2024-11-25 15:52:39 +00:00
Konstantin Knizhnik
441612c1ce Prefetch on macos (#9875)
## Problem

Prefetch is disabled at MacODS because `posix_fadvise` is not available.
But Neon prefetch is not using this function and for testing at MacOS is
it very convenient that prefetch is available.

## Summary of changes

Define `USE_PREFETCH` in Makefile.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-25 15:21:52 +00:00
Arpad Müller
77630e5408 Address beta clippy lint needless_lifetimes (#9877)
The 1.82.0 version of Rust will be stable soon, let's get the clippy
lint fixes in before the compiler version upgrade.
2024-11-25 14:59:12 +00:00
Alexander Bayandin
3d380acbd1 Bump default Debian version to Bookworm everywhere (#9863)
## Problem

We have a couple of CI workflows that still run on Debian Bullseye, and
the default Debian version in images is Bullseye as well (we explicitly
set building on Bookworm)

## Summary of changes
- Run `pgbench-pgvector` on Bookworm (fix a couple of packages)
- Run `trigger_bench_on_ec2_machine_in_eu_central_1` on Bookworm
- Change default `DEBIAN_VERSION` in Dockerfiles to Bookworm
- Make `pinned` docker tag an alias to `pinned-bookworm`
2024-11-25 14:43:32 +00:00
Alex Chi Z.
4630b70962 fix(pageserver): ensure all layers are flushed before measuring RSS (#9861)
## Problem

close https://github.com/neondatabase/neon/issues/9761

The test assumed that no new L0 layers are flushed throughout the
process, which is not true.

## Summary of changes

Fix the test case `test_compaction_l0_memory` by flushing in-memory
layers before compaction.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-25 14:25:18 +00:00
Conrad Ludgate
6f6749c4a9 chore: update rustls (#9871) 2024-11-25 12:01:30 +00:00
Folke Behrens
0d1e82f0a7 Bump futures-* crates, drop unused license, hide duplicate crate warnings (#9858)
* The futures-util crate we use was yanked. Bump it and its siblings to
new patch release.
https://github.com/rust-lang/futures-rs/releases/tag/0.3.31
* cargo-deny: Drop an unused license.
* cargo-deny: Don't warn about duplicate crate. Duplicate crates are
unavoidable and the noise just hides real warnings.
2024-11-25 10:59:49 +00:00
Alexander Bayandin
6f7aeaa1c5 test_runner: use LFC by default (#8613)
## Problem
LFC is not enabled by default in tests, but it is enabled in production.
This increases the risk of errors in the production environment, which
were not found during the routine workflow.
However, enabling LFC for all the tests may overload the disk on our
servers and increase the number of failures.
So, we try enabling  LFC in one case to evaluate the possible risk.

## Summary of changes
A new environment variable, USE_LFC is introduced. If it is set to true,
LFC is enabled by default in all the tests.
In our workflow, we enable LFC for PG17, release, x86-64, and disabled
for all other combinations.

---------

Co-authored-by: Alexey Masterov <alexeymasterov@neon.tech>
Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>
2024-11-25 09:01:05 +00:00
Christian Schwarz
450be26bbb fast imports: initial Importer and Storage changes (#9218)
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Stas Kelvic <stas@neon.tech>

# Context

This PR contains PoC-level changes for a product feature that allows
onboarding large databases into Neon without going through the regular
data path.

# Changes

This internal RFC provides all the context
* https://github.com/neondatabase/cloud/pull/19799

In the language of the RFC, this PR covers

* the Importer code (`fast_import`) 
* all the Pageserver changes (mgmt API changes, flow implementation,
etc)
* a basic test for the Pageserver changes

# Reviewing

As acknowledged in the RFC, the code added in this PR is not ready for
general availability.
Also, the **architecture is not to be discussed in this PR**, but in the
RFC and associated Slack channel instead.

Reviewers of this PR should take that into consideration.
The quality bar to apply during review depends on what area of the code
is being reviewed:

* Importer code (`fast_import`): practically anything goes
* Core flow (`flow.rs`):
* Malicious input data must be expected and the existing threat models
apply.
* The code must not be safe to execute on *dedicated* Pageserver
instances:
* This means in particular that tenants *on other* Pageserver instances
must not be affected negatively wrt data confidentiality, integrity or
availability.
* Other code: the usual quality bar
* Pay special attention to correct use of gate guards, timeline
cancellation in all places during shutdown & migration, etc.
* Consider the broader system impact; if you find potentially
problematic interactions with Storage features that were not covered in
the RFC, bring that up during the review.

I recommend submitting three separate reviews, for the three high-level
areas with different quality bars.


# References

(Internal-only)

* refs https://github.com/neondatabase/cloud/issues/17507
* refs https://github.com/neondatabase/company_projects/issues/293
* refs https://github.com/neondatabase/company_projects/issues/309
* refs https://github.com/neondatabase/cloud/issues/20646

---------

Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
2024-11-22 22:47:06 +00:00
Anastasia Lubennikova
3245f7b88d Rename 'installed_extensions' metric to 'compute_installed_extensions' (#9759)
to keep it consistent with existing compute metrics.

flux-fleet change is not needed, because it doesn't have any filter by
metric name for compute metrics.
2024-11-22 19:27:04 +00:00
Alex Chi Z.
c1937d073f fix(pageserver): ensure upload happens after delete (#9844)
## Problem

Follow up of https://github.com/neondatabase/neon/pull/9682, that patch
didn't fully address the problem: what if shutdown fails due to whatever
reason and then we reattach the tenant? Then we will still remove the
future layer. The underlying problem is that the fix for #5878 gets
voided because of the generation optimizations.

Of course, we also need to ensure that delete happens after uploads, but
note that we only schedule deletes when there are no ongoing upload
tasks, so that's fine.

## Summary of changes

* Add a test case to reproduce the behavior (by changing the original
test case to attach the same generation).
* If layer upload happens after the deletion, drain the deletion queue
before uploading.
* If blocked_deletion is enabled, directly remove it from the
blocked_deletion queue.
* Local fs backend fix to avoid race between deletion and preload.
* test_emergency_mode does not need to wait for uploads (and it's
generally not possible to wait for uploads).
* ~~Optimize deletion executor to skip validation if there are no files
to delete.~~ this doesn't work

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-22 18:30:53 +00:00
Alex Chi Z.
6f8b1eb5a6 test(pageserver): add detach ancestor smoke test (#9842)
## Problem

Follow up to https://github.com/neondatabase/neon/pull/9682, hopefully
we can detect some issues or assure ourselves that this is ready for
production.

## Summary of changes

* Add a compaction-detach-ancestor smoke test.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-22 18:21:51 +00:00
Erik Grinaker
e939d36dd4 safekeeper,pageserver: fix CPU profiling allowlists (#9856)
## Problem

The HTTP router allowlists matched both on the path and the query
string. This meant that only `/profile/cpu` would be allowed without
auth, while `/profile/cpu?format=svg` would require auth.

Follows #9764.

## Summary of changes

* Match allowlists on URI path, rather than the entire URI.
* Fix the allowlist for Safekeeper to use `/profile/cpu` rather than the
old `/pprof/profile`.
* Just use a constant slice for the allowlist; it's only a handful of
items, and these handlers are not on hot paths.
2024-11-22 17:50:33 +00:00
Alex Chi Z.
211e4174d2 fix(pageserver): preempt and retry azure list operation (#9840)
## Problem

close https://github.com/neondatabase/neon/issues/9836

Looking at Azure SDK, the only related issue I can find is
https://github.com/azure/azure-sdk-for-rust/issues/1549. Azure uses
reqwest as the backend, so I assume there's some underlying magic
unknown to us that might have caused the stuck in #9836.

The observation is:
* We didn't get an explicit out of resource HTTP error from Azure.
* The connection simply gets stuck and times out.
* But when we retry after we reach the timeout, it succeeds.

This issue is hard to identify -- maybe something went wrong at the ABS
side, or something wrong with our side. But we know that a retry will
usually succeed if we give up the stuck connection.

Therefore, I propose the fix that we preempt stuck HTTP operation and
actively retry. This would mitigate the problem, while in the long run,
we need to keep an eye on ABS usage and see if we can fully resolve this
problem.

The reasoning of such timeout mechanism: we use a much smaller timeout
than before to preempt, while it is possible that a normal listing
operation would take a longer time than the initial timeout if it
contains a lot of keys. Therefore, after we terminate the connection, we
should double the timeout, so that such requests would eventually
succeed.

## Summary of changes

* Use exponential growth for ABS list timeout.
* Rather than using a fixed timeout, use a timeout that starts small and
grows
* Rather than exposing timeouts to the list_streaming caller as soon as
we see them, only do so after we have retried a few times

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-22 17:50:00 +00:00
Ivan Efremov
3b1ac8b14a proxy: Implement cancellation rate limiting (#9739)
Implement cancellation rate limiting and ip allowlist checks. Add
ip_allowlist to the cancel closure

Fixes [#16456](https://github.com/neondatabase/cloud/issues/16456)
2024-11-22 16:46:38 +00:00
Alexander Bayandin
b3b579b45e test_bulk_insert: fix typing for PgVersion (#9854)
## Problem

Along with the migration to Python 3.11, I switched `C(str, Enum)` with
`C(StrEnum)`; one such example is the `PgVersion` enum.
It required more changes in `PgVersion` itself (before, it accepted both
`str` and `int`, and after it, it supports only `str`), which caused the
`test_bulk_insert` test to fail.

## Summary of changes
- `test_bulk_insert`: explicitly cast pg_version from `timeline_detail`
to str
2024-11-22 16:13:53 +00:00
Conrad Ludgate
8ab96cc71f chore(proxy/jwks): reduce the rightward drift of jwks renewal (#9853)
I found the rightward drift of the `renew_jwks` function hard to review.

This PR splits out some major logic and uses early returns to make the
happy path more linear.
2024-11-22 14:51:32 +00:00
Alexander Bayandin
51d26a261b build(deps): bump mypy from 1.3.0 to 1.13.0 (#9670)
## Problem
We use a pretty old version of `mypy` 1.3 (released 1.5 years ago), it
produces false positives for `typing.Self`.

## Summary of changes
- Bump `mypy` from 1.3 to 1.13
- Fix new warnings and errors
- Use `typing.Self` whenever we `return self`
2024-11-22 14:31:36 +00:00
Tristan Partin
c10b7f7de9 Write a newline after adding dynamic_shared_memory_type to PG conf (#9843)
Without adding a newline, we can end up with a conf line that looks like
the following:

dynamic_shared_memory_type = mmap# Managed by compute_ctl: begin

This leads to Postgres logging:

LOG: configuration file
"/var/db/postgres/compute/pgdata/postgresql.conf" contains errors;
unaffected changes were applied

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-22 13:37:06 +00:00
Heikki Linnakangas
7372312a73 Avoid unnecessary send_replace calls in seqwait (#9852)
The notifications need to be sent whenever the waiters heap changes, per
the comment in `update_status`. But if 'advance' is called when there
are no waiters, or the new LSN is lower than the waiters so that no one
needs to be woken up, there's no need to send notifications. This saves
some CPU cycles in the common case that there are no waiters.
2024-11-22 13:29:49 +00:00
John Spray
d9de65ee8f pageserver: permit reads behind GC cutoff during LSN grace period (#9833)
## Problem

In https://github.com/neondatabase/neon/issues/9754 and the flakiness of
`test_readonly_node_gc`, we saw that although our logic for controlling
GC was sound, the validation of getpage requests was not, because it
could not consider LSN leases when requests arrived shortly after
restart.

Closes https://github.com/neondatabase/neon/issues/9754

## Summary of changes

This is the "Option 3" discussed verbally -- rather than holding back gc
cutoff, we waive the usual validation of request LSN if we are still
waiting for leases to be sent after startup

- When validating LSN in `wait_or_get_last_lsn`, skip the validation
relative to GC cutoff if the timeline is still in its LSN lease grace
period
- Re-enable test_readonly_node_gc
2024-11-22 09:24:23 +00:00
Fedor Dikarev
83b73fc24e Batch scrape workflows up to last 30 days and stop ad-hoc (#9846)
Comparing Batch and Ad-hoc collectors there is no big difference, just
we need scrape for longer duration to catch retries.
Dashboard with comparison:

https://neonprod.grafana.net/d/be3pjm7c9ne2oe/compare-ad-hoc-and-batch?orgId=1&from=1731345095814&to=1731946295814

I should anyway raise support case with Github relating to that,
meanwhile that should be working solution and should save us some cost,
so it worths to switch to Batch now.

Ref: https://github.com/neondatabase/cloud/issues/17503
2024-11-22 09:06:00 +00:00
Peter Bendel
1e05e3a6e2 minor PostgreSQL update in benchmarking (#9845)
## Problem

in benchmarking.yml job pgvector we install postgres from deb packages.
After the minor postgres update the referenced packages no longer exist

[Failing job:
](https://github.com/neondatabase/neon/actions/runs/11965785323/job/33360391115#step:4:41)

## Summary of changes

Reference and install the updated packages.

[Successful job after this
fix](https://github.com/neondatabase/neon/actions/runs/11967959920/job/33366011934#step:4:45)
2024-11-22 08:31:54 +00:00
Tristan Partin
37962e729e Fix panic in compute_ctl metrics collection (#9831)
Calling unwrap on the encoder is a little overzealous. One of the errors
that can be returned by the encode function in particular is the
non-existence of metrics for a metric family, so we should prematurely
filter instances like that out. I believe that the cause of this panic
was caused by a race condition between the prometheus collector and the
compute collecting the installed extensions metric for the first time.
The HTTP server is spawned on a separate thread before we even start
bringing up Postgres.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-21 20:19:02 +00:00
Erik Grinaker
190e8cebac safekeeper,pageserver: add CPU profiling (#9764)
## Problem

We don't have a convenient way to gather CPU profiles from a running
binary, e.g. during production incidents or end-to-end benchmarks, nor
during microbenchmarks (particularly on macOS).

We would also like to have continuous profiling in production, likely
using [Grafana Cloud
Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/).
We may choose to use either eBPF profiles or pprof profiles for this
(pending testing and discussion with SREs), but pprof profiles appear
useful regardless for the reasons listed above. See
https://github.com/neondatabase/cloud/issues/14888.

This PR is intended as a proof of concept, to try it out in staging and
drive further discussions about profiling more broadly.

Touches #9534.
Touches https://github.com/neondatabase/cloud/issues/14888.

## Summary of changes

Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns
it. Defaults to a 5-second pprof Protobuf profile for use with e.g.
`pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query
parameters:

* `format`: output format (`pprof` or `svg`)
* `frequency`: sampling frequency in microseconds (default 100)
* `seconds`: number of seconds to profile (default 5)

Also integrates pprof profiles into Criterion benchmarks, such that
flamegraph reports can be taken with `cargo bench ... --profile-duration
<seconds>`. Output under `target/criterion/*/profile/flamegraph.svg`.

Example profiles:

* pprof profile (use [`pprof`](https://github.com/google/pprof)):
[profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz)
  * Web interface: `pprof -http :6060 profile.pb.gz`
* Interactive flamegraph:
[profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)
2024-11-21 18:59:46 +00:00
Conrad Ludgate
725a5ff003 fix(proxy): CancelKeyData display log masking (#9838)
Fixes the masking for the CancelKeyData display format. Due to negative
i32 cast to u64, the top-bits all had `0xffffffff` prefix. On the
bitwise-or that followed, these took priority.

This PR also compresses 3 logs during sql-over-http into 1 log with
durations as label fields, as prior discussed.
2024-11-21 16:46:30 +00:00
Alexander Bayandin
8d1c44039e Python 3.11 (#9515)
## Problem

On Debian 12 (Bookworm), Python 3.11 is the latest available version.

## Summary of changes
- Update Python to 3.11 in build-tools
- Fix ruff check / format
- Fix mypy
- Use `StrEnum` instead of pair `str`, `Enum`
- Update docs
2024-11-21 16:25:31 +00:00
Konstantin Knizhnik
0713ff3176 Bump Postgres version (#9808)
## Problem

I have made a mistake in merging Postgre PRs

## Summary of changes

Restore consistency of submodule referenced.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-21 14:56:56 +00:00
John Spray
42bda5d632 pageserver: revise metrics lifetime for SecondaryTenant (#9818)
## Problem

We saw a scale test failure when one shard went
secondary->attached->secondary in a short period of time -- the metrics
for the shard failed a validation assertion that is meant to ensure the
size metric matches the sum of layer sizes in the SecondaryDetail
struct.

This appears to be due to two SecondaryTenants being alive at the same
time -- the first one was shut down but still had its contributions to
the metrics.

Closes: https://github.com/neondatabase/neon/issues/9628

## Summary of changes

- Refactor code for validating metrics and call it in shutdown as well
as during downloads
- Move code for dropping per-tenant secondary metrics from drop() into
shutdown(), so that once shutdown() completes it is definitely safe to
instantiate another SecondaryTenant for the same tenant.
2024-11-21 08:31:24 +00:00
Ivan Efremov
995e729ebe Merge pull request #9832 from neondatabase/rc/release-proxy/2024-11-21
Proxy release 2024-11-21
2024-11-21 09:41:31 +02:00
github-actions[bot]
76077e1ddf Proxy release 2024-11-21 2024-11-21 06:02:11 +00:00
Arpad Müller
59c2c3f8ad compute_ctl: print OpenTelemetry errors via tracing, not stdout (#9830)
Before, `OpenTelemetry` errors were printed to stdout/stderr directly,
causing one of the few log lines without a timestamp, like:

```
OpenTelemetry trace error occurred. error sending request for url (http://localhost:4318/v1/traces)
```

Now, we print:

```
2024-11-21T02:24:20.511160Z  INFO OpenTelemetry error: error sending request for url (http://localhost:4318/v1/traces)
```

I found this while investigating #9731.
2024-11-21 04:46:01 +00:00
Ivan Efremov
2d6bf176a0 proxy: Refactor http conn pool (#9785)
- Use the same ConnPoolEntry for http connection pool.
- Rename EndpointConnPool to the HttpConnPool.
- Narrow clone bound for client

Fixes #9284
2024-11-20 19:36:29 +00:00
Vadim Kharitonov
313ebfdb88 [proxy] chore: allow bypassing empty params to /sql endpoint (#9827)
## Problem

```
curl -H "Neon-Connection-String: postgresql://neondb_owner:PASSWORD@ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/neondb?sslmode=require" https://ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/sql -d '{"query":"SELECT 1","params":[]}'
```

For such a query, I also need to send `params`. Do I really need it?

## Summary of changes
I've marked `params` as optional
2024-11-20 19:36:23 +00:00
Arpad Müller
811fab136f scrubber: allow restricting find_garbage to a partial tenant id prefix (#9814)
Adds support to the `find_garbage` command to restrict itself to a
partial tenant ID prefix, say `a`, and then it only traverses tenants
with IDs starting with `a`. One can now pass the `--tenant-id-prefix`
parameter.

That way, one can shard the `find_garbage` command and make it run in
parallel.

The PR also does a change of how `remote_storage` first removes trailing
`/`s, only to then add them in the listing function. It turns out that
this isn't neccessary and it prevents the prefix functionality from
working. S3 doesn't do this either.
2024-11-20 19:31:02 +00:00
Vlad Lazar
ee26f09e45 pageserver: remove shard split hard link assertion (#9829)
## Problem

We were hitting this assertion in debug mode tests sometimes.

This case was being hit when the parent shard has no resident layers.
For instance, this is the case on split retry where the previous attempt
shut-down the parent and deleted local state for it. If the logical size
calculation does not download some layers before we get to the
hardlinking, then the assertion is hit.

## Summary of Changes

Remove the assertion. It's fine for the ancestor to not have any
resident layers at the time of the split.

Closes https://github.com/neondatabase/neon/issues/9412
2024-11-20 18:33:05 +00:00
Conrad Ludgate
f36f0068b8 chore(proxy): demote more logs during successful connection attempts (#9828)
Follow up to #9803 

See https://github.com/neondatabase/cloud/issues/14378

In collaboration with @cloneable and @awarus, we sifted through logs and
simply demoted some logs to debug. This is not at all finished and there
are more logs to review, but we ran out of time in the session we
organised. In any slightly more nuanced cases, we didn't touch the log,
instead leaving a TODO comment.

I've also slightly refactored the sql-over-http body read/length reject
code. I can split that into a separate PR. It just felt natural after I
switched to `read_body_with_limit` as we discussed during the meet.
2024-11-20 17:50:39 +00:00
John Spray
5ff2f1ee7d pageserver: enable compaction to proceed while live-migrating (#5397)
## Problem

Long ago, in #5299 the tenant states for migration are added, but
respected only in a coarse-grained way: when hinted not to do deletions,
tenants will just avoid doing all GC or compaction.

Skipping compaction is not necessary for AttachedMulti, as we will soon
become the primary attached location, and it is not a waste of resources
to proceed with compaction. Instead, per the RFC
https://github.com/neondatabase/neon/pull/5029/files), deletions should
be queued up in this state, and executed later when we switch to
AttachedSingle.

Avoiding compaction in AttachedMulti can have an operational impact if a
tenant is under significant write load, as a long-running migration can
result in a large accumulation of delta layers with commensurate impact
on read latency.

Closes: https://github.com/neondatabase/neon/issues/5396

## Summary of changes

- Add a 'config' part to RemoteTimelineClient so that it can be aware of
the mode of the tenant it belongs to, and wire this through for
construction + updates
- Add a special buffer for delayed deletions, and when in AttachedMulti
route deletions here instead of into the main remote client queue. This
is drained when transitioning to AttachedSingle. If the tenant is
detached or our process dies before then, then these objects are leaked.
- As a quality of life improvement, also use the remote timeline
client's knowledge of the tenant state to avoid submitting remote
consistent LSN updates for validation when in AttachedStale (as we know
these will fail)

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-11-20 17:31:55 +00:00
John Spray
67f5f83edc pageserver: avoid reading SLRU blocks for GC on shards >0 (#9423)
## Problem

SLRU blocks, which can add up to several gigabytes, are currently
ingested by all shards, multiplying their capacity cost by the shard
count and slowing down ingest. We do this because all shards need the
SLRU pages to do timestamp->LSN lookup for GC.

Related: https://github.com/neondatabase/neon/issues/7512

## Summary of changes

- On non-zero shards, learn the GC offset from shard 0's index instead
of calculating it.
- Add a test `test_sharding_gc` that exercises this
- Do GC in test_pg_regress as a general smoke test that GC functions run
(e.g. this would fail if we were using SLRUs we didn't have)

In this PR we are still ingesting SLRUs everywhere, but not using them
any more. Part 2 PR (https://github.com/neondatabase/neon/pull/9786)
makes the change to not store them at all.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-11-20 15:56:14 +00:00
John Spray
593e35027a tests: use fewer pageservers in test_sharding_split_smoke (#9804)
## Problem

This test uses a gratuitous number of pageservers (16). This works fine
when there are plenty of system resources, but causes issues on test
runners that have limited resources and run many tests concurrently.

Related: https://github.com/neondatabase/neon/issues/9802

## Summary of changes

- Split from 2 shards to 4, instead of 4 to 8
- Don't give every shard a separate pageserver, let two locations share
each pageserver.

Net result is 4 pageservers instead of 16
2024-11-20 14:57:59 +00:00
Folke Behrens
bf7d859a8b proxy: Rename RequestMonitoring to RequestContext (#9805)
## Problem

It is called context/ctx everywhere and the Monitoring suffix needlessly
confuses with proper monitoring code.

## Summary of changes

* Rename RequestMonitoring to RequestContext
* Rename RequestMonitoringInner to RequestContextInner
2024-11-20 12:50:36 +00:00
Alexander Bayandin
899933e159 scan_log_for_errors: check that regex is correct (#9815)
## Problem

I've noticed that we have 2 flaky tests which failed with error:
```
re.error: missing ), unterminated subpattern at position 21
```

- `test_timeline_archival_chaos` — has been already fixed 
- `test_sharded_tad_interleaved_after_partial_success` — I didn't manage
to find the incorrect regex

[Internal link](https://neonprod.grafana.net/goto/yfmVHV7NR?orgId=1) 

## Summary of changes
- Wrap `re.match` in `try..except` block and print incorrect regex
2024-11-20 12:48:21 +00:00
Alexander Bayandin
46beecacce CI(benchmarking): route test failures to on-call-qa-staging-stream (#9813)
## Problem

We want to keep `#on-call-staging-stream` channel close to the prod one
and redirect notifications from failing benchmarks to another channel
for investigation.

## Summary of changes
- Send notifications regarding failures in `benchmarking` job to
`#on-call-staging-stream`
- Send notifications regarding failures in `periodic_pagebench` job to
`#on-call-staging-stream`
2024-11-20 12:23:41 +00:00
Fedor Dikarev
94e4a0e2a0 update macos version for runner (#9817)
Closes: https://github.com/neondatabase/neon/issues/9816

Run MacOs builds on `macos-15`.
As `pkg-config` is bundled in runner image, don't install it with `brew`
2024-11-20 13:04:14 +01:00
John Spray
33dce25af8 safekeeper: block deletion on protocol handler shutdown (#9364)
## Problem

Two recently observed log errors indicate safekeeper tasks for a
timeline running after that timeline's deletion has started.
- https://github.com/neondatabase/neon/issues/8972
- https://github.com/neondatabase/neon/issues/8974

These code paths do not have a mechanism that coordinates task shutdown
with the overall shutdown of the timeline.

## Summary of changes

- Add a `Gate` to `Timeline`
- Take the gate as part of resident timeline guard: any code that holds
a guard over a timeline staying resident should also hold a guard over
the timeline's total lifetime.
- Take the gate from the wal removal task
- Respect Timeline::cancel in WAL send/recv code, so that we do not
block shutdown indefinitely.
- Add a test that deletes timelines with open pageserver+compute
connections, to check these get torn down as expected.

There is some risk to introducing gates: if there is code holding a gate
which does not properly respect a cancellation token, it can cause
shutdown hangs. The risk of this for safekeepers is lower in practice
than it is for other services, because in a healthy timeline deletion,
the compute is shutdown first, then the timeline is deleted on the
pageserver, and finally it is deleted on the safekeepers -- that makes
it much less likely that some protocol handler will still be running.

Closes: #8972
Closes: #8974
2024-11-20 11:07:45 +00:00
Conrad Ludgate
3ae0b2149e chore(proxy): demote a ton of logs for successful connection attempts (#9803)
See https://github.com/neondatabase/cloud/issues/14378

In collaboration with @cloneable and @awarus, we sifted through logs and
simply demoted some logs to debug. This is not at all finished and there
are more logs to review, but we ran out of time in the session we
organised. In any slightly more nuanced cases, we didn't touch the log,
instead leaving a TODO comment.
2024-11-20 10:14:28 +00:00
Arpad Müller
0a499a3176 Don't preload offloaded timelines (#9646)
In timeline preloading, we also do a preload for offloaded timelines.
This includes the download of `index-part.json`. Ultimately, such a
download is wasteful, therefore avoid it. Same goes for the remote
client, we just discard it immediately thereafter.

Part of #8088

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-11-20 05:44:23 +00:00
Matthias van de Meent
ea1858e3b6 compute_ctl: Streamline and Pipeline startup SQL (#9717)
Before, compute_ctl didn't have a good registry for what command would
run when, depending exclusively on sync code to apply changes. When
users have many databases/roles to manage, this step can take a
substantial amount of time, breaking assumptions about low (re)start
times in other systems.

This commit reduces the time compute_ctl takes to restart when changes
must be applied, by making all commands more or less blind writes, and
applying these commands in an asynchronous context, only waiting for
completion once we know the commands have all been sent.

Additionally, this reduces time spent by batching per-database
operations where previously we would create a new SQL connection for
every user-database operation we planned to execute.
2024-11-20 02:14:58 +01:00
Alexander Bayandin
2281a02c49 CODEOWNERS: add developer-productivity team (#9810)
Notify @neondatabase/developer-productivity team about changes in CI
(i.e. in `.github/` directory)
2024-11-20 00:30:24 +00:00
Alexander Bayandin
725e0a1ac9 CI(release): create reusable workflow for releases (#9806)
## Problem

We have a bunch of duplicated code for automated releases. There will be
even more, once we have `release-compute` branch
(https://github.com/neondatabase/neon/pull/9637).

Another issue with the current `release` workflow is that it creates a
PR from the main as is. If we create 2 different releases from the
same commit, GitHub could mix up results from different PRs.

## Summary of changes
- Create a reusable workflow for releases
- Create an empty commit to differentiate releases
2024-11-19 23:03:15 +00:00
Konstantin Knizhnik
770ac34ae6 Register custom xlog reader callbacks for on-demand WAL download in StartupDecodingContext (#9007)
## Problem

See https://github.com/neondatabase/neon/issues/8931
On-demand WAL download are not set in all cases where WAL is accessed by
logical replication

## Summary of changes

Set customer xlog reader handles in StartupDecodingContext

Related changes in Postgres modules:

https://github.com/neondatabase/postgres/pull/495
https://github.com/neondatabase/postgres/pull/496
https://github.com/neondatabase/postgres/pull/497
https://github.com/neondatabase/postgres/pull/498

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-19 22:29:57 +02:00
Alex Chi Z.
b22a84a7bf feat(pageserver): support key range for manual compaction trigger (#9723)
part of https://github.com/neondatabase/neon/issues/9114, we want to be
able to run partial gc-compaction in tests. In the future, we can also
expand this functionality to legacy compaction, so that we can trigger
compaction for a specific key range.

## Summary of changes

* Support passing compaction key range through pageserver routes.
* Refactor input parameters of compact related function to take the new
`CompactOptions`.
* Add tests for partial compaction. Note that the test may or may not
trigger compaction based on GC horizon. We need to improve the test case
to ensure things always get below the gc_horizon and the gc-compaction
can be triggered.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-19 19:38:41 +00:00
Arpad Müller
b092126c94 scrubber: fix parsing issue with Azure (#9797)
Apparently Azure returns timelines ending with `/` which confuses the
parsing. So remove all trailing `/`s before attempting to parse.

Part of https://github.com/neondatabase/cloud/issues/19963
2024-11-19 20:10:53 +01:00
Alex Chi Z.
5e3fbef721 fix(pageserver): queue stopped error should be ignored during create timeline (#9767)
close https://github.com/neondatabase/neon/issues/9730

The test case tests if anything goes wrong during pageserver restart +
*during timeline creation not complete*. Therefore, queue is stopped
error is normal in this case, except that it should be categorized as a
shutdown error instead of a real error.

## Summary of changes

* More comments for the test case.
* Queue stopped error will now be forwarded as
CreateTimelineError::ShuttingDown.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-19 14:10:09 -05:00
dependabot[bot]
15468cd23c build(deps): bump aiohttp from 3.10.2 to 3.10.11 (#9794) 2024-11-19 19:08:00 +00:00
Peter Bendel
a8ac895b83 re-acquire S3 OIDC token after long running tests for report upload to S3 (#9799)
## Problem

If a benchmark or test-case runs longer than the AWS OIDC token lifetime
successive upload of test reports to S3 fail - example:


https://github.com/neondatabase/neon/actions/runs/11905529176/job/33176168174#step:9:243

## Summary of changes

In actions that require access to S3 and which are invoked after a long
running python testcase we re-acquire the OIDC token explicitly.
Note that we need to pass down the aws_oicd_role_arn from the workflow
to the action because actions have no access to GitHub vars for security
reasons.

Sample run
https://github.com/neondatabase/neon/actions/runs/11912328276/job/33195676867
2024-11-19 18:22:51 +01:00
Heikki Linnakangas
ada84400b7 PostgreSQL minor version updates (17.2, 16.6, 15.10, 14.15) (#9795)
The community decided to make a new off-schedule release due to ABI
breakage in last week's release. We're not affected by the ABI
breakage because we rebuild all extensions in our docker images, but
let's stay up-to-date. There were a few other fixes in the release
too.
2024-11-19 17:01:05 +02:00
Conrad Ludgate
191f745c81 fix(proxy/auth_broker): ignore -pooler suffix (#9800)
Fixes https://github.com/neondatabase/cloud/issues/20400

We cannot mix local_proxy and pgbouncer, so we are filtering out the
`-pooler` suffix prior to calling wake_compute.
2024-11-19 13:58:26 +00:00
Conrad Ludgate
37b97b3a68 chore(local_proxy): reduce some startup logging (#9798)
Currently, local_proxy will write an error log if it doesn't find the
config file. This is expected for startup, so it's just noise. It is an
error if we do receive an explicit SIGHUP though.

I've also demoted the build info logs to be debug level. We don't need
them in the compute image since we have other ways to determine what
code is running.

Lastly, I've demoted SIGHUP signal handling from warn to info, since
it's not really a warning event.

See https://github.com/neondatabase/cloud/issues/10880 for more details
2024-11-19 13:58:11 +00:00
Konstantin Knizhnik
c9acd214ae Do not create DSM segment for wal_redo_postgres (#9793)
## Problem

See  https://github.com/neondatabase/neon/issues/9738

## Summary of changes

Do not create DSM segment for wal_redo Postgres

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-19 11:56:40 +02:00
Peter Bendel
982cb1c15d Move logic for ingest benchmark from GitHub workflow into python testcase (#9762)
## Problem

The first version of the ingest benchmark had some parsing and reporting
logic in shell script inside GitHub workflow.
it is better to move that logic into a python testcase so that we can
also run it locally.

## Summary of changes

- Create new python testcase
- invoke pgcopydb inside python test case
- move the following logic into python testcase
  - determine backpressure
  - invoke pgcopydb and report its progress
  - parse pgcopydb log and extract metrics
  - insert metrics into perf test database
 
- add additional column to perf test database that can receive endpoint
ID used for pgcopydb run to have it available in grafana dashboard when
retrieving other metrics for an endpoint

## Example run


https://github.com/neondatabase/neon/actions/runs/11860622170/job/33056264386
2024-11-19 09:46:46 +00:00
Arpad Müller
9b6af2bcad Add the ability to configure GenericRemoteStorage for the scrubber (#9652)
Earlier work (#7547) has made the scrubber internally generic, but one
could only configure it to use S3 storage.

This is the final piece to make (most of, snapshotting still requires
S3) the scrubber be able to be configured via GenericRemoteStorage.

I.e. you can now set an env var like:

```
REMOTE_STORAGE_CONFIG='remote_storage = { bucket_name = "neon-dev-safekeeper-us-east-2d", bucket_region = "us-east-2" }
```

and the scrubber will read it instead.
2024-11-18 21:01:48 +00:00
Arpad Müller
4fc3af15dd Remove at most one retain_lsn entry from (possibly offloaded) timelne's parent (#9791)
There is a potential data corruption issue, not one I've encountered,
but it's still not hard to hit with some correct looking code given our
current architecture. It has to do with the timeline's memory object storage
via reference counted `Arc`s, and the removal of `retain_lsn` entries at
the drop of the last `Arc` reference.

The corruption steps are as follows:

1. timeline gets offloaded. timeline object A doesn't get dropped
though, because some long-running task accesses it
2. the same timeline gets unoffloaded again. timeline object B gets
created for it, timeline object A still referenced. both point to the
same timeline.
3. the task keeping the reference to timeline object A exits. destructor
for object A runs, removing `retain_lsn` in the timeline's parent.
4. the timeline's parent runs gc without the `retain_lsn` of the still
exant timleine's child, leading to data corruption.

In general we are susceptible each time when we recreate a `Timeline`
object in the same process, which happens both during a timeline
offload/unoffload cycle, as well as during an ancestor detach operation.

The solution this PR implements is to make the destructor for a timeline
as well as an offloaded timeline remove at most one `retain_lsn`.

PR #9760 has added a log line to print the refcounts at timeline
offload, but this only detects one of the places where we do such a
recycle operation. Plus it doesn't prevent the actual issue.

I doubt that this occurs in practice. It is more a defense in depth measure.
Usually I'd assume that the timeline gets dropped immediately in step 1,
as there is no background tasks referencing it after its shutdown.
But one never knows, and reducing the stakes of step 1 actually occurring
is a really good idea, from potential data corruption to waste of CPU time.

Part of #8088
2024-11-18 21:42:19 +01:00
Vlad Lazar
d7662fdc7b feat(page_service): timeout-based batching of requests (#9321)
## Problem

We don't take advantage of queue depth generated by the compute
on the pageserver. We can process getpage requests more efficiently
by batching them. 

## Summary of changes

Batch up incoming getpage requests that arrive within a configurable
time window (`server_side_batch_timeout`).
Then process the entire batch via one `get_vectored` timeline operation.
By default, no merging takes place.

## Testing

* **Functional**: https://github.com/neondatabase/neon/pull/9792
* **Performance**: will be done in staging/pre-prod

# Refs

* https://github.com/neondatabase/neon/issues/9377
* https://github.com/neondatabase/neon/issues/9376

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-11-18 20:24:03 +00:00
Alex Chi Z.
e5c89f3da3 feat(pageserver): drop disposable keys during gc-compaction (#9765)
close https://github.com/neondatabase/neon/issues/9552, close
https://github.com/neondatabase/neon/issues/8920, part of
https://github.com/neondatabase/neon/issues/9114

## Summary of changes

* Drop keys not belonging to this shard during gc-compaction to avoid
constructing history that might have been truncated during shard
compaction.
* Run gc-compaction at the end of shard compaction test.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-18 19:27:52 +00:00
Alexey Kondratov
5f0e9c9a94 feat(compute/tests): Report successful replication test runs as well (#9787)
It should increase the visibility of whether they run and pass.
2024-11-18 16:05:09 +00:00
Alexander Bayandin
44f33b2bd6 Bump default Postgres version for tests to v17 (#9777)
## Problem
Tests that are marked with `run_only_on_default_postgres` do not run on
debug builds on CI because we run debug builds only for the latest
Postgres version (which is 17)

## Summary of changes
- Bump `PgVersion.DEFAULT` to `v17`
- Skip `test_timeline_archival_chaos` in debug builds
2024-11-18 15:06:24 +00:00
Alexander Bayandin
913b5b7027 CI: remove separate check-build-tools-image workflow (#9708)
## Problem

We call `check-build-tools-image` twice for each workflow whenever we
use it, along with `build-build-tools-image`, once as a workflow itself,
and the second time from `build-build-tools-image`. This is not
necessary.

## Summary of changes
- Inline `check-build-tools-image` into `build-build-tools-image`
- Remove separate `check-build-tools-image` workflow
2024-11-18 13:14:28 +00:00
John Spray
3f401a328f tests: mitigate bug to stabilize test_storage_controller_many_tenants (#9771)
## Problem

Due to #9471 , the scale test occasionally gets 404s while trying to
modify the config of a timeline that belongs to a tenant being migrated.
We rarely see this narrow race in the field, but the test is quite good
at reproducing it.

## Summary of changes

- Ignore 404 errors in this test.
2024-11-18 11:33:27 +00:00
Peter Bendel
c3eecf6763 adapt pgvector bench to minor version upgrades of PostgreSql (#9784)
## Problem

pgvector benchmark is failing because after PostgreSQL minor version
upgrade previous version packages are no longer available in deb
repository

[example
failure](https://github.com/neondatabase/neon/actions/runs/11875503070/job/33092787149#step:4:40)

## Summary of changes

Update postgres minor version of packages to current version

[Example run after this
change](https://github.com/neondatabase/neon/actions/runs/11888978279/job/33124614605)
2024-11-18 10:47:43 +00:00
Konstantin Knizhnik
6fa9b0cd8c Use DATA_DIR instead of current workign directory in restore_from_wal script (#9729)
## Problem

See https://github.com/neondatabase/neon/issues/7750

test_wal_restore.sh is copying file to current working directory which
can cause interfere of test_wa_restore.py tests spawned of different
configurations.

## Summary of changes

Copy file to $DATA_DIR

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-18 11:55:38 +02:00
a-masterov
10bc1903e1 Fix the regression test running against the staging instance (#9773)
## Problem
The Postgres version was updated. The patch has to be updated
accordingly.
## Summary of changes
The patch of the regression test was updated.
2024-11-18 10:30:50 +01:00
John Spray
261d065e6f pageserver: respect no_sync in VirtualFile (#9772)
## Problem

`no_sync` initially just skipped syncfs on startup (#9677). I'm also
interested in flaky tests that time out during pageserver shutdown while
flushing l0s, so to eliminate disk throughput as a source of issues
there,

## Summary of changes

- Drive-by change for test timeouts: add a couple more ::info logs
during pageserver startup so it's obvious which part got stuck.
- Add a SyncMode enum to configure VirtualFile and respect it in
sync_all and sync_data functions
- During pageserver startup, set SyncMode according to `no_sync`
2024-11-18 08:59:05 +00:00
Christian Schwarz
b6154b03f4 build(deps): bump smallvec to 1.13.2 to get UB fix (#9781)
Smallvec 1.13.2 contains [an UB
fix](https://github.com/servo/rust-smallvec/pull/345).

Upstream opened [a
request](https://github.com/rustsec/advisory-db/issues/1960)
for this in the advisory-db but it never got acted upon.

Found while working on https://github.com/neondatabase/neon/pull/9321.
2024-11-17 21:25:16 +01:00
Erik Grinaker
8880134171 Cargo.toml: upgrade tikv-jemallocator to 0.6.0 (#9779) 2024-11-17 19:52:05 +01:00
Erik Grinaker
de7e4a34ca safekeeper: send AppendResponse on segment flush (#9692)
## Problem

When processing pipelined `AppendRequest`s, we explicitly flush the WAL
every second and return an `AppendResponse`. However, the WAL is also
implicitly flushed on segment bounds, but this does not result in an
`AppendResponse`. Because of this, concurrent transactions may take up
to 1 second to commit and writes may take up to 1 second before sending
to the pageserver.

## Summary of changes

Advance `flush_lsn` when a WAL segment is closed and flushed, and emit
an `AppendResponse`. To accommodate this, track the `flush_lsn` in
addition to the `flush_record_lsn`.
2024-11-17 18:19:14 +01:00
Vlad Lazar
ac689ab014 wal_decoder: rename end_lsn to next_record_lsn (#9776)
## Problem

It turns out that `WalStreamDecoder::poll_decode` returns the start LSN
of the next record and not the end LSN of the current record. They are
not always equal. For example, they're not equal when the record in
question is an XLOG SWITCH record.

## Summary of changes

Rename things to reflect that.
2024-11-15 21:53:11 +00:00
Tristan Partin
23eabb9919 Fix PG_MAJORVERSION_NUM typo
In ea32f1d0a3, Matthias added a feature to
our extension to expose more granular wait events. However, due to the
typo, those wait events were never registered, so we used the more
generic wait events instead.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-15 15:17:23 -06:00
Vlad Lazar
2af791ba83 wal_decoder: make InterpretedWalRecord serde (#9775)
## Problem

We want to serialize interpreted records to send them over the wire from
safekeeper to pageserver.

## Summary of changes

Make `InterpretedWalRecord` ser/de. This is a temporary change to get
the bulk of the lift merged in
https://github.com/neondatabase/neon/pull/9746. For going to prod, we
don't want to use bincode since we can't evolve the schema.
Questions on serialization will be tackled separately.
2024-11-15 20:34:48 +00:00
Mikhail Kot
e12628fe93 Collect max_connections metric (#9770)
This will further allow us to expose this metric to users
2024-11-15 17:42:41 +00:00
Arpad Müller
7880c246f1 Correct mistakes in offloaded timeline retain_lsn management (#9760)
PR #9308 has modified tenant activation code to take offloaded child
timelines into account for populating the list of `retain_lsn` values.
However, there is more places than just tenant activation where one
needs to update the `retain_lsn`s.

This PR fixes some bugs of the current code that could lead to
corruption in the worst case:

1. Deleting of an offloaded timeline would not get its `retain_lsn`
purged from its parent. With the patch we now do it, but as the parent
can be offloaded as well, the situatoin is a bit trickier than for
non-offloaded timelines which can just keep a pointer to their parent.
Here we can't keep a pointer because the parent might get offloaded,
then unoffloaded again, creating a dangling pointer situation. Keeping a
pointer to the *tenant* is not good either, because we might drop the
offloaded timeline in a context where a `offloaded_timelines` lock is
already held: so we don't want to acquire a lock in the drop code of
OffloadedTimeline.
2. Unoffloading a timeline would not get its `retain_lsn` values
populated, leading to it maybe garbage collecting values that its
children might need. We now call `initialize_gc_info` on the parent.
3. Offloading of a timeline would not get its `retain_lsn` values
registered as offloaded at the parent. So if we drop the `Timeline`
object, and its registration is removed, the parent would not have any
of the child's `retain_lsn`s around. Also, before, the `Timeline` object
would delete anything related to its timeline ID, now it only deletes
`retain_lsn`s that have `MaybeOffloaded::No` set.

Incorporates Chi's reproducer from #9753. cc
https://github.com/neondatabase/cloud/issues/20199

The `test_timeline_retain_lsn` test is extended:

1. it gains a new dimension, duplicating each mode, to either have the
"main" branch be the direct parent of the timeline we archive, or the
"test_archived_parent" branch intermediary, creating a three timeline
structure. This doesn't test anything fixed by this PR in particular,
just explores the vast space of possible configurations a little bit
more.
2. it gains two new modes, `offload-parent`, which tests the second
point, and `offload-no-restart` which tests the third point.

It's easy to verify the test actually is "sharp" by removing one of the
respective `self.initialize_gc_info()`, `gc_info.insert_child()` or
`ancestor_children.push()`.

Part of #8088

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Alex Chi Z <chi@neon.tech>
2024-11-15 14:22:29 +01:00
John Spray
04938d9d55 tests: tolerate pageserver 500s in test_timeline_archival_chaos (#9769)
## Problem

Test exposes cases where pageserver gives 500 responses, causing
failures like
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9766/11844529470/index.html#suites/d1acc79950edeb0563fc86236c620898/3546be2ffed99ba6

## Summary of changes

- Tolerate such messages, and link an issue for cleaning up the
pageserver not to return such 500s.
2024-11-15 13:22:05 +00:00
Erik Grinaker
19f7d40c1d deny.toml: allow CDDL-1.0 license (#9766)
#9764, which adds profiling support to Safekeeper, pulls in the
dependency [`inferno`](https://crates.io/crates/inferno) via
[`pprof-rs`](https://crates.io/crates/pprof). This is licenced under the
[Common Development and Distribution License
1.0](https://spdx.org/licenses/CDDL-1.0.html), which is not allowed by
`cargo-deny`.

This patch allows the CDDL-1.0 license. It is a derivative of the
Mozilla Public License, which we already allow, but avoids some issues
around European copyright law that the MPL had. As such, I don't expect
this to be problematic.
2024-11-15 10:41:43 +00:00
John Spray
38563de7dd storcon: exclude non-Active tenants from shard autosplitting (#9743)
## Problem

We didn't have a neat way to prevent auto-splitting of tenants. This
could be useful during incidents or for testing.

Closes https://github.com/neondatabase/neon/issues/9332

## Summary of changes

- Filter splitting candidates by scheduling policy
2024-11-14 19:41:10 +00:00
John Spray
93939f123f tests: add test_timeline_archival_chaos (#9609)
## Problem

- We lack test coverage of cases where multiple timelines fight for
updates to the same manifest
(https://github.com/neondatabase/neon/pull/9557), and in timeline
archival changes while dual-attached
(https://github.com/neondatabase/neon/pull/9555)

## Summary of changes

- Add a chaos test for timeline creation->archival->offload->deletion
2024-11-14 17:31:35 +00:00
Tristan Partin
49b599c113 Remove the replication slot in test_snap_files at the end of the test
Analysis of the LR benchmarking tests indicates that in the duration of
test_subscriber_lag, a leftover 'slotter' replication slot can lead to
retained WAL growing on the publisher. This replication slot is not used
by any subscriber. The only purpose of the slot is to generate snapshot
files for the puspose of test_snap_files.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-14 10:59:15 -06:00
Yuchen Liang
8cde37bc0b test: disable test_readonly_node_gc until proper fix (#9755)
## Problem

After investigation, we think to make `test_readonly_node_gc` less
flaky, we need to make a proper fix (likely involving persisting part of
the lease state). See https://github.com/neondatabase/neon/issues/9754
for details.

## Summary of changes

- skip the test until proper fix.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-11-14 15:26:58 +00:00
Konstantin Knizhnik
f70611c8df Correctly truncate VM (#9342)
## Problem

https://github.com/neondatabase/neon/issues/9240

## Summary of changes

Correctly truncate VM page instead just replacing it with zero page.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-11-14 17:19:13 +02:00
Vlad Lazar
21282aa113 cargo: use neon branch of rust-postgres (#9757)
## Problem

We are pining our fork of rust-postgres to a commit hash and that
prevents us from making
further changes to it. The latest commit in rust-postgres requires
https://github.com/neondatabase/neon/pull/8747,
but that seems to have gone stale. I reverted rust-postgres `neon`
branch to the pinned commit in
https://github.com/neondatabase/rust-postgres/pull/31.

## Summary of changes

Switch back to using the `neon` branch of the rust-postgres fork.
2024-11-14 15:16:43 +00:00
Arseny Sher
d06bf4b0fe safekeeper: fix atomicity of WAL truncation (#9685)
If WAL truncation fails in the middle it might leave some data on disk
above the write/flush LSN. In theory, concatenated with previous records
it might form bogus WAL (though very unlikely in practice because CRC
would protect from that). To protect from that, set
pending_wal_truncation flag: means before any WAL writes truncation must
be retried until it succeeds. We already did that in case of safekeeper
restart, now extend this mechanism for failures without restart. Also,
importantly, reset LSNs in the beginning of the operation, not in the
end, because once on disk deletion starts previous pointers are wrong.

All this most likely haven't created any problems in practice because
CRC protects from the consequences.

Tests for this are hard; simulation infrastructure might be useful here
in the future, but not yet.
2024-11-14 13:06:42 +03:00
Ivan Efremov
0467d88f06 Merge pull request #9756 from neondatabase/rc/proxy/2024-11-14
Proxy release 2024-11-14
2024-11-14 09:46:52 +02:00
Tristan Partin
1280b708f1 Improve error handling for NeonAPI fixture
Move error handling to the common request function and add a debug log.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-13 20:35:48 -06:00
John Spray
b4e00b8b22 pageserver: refuse to load tenants with suspiciously old indices in old generations (#9719)
## Problem

Historically, if a control component passed a pageserver "generation: 1"
this could be a quick way to corrupt a tenant by loading a historic
index.

Follows https://github.com/neondatabase/neon/pull/9383
Closes #6951 

## Summary of changes

- Introduce a Fatal variant to DownloadError, to enable index downloads
to signal when they have encountered a scary enough situation that we
shouldn't proceed to load the tenant.
- Handle this variant by putting the tenant into a broken state (no
matter which timeline within the tenant reported it)
- Add a test for this case

In the event that this behavior fires when we don't want it to, we have
ways to intervene:
- "Touch" an affected index to update its mtime (download+upload S3
object)
- If this behavior is triggered, it indicates we're attaching in some
old generation, so we should be able to fix that by manually bumping
generation numbers in the storage controller database (this should never
happen, but it's an option if it does)
2024-11-13 18:07:39 +00:00
Heikki Linnakangas
10aaa3677d PostgreSQL minor version updates (17.1, 16.5, 15.9, 14.14) (#9727)
This includes a patch to temporarily disable one test in the pg_anon
test suite. It is an upstream issue, the test started failing with the
new PostgreSQL minor versions because of a change in the default
timezone used in tests. We don't want to block the release for this,
so just disable the test for now. See
199f0a392b (note_2148017485)

Corresponding postgres repository PRs:
https://github.com/neondatabase/postgres/pull/524
https://github.com/neondatabase/postgres/pull/525
https://github.com/neondatabase/postgres/pull/526
https://github.com/neondatabase/postgres/pull/527
2024-11-13 15:08:58 +02:00
Heikki Linnakangas
d5435b1a81 tests: Increase timeout in test_create_churn_during_restart (#9736)
This test was seen to be flaky, e.g. at:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9457/11804246485/index.html#suites/ec4311502db344eee91f1354e9dc839b/982bd121ea698414/.
If I _reduce_ the timeout from 10s to 8s on my laptop, it reliably hits
that timeout and fails. That suggests that the test is pretty close to
the edge even when it passes. Let's bump up the timeout to 30 s to make
it more robust.

See also https://github.com/neondatabase/neon/issues/9730, although the
error message is different there.
2024-11-13 12:20:32 +02:00
Anastasia Lubennikova
080d585b22 Add installed_extensions prometheus metric (#9608)
and add /metrics endpoint to compute_ctl to expose such metrics

metric format example for extension pg_rag
with versions 1.2.3 and 1.4.2
installed in 3 and 1 databases respectively:

neon_extensions_installed{extension="pg_rag", version="1.2.3"} = 3
neon_extensions_installed{extension="pg_rag", version="1.4.2"} = 1

------
infra part: https://github.com/neondatabase/flux-fleet/pull/251
---------

Co-authored-by: Tristan Partin <tristan@neon.tech>
2024-11-13 09:36:48 +00:00
John Spray
7595d3afe6 pageserver: add no_sync for use in regression tests (2/2) (#9678)
## Problem

Followup to https://github.com/neondatabase/neon/pull/9677 which enables
`no_sync` in tests. This can be merged once the next release has
happened.

## Summary of changes

- Always run pageserver with `no_sync = true` in tests.
2024-11-13 09:17:26 +00:00
Konstantin Knizhnik
1ff5333a1b Do not wallog AUX files at replica (#9457)
## Problem

Attempt to persist LR stuff at replica cause cannot make new WAL entries
during recovery` error.
See https://neondb.slack.com/archives/C07S7RBFVRA/p1729280401283389

## Summary of changes

Do not wallog AUX files at replica.
Related Postgres PRs:

https://github.com/neondatabase/postgres/pull/517
https://github.com/neondatabase/postgres/pull/516
https://github.com/neondatabase/postgres/pull/515
https://github.com/neondatabase/postgres/pull/514


## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-11-13 08:50:01 +02:00
Tristan Partin
d8f5d43549 Fix autocommit footguns in performance tests
psycopg2 has the following warning related to autocommit:

> By default, any query execution, including a simple SELECT will start
> a transaction: for long-running programs, if no further action is
> taken, the session will remain “idle in transaction”, an undesirable
> condition for several reasons (locks are held by the session, tables
> bloat…). For long lived scripts, either ensure to terminate a
> transaction as soon as possible or use an autocommit connection.

In the 2.9 release notes, psycopg2 also made the following change:

> `with connection` starts a transaction on autocommit transactions too

Some of these connections are indeed long-lived, so we were retaining
tons of WAL on the endpoints because we had a transaction pinned in the
past.

Link: https://www.psycopg.org/docs/news.html#what-s-new-in-psycopg-2-9
Link: https://github.com/psycopg/psycopg2/issues/941
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-12 15:48:19 -06:00
Erik Grinaker
2256a5727a safekeeper: use WAL_SEGMENT_SIZE for empty timeline state (#9734)
## Problem

`TimelinePersistentState::empty()`, used for tests and benchmarks, had a
hardcoded 16 MB WAL segment size. This caused confusion when attempting
to change the global segment size.

## Summary of changes

Inherit from `WAL_SEGMENT_SIZE` in `TimelinePersistentState::empty()`.
2024-11-12 20:35:44 +00:00
Tristan Partin
3f80af8b1d Add neon.logical_replication_max_logicalsnapdir_size
This GUC will drop replication slots if the size of the
pg_logical/snapshots directory (not including temp snapshot files)
becomes larger than the specified size. Keeping the size of this
directory smaller will help with basebackup size from the pageserver.

Part-of: https://github.com/neondatabase/neon/issues/8619
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-12 13:13:28 -06:00
Tristan Partin
a61d81bbc7 Calculate compute_backpressure_throttling_seconds correctly
The original value that we get is measured in microseconds. It comes
from a calculation using Postgres' GetCurrentTimestamp(), whihc is
implemented in terms of gettimeofday(2).

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-12 13:12:08 -06:00
Erik Grinaker
05381a48f0 utils: remove unnecessary fsync in durable_rename() (#9686)
## Problem

WAL segment fsyncs significantly affect WAL ingestion throughput.
`durable_rename()` is used when initializing every 16 MB segment, and
issues 3 fsyncs of which 1 was unnecessary.

## Summary of changes

Remove an fsync in `durable_rename` which is unnecessary with Linux and
ext4 (which we currently use). This improves WAL ingestion throughput by
up to 23% with large appends on my MacBook.
2024-11-12 18:57:31 +01:00
Alex Chi Z.
cef165818c test(pageserver): add gc-compaction tests with delta will_init (#9724)
I had an impression that gc-compaction didn't test the case where the
first record of the key history is will_init because of there are some
code path that will panic in this case. Luckily it got fixed in
https://github.com/neondatabase/neon/pull/9026 so we can now implement
such tests.

Part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

* Randomly changed some images into will_init neon wal record
* Split `test_simple_bottom_most_compaction_deltas` into two test cases,
one of them has the bottom layer as delta layer with will_init flags,
while the other is the original one with image layers.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-12 10:37:31 -05:00
Erik Grinaker
6b19867410 safekeeper: don't flush control file on WAL ingest path (#9698)
## Problem

The control file is flushed on the WAL ingest path when the commit LSN
advances by one segment, to bound the amount of recovery work in case of
a crash. This involves 3 additional fsyncs, which can have a significant
impact on WAL ingest throughput. This is to some extent mitigated by
`AppendResponse` not being emitted on segment bound flushes, since this
will prevent commit LSN advancement, which will be addressed separately.

## Summary of changes

Don't flush the control file on the WAL ingest path at all. Instead,
leave that responsibility to the timeline manager, but ask it to flush
eagerly if the control file lags the in-memory commit LSN by more than
one segment. This should not cause more than `REFRESH_INTERVAL` (300 ms)
additional latency before flushing the control file, which is
negligible.
2024-11-12 15:17:03 +00:00
Tristan Partin
cc8029c4c8 Update pg_cron to 1.6.4
This comes with PG 17 support.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-11 20:10:53 -06:00
Tristan Partin
5be6b07cf1 Improve typing related to regress/test_logical_replication.py (#9725)
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-11 17:36:45 -06:00
Arpad Müller
b018bc7da8 Add a retain_lsn test (#9599)
Add a test that ensures the `retain_lsn` functionality works. Right now,
there is not a single test that is broken if offloaded or non-offloaded
timelines don't get registered at their parents, preventing gc from
discarding the ancestor_lsns of the children. This PR fills that gap.

The test has four modes:

* `offloaded`: offload the child timeline, run compaction on the parent
timeline, unarchive the child timeline, then try reading from it.
hopefully the data is still there.
* `offloaded-corrupted`: offload the child timeline, corrupts the
manifest in a way that the pageserver believes the timeline was
flattened. This is the closest we can get to pretend the `retain_lsn`
mechanism doesn't exist for offloaded timelines, so we can avoid adding
endpoints to the pageserver that do this manually for tests. The test
then checks that indeed data is corrupted and the endpoint can't be
started. That way we know that the test is actually working, and
actually tests the `retain_lsn` mechanism, instead of say the lsn lease
mechanism, or one of the many other mechanisms that impede gc.
* `archived`: the child timeline gets archived but doesn't get
offloaded. this currently matches the `None` case but we might have
refactors in the future that make archived timelines sufficiently
different from non-archived ones.
* `None`: the child timeline doesn't even get archived. this tests that
normal timelines participate in `retain_lsn`. I've made them locally not
participate in `retain_lsn` (via commenting out the respective
`ancestor_children.push` statement in tenant.rs) and ran the testsuite,
and not a single test failed. So this test is first of its kind.

Part of #8088.
2024-11-11 22:29:21 +00:00
Tristan Partin
4b075db7ea Add a postgres_exporter config file
This exporter logs an ERROR if a file called `postgres_exporter.yml` is
not located in its current working directory. We can silence it by
adding an empty config file and pointing the exporter at it.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-11 14:49:37 -06:00
Fedor Dikarev
fde16f8614 use batch gh-workflow-stats-action with separate table (#9722)
We found that exporting GH Workflow Runs in batch is more efficient due
to
- better utilisation of Github API
- and gh runners usage is rounded to minutes, so even when ad-hoc export
is done in 5-10 seconds, we billed for one minute usage

So now we introduce batch exporting, with version v0.2.x of github
workflow stats exporter.
How it's expected to work now:
- every 15 minutes we query for the workflow runs, created in last 2
hours
- to avoid missing workflows that ran for more than 2 hours, every night
(00:25) we will query workflows created in past 24 hours and export them
as well
- should we have query for even longer periods?
- lets see how it works with current schedule
- for longer periods like for days or weeks, it may require to adjust
logic and concurrency of querying data, so lets for now use simpler
version
2024-11-11 20:33:29 +00:00
Alex Chi Z.
5a138d08a3 feat(pageserver): support partial gc-compaction for delta layers (#9611)
The final patch for partial compaction, part of
https://github.com/neondatabase/neon/issues/9114, close
https://github.com/neondatabase/neon/issues/8921 (note that we didn't
implement parallel compaction or compaction scheduler for partial
compaction -- currently this needs to be scheduled by using a Python
script to split the keyspace, and in the future, automatically split
based on the key partitioning when the pageserver wants to trigger a
gc-compaction)

## Summary of changes

* Update the layer selection algorithm to use the same selection as full
compaction (everything intersect/below gc horizon)
* Update the layer selection algorithm to also generate a list of delta
layers that need to be rewritten
* Add the logic to rewrite delta layers and add them back to the layer
map
* Update test case to do partial compaction on deltas

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-11 20:30:32 +00:00
Tristan Partin
2d9652c434 Clean up C.UTF-8 locale changes
Removes some unnecessary initdb arguments, and fixes Neon for MacOS
since it doesn't seem to ship a C.UTF-8 locale.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-11 13:53:12 -06:00
Alexander Bayandin
e9dcfa2eb2 test_runner: skip more tests using decorator instead of pytest.skip (#9704)
## Problem

Running `pytest.skip(...)` in a test body instead of marking the test
with `@pytest.mark.skipif(...)` makes all fixtures to be initialised,
which is not necessary if the test is going to be skipped anyway.

Also, some tests are unnecessarily skipped (e.g. `test_layer_bloating`
on Postgres 17, or `test_idle_reconnections` at all) or run (e.g.
`test_parse_project_git_version_output_positive` more than on once
configuration) according to comments.

## Summary of changes
- Move `skip_on_postgres` / `xfail_on_postgres` /
`run_only_on_default_postgres` decorators to `fixture.utils`
- Add new `skip_in_debug_build` and `skip_on_ci` decorators
- Replace `pytest.skip(...)` calls with decorators where possible
2024-11-11 18:07:01 +00:00
Peter Bendel
8db84d9964 new ingest benchmark (#9711)
## Problem

We have no specific benchmark testing project migration of postgresql
project with existing data into Neon.
Typical steps of such a project migration are
- schema creation in the neon project
- initial COPY of relations
- creation of indexes and constraints
- vacuum analyze

## Summary of changes

Add a periodic benchmark running 9 AM UTC every day.
In each run:
- copy a 200 GiB project that has realistic schema, data, tables,
indexes and constraints from another project into
  - a new Neon project (7 CU fixed)
- an existing tenant, (but new branch and new database) that already has
4 TiB of data
- use pgcopydb tool to automate all steps and parallelize COPY and index
creation
- parse pgcopydb output and report performance metrics in Neon
performance test database

## Logs

This benchmark has been tested first manually and then as part of
benchmarking.yml workflow, example run see

https://github.com/neondatabase/neon/actions/runs/11757679870
2024-11-11 17:51:15 +00:00
Alexander Bayandin
1aab34715a Remove checklist from the PR template (#9702)
## Problem
Once we enable the merge queue for the `main` branch, it won't be
possible to adjust the commit message right after pressing the "Squash
and merge" button and the PR title + description will be used as is.

To avoid extra noise in the commits in the `main` with the checklist
leftovers, I propose removing the checklist from the PR template and
keeping only the Problem / Summary of changes.

## Summary of changes
- Remove the checklist from the PR template
2024-11-11 17:01:02 +00:00
Erik Grinaker
f63de5f527 safekeeper: add initialize_segment variant of safekeeper_wal_storage_operation_seconds (#9691)
## Problem

We don't have a metric capturing the latency of segment initialization.
This can be significant due to fsyncs.

## Summary of changes

Add an `initialize_segment` variant of
`safekeeper_wal_storage_operation_seconds`.
2024-11-11 17:55:50 +01:00
Alex Chi Z.
54a1676680 rfc: update aux file rfc to reflect latest optimizations (#9681)
Reflects https://github.com/neondatabase/neon/pull/9631 in the RFC.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-11 09:19:03 -05:00
Alex Chi Z.
48c06d9f7b fix(pageserver): increase frozen layer warning threshold; ignore in tests (#9705)
Perf benchmarks produce a lot of layers.

## Summary of changes

Bumping the threshold and ignore the warning.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-11 09:13:46 -05:00
Alexander Bayandin
f510647c7e CI: retry actions/github-script for 5XX errors (#9703)
## Problem

GitHub API can return error 500, and it fails jobs that use
`actions/github-script` action.

## Summary of changes
- Add `retry: 500` to all `actions/github-script` usage
2024-11-11 12:42:32 +00:00
Vlad Lazar
ceaa80ffeb storcon: add peer token for peer to peer communication (#9695)
## Problem

We wish to stop using admin tokens in the infra repo, but step down
requests use the admin token.

## Summary of Changes

Introduce a new "ControllerPeer" scope and use it for step-down requests.
2024-11-11 09:58:41 +00:00
Alexander Bayandin
2fcac0e66b CI(pre-merge-checks): add required checks (#9700)
## Problem
The Merge queue doesn't work because it expects certain jobs, which we
don't have in the `pre-merge-checks` workflow.
But it turns out we can just create jobs/checks with the same names in
any workflow that we run.

## Summary of changes
- Add `conclusion` jobs
- Create `neon-cloud-e2e` status check
- Add a bunch of `if`s to handle cases with no relevant changes found
and prepare the workflow to run rust checks in the future
- List the workflow in `report-workflow-stats` to collect stats about it
2024-11-09 01:02:54 +00:00
Tristan Partin
ecde8d7632 Improve type safety according to pyright
Pyright found many issues that mypy doesn't seem to want to catch or
mypy isn't configured to catch.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-08 14:43:15 -06:00
Alex Chi Z.
af8238ae52 fix(pageserver): drain upload queue before offloading timeline (#9682)
It is possible at the point we shutdown the timeline, there are
still layer files we did not upload.

## Summary of changes

* If the queue is not empty, avoid offloading.
* Shutdown the timeline gracefully using the flush mode to
ensure all local files are uploaded before deleting the timeline
directory.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-08 14:28:55 -05:00
Erik Grinaker
ab47804d00 safekeeper: remove unused WriteGuardSharedState::skip_update (#9699) 2024-11-08 19:25:31 +00:00
Alex Chi Z.
ecca62a45d feat(pageserver): more log lines around frozen layers (#9697)
We saw pageserver OOMs
https://github.com/neondatabase/cloud/issues/19715 for tenants doing
large writes. Add log lines around in-memory layers to hopefully collect
some info during my on-call shift next week.

## Summary of changes

* Estimate in-memory size of an in-mem layer.
* Print frozen layer number if there are too many layers accumulated in
memory.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-08 18:44:00 +00:00
Tristan Partin
34a4eb6f2a Switch compute-related locales to C.UTF-8 by default
Right now, our environments create databases with the C locale, which is
really unfortunate for users who have data stored in other languages
that they want to analyze. For instance, show_trgm on Hebrew text
currently doesn't work in staging or production.

I don't envision this being the final solution. I think this is just a
way to set a known value so the pageserver doesn't use its parent
environment. The final solution to me is exposing initdb parameters to
users in the console. Then they could use a different locale or encoding
if they so chose.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-08 12:19:18 -06:00
Alexander Bayandin
b6bc954c5d CI: move check codestyle python to reusable workflow and run on a merge_group (#9683)
## Problem

To prevent breaking main after Python 3.11 PR get merged 
we need to enable merge queue and run `check-codestyle-python`
job on it

## Summary of changes
- Move `check-codestyle-python` to a reusable workflow
- Run this workflow on `merge_group` event
2024-11-08 17:32:56 +00:00
Vlad Lazar
30680d1f32 tests: use tigther storcon scopes (#9696)
## Problem

https://github.com/neondatabase/neon/pull/9596 did not update tests
because that would've broken the compat tests.

## Summary of Changes

Use infra scope where possible.
2024-11-08 17:00:31 +00:00
Alex Chi Z.
f561cbe1c7 fix(pageserver): drain upload queue before detaching ancestor (#9651)
In INC-317
https://neondb.slack.com/archives/C033RQ5SPDH/p1730815677932209, we saw
an interesting series of operations that would remove valid layer files
existing in the layer map.

* Timeline A starts compaction and generates an image layer Z but not
uploading it yet.
* Timeline B/C starts ancestor detaching (which should not affect
timeline A)
* The tenant gets restarted as part of the ancestor detaching process,
without increasing the generation number.
* Timeline A reloads, discovering the layer Z is a future layer, and
schedules a **deletion into the deletion queue**. This means that the
file will be deleted any time in the future.
* Timeline A starts compaction and generates layer Z again, adding it to
the layer map. Note that because we don't bump generation number during
ancestor detach, it has the same filename + generation number as the
original Z.
* Timeline A deletes layer Z from s3 + disk, and now we have a dangling
reference in the layer map, blocking all
compaction/logical_size_calculation process.

## Summary of changes

* We wait until all layers to be uploaded before shutting down the
tenants in `Flush` mode.
* Ancestor detach restarts now use this mode.
* Ancestor detach also waits for remote queue completion before starting
the detaching process.
* The patch ensures that we don't have any future image layer (or
something similar) after restart, but not fixing the underlying problem
around generation numbers.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-08 10:35:27 -05:00
Tristan Partin
3525d2e381 Update TimescaleDB to 2.17.1 for PG 17
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-08 09:15:38 -06:00
Konstantin Knizhnik
17c002b660 Do not copy logical replicaiton slots to replica (#9458)
## Problem

Replication slots are now persisted using AUX files mechanism and
included in basebackup when replica is launched.
This slots are not somehow used at replica but hold WAL, which may cause
local disk space exhaustion.

## Summary of changes

Add `--replica` parameter to basebackup request and do not include
replication slot state files in basebackup for replica.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-08 14:54:58 +02:00
John Spray
aa9112efce pageserver: add no_sync for use in regression tests (1/2) (#9677)
## Problem

In test environments, the `syncfs` that the pageserver does on startup
can take a long time, as other tests running concurrently might have
many gigabytes of dirty pages.

## Summary of changes

- Add a `no_sync` option to the pageserver's config.
- Skip syncfs on startup if this is set
- A subsequent PR (https://github.com/neondatabase/neon/pull/9678) will
enable this by default in tests. We need to wait until after the next
release to avoid breaking compat tests, which would fail if we set
no_sync & use an old pageserver binary.

Q: Why is this a different mechanism than safekeeper, which as a
--no-sync CLI?
A: Because the way we manage pageservers in neon_local depends on the
pageserver.toml containing the full configuration, whereas safekeepers
have a config file which is neon-local-specific and can drive a CLI
flag.

Q: Why is the option no_sync rather than sync?
A: For boolean configs with a dangerous value, it's preferable to make
"false" the safe option, so that any downstream future config tooling
that might have a "booleans are false by default" behavior (e.g. golang
structs) is safe by default.

Q: Why only skip the syncfs, and not all fsyncs?
A: Skipping all fsyncs would require more code changes, and the most
acute problem isn't fsyncs themselves (these just slow down a running
test), it's the syncfs (which makes a pageserver startup slow as a
result of _other_ tests)
2024-11-08 10:16:04 +00:00
JC Grünhage
027889b06c ci: use set-docker-config-dir from dev-actions (#9638)
set-docker-config-dir was replicated over multiple repositories.

The replica of this action was removed from this repository and it's
using the version from github.com/neondatabase/dev-actions instead
2024-11-08 10:44:59 +01:00
Heikki Linnakangas
79929bb1b6 Disable rust_2024_compatibility lint option (#9615)
Compiling with nightly rust compiler, I'm getting a lot of errors like
this:

    error: `if let` assigns a shorter lifetime since Edition 2024
       --> proxy/src/auth/backend/jwt.rs:226:16
        |
    226 |             if let Some(permit) = self.try_acquire_permit() {
        |                ^^^^^^^^^^^^^^^^^^^-------------------------
        |                                   |
| this value has a significant drop implementation which may observe a
major change in drop order and requires your discretion
        |
        = warning: this changes meaning in Rust 2024
= note: for more information, see issue #124085
<https://github.com/rust-lang/rust/issues/124085>
    help: the value is now dropped here in Edition 2024
       --> proxy/src/auth/backend/jwt.rs:241:13
        |
    241 |             } else {
        |             ^
    note: the lint level is defined here
       --> proxy/src/lib.rs:8:5
        |
    8   |     rust_2024_compatibility
        |     ^^^^^^^^^^^^^^^^^^^^^^^
= note: `#[deny(if_let_rescope)]` implied by
`#[deny(rust_2024_compatibility)]`

and this:

error: these values and local bindings have significant drop
implementation that will have a different drop order from that of
Edition 2021
       --> proxy/src/auth/backend/jwt.rs:376:18
        |
    369 |         let client = Client::builder()
| ------ these values have significant drop implementation and will
observe changes in drop order under Edition 2024
    ...
    376 |             map: DashMap::default(),
        |                  ^^^^^^^^^^^^^^^^^^
        |
        = warning: this changes meaning in Rust 2024
= note: for more information, see issue #123739
<https://github.com/rust-lang/rust/issues/123739>
= note: `#[deny(tail_expr_drop_order)]` implied by
`#[deny(rust_2024_compatibility)]`

They are caused by the `rust_2024_compatibility` lint option.

When we actually switch to the 2024 edition, it makes sense to go
through all these and check that the drop order changes don't break
anything, but in the meanwhile, there's no easy way to avoid these
errors. Disable it, to allow compiling with nightly again.

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-11-08 08:35:03 +00:00
Peter Bendel
9132d80aa3 add pgcopydb tool to build tools image (#9658)
## Problem

build-tools image does not provide superuser, so additional packages can
not be installed during GitHub benchmarking workflows but need to be
added to the image

## Summary of changes

install pgcopydb version 0.17-1 or higher into build-tools bookworm
image

```bash
docker run -it neondatabase/build-tools:<tag>-bookworm-arm64 /bin/bash
...
nonroot@c23c6f4901ce:~$ LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version;
13:58:19.768 8 INFO   Running pgcopydb version 0.17 from "/pgcopydb/bin/pgcopydb"
pgcopydb version 0.17
compiled with PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
compatible with Postgres 11, 12, 13, 14, 15, and 16
```

Example usage of that image in a workflow

https://github.com/neondatabase/neon/actions/runs/11725718371/job/32662681172#step:7:14
2024-11-07 19:00:25 +01:00
Conrad Ludgate
82e3f0ecba [proxy/authorize]: improve JWKS reliability (#9676)
While setting up some tests, I noticed that we didn't support keycloak.
They make use of encryption JWKs as well as signature ones. Our current
jwks crate does not support parsing encryption keys which caused the
entire jwk set to fail to parse. Switching to lazy parsing fixes this.

Also while setting up tests, I couldn't use localhost jwks server as we
require HTTPS and we were using webpki so it was impossible to add a
custom CA. Enabling native roots addresses this possibility.

I saw some of our current e2e tests against our custom JWKS in s3 were
taking a while to fetch. I've added a timeout + retries to address this.
2024-11-07 16:24:38 +00:00
Arpad Müller
75aa19aa2d Don't attach is_archived to debug output (#9679)
We are in branches where we know its value already.
2024-11-07 16:13:50 +00:00
Alex Chi Z.
a8d9939ea9 fix(pageserver): reduce aux compaction threshold (#9647)
ref https://github.com/neondatabase/neon/issues/9441

The metrics from LR publisher testing project: ~300KB aux key deltas per
256MB files. Therefore, I think we can do compaction more aggressively
as these deltas are small and compaction can reduce layer download
latency. We also have a read path perf fix
https://github.com/neondatabase/neon/pull/9631 but I'd still combine the
read path fix with the reduce of the compaction threshold.

## Summary of changes

* reduce metadata compaction threshold
* use num of L1 delta layers as an indicator for metadata compaction
* dump more logs

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-07 10:38:15 -05:00
Erik Grinaker
f18aa04b90 safekeeper: use set_len() to zero out segments (#9665)
## Problem

When we create a new segment, we zero it out in order to avoid changing
the length and fsyncing metadata on every write. However, we zeroed it
out by writing 8 KB zero-pages, and Tokio file writes have non-trivial
overhead.

## Summary of changes

Zero out the segment using
[`File::set_len()`](https://docs.rs/tokio/latest/i686-unknown-linux-gnu/tokio/fs/struct.File.html#method.set_len)
instead. This will typically (depending on the filesystem) just write a
sparse file and omit the 16 MB of data entirely. This improves WAL
append throughput for large messages by over 400% with fsync disabled,
and 100% with fsync enabled.
2024-11-07 15:09:57 +00:00
Erik Grinaker
01265b7bc6 safekeeper: add basic WAL ingestion benchmarks (#9531)
## Problem

We don't have any benchmarks for Safekeeper WAL ingestion.

## Summary of changes

Add some basic benchmarks for WAL ingestion, specifically for
`SafeKeeper::process_msg()` (single append) and `WalAcceptor` (pipelined
batch ingestion). Also add some baseline file write benchmarks.
2024-11-07 13:24:03 +00:00
Arseny Sher
f54f0e8e2d Fix direct reading from WAL buffers. (#9639)
Fix direct reading from WAL buffers.
Pointer wasn't advanced which resulted in sending corrupted WAL if part
of read used WAL buffers and part read from the file. Also move it to
neon_walreader so that e.g. replication could also make use of it.

ref https://github.com/neondatabase/cloud/issues/19567
2024-11-07 11:29:52 +00:00
Erik Grinaker
d6aa26a533 postgres_ffi: make WalGenerator generic over record generator (#9614)
## Problem

Benchmarks need more control over the WAL generated by `WalGenerator`.
In particular, they need to vary the size of logical messages.

## Summary of changes

* Make `WalGenerator` generic over `RecordGenerator`, which constructs
WAL records.
* Add `LogicalMessageGenerator` which emits logical messages, with a
configurable payload.
* Minor tweaks and code reorganization.

There are no changes to the core logic or emitted WAL.
2024-11-07 10:38:39 +00:00
Ivan Efremov
f5eec194e7 Merge pull request #9674 from neondatabase/rc/proxy/2024-11-07
Proxy release 2024-11-07
2024-11-07 12:07:12 +02:00
Cheng Chen
e1d0b73824 chore(compute): Bump pg_mooncake to the latest version 2024-11-06 22:41:18 -06:00
Arpad Müller
011c0a175f Support copying layers in detach_ancestor from before shard splits (#9669)
We need to use the shard associated with the layer file, not the shard
associated with our current tenant shard ID.

Due to shard splits, the shard IDs can refer to older files.

close https://github.com/neondatabase/neon/issues/9667
2024-11-07 01:53:58 +01:00
Alex Chi Z.
2a95a51a0d refactor(pageserver): better pageservice command parsing (#9597)
close https://github.com/neondatabase/neon/issues/9460

## Summary of changes

A full rewrite of pagestream cmdline parsing to make it more robust and
readable.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-11-06 20:41:01 +00:00
Yuchen Liang
11fc1a4c12 fix(test): use layer map dump in test_readonly_node_gc to validate layers protected by leases (#9551)
Fixes #9518.

## Problem

After removing the assertion `layers_removed == 0` in #9506, we could
miss breakage if we solely rely on the successful execution of the
`SELECT` query to check if lease is properly protecting layers. Details
listed in #9518.

Also, in integration tests, we sometimes run into the race condition
where getpage request comes before the lease get renewed (item 2 of
#8817), even if compute_ctl sends a lease renewal as soon as it sees a
`/configure` API calls that updates the `pageserver_connstr`. In this
case, we would observe a getpage request error stating that we `tried to
request a page version that was garbage collected` (as we seen in
[Allure
Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8613/11550393107/index.html#suites/3ccffb1d100105b98aed3dc19b717917/d1a1ba47bc180493)).

## Summary of changes

- Use layer map dump to verify if the lease protects what it claimed:
Record all historical layers that has `start_lsn <= lease_lsn` before
and after running timeline gc. This is the same check as
ad79f42460/pageserver/src/tenant/timeline.rs (L5025-L5027)
The set recorded after GC should contain every layer in the set recorded
before GC.
- Wait until log contains another successful lease request before
running the `SELECT` query after GC. We argued in #8817 that the bad
request can only exist within a short period after migration/restart,
and our test shows that as long as a lease renewal is done before the
first getpage request sent after reconfiguration, we will not have bad
request.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-11-06 20:18:21 +00:00
Tristan Partin
93123f2623 Rename compute_backpressure_throttling_ms to compute_backpressure_throttling_seconds
This is in line with the Prometheus guidance[0]. We also haven't started
using this metric, so renaming is essentially free.

Link: https://prometheus.io/docs/practices/naming/ [0]
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-06 13:28:23 -06:00
Alex Chi Z.
1d3559d4bc feat(pageserver): add fast path for sparse keyspace read (#9631)
In https://github.com/neondatabase/neon/issues/9441, the tenant has a
lot of aux keys spread in multiple aux files. The perf tool shows that a
significant amount of time is spent on remove_overlapping_keys. For
sparse keyspaces, we don't need to report missing key errors anyways,
and it's very likely that we will need to read all layers intersecting
with the key range. Therefore, this patch adds a new fast path for
sparse keyspace reads that we do not track `unmapped_keyspace` in a
fine-grained way. We only modify it when we find an image layer.

In debug mode, it was ~5min to read the aux files for a dump of the
tenant, and now it's only 8s, that's a 60x speedup.

## Summary of changes

* Do not add sparse keys into `keys_done` so that remove_overlapping
does nothing.
* Allow `ValueReconstructSituation::Complete` to be updated again in
`ValuesReconstructState::update_key` for sparse keyspaces.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-11-06 18:17:02 +00:00
Conrad Ludgate
73bdc9a2d0 [proxy]: minor changes to endpoint-cache handling (#9666)
I think I meant to make these changes over 6 months ago. alas, better
late than never.

1. should_reject doesn't eagerly intern the endpoint string
2. Rate limiter uses a std Mutex instead of a tokio Mutex.
3. Recently I introduced a `-local-proxy` endpoint suffix. I forgot to
add this to normalize.
4. Random but a small cleanup making the ControlPlaneEvent deser
directly to the interned strings.
2024-11-06 17:40:40 +00:00
John Spray
d182ff294c storcon: respect tenant scheduling policy in drain/fill (#9657)
## Problem

Pinning a tenant by setting Pause scheduling policy doesn't work because
drain/fill code moves the tenant around during deploys.

Closes: https://github.com/neondatabase/neon/issues/9612

## Summary of changes

- In drain, only move a tenant if it is in Active or Essential mode
- In fill, only move a tenant if it is in Active mode.

The asymmetry is a bit annoying, but it faithfully respects the purposes
of the modes: Essential is meant to endeavor to keep the tenant
available, which means it needs to be drained but doesn't need to be
migrated during fills.
2024-11-06 15:14:43 +00:00
Vlad Lazar
4dfa0c221b pageserver: ingest pre-serialized batches of values (#9579)
## Problem

https://github.com/neondatabase/neon/pull/9524 split the decoding and
interpretation step from ingestion.
The output of the first phase is a `wal_decoder::models::InterpretedWalRecord`. 
Before this patch set that struct contained a list of `Value` instances.

We wish to lift the decoding and interpretation step to the safekeeper,
but it would be nice if the safekeeper gave us a batch containing the raw data instead of actual values.

## Summary of changes

Main goal here is to make `InterpretedWalRecord` hold a raw buffer which
contains pre-serialized Values.
For this we do:
1. Add a `SerializedValueBatch` type. This is `inmemory_layer::SerializedBatch` with some 
extra functionality for extension, observing values for shard 0 and tests.
2. Replace `inmemory_layer::SerializedBatch` with `SerializedValueBatch`
3. Make `DatadirModification` maintain a `SerializedValueBatch`.


### `DatadirModification` changes

`DatadirModification` now maintains a `SerializedValueBatch` and extends
it as new WAL records come in (to avoid flushing to disk on every
record).
In turn, this cascaded into a number of modifications to
`DatadirModification`:
1. Replace `pending_data_pages` and `pending_zero_data_pages` with `pending_data_batch`.
2. Removal of `pending_zero_data_pages` and its cousin `on_wal_record_end`
3. Rename `pending_bytes` to `pending_metadata_bytes` since this is what it tracks now.
4. Adapting of various utility methods like `len`, `approx_pending_bytes` and `has_dirty_data_pages`.

Removal of `pending_zero_data_pages` and the optimisation associated
with it ((1) and (2)) deserves more detail.

Previously all zero data pages went through `pending_zero_data_pages`.
We wrote zero data pages when filling gaps caused by relation extension
(case A) and when handling special wal records (case B). If it happened
that the same WAL record contained a non zero write for an entry in
`pending_zero_data_pages` we skipped the zero write.

Case A: We handle this differently now. When ingesting the
`SerialiezdValueBatch` associated with one PG WAL record, we identify the gaps and fill the
them in one go. Essentially, we move from a per key process (gaps were filled after each
new key), and replace it with a per record process. Hence, the optimisation is not
required anymore.

Case B: When the handling of a special record needs to zero out a key,
it just adds that to the current batch. I inspected the code, and I
don't think the optimisation kicked in here.
2024-11-06 14:10:32 +00:00
Folke Behrens
bdd492b1d8 proxy: Replace "web(auth)" with "console redirect" everywhere (#9655) 2024-11-06 11:03:38 +00:00
Folke Behrens
5d8284c7fe proxy: Read cplane JWT with clap arg (#9654) 2024-11-06 10:27:55 +00:00
Folke Behrens
ebc43efebc proxy: Refactor cplane types (#9643)
The overall idea of the PR is to rename a few types to make their
purpose more clear, reduce abstraction where not needed, and move types
to to more better suited modules.
2024-11-05 23:03:53 +01:00
Folke Behrens
754d2950a3 proxy: Revert ControlPlaneEvent back to struct (#9649)
Due to neondatabase/cloud#19815 we need to be more tolerant when reading
events.
2024-11-05 21:32:33 +00:00
Conrad Ludgate
fcde40d600 [proxy] use the proxy protocol v2 command to silence some logs (#9620)
The PROXY Protocol V2 offers a "command" concept. It can be of two
different values. "Local" and "Proxy". The spec suggests that "Local" be
used for health-checks. We can thus use this to silence logging for such
health checks such as those from NLB.

This additionally refactors the flow to be a bit more type-safe, self
documenting and using zerocopy deser.
2024-11-05 17:23:00 +00:00
Erik Grinaker
babfeb70ba safekeeper: don't allocate send buffers on stack (#9644)
## Problem

While experimenting with `MAX_SEND_SIZE` for benchmarking, I saw stack
overflows when increasing it to 1 MB. Turns out a few buffers of this
size are stack-allocated rather than heap-allocated. Even at the default
128 KB size, that's a bit large to allocate on the stack.

## Summary of changes

Heap-allocate buffers of size `MAX_SEND_SIZE`.
2024-11-05 17:05:30 +00:00
Ivan Efremov
2f1a56c8f9 proxy: Unify local and remote conn pool client structures (#9604)
Unify client, EndpointConnPool and DbUserConnPool for remote and local
conn.
- Use new ClientDataEnum for additional client data.
- Add ClientInnerCommon client structure.
- Remove Client and EndpointConnPool code from local_conn_pool.rs
2024-11-05 17:33:41 +02:00
John Spray
e30f5fb922 scrubber: remove AWS region assumption, tolerate negative max_project_size (#9636)
## Problem

First issues noticed when trying to run scrubber find-garbage on Azure:
- Azure staging contains projects with -1 set for max_project_size:
apparently the control plane treats this as a signed field.
- Scrubber code assumed that listing projects should filter to
aws-$REGION. This is no longer needed (per comment in the code) because
we know hit region-local APIs.

This PR doesn't make it work all the way (`init_remote` still assumes
S3), but these are necessary precursors.

## Summary of changes

- Change max-project_size from unsigned to signed
- Remove region filtering in favor of simply using the right region's
API (which we already do)
2024-11-05 13:32:50 +00:00
Arpad Müller
70ae8c16da Construct models::TenantConfig only once (#9630)
Since 5f83c9290b482dc90006c400dfc68e85a17af785/#1504 we've had
duplication in construction of models::TenantConfig, where both
constructs contained the same code. This PR removes one of the two
locations to avoid the duplication.
2024-11-05 13:02:49 +00:00
Erik Grinaker
8840f3858c pageserver: return 503 during tenant shutdown (#9635)
## Problem

Tenant operations may return `409 Conflict` if the tenant is shutting
down. This status code is not retried by the control plane, causing
user-facing errors during pageserver restarts. Operations should instead
return `503 Service Unavailable`, which may be retried for idempotent
operations.

## Summary of changes

Convert
`GetActiveTenantError::WillNotBecomeActive(TenantState::Stopping)` to
`ApiError::ShuttingDown` rather than `ApiError::Conflict`. This error is
returned by `Tenant::wait_to_become_active` in most (all?)
tenant/timeline-related HTTP routes.
2024-11-05 13:16:55 +01:00
Tristan Partin
1e16221f82 Update psycopg2 to latest version for complete PG 17 support
Update the types to match. Changes the cursor import to match the
C bindings[0].

Link: https://github.com/python/typeshed/issues/12578 [0]
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-04 18:21:59 -06:00
Tristan Partin
34812a6aab Improve some typing related to performance testing for LR
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-11-04 15:52:01 -06:00
Arpad Müller
ee68bbf6f5 Add tenant config option to allow timeline_offloading (#9598)
Allow us to enable timeline offloading for single tenants without having
to enable it for the entire pageserver.

Part of #8088.
2024-11-04 21:01:18 +01:00
Folke Behrens
1085fe57d3 proxy: Rewrite ControlPlaneEvent as enum (#9627) 2024-11-04 20:19:26 +01:00
Folke Behrens
59879985b4 proxy: Wrap JWT errors in separate AuthError variant (#9625)
* Also rename `AuthFailed` variant to `PasswordFailed`.
* Before this all JWT errors end up in `AuthError::AuthFailed()`,
  expects a username and also causes cache invalidation.
2024-11-04 19:56:40 +01:00
Conrad Ludgate
81d1bb1941 quieten aws_config logs (#9626)
logs during aws authentication are soooo noisy in staging 🙃
2024-11-04 17:28:10 +00:00
Christian Schwarz
06113e94e6 fix(test_regress): always use storcon virtual pageserver API to set tenant config (#9622)
Problem
-------

Tests that directly call the Pageserver Management API to set tenant
config are flaky if the Pageserver is managed by Storcon because Storcon
is the source of truth and may (theoretically) reconcile a tenant at any
time.

Solution
--------

Switch all users of
`set_tenant_config`/`patch_tenant_config_client_side`
to use the `env.storage_controller.pageserver_api()`

Future Work
-----------

Prevent regressions from creeping in.

And generally clean up up tenant configuration.
Maybe we can avoid the Pageserver having a default tenant config at all
and put the default into Storcon instead?

* => https://github.com/neondatabase/neon/issues/9621

Refs
----

fixes https://github.com/neondatabase/neon/issues/9522
2024-11-04 17:42:08 +01:00
Erik Grinaker
0d5a512825 safekeeper: add walreceiver metrics (#9450)
## Problem

We don't have any observability for Safekeeper WAL receiver queues.

## Summary of changes

Adds a few WAL receiver metrics:

* `safekeeper_wal_receivers`: gauge of currently connected WAL
receivers.
* `safekeeper_wal_receiver_queue_depth`: histogram of queue depths per
receiver, sampled every 5 seconds.
* `safekeeper_wal_receiver_queue_depth_total`: gauge of total queued
messages across all receivers.
* `safekeeper_wal_receiver_queue_size_total`: gauge of total queued
message sizes across all receivers.

There are already metrics for ingested WAL volume: `written_wal_bytes`
counter per timeline, and `safekeeper_write_wal_bytes` per-request
histogram.
2024-11-04 15:22:46 +00:00
Conrad Ludgate
8ad1dbce72 [proxy]: parse proxy protocol TLVs with aws/azure support (#9610)
AWS/azure private link shares extra information in the "TLV" values of
the proxy protocol v2 header. This code doesn't action on it, but it
parses it as appropriate.
2024-11-04 14:04:56 +00:00
Conrad Ludgate
3dcdbcc34d remove aws-lc-rs dep and fix storage_broker tls (#9613)
It seems the ecosystem is not so keen on moving to aws-lc-rs as it's
build setup is more complicated than ring (requiring cmake).

Eventually I expect the ecosystem should pivot to
https://github.com/ctz/graviola/tree/main/rustls-graviola as it
stabilises (it has a very simply build step and license), but for now
let's try not have a headache of juggling two crypto libs.

I also noticed that tonic will just fail with tls without a default
provider, so I added some defensive code for that.
2024-11-04 13:29:13 +00:00
Matthias van de Meent
d5de63c6b8 Fix a time zone issue in a PG17 test case (#9618)
The commit was cherry-picked and thus shouldn't cause issues once we
merge the release tag for PostgreSQL 17.1
2024-11-04 12:10:32 +00:00
John Spray
4534f5cdc6 pageserver: make local timeline deletion infallible (#9594)
## Problem

In https://github.com/neondatabase/neon/pull/9589, timeline offload code
is modified to return an explicit error type rather than propagating
anyhow::Error. One of the 'Other' cases there is I/O errors from local
timeline deletion, which shouldn't need to exist, because our policy is
not to try and continue running if the local disk gives us errors.

## Summary of changes

- Make `delete_local_timeline_directory` and use `.fatal_err(` on I/O
errors

---------

Co-authored-by: Erik Grinaker <erik@neon.tech>
2024-11-04 09:11:52 +00:00
Erik Grinaker
0058eb09df test_runner/performance: add sharded ingest benchmark (#9591)
Adds a Python benchmark for sharded ingestion. This ingests 7 GB of WAL
(100M rows) into a Safekeeper and fans out to 10 shards running on 10
different pageservers. The ingest volume and duration is recorded.
2024-11-02 16:42:10 +00:00
Konstantin Knizhnik
8ac523d2ee Do not assign page LSN to new (uninitialized) page in ClearVisibilityMapFlags redo handler (#9287)
## Problem

https://neondb.slack.com/archives/C04DGM6SMTM/p1727872045252899

See https://github.com/neondatabase/neon/issues/9240

## Summary of changes

Add `!page_is_new` check before assigning page lsn.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-11-01 20:31:29 +02:00
John Spray
3c16bd6e0b storcon: skip non-active projects in chaos injection (#9606)
## Problem

We may sometimes use scheduling modes like `Pause` to pin a tenant in
its current location for operational reasons. It is undesirable for the
chaos task to make any changes to such projects.

## Summary of changes

- Add a check for scheduling mode
- Add a log line when we do choose to do a chaos action for a tenant:
this will help us understand which operations originate from the chaos
task.
2024-11-01 16:47:20 +00:00
Erik Grinaker
123816e99a safekeeper: log slow WalAcceptor sends (#9564)
## Problem

We don't have any observability into full WalAcceptor queues per
timeline.

## Summary of changes

Logs a message when a WalAcceptor send has blocked for 5 seconds, and
another message when the send completes. This implies that the log
frequency is at most once every 5 seconds per timeline, so we don't need
further throttling.
2024-11-01 13:47:03 +01:00
Peter Bendel
8b3bcf71ee revert higher token expiration (#9605)
## Problem

The IAM role associated with our github action runner supports a max
token expiration which is lower than the value we tried.

## Summary of changes

Since we believe to have understood the performance regression we (by
ensuring availability zone affinity of compute and pageserver) the job
should again run in lower than 5 hours and we revert this change instead
of increasing the max session token expiration in the IAM role which
would reduce our security.
2024-11-01 12:46:02 +01:00
Erik Grinaker
4c2c8d6708 test_runner: fix tenant_get_shards with one pageserver (#9603)
## Problem

`tenant_get_shards()` does not work with a sharded tenant on 1
pageserver, as it assumes an unsharded tenant in this case. This special
case appears to have been added to handle e.g. `test_emergency_mode`,
where the storage controller is stopped. This breaks e.g. the sharded
ingest benchmark in #9591 when run with a single shard.

## Summary of changes

Correctly look up shards even with a single pageserver, but add a
special case that assumes an unsharded tenant if the storage controller
is stopped and the caller provides an explicit pageserver, in order to
accomodate `test_emergency_mode`.
2024-11-01 11:25:04 +00:00
Conrad Ludgate
2d1366c8ee fix pre-commit hook with python stubs (#9602)
fix #9601
2024-11-01 11:22:38 +00:00
Vlad Lazar
e589c2e5ec storage_controller: allow deployment infra to use infra token (#9596)
## Problem

We wish for the deployment orchestrator to use infra scoped tokens,
but storcon endpoints it's using require admin scoped tokens.

## Summary of Changes

Switch over all endpoints that are used by the deployment orchestrator
to use an infra scoped token. This causes no breakage during mixed
version scenarios because admin scoped tokens allow access to all
endpoints. The deployment orchestrator can cut over to the infra token
after this commit touches down in prod.

Once this commit is released we should also update the tests code to use
infra scoped tokens where appropriate. Currently it would fail on the
[compat tests](9761b6a64e/test_runner/regress/test_storage_controller.py (L69-L71)).
2024-10-31 18:29:16 +00:00
Conrad Ludgate
9761b6a64e update pg_session_jwt to use pgrx 0.12 for pg17 (#9595)
Updates the extension to use pgrx 0.12. No changes to the extensions
have been made, the only difference is the pgrx version.
2024-10-31 15:50:41 +00:00
Conrad Ludgate
897cffb9d8 auth_broker: fix local_proxy conn count (#9593)
our current metrics for http pool opened connections is always negative
:D oops
2024-10-31 14:57:55 +00:00
John Spray
552088ac16 pageserver: fix spurious error logs in timeline lifecycle (#9589)
## Problem

The final part of https://github.com/neondatabase/neon/issues/9543 will
be a chaos test that creates/deletes/archives/offloads timelines while
restarting pageservers and migrating tenants. Developing that test
showed up a few places where we log errors during normal shutdown.

## Summary of changes

- UninitializedTimeline's drop should log at info severity: this is a
normal code path when some part of timeline creation encounters a
cancellation `?` path.
- When offloading and finding a `RemoteTimelineClient` in a
non-initialized state, this is not an error and should not be logged as
such.
- The `offload_timeline` function returned an anyhow error, so callers
couldn't gracefully pick out cancellation errors from real errors:
update this to have a structured error type and use it throughout.
2024-10-31 14:44:59 +00:00
Peter Bendel
51fda118f6 increase lifetime of AWS session token to 12 hours (#9590)
## Problem

clickbench regression causes clickbench to run >9 hours and the AWS
session token is expired before the run completes

## Summary of changes

extend lifetime of session token for this job to 12 hours
2024-10-31 13:34:50 +00:00
Anastasia Lubennikova
e96398a552 Add support of extensions for v17 (part 4) (#9568)
- pg_jsonschema 0.3.3
- pg_graphql 1.5.9
- rum 65e0a752
- pg_tiktoken a5bc447e

update support of extensions for v14-v16:
- pg_jsonschema 0.3.1 -> 0.3.3
- pg_graphql 1.5.7 -> 1.5.9
- rum 6ab37053 -> 65e0a752
- pg_tiktoken e64e55aa -> a5bc447e
2024-10-31 15:05:24 +02:00
Erik Grinaker
f9d8256d55 pageserver: don't return option from DeletionQueue::new (#9588)
`DeletionQueue::new()` always returns deletion workers, so the returned
`Option` is redundant.
2024-10-31 10:51:58 +00:00
Vlad Lazar
411c3aa0d6 pageserver: lift decoding and interpreting of wal into wal_decoder (#9524)
## Problem

Decoding and ingestion are still coupled in `pageserver::WalIngest`.

## Summary of changes

A new type is added to `wal_decoder::models`, InterpretedWalRecord. This
type contains everything that the pageserver requires in order to ingest
a WAL record. The highlights are the `metadata_record` which is an
optional special record type to be handled and `blocks` which stores
key, value pairs to be persisted to storage.

This type is produced by
`wal_decoder::models::InterpretedWalRecord::from_bytes` from a raw PG
wal record.

The rest of this commit separates decoding and interpretation of the PG
WAL record from its application in `WalIngest::ingest_record`.

Related: https://github.com/neondatabase/neon/issues/9335
Epic: https://github.com/neondatabase/neon/issues/9329
2024-10-31 10:47:43 +00:00
Arpad Müller
65b69392ea Disallow offloaded children during timeline deletion (#9582)
If we delete a timeline that has childen, those children will have their
data corrupted. Therefore, extend the already existing safety check to
offloaded timelines as well.

Part of #8088
2024-10-30 19:37:09 +01:00
Alex Chi Z.
8d70f88b37 refactor(pageserver): use JSON field encoding for consumption metrics cache (#9470)
In https://github.com/neondatabase/neon/issues/9032, I would like to
eventually add a `generation` field to the consumption metrics cache.
The current encoding is not backward compatible and it is hard to add
another field into the cache. Therefore, this patch refactors the format
to store "field -> value", and it's easier to maintain backward/forward
compatibility with this new format.

## Summary of changes

* Add `NewRawMetric` as the new format.
* Add upgrade path. When opening the disk cache, the codepath first
inspects the `version` field, and decide how to decode.
* Refactor metrics generation code and tests.
* Add tests on upgrade / compatibility with the old format.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-30 18:13:11 +00:00
Arpad Müller
bcfe013094 Don't keep around the timeline's remote_client (#9583)
Constructing a remote client is no big deal. Yes, it means an extra
download from S3 but it's not that expensive. This simplifies code paths
and scenarios to test. This unifies timelines that have been recently
offloaded with timelines that have been offloaded in an earlier
invocation of the process.

Part of #8088
2024-10-30 18:44:29 +01:00
Arpad Müller
d0a02f3649 Disallow archived timelines to be detached or reparented (#9578)
Disallow a request for timeline ancestor detach if either the to be
detached timeline, or any of the to be reparented timelines are
offloaded or archived.

In theory we could support timelines that are archived but not
offloaded, but archived timelines are at the risk of being offloaded, so
we treat them like offloaded timelines. As for offloaded timelines, any
code to "support" them would amount to unoffloading them, at which point
we can just demand to have the timelines be unarchived.

Part of #8088
2024-10-30 17:04:57 +01:00
Tristan Partin
8af9412eb2 Collect compute backpressure throttling time
This will tell us how much time the compute has spent throttled if
pageserver/safekeeper cannot keep up with WAL generation.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-30 09:58:29 -05:00
Erik Grinaker
96e35e11a6 postgres_ffi: add WAL generator for tests/benchmarks (#9503)
## Problem

We don't have a convenient way to generate WAL records for benchmarks
and tests.

## Summary of changes

Adds a WAL generator, exposed as an iterator. It currently only
generates logical messages (noops), but will be extended to write actual
table rows later.

Some existing code for WAL generation has been replaced with this
generator, to reduce duplication.
2024-10-30 14:46:39 +03:00
Alexey Kondratov
745061ddf8 chore(compute): Bump pg_mooncake to the latest version (#9576)
## Problem

There were some critical breaking changes made in the upstream since Oct
29th morning.

## Summary of changes

Point it to the topmost commit in the `neon` branch at the time of
writing this
https://github.com/Mooncake-Labs/pg_mooncake/commits/neon/
c495cd17d6
2024-10-30 11:07:02 +01:00
Tristan Partin
0c828c57e2 Remove non-gzipped basebackup code path
In July of 2023, Bojan and Chi authored
92aee7e07f. Our in production pageservers
are most definitely at a version where they all support gzipped
basebackups.
2024-10-29 23:03:45 -05:00
John Spray
8e2e9f0fed pageserver: generation-aware storage for TenantManifest (#9555)
## Problem

When tenant manifest objects are written without a generation suffix,
concurrently attached pageservers may stamp on each others writes of the
manifest and cause undefined behavior.

Closes: #9543 

## Summary of changes

- Use download_generation_object helper when reading manifests, to
search for the most recent generation
- Use Tenant::generation as the generation suffix when writing
manifests.
2024-10-29 23:24:04 +01:00
Tristan Partin
b77b9bdc9f Add tests for sql-exporter metrics
Should help us keep non-working metrics from hitting staging or
production.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Fixes: https://github.com/neondatabase/neon/issues/8569
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-29 15:13:06 -05:00
Alex Chi Z.
81f9aba005 fix(pagectl): layer parsing and image layer dump (#9571)
This patch contains various improvements for the pagectl tool.

## Summary of changes

* Rewrite layer name parsing: LayerName now supports all variants we use
now.
* Drop pagectl's own layer parsing function, use LayerName in the
pageserver crate.
* Support image layer dumping in the layer dump command using
ImageLayer::dump, drop the original implementation.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-29 15:16:23 -04:00
Alex Chi Z.
88ff8a7803 feat(pageserver): support partial gc-compaction for lowest retain lsn (#9134)
part of https://github.com/neondatabase/neon/issues/8921,
https://github.com/neondatabase/neon/issues/9114

## Summary of changes

We start the partial compaction implementation with the image layer
partial generation. The partial compaction API now takes a key range. We
will only generate images for that key range for now, and remove layers
fully included in the key range after compaction.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-10-29 18:25:32 +00:00
Konstantin Knizhnik
0c075fab3a Add --replica parameter to basebackup (#9553)
## Problem

See https://github.com/neondatabase/neon/pull/9458
This PR separates PS related changes in #9458 from compute_ctl changes
to enforce that PS is deployed before compute.

## Summary of changes

This PR adds handlings of `--replica` parameters of backebackup to page
server.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-29 18:40:10 +02:00
Anastasia Lubennikova
80e1630042 Use pg_mooncake from our fork. (#9565)
Switch to main repo once
https://github.com/Mooncake-Labs/pg_mooncake/pull/3 is merged
2024-10-29 15:57:52 +00:00
Jakub Kołodziejczak
57499640c5 proxy: more granular http status codes for sql-over-http errors (#9549)
closes #9532
2024-10-29 15:44:45 +00:00
Anastasia Lubennikova
793ad50b7d fix allow_unstable_extensions GUC - make it USERSET (#9563)
fix message wording
2024-10-29 14:25:23 +00:00
John Spray
7a1331eee5 pageserver: make concurrent offloaded timeline operations safe wrt manifest uploads (#9557)
## Problem

Uploads of the tenant manifest could race between different tasks,
resulting in unexpected results in remote storage.

Closes: https://github.com/neondatabase/neon/issues/9556

## Summary of changes

- Create a central function for uploads that takes a tokio::sync::Mutex
- Store the latest upload in that Mutex, so that when there is lots of
concurrency (e.g. archive 20 timelines at once) we can coalesce their
manifest writes somewhat.
2024-10-29 13:54:48 +00:00
John Spray
4ef74215e1 pageserver: refactor generation-aware loading code into generic (#9545)
## Problem

Indices used to be the only kind of object where we had to search across
generations to find the most recent one. As of
https://github.com/neondatabase/neon/issues/9543, manifests will need
the same treatment.

## Summary of changes

- Refactor download_index_part to a generic download_generation_object
function, which will be usable for downloading manifest objects as well.
2024-10-29 13:00:03 +00:00
Conrad Ludgate
7e00be391d Merge pull request #9558 from neondatabase/rc/proxy/2024-10-29
Auth broker release 2024-10-29
2024-10-29 12:10:50 +00:00
Conrad Ludgate
d4cbc8cfeb [auth_broker]: regress test (#9541)
python based regression test setup for auth_broker. This uses a http
mock for cplane as well as the JWKs url.

complications:
1. We cannot just use local_proxy binary, as that requires the
pg_session_jwt extension which we don't have available in the current
test suite
2. We cannot use just any old http mock for local_proxy, as auth_broker
requires http2 to local_proxy

as such, I used the h2 library to implement an echo server - copied from
the examples in the h2 docs.
2024-10-29 11:39:09 +00:00
Conrad Ludgate
47c35f67c3 [proxy]: fix JWT handling for AWS cognito. (#9536)
In the base64 payload of an aws cognito jwt, I saw the following:

```
"iss":"https:\/\/cognito-idp.us-west-2.amazonaws.com\/us-west-2_redacted"
```

issuers are supposed to be URLs, and URLs are always valid un-escaped
JSON. However, `\/` is a valid escape character so what AWS is doing is
technically correct... sigh...

This PR refactors the test suite and adds a new regression test for
cognito.
2024-10-29 11:01:09 +00:00
Peter Bendel
45b558f480 temporarily increase timeout for clickbench benchmark until regression is resolved (#9554)
## Problem

click bench job in benchmarking workflow has a performance regression
causing it to run in timeout of max job run.

Suspected root cause:
Project has been migrated from single pageserver to storage controller
managed project on Oct 14th.
Since then the regression shows.

## Summary of changes

Increase timeout of pytest to 12 hours.
Increase job timeout to 12 hours
2024-10-29 10:53:28 +00:00
Arpad Müller
a73402e646 Offloaded timeline deletion (#9519)
As pointed out in
https://github.com/neondatabase/neon/pull/9489#discussion_r1814699683 ,
we currently didn't support deletion for offloaded timelines after the
timeline has been loaded from the manifest instead of having been
offloaded.

This was because the upload queue hasn't been initialized yet. This PR
thus initializes the timeline and shuts it down immediately.

Part of #8088
2024-10-29 10:41:53 +00:00
Vlad Lazar
07b974480c pageserver: move things around to prepare for decoding logic (#9504)
## Problem

We wish to have high level WAL decoding logic in `wal_decoder::decoder`
module.

## Summary of Changes

For this we need the `Value` and `NeonWalRecord` types accessible there, so:
1. Move `Value` and `NeonWalRecord` to `pageserver::value` and
`pageserver::record` respectively.
2. Get rid of `pageserver::repository` (follow up from (1))
3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In
theory they could live in `wal_decoder`, but it would create a circular
dependency between `wal_decoder` and `postgres_ffi`. Long term it makes
sense for those types to be PG version specific, so that will work out nicely.
4. Move higher level WAL record types (to be ingested by pageserver)
into `wal_decoder::models`

Related: https://github.com/neondatabase/neon/issues/9335
Epic: https://github.com/neondatabase/neon/issues/9329
2024-10-29 10:00:34 +00:00
Arpad Müller
62f5d484d9 Assert the tenant to be active in unoffload_timeline (#9539)
Currently, all callers of `unoffload_timeline` ensure that the tenant
the unoffload operation is called on is active. We rely on it being
active as we activate the timeline below and don't want to race with the
activation code of the tenant (in the worst case, activating a timeline
twice).

Therefore, add this assertion.

Part of #8088
2024-10-29 00:36:05 +00:00
Tristan Partin
4df3987054 Get role name when not a C string
We will only have a C string if the specified role is a string.
Otherwise, we need to resolve references to public, current_role,
current_user, and session_user.

Fixes: https://github.com/neondatabase/cloud/issues/19323
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-28 18:21:45 -05:00
Konstantin Knizhnik
0624565617 Create the notion of unstable extensions
As a DBaaS provider, Neon needs to provide a stable platform for
customers to build applications upon. At the same time however, we also
need to enable customers to use the latest and greatest technology, so
they can prototype their work, and we can solicit feedback. If all
extensions are treated the same in terms of stability, it is hard to
meet that goal.

There are now two new GUCs created by the Neon extension:

neon.allow_unstable_extensions: This is a session GUC which allows
a session to install and load unstable extensions.

neon.unstable_extensions: This is a comma-separated list of extension
names. We can check if a CREATE EXTENSION statement is attempting to
install an unstable extension, and if so, deny the request if
neon.allow_unstable_extensions is not set to true.

Signed-off-by: Tristan Partin <tristan@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-28 17:47:15 -05:00
George MacKerron
7d5f6b6a52 Build pgrag extensions x3 (#8486)
Build the pgrag extensions (rag, rag_bge_small_en_v15, and
rag_jina_reranker_v1_tiny_en) as part of the compute node Dockerfile.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-10-28 20:06:36 +00:00
Alex Chi Z.
f7c61e856f fix(pageserver): bump tokio-epoll-uring (#9546)
Includes https://github.com/neondatabase/tokio-epoll-uring/pull/58 that
fixes the clippy error.

## Summary of changes

Update the version of tokio-epoll-uring

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-28 20:03:02 +00:00
Alex Chi Z.
57c21aff9f refactor(pageserver): remove aux v1 configs (#9494)
## Problem

Part of https://github.com/neondatabase/neon/issues/8623

## Summary of changes

Removed all aux-v1 config processing code. Note that we persisted it
into the index part file, so we cannot really remove the field from
index part. I also kept the config item within the tenant config, but we
will not read it any more.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-28 19:51:14 +00:00
Erik Grinaker
248558dee8 safekeeper: refactor WalAcceptor to be event-driven (#9462)
## Problem

The `WalAcceptor` main loop currently uses two nested loops to consume
inbound messages. This makes it hard to slot in periodic events like
metrics collection. It also duplicates the event processing code, and assumes
all messages in steady state are AppendRequests (other messages types may
be dropped if following an AppendRequest).

## Summary of changes

Refactor the `WalAcceptor` loop to be event driven.
2024-10-28 17:18:37 +00:00
Sergey Melnikov
3bad52543f We don't have legacy proxies anymore (#9544)
We don't have legacy scram proxies anymore:
cc: https://github.com/neondatabase/cloud/issues/9745
2024-10-28 16:42:35 +00:00
Tristan Partin
3d64a7ddcd Add pg_mooncake to compute-node.Dockerfile
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-28 11:23:30 -05:00
Conrad Ludgate
25f1e5cfeb [proxy] demote warnings and remove dead-argument (#9512)
fixes https://github.com/neondatabase/cloud/issues/19000
2024-10-28 15:02:20 +00:00
Rahul Patil
8dd555d396 ci(proxy): Update GH action flag on proxy deployment (#9535)
## Problem

Based on a recent proxy deployment issue, we deployed another proxy
version (proxy-scram), which was not needed when deploying a specific
proxy type. we have
[PR](https://github.com/neondatabase/infra/pull/2142) to update on the
infra branch and need to update CI in this repo which triggers proxy
deployment.

## Summary of changes

- Update proxy deployment flag 

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-10-28 13:17:09 +01:00
Arthur Petukhovsky
01b6843e12 Route pgbouncer logs to virtio-serial (#9488)
virtio-serial is much more performant than /dev/console emulation,
therefore, is much more suitable for the verbose logs inside vm. This
commit changes routing for pgbouncer logs, since we've recently noticed
it can emit large volumes of logs.

Manually tested on staging by pinning a compute image to my test
project.

Should help with https://github.com/neondatabase/cloud/issues/19072
2024-10-28 12:09:47 +00:00
John Spray
93987b5a4a tests: add test_storage_controller_onboard_detached (#9431)
## Problem

We haven't historically taken this API route where we would onboard a
tenant to the controller in detached state. It worked, but we didn't
have test coverage.

## Summary of changes

- Add a test that onboards a tenant to the storage controller in
Detached mode, and checks that deleting it without attaching it works as
expected.
2024-10-28 11:11:12 +00:00
John Spray
33baca07b6 storcon: add an API to cancel ongoing reconciler (#9520)
## Problem

If something goes wrong with a live migration, we currently only have
awkward ways to interrupt that:
- Restart the storage controller
- Ask it to do some other modification/migration on the shard, which we
don't really want.

## Summary of changes

- Add a new `/cancel` control API, and storcon_cli wrapper for it, which
fires the Reconciler's cancellation token. This is just for on-call use
and we do not expect it to be used by any other services.
2024-10-28 09:26:01 +00:00
John Spray
923974d4da safekeeper: don't un-evict timelines during snapshot API handler (#9428)
## Problem

When we use pull_timeline API on an evicted timeline, it gets downloaded
to serve the snapshot API request. That means that to evacuate all the
timelines from a node, the node needs enough disk space to download
partial segments from all timelines, which may not be physically the
case.

Closes: #8833 

## Summary of changes

- Add a "try" variant of acquiring a residence guard, that returns None
if the timeline is offloaded
- During snapshot API handler, take a different code path if the
timeline isn't resident, where we just read the checkpoint and don't try
to read any segments.
2024-10-28 08:47:12 +00:00
Arpad Müller
e7277885b3 Don't consider archived timelines for synthetic size calculation (#9497)
Archived timelines should not count towards synthetic size.

Closes #9384.

Part of #8088.
2024-10-26 13:27:57 +00:00
dependabot[bot]
80262e724f build(deps): bump werkzeug from 3.0.3 to 3.0.6 (#9527) 2024-10-26 08:24:15 +01:00
Yuchen Liang
85b954f449 pageserver: add tokio-epoll-uring slots waiters queue depth metrics (#9482)
In complement to
https://github.com/neondatabase/tokio-epoll-uring/pull/56.

## Problem

We want to make tokio-epoll-uring slots waiters queue depth observable
via Prometheus.

## Summary of changes

- Add `pageserver_tokio_epoll_uring_slots_submission_queue_depth`
metrics as a `Histogram`.
- Each thread-local tokio-epoll-uring system is given a `LocalHistogram`
to observe the metrics.
- Keep a list of `Arc<ThreadLocalMetrics>` used on-demand to flush data
to the shared histogram.
- Extend `Collector::collect` to report
`pageserver_tokio_epoll_uring_slots_submission_queue_depth`.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-10-25 21:30:57 +01:00
Arpad Müller
76328ada05 Fix unoffload_timeline races with creation (#9525)
This PR does two things:

1. Obtain a `TimelineCreateGuard` object in `unoffload_timeline`. This
prevents two unoffload tasks from racing with each other. While they
already obtain locks for `timelines` and `offloaded_timelines`, they
aren't sufficient, as we have already constructed an entire timeline at
that point. We shouldn't ever have two `Timeline` objects in the same
process at the same time.
2. don't allow timeline creations for timelines that have been
offloaded. Obviously they already exist, so we should not allow
creation. the previous logic only looked at the timelines list.

Part of #8088
2024-10-25 20:06:27 +00:00
Erik Grinaker
b54b632c6a safekeeper: don't pass conf into storage constructors (#9523)
## Problem

The storage components take an entire `SafekeeperConf` during
construction, but only actually use the `no_sync` field. This makes it
hard to understand the storage inputs (which fields do they actually
care about?), and is also inconvenient for tests and benchmarks that
need to set up a lot of unnecessary boilerplate.

## Summary of changes

* Don't take the entire config, but pass in the `no_sync` field
explicitly.
* Take the timeline dir instead of `ttid` as an input, since it's the
only thing it cares about.
* Fix a couple of tests to not leak tempdirs.
* Various minor tweaks.
2024-10-25 18:19:52 +01:00
Erik Grinaker
9909551f47 safekeeper: fix version in TimelinePersistentState::empty() (#9521)
## Problem

The Postgres version in `TimelinePersistentState::empty()` is incorrect:
the major version should be multiplied by 10000.

## Summary of changes

Multiply the version by 10000.
2024-10-25 16:22:35 +01:00
Arseny Sher
700b102b0f safekeeper: retry eviction. (#9485)
Without this manager may sleep forever after eviction failure without
retries.
2024-10-25 17:48:29 +03:00
Conrad Ludgate
dbadb0f9bb proxy: propagate session IDs (#9509)
fixes #9367 by sending session IDs to local_proxy, and also returns
session IDs to the client for easier debugging.
2024-10-25 14:34:19 +00:00
John Spray
8297f7a181 pageserver: fix N^2 I/O when processing relation drops in transaction abort (#9507)
## Problem

We have some known N^2 behaviors when it comes to large relation counts,
due to the monolithic encoding and full rewrites of of RelDirectory each
time a relation is added. Ordinarily our backpressure mechanisms give
"slow but steady" performance when creating/dropping/truncating
relations. However, in the case of a transaction abort, it is possible
for a single WAL record to drop an unbounded number of relations. The
results in an unavailable compute, as when it sends one of these
records, it can stall the pageserver's ingest for many minutes, even
though the compute only sent a small amount of WAL.

Closes https://github.com/neondatabase/neon/issues/9505

## Summary of changes

- Rewrite relation-dropping code to do one read/modify/write cycle of
RelDirectory, instead of doing it separately for each relation in a
loop.
- Add a test for the bug scenario encountered:
`test_tx_abort_with_many_relations`

The test has ~40s runtime on my workstation. About 1 second of that is
the part where we wait for ingest to catch up after a rollback, the rest
is the slowness of creating and truncating a large number of relations.


---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-10-25 15:09:02 +01:00
Christian Schwarz
2090e928d1 refactor(timeline creation): idempotency checking (#9501)
# Context

In the PGDATA import code
(https://github.com/neondatabase/neon/pull/9218) I add a third way to
create timelines, namely, by importing from a copy of a vanilla PGDATA
directory in object storage.

For idempotency, I'm using the PGDATA object storage location
specification, which is stored in the IndexPart for the entire lifespan
of the timeline. When loading the timeline from remote storage, that
value gets stored inside `struct Timeline` and timeline creation
compares the creation argument with that value to determine idempotency
of the request.

# Changes

This PR refactors the existing idempotency handling of Timeline
bootstrap and branching such that we simply compare the
`CreateTimelineIdempotency` struct, using the derive-generated
`PartialEq` implementation.

Also, by spelling idempotency out in the type names, I find it adds a
lot of clarity.

The pathway to idempotency via requester-provided idempotency key also
becomes very straight-forward, if we ever want to do this in the future.

# Refs
* platform context: https://github.com/neondatabase/neon/pull/9218
* product context: https://github.com/neondatabase/cloud/issues/17507
* stacks on top of https://github.com/neondatabase/neon/pull/9366
2024-10-25 14:44:20 +01:00
Tristan Partin
05eff3a67e Move logical replication slot monitor
neon.c is getting crowded and the logical replication slot monitor is
a good candidate for reorganization. It is very self-contained, and
being in a separate file will make it that much easier to find.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-25 08:41:44 -05:00
Arseny Sher
c6cf5e7c0f Make test_pageserver_lsn_wait_error_safekeeper_stop less aggressive. (#9517)
Previously it inserted ~150MiB of WAL while expecting page fetching to
work in 1s (wait_lsn_timeout=1s). It failed in CI in debug builds.
Instead, just directly wait for the wanted condition, i.e. needed
safekeepers are reported in pageserver timed out waiting for WAL error
message. Also set NEON_COMPUTE_TESTING_BASEBACKUP_RETRIES to 1 in this
test and neighbour one, it reduces execution time from 2.5m to ~10s.
2024-10-25 14:13:46 +01:00
Christian Schwarz
e0c7f1ce15 remote_storage(local_fs): return correct file sizes (#9511)
## Problem

`local_fs` doesn't return file sizes, which I need in PGDATA import
(#9218)

## Solution

Include file sizes in the result.

I would have liked to add a unit test, and started doing that in 

* https://github.com/neondatabase/neon/pull/9510

by extending the common object storage tests
(`libs/remote_storage/tests/common/tests.rs`) to check for sizes as
well.

But it turns out that localfs is not even covered by the common object
storage tests and upon closer inspection, it seems that this area needs
more attention.
=> punt the effort into https://github.com/neondatabase/neon/pull/9510
2024-10-25 12:20:53 +00:00
Christian Schwarz
6f5c262684 pageserver: add testing API to scan layers for disposable keys (#9393)
This PR adds a pageserver mgmt API to scan a layer file for disposable
keys.

It hooks it up to the sharding compaction test, demonstrating that we're
not filtering out all disposable keys.

This is extracted from PGDATA import
(https://github.com/neondatabase/neon/pull/9218)
where I do the filtering of layer files based on `is_key_disposable`.
2024-10-25 14:16:45 +02:00
Jakub Kołodziejczak
9768f09f6b proxy: don't follow redirects for user provided JWKS urls + set custom user agent (#9514)
partially fixes https://github.com/neondatabase/cloud/issues/19249

ref https://docs.rs/reqwest/latest/reqwest/redirect/index.html
> By default, a Client will automatically handle HTTP redirects, having
a maximum redirect chain of 10 hops. To customize this behavior, a
redirect::Policy can be used with a ClientBuilder.
2024-10-25 14:04:41 +02:00
Yuchen Liang
db900ae9d0 fix(test): remove too strict layers_removed==0 check in test_readonly_node_gc (#9506)
Fixes #9098 

## Problem

`test_readonly_node_gc` is flaky. As shown in [Allure
Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9469/11444519440/index.html#suites/3ccffb1d100105b98aed3dc19b717917/2c02073738fa2b39),
we would get a `AssertionError: No layers should be removed, old layers
are guarded by leases.` after the test restarts pageservers or after
reconfigure pageservers.

During the investigation, we found that the layers has LSN (`0/1563088`)
greater than the LSN (`0x1562000`) protected by the lease. For instance,


**Layers removed**
<pre>

000000067F00000005000034540100000000-000000067F00000005000040050100000000__000000000<b><i>1563088</i></b>-00000001
(shard 0002)

000000068000000000000017E20000000001-010000000100000001000000000000000001__000000000<b><i>1563088</i></b>-00000001
(shard 0002)
</pre>

**Lsn Lease Granted**
<pre>
handle_make_lsn_lease{lsn=<b><i>0/1562000</i></b> shard_id=0002
shard_id=0002}: lease created, valid until 2024-10-21
</pre>

This means that these layers are not guarded by the leases: they are in
"future", not visible to the static endpoint.

## Summary of changes

- Remove the assertion layers_removed == 0 after trigger timeline GC
while holding the lease. Instead rely on the successful execution of
the`SELECT` query to test lease validity.
- Improve test logging


Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-10-25 12:50:47 +01:00
Arpad Müller
4d9036bf1f Support offloaded timelines during shard split (#9489)
Before, we didn't copy over the `index-part.json` of offloaded timelines
to the new shard's location, resulting in the new shard not knowing the
timeline even exists.

In #9444, we copy over the manifest, but we also need to do this for
`index-part.json`.

As the operations to do are mostly the same between offloaded and
non-offloaded timelines, we can iterate over all of them in the same
loop, after the introduction of a `TimelineOrOffloadedArcRef` type to
generalize over the two cases. This is analogous to the deletion code
added in #8907.

The added test also ensures that the sharded archival config endpoint
works, something that has not yet been ensured by tests.

Part of #8088
2024-10-25 12:32:46 +02:00
Vlad Lazar
b3bedda6fd pageserver/walingest: log on gappy rel extend (#9502)
## Problem

https://github.com/neondatabase/neon/pull/9492 added a metric to track
the total count of block gaps filled on rel extend. More context is
needed to understand when this happens. The current theory is that it
may only happen on pg 14 and pg 15 since they do not WAL log relation extends.

## Summary of Changes

A rate limited log is added.
2024-10-25 11:15:53 +01:00
Christian Schwarz
b782b11b33 refactor(timeline creation): represent bootstrap vs branch using enum (#9366)
# Problem

Timeline creation can either be bootstrap or branch.
The distinction is made based on whether the `ancestor_*` fields are
present or not.

In the PGDATA import code
(https://github.com/neondatabase/neon/pull/9218), I add a third variant
to timeline creation.

# Solution

The above pushed me to refactor the code in Pageserver to distinguish
the different creation requests through enum variants.

There is no externally observable effect from this change.

On the implementation level, a notable change is that the acquisition of
the `TimelineCreationGuard` happens later than before. This is necessary
so that we have everything in place to construct the
`CreateTimelineIdempotency`. Notably, this moves the acquisition of the
creation guard _after_ the acquisition of the `gc_cs` lock in the case
of branching. This might appear as if we're at risk of holding `gc_cs`
longer than before this PR, but, even before this PR, we were holding
`gc_cs` until after the `wait_completion()` that makes the timeline
creation durable in S3 returns. I don't see any deadlock risk with
reversing the lock acquisition order.

As a drive-by change, I found that the `create_timeline()` function in
`neon_local` is unused, so I removed it.

# Refs

* platform context: https://github.com/neondatabase/neon/pull/9218
* product context: https://github.com/neondatabase/cloud/issues/17507
* next PR stacked atop this one:
https://github.com/neondatabase/neon/pull/9501
2024-10-25 10:04:27 +00:00
Vlad Lazar
5069123b6d pageserver: refactor ingest inplace to decouple decoding and handling (#9472)
## Problem

WAL ingest couples decoding of special records with their handling
(updates to the storage engine mostly).
This is a roadblock for our plan to move WAL filtering (and implicitly
decoding) to safekeepers since they cannot
do writes to the storage engine. 

## Summary of changes

This PR decouples the decoding of the special WAL records from their
application. The changes are done in place
and I've done my best to refrain from refactorings and attempted to
preserve the original code as much as possible.

Related: https://github.com/neondatabase/neon/issues/9335
Epic: https://github.com/neondatabase/neon/issues/9329
2024-10-24 17:12:47 +01:00
Alex Chi Z.
fb0406e9d2 refactor(pageserver): refactor split writers using batch layer writer (#9493)
part of https://github.com/neondatabase/neon/issues/9114,
https://github.com/neondatabase/neon/issues/8836,
https://github.com/neondatabase/neon/issues/8362

The split layer writer code can be used in a more general way: the
caller puts unfinished writers into the batch layer writer and let batch
layer writer to ensure the atomicity of the layer produces.

## Summary of changes

* Add batch layer writer, which atomically finishes the layers.
`BatchLayerWriter::finish` is simply a copy-paste from previous split
layer writers.
* Refactor split writers to use the batch layer writer.
* The current split writer tests cover all code path of batch layer
writer.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-24 10:49:54 -04:00
Alexander Bayandin
b8a311131e CI: remove git config --add safe.directory hack (#9391)
## Problem

We have `git config --global --add safe.directory ...` leftovers from the
past, but `actions/checkout` does it by default (since v3.0.2, we use v4)

## Summary of changes
- Remove `git config --global --add safe.directory ...` hack
2024-10-24 15:49:26 +01:00
John Spray
d589498c6f storcon: respect Reconciler::cancel during await_lsn (#9486)
## Problem

When a pageserver is misbehaving (e.g. we hit an ingest bug or something
is pathologically slow), the storage controller could get stuck in the
part of live migration that waits for LSNs to catch up. This is a
problem, because it can prevent us migrating the troublesome tenant to
another pageserver.

Closes: https://github.com/neondatabase/cloud/issues/19169

## Summary of changes

- Respect Reconciler::cancel during await_lsn.
2024-10-24 15:23:09 +01:00
Folke Behrens
d56599df2a Merge pull request #9499 from neondatabase/rc/proxy/2024-10-24
Proxy release 2024-10-24
2024-10-24 10:34:56 +02:00
Christian Schwarz
6f34f97573 refactor(pageserver(load_remote_timeline)) remove dead code handling absence of IndexPart (#9408)
The code is dead at runtime since we're nowadays always running with
remote storage and treat it as the source of truth during attach.

Clean it up as a preliminary to
https://github.com/neondatabase/neon/pull/9218.

Related: https://github.com/neondatabase/neon/pull/9366
2024-10-24 09:00:22 +01:00
Tristan Partin
b86432c29e Fix buggy sizeof
A sizeof on a pointer on a 64 bit machine is 8 bytes whereas
Entry::old_name is a 64 byte array of characters. There was most likely
no fallout since the string would start with NUL bytes, but best to fix
nonetheless.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-23 21:52:22 -06:00
Vlad Lazar
ac1205c14c pageserver: add metric for number of zeroed pages on rel extend (#9492)
## Problem

Filling the gap in with zeroes is annoying for sharded ingest. We are
not sure it even happens in reality.

## Summary of Changes

Add one global counter which tracks how many such gap blocks we filled
on relation extends. We can add more metrics once we understand the
scope.
2024-10-23 19:58:28 +01:00
John Spray
e3ff87ce3b tests: avoid using background_process when invoking pg_ctl (#9469)
## Problem

Occasionally, we get failures to start the storage controller's db with
errors like:
```
aborting due to panic at /__w/neon/neon/control_plane/src/background_process.rs:349:67:
claim pid file: lock file

Caused by:
    file is already locked
```
e.g.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9428/11380574562/index.html#/testresult/1c68d413ea9ecd4a

This is happening in a stop,start cycle during a test. Presumably the
pidfile from the startup background process is still held at the point
we stop, because we let pg_ctl keep running in the background.

## Summary of changes

- Refactor pg_ctl invocations into a helper
- In the controller's `start` function, use pg_ctl & a wait loop for
pg_isready, instead of using background_process

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-10-23 16:29:55 +00:00
Tristan Partin
0595320c87 Protect call to pg_current_wal_lsn() in retained_wal query
We can't call pg_current_wal_lsn() if we are a standby instance (read
replica). Any attempt to call this function while in recovery results
in:

ERROR:  recovery is in progress

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-23 09:55:00 -06:00
Folke Behrens
92d5e0e87a proxy: clear lib.rs of code items (#9479)
We keep lib.rs for crate configs, lint configs and re-exports for the binaries.
2024-10-23 08:21:28 +02:00
Arpad Müller
3a3bd34a28 Rename IndexPart::{from_s3_bytes,to_s3_bytes} (#9481)
We support multiple storage backends now, so remove the `_s3_` from the
name.

Analogous to the names adopted for tenant manifests added in #9444.
2024-10-23 00:34:24 +02:00
Alex Chi Z.
64949a37a9 fix(pageserver): make delta split layer writer finish atomic (#9048)
similar to https://github.com/neondatabase/neon/pull/8841, we make the
delta layer writer atomic when finishing the layers.

## Summary of changes

* `put_value` not taking discard fn anymore
* `finish` decides what layers to keep

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-22 22:06:21 +00:00
Arpad Müller
6f8fcdf9ea Timeline offloading persistence (#9444)
Persist timeline offloaded state to S3.

Right now, as of #8907, at each restart of the pageserver, all offloaded
state is lost, so we load the full timeline again. As it starts with an
empty local directory, we might potentially download some files again,
leading to downloads that are ultimately wasteful.

This patch adds support for persisting the offloaded state, allowing us
to never load offloaded timelines in the first place. The persistence
feature is facilitated via a new file in S3 that is tenant-global, which
contains a list of all offloaded timelines. It is updated each time we
offload or unoffload a timeline, and otherwise never touched.

This choice means that tenants where no offloading is happening will not
immediately get a manifest, keeping the change very minimal at the
start.

We leave generation support for future work. It is important to support
generations, as in the worst case, the manifest might be overwritten by
an older generation after a timeline has been unoffloaded (and
unarchived), so the next pageserver process instantiation might wrongly
believe that some timeline is still offloaded even though it should be
active.

Part of #9386, #8088
2024-10-22 20:52:30 +00:00
Tristan Partin
fcb55a2aa2 Fix copy-paste error in checkpoints_timed metric
Importing the wrong metric. Sigh...

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-22 14:34:26 -06:00
a-masterov
f36cf3f885 Fix local errors for the tests with the versions mix (#9477)
## Problem
If the environment variables `COMPATIBILITY_NEON_BIN` or
`COMPATIBILITY_POSTGRES_DISTRIB_DIR` are not set (this is usual during a
local run), the tests with the versions mix cannot run.
## Summary of changes
If these variables are not set turn off the version mix.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-10-22 21:58:55 +02:00
John Spray
8dca188974 storage controller: add metrics for tenant shard, node count (#9475)
## Problem

Previously, figuring out how many tenant shards were managed by a
storage controller was typically done by peeking at the database or
calling into the API. A metric makes it easier to monitor, as
unexpectedly increasing shard counts can be indicative of problems
elsewhere in the system.

## Summary of changes

- Add metrics `storage_controller_pageserver_nodes` (updated on node
CRUD operations from Service) and `storage_controller_tenant_shards`
(updated RAII-style from TenantShard)
2024-10-22 19:43:02 +01:00
Tristan Partin
b7fa93f6b7 Use make's builtin RM variable
At least as far as removing individual files goes, this is the best
pattern for removing. I can't say the same for removing directories, but
I went ahead and changed those to `$(RM) -r` anyway.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-22 09:14:29 -06:00
Arseny Sher
1e8e04bb2c safekeeper: refactor timeline initialization (#9362)
Always do timeline init through atomic rename of temp directory. Add
GlobalTimelines::load_temp_timeline which does this, and use it from
both pull_timeline and basic timeline creation. Fixes a collection
of issues:
- previously timeline creation didn't really flushed cfile to disk
  due to 'nothing to do if state didn't change' check;
- even if it did, without tmp dir it is possible to lose the cfile
  but leave timeline dir in place, making it look corrupted;
- tenant directory creation fsync was missing in timeline creation;
- pull_timeline is now protected from concurrent both itself and
  timeline creation;
- now global timelines map entry got special CreationInProgress
  entry type which prevents from anyone getting access to timeline
  while it is being created (previously one could get access to it,
  but it was locked during creation, which is valid but confusing if
  creation failed).

fixes #8927
2024-10-22 07:11:36 +01:00
David Gomes
94369af782 chore(compute): bumps pg_session_jwt to latest version (#9474) 2024-10-21 23:39:30 +00:00
Arpad Müller
34b6bd416a offloaded timeline list API (#9461)
Add a way to list the offloaded timelines.

Before, one had to look at logs to figure out if a timeline has been
offloaded or not, or use the non-presence of a certain timeline in the
list of normal timelines. Now, one can list them directly.
 
Part of #8088
2024-10-21 16:33:05 +01:00
Yuchen Liang
49d5e56c08 pageserver: use direct IO for delta and image layer reads (#9326)
Part of #8130 

## Problem

Pageserver previously goes through the kernel page cache for all the
IOs. The kernel page cache makes light-loaded pageserver have deceptive
fast performance. Using direct IO would offer predictable latencies of
our virtual file IO operations.

In particular for reads, the data pages also have an extremely low
temporal locality because the most frequently accessed pages are cached
on the compute side.

## Summary of changes

This PR enables pageserver to use direct IO for delta layer and image
layer reads. We can ship them separately because these layers are
write-once, read-many, so we will not be mixing buffered IO with direct
IO.

- implement `IoBufferMut`, an buffer type with aligned allocation
(currently set to 512).
- use `IoBufferMut` at all places we are doing reads on image + delta
layers.
- leverage Rust type system and use `IoBufAlignedMut` marker trait to
guarantee that the input buffers for the IO operations are aligned.
- page cache allocation is also made aligned.

_* in-memory layer reads and the write path will be shipped separately._

## Testing

Integration test suite run with O_DIRECT enabled:
https://github.com/neondatabase/neon/pull/9350

## Performance

We evaluated performance based on the `get-page-at-latest-lsn`
benchmark. The results demonstrate a decrease in the number of IOps, no
sigificant change in the latency mean, and an slight improvement on the
p99.9 and p99.99 latencies.


[Benchmark](https://www.notion.so/neondatabase/Benchmark-O_DIRECT-for-image-and-delta-layers-2024-10-01-112f189e00478092a195ea5a0137e706?pvs=4)

## Rollout

We will add `virtual_file_io_mode=direct` region by region to enable
direct IO on image + delta layers.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-10-21 11:01:25 -04:00
Alex Chi Z.
aca81f5fa4 fix(pageserver): make image split layer writer finish atomic (#8841)
Part of https://github.com/neondatabase/neon/issues/8836

## Summary of changes

This pull request makes the image layer split writer atomic when
finishing the layers. All the produced layers either finish at the same
time, or discard at the same time. Note that this does not prevent
atomicity when crash, but anyways, it will be cleaned up on pageserver
restart.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-10-21 15:59:48 +01:00
Ivan Efremov
2dcac94194 proxy: Use common error interface for error handling with cplane (#9454)
- Remove obsolete error handles.
- Use one source of truth for cplane errors.
#18468
2024-10-21 17:20:09 +03:00
Ivan Efremov
ababa50cce Use '-f' for make clean in Makefile compute (#9464)
Use '-f' instead of '--force' because it is impossible to clean the
targets on MacOS
2024-10-21 16:20:39 +03:00
Alexander Bayandin
163beaf9ad CI: use build-tools on Debian 12 whenever we use Neon artifact (#9463)
## Problem

```
+ /tmp/neon/pg_install/v16/bin/psql '***' -c 'SELECT version()'
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/bin/psql)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/bin/psql)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
```

## Summary of changes
- Use `build-tools:pinned-bookworm` whenever we download Neon artefact
2024-10-21 12:14:19 +01:00
Alexander Bayandin
5b37485c99 Rename dockerfiles from Dockerfile.<something> to <something>.Dockerfile (#9446)
## Problem

Our dockerfiles, for some historical reason, have unconventional names
`Dockerfile.<something>`, and some tools (like GitHub UI) fail to highlight
the syntax in them.

> Some projects may need distinct Dockerfiles for specific purposes. A
common convention is to name these `<something>.Dockerfile`

From: https://docs.docker.com/build/concepts/dockerfile/#filename

## Summary of changes
- Rename `Dockerfile.build-tools` -> `build-tools.Dockerfile`
- Rename `compute/Dockerfile.compute-node` ->
`compute/compute-node.Dockerfile`
2024-10-21 09:51:12 +01:00
Folke Behrens
ed958da38a proxy: Make tests fail fast when test proxy exited early (#9432)
This currently happens when proxy is not compiled with feature
`testing`.
Also fix an adjacent function.
2024-10-21 08:29:23 +00:00
Conrad Ludgate
cc25ef7342 bump pg-session-jwt version (#9455)
forgot to bump this before
2024-10-20 14:42:50 +02:00
Arpad Müller
71d09c78d4 Accept basebackup <tenant> <timeline> --gzip requests (#9456)
In #9453, we want to remove the non-gzipped basebackup code in the
computes, and always request gzipped basebackups.

However, right now the pageserver's page service only accepts basebackup
requests in the following formats:

* `basebackup <tenant_id> <timeline_id>`, lsn is determined by the
pageserver as the most recent one (`timeline.get_last_record_rlsn()`)
* `basebackup <tenant_id> <timeline_id> <lsn>`
* `basebackup <tenant_id> <timeline_id> <lsn> --gzip`

We add a fourth case, `basebackup <tenant_id> <timeline_id> --gzip` to
allow gzipping the request for the latest lsn as well.
2024-10-19 00:23:49 +02:00
Tristan Partin
62a334871f Take the collector name as argument when generating sql_exporter configs
In neon_collector_autoscaling.jsonnet, the collector name is hardcoded
to neon_collector_autoscaling. This issue manifests itself such that
sql_exporter would not find the collector configuration.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-18 09:36:29 -05:00
Vlad Lazar
e162ab8b53 storcon: handle ongoing deletions gracefully (#9449)
## Problem

Pageserver returns 409 (Conflict) if any of the shards are already
deleting the timeline. This resulted in an error being propagated out of
the HTTP handler and to the client. It's an expected scenario so we
should handle it nicely.

This caused failures in `test_storage_controller_smoke`
[here](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9435/11390431900/index.html#suites/8fc5d1648d2225380766afde7c428d81/86eee4b002d6572d).

## Summary of Changes

Instead of returning an error on 409s, we now bubble the status code up
and let the HTTP handler code retry until it gets a 404 or times out.
2024-10-18 15:33:04 +01:00
Conrad Ludgate
5cbdec9c79 [local_proxy]: install pg_session_jwt extension on demand (#9370)
Follow up on #9344. We want to install the extension automatically. We
didn't want to couple the extension into compute_ctl so instead
local_proxy is the one to issue requests specific to the extension.

depends on #9344 and #9395
2024-10-18 14:41:21 +01:00
Vlad Lazar
ec6d3422a5 pageserver: disconnect when asking client to reconnect (#9390)
## Problem

Consider the following sequence of events:
1. Shard location gets downgraded to secondary while there's a libpq
connection in pagestream mode from the compute
2. There's no active tenant, so we return `QueryError::Reconnect` from
`PageServerHandler::handle_get_page_at_lsn_request`.
3. Error bubbles up to `PostgresBackendIO::process_message`, bailing us
out of pagestream mode.
4. We instruct the client to reconnnect, but continue serving the libpq
connection. The client isn't yet aware of the request to reconnect and
believes it is still in pagestream mode. Pageserver fails to deserialize
get page requests wrapped in `CopyData` since it's not in pagestream
mode.

## Summary of Changes

When we wish to instruct the client to reconnect, also disconnect from
the server side after flushing the error.

Closes https://github.com/neondatabase/cloud/issues/17336
2024-10-18 13:38:59 +01:00
Arseny Sher
fecff15f18 walproposer: immediately exit if sync-safekeepers collected 0/0. (#9442)
Otherwise term history starting with 0/0 is streamed to safekeepers.

ref https://github.com/neondatabase/neon/issues/9434
2024-10-18 15:31:50 +03:00
Jere Vaara
3532ae76ef compute_ctl: Add endpoint that allows extensions to be installed (#9344)
Adds endpoint to install extensions:

**POST** `/extensions`
```
{"extension":"pg_sessions_jwt","database":"neondb","version":"1.0.0"}
```

Will be used by `local-proxy`.
Example, for the JWT authentication to work the database needs to have
the pg_session_jwt extension and also to enable JWT to work in RLS
policies.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2024-10-18 15:07:36 +03:00
Folke Behrens
15fecffe6b Update ruff to much newer version (#9433)
Includes a multidict patch release to fix build with newer cpython.
2024-10-18 12:42:41 +02:00
Arseny Sher
98fee7a97d Increase shared_buffers in test_subscriber_synchronous_commit. (#9427)
Might make the test less flaky.
2024-10-18 13:31:14 +03:00
John Spray
b7173b1ef0 storcon: fix case where we might fail to send compute notifications after two opposite migrations (#9435)
## Problem

If we migrate A->B, then B->A, and the notification of A->B fails, then
we might have retained state that makes us think "A" is the last state
we sent to the compute hook, whereas when we migrate B->A we should
really be sending a fresh notification in case our earlier failed
notification has actually mutated the remote compute config.

Closes: #9417 

## Summary of changes

- Add a reproducer for the bug
(`test_storage_controller_compute_hook_revert`)
- Refactor compute hook code to represent remote state with
`ComputeRemoteState` which stores a boolean for whether the compute has
fully applied the change as well as the request that the compute
accepted.
- The actual bug fix: after sending a compute notification, if we got a
423 response then update our ComputeRemoteState to reflect that we have
mutated the remote state. This way, when we later try and notify for our
historic location, we will properly see that as a change and send the
notification.

Co-authored-by: Vlad Lazar <vlad@neon.tech>
2024-10-18 11:29:23 +01:00
Jere Vaara
24654b8eee compute_ctl: Add endpoint that allows setting role grants (#9395)
This PR introduces a `/grants` endpoint which allows setting specific
`privileges` to certain `role` for a certain `schema`.

Related to #9344 

Together these endpoints will be used to configure JWT extension and set
correct usage to its schema to specific roles that will need them.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2024-10-18 11:25:45 +01:00
Conrad Ludgate
b8304f90d6 2024 oct new clippy lints (#9448)
Fixes new lints from `cargo +nightly clippy` (`clippy 0.1.83 (798fb83f
2024-10-16)`)
2024-10-18 10:27:50 +01:00
Conrad Ludgate
d762ad0883 update rustls (#9396)
The forever ongoing effort of juggling multiple versions of rustls :3

now with new crypto library aws-lc.

Because of dependencies, it is currently impossible to not have both
ring and aws-lc in the dep tree, therefore our only options are not
updating rustls or having both crypto backends enabled...

According to benchmarks run by the rustls maintainer, aws-lc is faster
than ring in some cases too <https://jbp.io/graviola/>, so it's not
without its upsides,
2024-10-17 20:45:37 +01:00
Arpad Müller
928d98b6dc Update Rust to 1.82.0 and mold to 2.34.0 (#9445)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1820-2024-10-17).

Also update mold. [release notes for
2.34.0](https://github.com/rui314/mold/releases/tag/v2.34.0), [release
notes for 2.34.1](https://github.com/rui314/mold/releases/tag/v2.34.1).

Prior update was in #8939.
2024-10-17 21:25:51 +02:00
John Spray
24398bf060 pageserver: detect & warn on loading an old index which is probably the result of a bad generation (#9383)
## Problem

The pageserver generally trusts the storage controller/control plane to
give it valid generations. However, sometimes it should be obvious that
a generation is bad, and for defense in depth we should detect that on
the pageserver.

This PR is part 1 of 2:
1. in this PR we detect and warn on such situations, but do not block
starting up the tenant. Once we have confidence that the check is not
firing unexpectedly in the field
2. part 2 of 2 will introduce a condition that refuses to start a tenant
in this situtation, and a test for that (maybe, if we can figure out how
to spoof an ancient mtime)

Related: #6951

## Summary of changes

- When loading an index older than 2 weeks, log an INFO message noting
that we will check for other indices
- When loading an index older than 2 weeks _and_ a newer-generation
index exists, log a warning.
2024-10-17 19:02:24 +01:00
Alex Chi Z.
63b3491c1b refactor(pageserver): remove aux v1 code path (#9424)
Part of the aux v1 retirement
https://github.com/neondatabase/neon/issues/8623

## Summary of changes

Remove write/read path for aux v1, but keeping the config item and the
index part field for now.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-17 17:22:44 +01:00
Anastasia Lubennikova
858867c627 Add logging of installed_extensions (#9438)
Simple PR to log installed_extensions statistics.

in the following format:
```
2024-10-17T13:53:02.860595Z  INFO [NEON_EXT_STAT] {"extensions":[{"extname":"plpgsql","versions":["1.0"],"n_databases":2},{"extname":"neon","versions":["1.5"],"n_databases":1}]}
```
2024-10-17 16:35:19 +01:00
Erik Grinaker
299cde899b safekeeper: flush WAL on compute disconnect (#9436)
## Problem

In #9259, we found that the `check_safekeepers_synced` fast path could
result in a lower basebackup LSN than the `flush_lsn` reported by
Safekeepers in `VoteResponse`, causing the compute to panic once on
startup.

This would happen if the Safekeeper had unflushed WAL records due to a
compute disconnect. The `TIMELINE_STATUS` query would report a
`flush_lsn` below these unflushed records, while `VoteResponse` would
flush the WAL and report the advanced `flush_lsn`. See
https://github.com/neondatabase/neon/issues/9259#issuecomment-2410849032.

## Summary of changes

Flush the WAL if the compute disconnects during WAL processing.
2024-10-17 17:19:18 +02:00
Erik Grinaker
4c9835f4a3 storage_controller: delete stale shards when deleting tenant (#9333)
## Problem

Tenant deletion only removes the current shards from remote storage. Any
stale parent shards (before splits) will be left behind. These shards
are kept since child shards may reference data from the parent until new
image layers are generated.

## Summary of changes

* Document a special case for pageserver tenant deletion that deletes
all shards in remote storage when given an unsharded tenant ID, as well
as any unsharded tenant data.
* Pass an unsharded tenant ID to delete all remote storage under the
tenant ID prefix.
* Split out `RemoteStorage::delete_prefix()` to delete a bucket prefix,
with additional test coverage.
* Add a `delimiter` argument to `asset_prefix_empty()` to support
partial prefix matches (i.e. all shards starting with a given tenant
ID).
2024-10-17 14:34:51 +00:00
Alex Chi Z.
f3a3eefd26 feat(pageserver): do space check before gc-compaction (#9250)
part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

gc-compaction may take a lot of disk space, and if it does, the caller
should do a partial gc-compaction. This patch adds space check for the
compaction job.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-17 10:29:53 -04:00
Ivan Efremov
a7c05686cc test_runner: Update the README.md to build neon with 'testing' (#9437)
Without having the '--features testing' in the cargo build the proxy
won't start causing tests to fail.
2024-10-17 17:20:42 +03:00
Anastasia Lubennikova
8b47938140 Add support of extensions for v17 (part 3) (#9430)
- pgvector 7.4

update support of extensions for v14-v16:
- pgvector 7.2 -> 7.4
2024-10-17 13:37:21 +01:00
Arpad Müller
35e7d91bc9 Add config variable for timeline offloading (#9421)
Adds a configuration variable for timeline offloading support. The added
pageserver-global config option controls whether the pageserver
automatically offloads timelines during compaction.

Therefore, already offloaded timelines are not affected by this, nor is
the manual testing endpoint.

This allows the rollout of timeline offloading to be driven by the
storage team.

Part of #8088
2024-10-17 12:07:58 +00:00
Ivan Efremov
22d8834474 proxy: move the connection pools to separate file (#9398)
First PR for #9284
Start unification of the client and connection pool interfaces:
- Exclude the 'global_connections_count' out from the get_conn_entry()
- Move remote connection pools to the conn_pool_lib as a reference
- Unify clients among all the conn pools
2024-10-17 13:38:24 +03:00
Folke Behrens
9d9aab3680 Merge pull request #9426 from neondatabase/rc/proxy/2024-10-17
Proxy release 2024-10-17
2024-10-17 12:18:51 +02:00
John Spray
db68e82235 storage_scrubber: fixes to garbage commands (#9409)
## Problem

While running `find-garbage` and `purge-garbage`, I encountered two
things that needed updating:
- Console API may omit `user_id` since org accounts were added
- When we cut over to using GenericRemoteStorage, the object listings we
do during purge did not get proper retry handling, so could easily fail
on usual S3 errors, and make the whole process drop out.

...and one bug:
- We had a `.unwrap` which expects that after finding an object in a
tenant path, a listing in that path will always return objects. This is
not true, because a pageserver might be deleting the path at the same
time as we scan it.

## Summary of changes

- When listing objects during purge, use backoff::retry
- Make `user_id` an `Option`
- Handle the case where a tenant's objects go away during find-garbage.
2024-10-17 10:06:02 +01:00
Konstantin Knizhnik
934dbb61f5 Check access_count in lfc_evict (#9407)
## Problem

See
https://neondb.slack.com/archives/C033A2WE6BZ/p1729007738526309?thread_ts=1722942856.987979&cid=C033A2WE6BZ

When replica receives WAL record which target page is not present in
shared buffer, we evict this page from LFC.
If all pages from the LFC chunk are evicted, then chunk is moved to the
beginning of LRU least to force it reuse.
Unfortunately access_count is not checked and if the entry is access at
this moment then this operation can cause LRU list corruption.

## Summary of changes

Check `access_count` in `lfc_evict`

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-17 08:04:57 +03:00
Christian Schwarz
67d5d98b19 readme: fix build instructions for debian 12 (#9371)
We need libprotobuf-dev for some of the
`/usr/include/google/protobuf/...*.proto`
referenced by our protobuf decls.
2024-10-16 21:47:53 +02:00
Tristan Partin
e0fa6bcf1a Fix some sql_exporter metrics for PG 17
Checkpointer related statistics moved from pg_stat_bgwriter to
pg_stat_checkpointer, so we need to adjust our queries accordingly.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-16 14:46:33 -05:00
Tristan Partin
409a286eaa Fix typo in sql_exporter generator
Bad copy-paste seemingly. This manifested itself as a failure to start
for the sql_exporter, and was just dying on loop in staging. A future PR
will have E2E testing of sql_exporter.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-16 13:08:40 -05:00
Arpad Müller
0551cfb6a7 Fix beta clippy warnings (#9419)
```
warning: first doc comment paragraph is too long
  --> compute_tools/src/installed_extensions.rs:35:1
   |
35 | / /// Connect to every database (see list_dbs above) and get the list of installed extensions.
36 | | /// Same extension can be installed in multiple databases with different versions,
37 | | /// we only keep the highest and lowest version across all databases.
   | |_
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph
   = note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default
help: add an empty line
   |
35 ~ /// Connect to every database (see list_dbs above) and get the list of installed extensions.
36 + ///
   |
```
2024-10-16 19:04:56 +01:00
Folke Behrens
ed694732e7 proxy: merge AuthError and AuthErrorImpl (#9418)
Since GetAuthInfoError now boxes the ControlPlaneError message the
variant is not big anymore and AuthError is 32 bytes.
2024-10-16 19:10:49 +02:00
Alex Chi Z.
8a114e3aed refactor(pageserver): upgrade remote_storage to use hyper1 (#9405)
part of https://github.com/neondatabase/neon/issues/9255

## Summary of changes

Upgrade remote_storage crate to use hyper1. Hyper0 is used when
providing the streaming HTTP body to the s3 SDK, and it is refactored to
use hyper1.


Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-16 16:19:45 +01:00
Arpad Müller
55b246085e Activate timelines during unoffload (#9399)
The current code has forgotten to activate timelines during unoffload,
leading to inability to receive the basebackup, due to the timeline
still being in loading state.

```
  stderr:
    command failed: compute startup failed: failed to get basebackup@0/0 from pageserver postgresql://no_user@localhost:15014

    Caused by:
        0: db error: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading
        1: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading
```

Therefore, also activate the timeline during unoffloading.

Part of #8088
2024-10-16 16:47:17 +02:00
Anastasia Lubennikova
9668601f46 Add support of extensions for v17 (part 2) (#9389)
- plv8 3.2.3
    - HypoPG 1.4.1
    - pgtap 1.3.3
    - timescaledb 2.17.0
    - pg_hint_plan 17_1_7_0
    - rdkit Release_2024_09_1
    - pg_uuidv7 1.6.0
    - wal2json 2.6
    - pg_ivm 1.9
    - pg_partman 5.1.0

    update support of extensions for v14-v16:
    - HypoPG 1.4.0 -> 1.4.1
    - pgtap 1.2.0 -> 1.3.3
    - plpgsql_check 2.5.3 -> 2.7.11
    - pg_uuidv7 1.0.1 -> 1.6.0
    - wal2json 2.5 -> 2.6
    - pg_ivm 1.7 -> 1.9
    - pg_partman 5.0.1 -> 5.1.0
2024-10-16 15:29:23 +01:00
Arpad Müller
3140c14d60 Remove allow(clippy::unknown_lints) (#9416)
the lint stabilized in 1.80.
2024-10-16 16:28:55 +02:00
John Spray
d6281cbe65 tests: stabilize test_timelines_parallel_endpoints (#9413)
## Problem

This test would get failures like `command failed: Found no timeline id
for branch name 'branch_8'`

It's because neon_local is being invoked concurrently for branch
creation, which is unsafe (they'll step on each others' JSON writes)

Example failure:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9410/11363051979/index.html#testresult/5ddc56c640f5422b/retries

## Summary of changes

- Don't do branch creation concurrently with endpoint creation via neon_local
2024-10-16 15:27:46 +01:00
Vlad Lazar
d490ad23e0 storcon: use the same trace fields for reconciler and results (#9410)
## Problem

The reconciler use `seq`, but processing of results uses `sequence`.
Order is different too. It makes it annoying to read logs.

## Summary of Changes

Use the same tracing fields in both
2024-10-16 14:04:17 +01:00
Folke Behrens
f14e45f0ce proxy: format imports with nightly rustfmt (#9414)
```shell
cargo +nightly fmt -p proxy -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true
```

These rust-analyzer settings for VSCode should help retain this style:
```json
  "rust-analyzer.imports.group.enable": true,
  "rust-analyzer.imports.prefix": "crate",
  "rust-analyzer.imports.merge.glob": false,
  "rust-analyzer.imports.granularity.group": "module",
  "rust-analyzer.imports.granularity.enforce": true,
```
2024-10-16 15:01:56 +02:00
John Spray
89a65a9e5a pageserver: improve handling of archival_config calls during Timeline shutdown (#9415)
## Problem

In test `test_timeline_offloading`, we see failures like:
```
PageserverApiException: queue is in state Stopped
```

Example failure:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/11356917668/index.html#testresult/ff0e348a78a974ee/retries

## Summary of changes

- Amend code paths that handle errors from RemoteTimelineClient to check
for cancellation and emit the Cancelled error variant in these cases
(will give clients a 503 to retry)
- Remove the implicit `#[from]` for the Other error case, to make it
harder to add code that accidentally squashes errors into this
(500-equivalent) error variant.

This would be neater if we made RemoteTimelineClient return a structured
error instead of anyhow::Error, but that's a bigger refactor.

I'm not sure if the test really intends to hit this path, but the error
handling fix makes sense either way.
2024-10-16 13:39:58 +01:00
Cihan Demirci
bc6b8cee01 don't trigger workflows in two repos (#9340)
https://github.com/neondatabase/cloud/issues/16723
2024-10-16 10:43:48 +01:00
Tristan Partin
061ea0de7a Add jsonnetfmt targets
This should make it a little bit easier for people wanting to check if
their files are formated correctly. Has the added bonus of making the CI
check simpler as well.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 20:01:13 -05:00
Tristan Partin
be5d6a69dc Fix jsonnet_files wildcard
Just a typo in a path.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 16:30:31 -05:00
Matthias van de Meent
18f4e5f10c Add newly added metrics from neondatabase/neon#9116 to exports (#9402)
They weren't added in that PR, but should be available immediately on
rollout as the neon extension already defaults to 1.5.
2024-10-15 23:13:31 +02:00
Alex Chi Z.
f1eb703256 fix(pageserver): use a buffer for basebackup; add aux basebackup metrics log (#9401)
Our replication bench project is stuck because it is too slow to
generate basebackup and it caused compute to disconnect.

https://neondb.slack.com/archives/C03438W3FLZ/p1728330685012419

The compute timeout for waiting for basebackup is 10m (is it true?).
Generating basebackup directly on pageserver takes ~3min. Therefore, I
suspect it's because there are too many wasted round-trip time for
writing the 10000+ snapshot aux files. Also, it is possible that the
basebackup process takes too long time retrieving all aux files that it
did not write anything over the wire protocol, causing a read timeout.

Basebackup size is 800KB gzipped for that project and was 55MB tar
before compression.

## Summary of changes

* Potentially fix the issue by placing a write buffer for basebackup.
* Log how many aux files did we read + the time spent on it.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-15 16:35:21 -04:00
Tristan Partin
cf7a596a15 Generate sql_exporter config files with Jsonnet
There are quite a few benefits to this approach:

- Reduce config duplication
  - The two sql_exporter configs were super similar with just a few
    differences
- Pull SQL queries into standalone files
  - That means we could run a SQL formatter on the file in the future
  - It also means access to syntax highlighting
- In the future, run different queries for different PG versions
  - This is relevant because right now, we have queries that are failing
    on PG 17 due to catalog updates

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 11:18:38 -05:00
Konstantin Knizhnik
614c3aef72 Remove redundant code (#9373)
## Problem

There is double update of resize cache in `put_rel_truncation`
Also `page_server_request` contains check that fork is MAIN_FORKNUM
which
1. is incorrect (because Vm/FSM pages are shreded in the same way as
MAIN fork pages and
2. is redundant because `page_server_request` is never called for `get
page` request so first part to OR condition is always true.

## Summary of changes

Remove redundant code

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-15 17:18:52 +03:00
Folke Behrens
fb74c21e8c proxy: Migrate jwt module away from anyhow (#9361) 2024-10-15 15:24:56 +02:00
Conrad Ludgate
d92d36a315 [local_proxy] update api for pg_session_jwt (#9359)
pg_session_jwt now:
1. Sets the JWK in a PGU_BACKEND session guc, no longer in the init()
function.
2. JWK no longer needs the kid.
2024-10-15 12:13:57 +00:00
Arpad Müller
ec4cc30de9 Shut down timelines during offload and add offload tests (#9289)
Add a test for timeline offloading, and subsequent unoffloading.

Also adds a manual endpoint, and issues a proper timeline shutdown
during offloading which prevents a pageserver hang at shutdown.

Part of #8088.
2024-10-15 09:46:51 +00:00
John Spray
73c6626b38 pageserver: stabilize & refine controller scale test (#8971)
## Problem

We were seeing timeouts on migrations in this test.

The test unfortunately tends to saturate local storage, which is shared
between the pageservers and the control plane database, which makes the
test kind of unrealistic. We will also want to increase the scale of
this test, so it's worth fixing that.

## Summary of changes

- Instead of randomly creating timelines at the same time as the other
background operations, explicitly identify a subset of tenant which will
have timelines, and create them at the start. This avoids pageservers
putting a lot of load on the test node during the main body of the test.
- Adjust the tenants created to create some number of 8 shard tenants
and the rest 1 shard tenants, instead of just creating a lot of 2 shard
tenants.
- Use archival_config to exercise tenant-mutating operations, instead of
using timeline creation for this.
- Adjust reconcile_until_idle calls to avoid waiting 5 seconds between
calls, which causes timelines with large shard count tenants.
- Fix a pageserver bug where calls to archival_config during activation
get 404
2024-10-15 09:31:18 +01:00
Alexander Bayandin
0fc4ada3ca Switch CI, Storage and Proxy to Debian 12 (Bookworm) (#9170)
## Problem

This PR switches CI and Storage to Debain 12 (Bookworm) based images.

## Summary of changes
- Add Debian codename (`bookworm`/`bullseye`) to most of docker tags,
create un-codenamed images to be used by default
- `vm-compute-node-image`: create a separate spec for `bookworm` (we
don't need to build cgroups in the future)
- `neon-image`: Switch to `bookworm`-based `build-tools` image
  - Storage components and Proxy use it
- CI: run lints and tests on `bookworm`-based `build-tools` image
2024-10-14 21:12:43 +01:00
Matthias van de Meent
dab96a6eb1 Add more timing histogram and gauge metrics to the Neon extension (#9116)
We now also track:

- Number of PS IOs in-flight
- Number of pages cached by smgr prefetch implementation
- IO timing histograms for LFC reads and writes, per IO issued

## Problem

There's little insight into the timing metrics of LFC, and what the
prefetch state of each backend is.

This changes that, by measuring (and subsequently exposing) these data
points.

## Summary of changes

- Extract IOHistogram as separate type, rather than a collection of
fields on NeonMetrics
- others, see items above.

Part of https://github.com/neondatabase/neon/issues/8926
2024-10-14 20:30:21 +02:00
Arpad Müller
f54e3e9147 Also consider offloaded timelines for obtaining retain_lsn (#9308)
Also consider offloaded timelines for obtaining `retain_lsn`. This is
required for correctness for all timelines that have not been flattened
yet: otherwise we GC data that might still be required for reading.

This somewhat counteracts the original purpose of timeline offloading of
not having to iterate over offloaded timelines, but sadly it's required.
In the future, we can improve the way the offloaded timelines are
stored.

We also make the `retain_lsn` optional so that in the future, when we
implement flattening, we can make it None. This also applies to full
timeline objects by the way, where it would probably make most sense to
add a bool flag whether the timeline is successfully flattened, and if
it is, one can exclude it from `retain_lsn` as well.

Also, track whether a timeline was offloaded or not in `retain_lsn` so
that the `retain_lsn` can be excluded from visibility and size
calculation.

Part of #8088
2024-10-14 17:54:03 +02:00
Vlad Lazar
f4f7ea247c tests: make size comparisons more lenient (#9388)
The empirically determined threshold doesn't hold for PG 17.
Bump the limit to stabilise ci.
2024-10-14 16:50:12 +01:00
Arpad Müller
d92ff578c4 Add test for fixed storage broker issue (#9311)
Adds a test for the (now fixed) storage broker limit issue, see #9268
for the description and #9299 for the fix.

Also fix a race condition with endpoint creation/starts running in parallel,
leading to file not found errors.
2024-10-14 14:34:57 +02:00
Alexander Bayandin
31b7703fa8 CI(build-build-tools): fix unexpected cancellations (#9357)
## Problem
When `Dockerfile.build-tools` gets changed, several PRs catch up with
it and some might get unexpectedly cancelled workflows because of
GitHub's concurrency model for workflows.
See the comment in the code for more details.

It should be possible to revert it after
https://github.com/orgs/community/discussions/41518 (I don't expect it
anytime soon, but I subscribed)

## Summary of changes
- Do not queue `build-build-tools-image` workflows in the concurrency
group
2024-10-14 11:51:01 +01:00
Konstantin Knizhnik
d056ae9be5 Ignore pg_dynshmem fiel when comparing directories (#9374)
## Problem

At MacOS `pg_dynshmem` file is create in PGDATADIR which cause mismatch
in directories comparison

## Summary of changes

Add this files to the ignore list.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-14 13:45:20 +03:00
Conrad Ludgate
cb9ab7463c proxy: split out the console-redirect backend flow (#9270)
removes the ConsoleRedirect backend from the main auth::Backends enum,
copy-paste the existing crate::proxy::task_main structure to use the
ConsoleRedirectBackend exclusively.

This makes the logic a bit simpler at the cost of some fairly trivial
code duplication.
2024-10-14 12:25:55 +02:00
Conrad Ludgate
ab5bbb445b proxy: refactor auth backends (#9271)
preliminary for #9270 

The auth::Backend didn't need to be in the mega ProxyConfig object, so I
split it off and passed it manually in the few places it was necessary.

I've also refined some of the uses of config I saw while doing this
small refactor.

I've also followed the trend and make the console redirect backend it's
own struct, same as LocalBackend and ControlPlaneBackend.
2024-10-11 20:14:52 +01:00
Alexander Bayandin
5ef805e12c CI(run-python-test-set): allow to skip missing compatibility snapshot (#9365)
## Problem
Action `run-python-test-set` fails if it is not used for `regress_tests`
on release PR, because it expects
`test_compatibility.py::test_create_snapshot` to generate a snapshot,
and the test exists only in `regress_tests` suite.
For example, in https://github.com/neondatabase/neon/pull/9291
[`test-postgres-client-libs`](https://github.com/neondatabase/neon/actions/runs/11209615321/job/31155111544)
job failed.

## Summary of changes
- Add `skip-if-does-not-exist` input to `.github/actions/upload` action
(the same way we do for `.github/actions/download`)
- Set `skip-if-does-not-exist=true` for "Upload compatibility snapshot"
step in `run-python-test-set` action
2024-10-11 16:58:41 +01:00
a-masterov
091a175a3e Test versions mismatch (#9167)
## Problem
We faced the problem of incompatibility of the different components of
different versions.
This should be detected automatically to prevent production bugs.
## Summary of changes
The test for this situation was implemented

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-10-11 15:29:54 +02:00
Fedor Dikarev
326cd80f0d ci: gh-workflow-stats-action v0.1.4: remove debug output and proper pagination (#9356)
## Problem
In previous version pagination didn't work so we collect information
only for first 30 jobs in WorkflowRun
2024-10-11 14:46:45 +02:00
Folke Behrens
6baf1aae33 proxy: Demote some errors to warnings in logs (#9354) 2024-10-11 11:29:08 +02:00
John Spray
184935619e tests: stabilize test_storage_controller_heartbeats (#9347)
## Problem

This could fail with `reconciliation in progress` if running on a slow
test node such that background reconciliation happens at the same time
as we call consistency_check.

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/11258171952/index.html#/testresult/54889c9469afb232

## Summary of changes

- Call reconcile_until_idle before calling consistency check once,
rather than calling consistency check until it passes
2024-10-11 09:41:08 +01:00
Ivan Efremov
b2ecbf3e80 Introduce "quota" ErrorKind (#9300)
## Problem
Fixes #8340
## Summary of changes
Introduced ErrorKind::quota to handle quota-related errors
## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-10-11 10:45:55 +03:00
Tristan Partin
53147b51f9 Use valid type hints for Python 3.9
I have no idea how this made it past the linters.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 13:00:25 -05:00
Tristan Partin
006d9dfb6b Add compute_config_dir fixture
Allows easy access to various compute config files.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 12:43:40 -05:00
Tristan Partin
1f7904c917 Enable cargo caching in check-codestyle-rust
This job takes an extraordinary amount of time for what I understand it
to do. The obvious win is caching dependencies.

Rory disabled caching in cd5732d9d8.
I assume this was to get gen3 runners up and running.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 12:40:30 -05:00
John Spray
07c714343f tests: allow a log warning in test_cli_start_stop_multi (#9320)
## Problem

This test restarts services in an undefined order (whatever neon_local
does), which means we should be tolerant of warnings that come from
restarting the storage controller while a pageserver is running.

We can see failures with warnings from dropped requests, e.g.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9307/11229000712/index.html#/testresult/d33d5cb206331e28
```
 WARN request{method=GET path=/v1/location_config request_id=b7dbda15-6efb-4610-8b19-a3772b65455f}: request was dropped before completing\n')
```

## Summary of changes

- allow-list the `request was dropped before completing` message on
pageservers before restarting services
2024-10-10 17:06:42 +01:00
Tristan Partin
264c34dfb7 Move path-related fixtures into their own module (#9304)
neon_fixtures.py has grown into quite a beast.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 10:26:23 -05:00
Erik Grinaker
9dd80b9b4c storage_scrubber: fix faulty assertion when no timelines (#9345)
When there are no timelines in remote storage, the storage scrubber
would incorrectly trip an assertion with "Must be set if results are
present", referring to the last processed tenant ID. When there are no
timelines we don't expect there to be a tenant ID either.

The assertion was introduced in 37aa6fd.

Only apply the assertion when any timelines are present.
2024-10-10 09:09:53 -04:00
Erik Grinaker
c2623ffef4 CODEOWNERS: assign storage_scrubber to storage (#9346) 2024-10-10 12:40:35 +01:00
John Spray
426b1c5f08 storage controller: use 'infra' JWT scope for node registration (#9343)
## Problem

Storage controller `/control` API mostly requires admin tokens, for
interactive use by engineers. But for endpoints used by scripts, we
should not require admin tokens.

Discussion at
https://neondb.slack.com/archives/C033RQ5SPDH/p1728550081788989?thread_ts=1728548232.265019&cid=C033RQ5SPDH

## Summary of changes

- Introduce the 'infra' JWT scope, which was not previously used in the
neon repo
- For pageserver & safekeeper node registrations, require infra scope
instead of admin

Note that admin will still work, as the controller auth checks permit
admin tokens for all endpoints irrespective of what scope they require.
2024-10-10 12:26:43 +01:00
Folke Behrens
a202b1b5cc Merge pull request #9341 from neondatabase/rc/proxy/2024-10-10
Proxy release 2024-10-10
2024-10-10 09:17:11 +02:00
Conrad Ludgate
306094a87d add local-proxy suffix to wake-compute requests, respect the returned port (#9298)
https://github.com/neondatabase/cloud/issues/18349

Use the `-local-proxy` suffix to make sure we get the 10432 local_proxy
port back from cplane.
2024-10-09 22:43:35 +01:00
Tristan Partin
d3464584a6 Improve some typing in test_runner
Fixes some types, adds some types, and adds some override annotations.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-09 15:42:22 -05:00
Tristan Partin
878135fe9c Move PgBenchInitResult.EXTRACTORS to a private module constant
This seems to paper over a behavioral difference in Python 3.9 and
Python 3.12 with how dataclasses work with mutable variables. On Python
3.12, I get the following error:

ValueError: mutable default <class 'dict'> for field EXTRACTORS is not allowed: use default_factory

This obviously doesn't occur in our testing environment. When I do what
the error tells me, EXTRACTORS doesn't seem to exist as an attribute on
the class in at least Python 3.9.

The solution provided in this commit seems like the least amount of
friction to keep the wheels turning.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-09 14:02:09 -05:00
Conrad Ludgate
75434060a5 local_proxy: integrate with pg_session_jwt extension (#9086) 2024-10-09 18:24:10 +01:00
Anastasia Lubennikova
721803a0e7 Add partial support of extensions for v17: (#9322)
- PostGIS 3.5.0
- pgrouting 3.6.2
- h3 4.1.3
- unit 7.9
- pgjwt version (f3d82fd)
- pg_hashids 1.2.1
- ip4r 2.4.2
- prefix 1.2.10
- postgresql-hll 2.18
- pg_roaringbitmap 0.5.4
- pg-semver 0.40.0

update support of extensions for v14-v16:
- unit 7.7 -> 7.9
- pgjwt 9742dab -> f3d82fd

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-10-09 17:07:59 +01:00
Fedor Dikarev
108a211917 added workflow Report Workflow Stats (#9330)
## Summary of changes
CI: Collect stats for Github Workflows Runs
2024-10-09 17:27:41 +02:00
Heikki Linnakangas
72ef0e0fa1 tests: Remove redundant log lines when stopping storage nodes (#9317)
The neon_cli functions print the command that gets executed, which
contains the same information.

Before:

    2024-10-07 22:32:28.884 INFO [neon_fixtures.py:3927] Stopping safekeeper 1
    2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1"
    2024-10-07 22:32:28.989 INFO [neon_fixtures.py:3927] Stopping safekeeper 2
    2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2"
    2024-10-07 22:32:29.93 INFO [neon_fixtures.py:3927] Stopping safekeeper 3
    2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3"
    2024-10-07 22:32:29.251 INFO [neon_cli.py:450] Stopping pageserver with ['pageserver', 'stop', '--id=1']
    2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1"

After:

    2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1"
    2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2"
    2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3"
    2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1"
2024-10-09 15:51:34 +03:00
Heikki Linnakangas
eb23d355a9 tests: Use ThreadedMotoServer python class to launch mock S3 server (#9313)
This is simpler than using subprocess.

One difference is in how moto's log output is now collected. Previously,
moto's logs went to stderr, and were collected and printed at the end of
the test by pytest, like this:

    2024-10-07T22:45:12.3705222Z ----------------------------- Captured stderr call -----------------------------
    2024-10-07T22:45:12.3705577Z 127.0.0.1 - - [07/Oct/2024 22:35:14] "PUT /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7 HTTP/1.1" 200 -
    2024-10-07T22:45:12.3706181Z 127.0.0.1 - - [07/Oct/2024 22:35:15] "GET /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7/?list-type=2&delimiter=/&prefix=/tenants/43da25eac0f41412696dd31b94dbb83c/timelines/ HTTP/1.1" 200 -
    2024-10-07T22:45:12.3706894Z 127.0.0.1 - - [07/Oct/2024 22:35:16] "PUT /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7//tenants/43da25eac0f41412696dd31b94dbb83c/timelines/eabba5f0c1c72c8656d3ef1d85b98c1d/initdb.tar.zst?x-id=PutObject HTTP/1.1" 200 -

Note the timestamps: the timestamp at the beginning of the line is the
time that the stderr was dumped, i.e. the end of the test, which makes
those timestamps rather useless. The timestamp in the middle of the line
is when the operation actually happened, but it has only 1 s
granularity.

With this change, moto's log lines are printed in the "live log call"
section, as they happen, which makes the timestamps more useful:

    2024-10-08 12:12:31.129 INFO [_internal.py:97] 127.0.0.1 - - [08/Oct/2024 12:12:31] "GET /pageserver-test-deletion-queue-e24e7525d437e1874d8a52030dcabb4f/?list-type=2&delimiter=/&prefix=/tenants/7b6a16b1460eda5204083fba78bc360f/timelines/ HTTP/1.1" 200 -
    2024-10-08 12:12:32.612 INFO [_internal.py:97] 127.0.0.1 - - [08/Oct/2024 12:12:32] "PUT /pageserver-test-deletion-queue-e24e7525d437e1874d8a52030dcabb4f//tenants/7b6a16b1460eda5204083fba78bc360f/timelines/7ab4c2b67fa8c712cada207675139877/initdb.tar.zst?x-id=PutObject HTTP/1.1" 200 -
2024-10-09 15:34:51 +03:00
Yuchen Liang
bee04b8a69 pageserver: add direct io config to virtual file (#9214)
## Problem
We need a way to incrementally switch to direct IO. During the rollout
we might want to switch to O_DIRECT on image and delta layer read path
first before others.

## Summary of changes
- Revisited and simplified direct io config in `PageserverConf`. 
- We could add a fallback mode for open, but for read there isn't a
reasonable alternative (without creating another buffered virtual file).
- Added a wrapper around `VirtualFile`, current implementation become
`VirtualFileInner`
- Use `open_v2`, `create_v2`, `open_with_options_v2` when we want to use
the IO mode specified in PS config.
- Once we onboard all IO through VirtualFile using this new API, we will
delete the old code path.
- Make io mode live configurable for benchmarking.
- Only guaranteed for files opened after the config change, so do it
before the experiment.

As an example, we are using `open_v2` with
`virtual_file::IoMode::Direct` in
https://github.com/neondatabase/neon/pull/9169

We also remove `io_buffer_alignment` config in
a04cfd754b and use it as a compile time
constant. This way we don't have to carry the alignment around or make
frequent call to retrieve this information from the static variable.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-10-09 08:33:07 -04:00
Anastasia Lubennikova
63e7fab990 Add /installed_extensions endpoint to collect statistics about extension usage. (#8917)
Add /installed_extensions endpoint to collect
statistics about extension usage.
It returns a list of installed extensions in the format:

```json
{
  "extensions": [
    {
      "extname": "extension_name",
      "versions": ["1.0", "1.1"],
      "n_databases": 5,
    }
  ]
}
```

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-10-09 13:32:13 +01:00
Arseny Sher
a181392738 safekeeper: add evicted_timelines gauge. (#9318)
showing total number of evicted timelines.
2024-10-09 14:40:30 +03:00
Alexander Bayandin
fc7397122c test_runner: fix path to tpc-h queries (#9327)
## Problem

The path to TPC-H queries was incorrectly changed in #9306.
This path is used for `test_tpch` parameterization, so all perf tests
started to fail:

```
==================================== ERRORS ====================================
__________ ERROR collecting test_runner/performance/test_perf_olap.py __________
test_runner/performance/test_perf_olap.py:205: in <module>
    @pytest.mark.parametrize("query", tpch_queuies())
test_runner/performance/test_perf_olap.py:196: in tpch_queuies
    assert queries_dir.exists(), f"TPC-H queries dir not found: {queries_dir}"
E   AssertionError: TPC-H queries dir not found: /__w/neon/neon/test_runner/performance/performance/tpc-h/queries
E   assert False
E    +  where False = <bound method Path.exists of PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries')>()
E    +    where <bound method Path.exists of PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries')> = PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries').exists
```

## Summary of changes
- Fix the path to tpc-h queries
2024-10-09 12:11:06 +01:00
Vlad Lazar
cc599e23c1 storcon: make observed state updates more granular (#9276)
## Problem

Previously, observed state updates from the reconciler may have
clobbered inline changes made to the observed state by other code paths.

## Summary of changes

Model observed state changes from reconcilers as deltas. This means that
we only update what has changed. Handling for node going off-line concurrently
during the reconcile is also added: set observed state to None in such cases to
respect the convention.

Closes https://github.com/neondatabase/neon/issues/9124
2024-10-09 11:53:29 +01:00
Folke Behrens
54d1185789 proxy: Unalias hyper1 and replace one use of hyper0 in test (#9324)
Leaves one final use of hyper0 in proxy for the health service,
which requires some coordinated effort with other services.
2024-10-09 12:44:17 +02:00
Heikki Linnakangas
8a138db8b7 tests: Reduce noise from logging renamed files (#9315)
Instead of printing the full absolute path for every file, print just
the filenames.

Before:

    2024-10-08 13:19:39.98 INFO [test_pageserver_generations.py:669] Found file /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001
    2024-10-08 13:19:39.99 INFO [test_pageserver_generations.py:673] Renamed /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 -> /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0

After:

    2024-10-08 13:24:39.726 INFO [test_pageserver_generations.py:667] Renaming files in /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/3439538816c520adecc541cc8b1de21c/timelines/6a7be8ee707b355de48dd91b326d6ae1
    2024-10-08 13:24:39.728 INFO [test_pageserver_generations.py:673] Renamed
000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 -> 000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0
2024-10-09 10:55:56 +01:00
Erik Grinaker
211970f0e0 remote_storage: add DownloadOpts::byte_(start|end) (#9293)
`download_byte_range()` is basically a copy of `download()` with an
additional option passed to the backend SDKs. This can cause these code
paths to diverge, and prevents combining various options.

This patch adds `DownloadOpts::byte_(start|end)` and move byte range
handling into `download()`.
2024-10-09 10:29:06 +01:00
Heikki Linnakangas
f87f5a383e tests: Remove redundant log lines when starting an endpoint (#9316)
The "Starting postgres endpoint <name>" message is not needed, because
the neon_cli.py prints the neon_local command line used to start the
endpoint. That contains the same information. The "Postgres startup took
XX seconds" message is not very useful because no one pays attention to
those in the python test logs when things are going smoothly, and if you
do wonder about the startup speed, the same information and more can be
found in the compute log.

Before:

    2024-10-07 22:32:27.794 INFO [neon_fixtures.py:3492] Starting postgres endpoint ep-1
    2024-10-07 22:32:27.794 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local endpoint start --safekeepers 1 ep-1"
    2024-10-07 22:32:27.901 INFO [neon_fixtures.py:3690] Postgres startup took 0.11398935317993164 seconds

After:

    2024-10-07 22:32:27.794 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local endpoint start --safekeepers 1 ep-1"
2024-10-09 09:58:50 +01:00
Arpad Müller
e8ae37652b Add timeline offload mechanism (#8907)
Implements an initial mechanism for offloading of archived timelines.

Offloading is implemented as specified in the RFC.

For now, there is no persistence, so a restart of the pageserver will
retrigger downloads until the timeline is offloaded again.

We trigger offloading in the compaction loop because we need the signal
for whether compaction is done and everything has been uploaded or not.

Part of #8088
2024-10-09 01:33:39 +02:00
Tristan Partin
5bd8e2363a Enable all pyupgrade checks in ruff
This will help to keep us from using deprecated Python features going
forward.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-08 14:32:26 -05:00
Vlad Lazar
618680c299 storcon: apply all node status changes before handling transitions (#9281)
## Problem

When a node goes offline, we trigger reconciles to migrate shards away
from it. If multiple nodes go offline at the same time, we handled them in
sequence. Hence, we might migrate shards from the first offline node to the second
offline node and increase the unavailability period.

## Summary of changes

Refactor heartbeat delta handling to:
1. Update in memory state for all nodes first
2. Handle availability transitions one by one (we have full picture for each node after (1))

Closes https://github.com/neondatabase/neon/issues/9126
2024-10-08 17:55:25 +01:00
Alexander Bayandin
baf27ba6a3 Fix compiler warnings on macOS (#9319)
## Problem

On macOS:
```
/Users/runner/work/neon/neon//pgxn/neon/file_cache.c:623:19: error: variable 'has_remaining_pages' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized]
```

## Summary of changes
- Initialise `has_remaining_pages` with `false`
2024-10-08 17:34:35 +01:00
Tristan Partin
16417d919d Remove get_self_dir()
It didn't serve much value, and was only used twice.
Path(__file__).parent is a pretty easy invocation to use.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-08 08:57:11 -05:00
Heikki Linnakangas
18b97150b2 Remove non-existent entries from .dockerignore (#9209) 2024-10-08 14:55:24 +03:00
Heikki Linnakangas
17c59ed786 Don't override CFLAGS when building neon extension
If you override CFLAGS, you also override any flags that PostgreSQL
configure script had picked. That includes many options that enable
extra compiler warnings, like '-Wall', '-Wmissing-prototypes', and so
forth. The override was added in commit 171385ac14, but the intention
of that was to be *more* strict, by enabling '-Werror', not less
strict. The proper way of setting '-Werror', as documented in the docs
and mentioned in PR #2405, is to set COPT='-Werror', but leave CFLAGS
alone.

All the compiler warnings with the standard PostgreSQL flags have now
been fixed, so we can do this without adding noise.

Part of the cleanup issue #9217.
2024-10-07 23:49:33 +03:00
Heikki Linnakangas
d7b960c9b5 Silence compiler warning about using variable uninitialized
It's not a bug, the variable is initialized when it's used, but the
compiler isn't smart enough to see that through all the conditions.

Part of the cleanup issue #9217.
2024-10-07 23:49:31 +03:00
Heikki Linnakangas
2ff6d2b6b5 Silence compiler warning about variable only used in assertions
Part of the cleanup issue #9217.
2024-10-07 23:49:29 +03:00
Heikki Linnakangas
30f7fbc88d Add pg_attribute_printf to WalProposerLibLog, per gcc's suggestion
/pgxn/neon/walproposer_compat.c:192:9: warning: function ‘WalProposerLibLog’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
      192 |         vsnprintf(buf, sizeof(buf), fmt, args);
          |         ^~~~~~~~~
2024-10-07 23:49:27 +03:00
Heikki Linnakangas
09f2000f91 Silence warnings about shadowed local variables
Part of the cleanup issue #9217.
2024-10-07 23:49:24 +03:00
Heikki Linnakangas
e553ca9e4f Silence warnings about mixed declarations and code
The warning:

    warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

It's PostgreSQL project style to stick to the old C90 style.
(Alternatively, we could disable it for our extension.)

Part of the cleanup issue #9217.
2024-10-07 23:49:22 +03:00
Heikki Linnakangas
0a80dbce83 neon_write() function is not used on v17
ifdef it out on v17, to silence compiler warning.

Part of the cleanup issue #9217.
2024-10-07 23:49:20 +03:00
Heikki Linnakangas
e763256448 Fix warnings about missing function prototypes
Prototypes for neon_writev(), neon_readv(), and neon_regisersync()
were missing. But instead of adding the missing prototypes, mark all
the smgr functions 'static'.

Part of the cleanup issue #9217.
2024-10-07 23:49:18 +03:00
Heikki Linnakangas
129d4480bb Move "/* fallthrough */" comments so that GCC recognizes them
This silences warnings about implicit fallthroughs.

Part of the cleanup issue #9217.
2024-10-07 23:49:16 +03:00
Heikki Linnakangas
776df963ba Fix function prototypes
Silences these compiler warnings:

    /pgxn/neon_walredo/walredoproc.c:452:1: warning: ‘CreateFakeSharedMemoryAndSemaphores’ was used with no prototype before its definition [-Wmissing-prototypes]
      452 | CreateFakeSharedMemoryAndSemaphores()
          | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /pgxn/neon/walproposer_pg.c:541:1: warning: no previous prototype for ‘GetWalpropShmemState’ [-Wmissing-prototypes]
      541 | GetWalpropShmemState()
          | ^~~~~~~~~~~~~~~~~~~~

Part of the cleanup issue #9217.
2024-10-07 23:49:13 +03:00
Heikki Linnakangas
11dc5feb36 Remove unused static function
In v16 merge, we copied much of heap RMGR, to distinguish vanilla
Postgres heap records from records generated with neon patches, with
the additional CID fields. This function is only used by the
HEAP_TRUNCATE records, however, which we didn't need to copy.

Part of the cleanup issue #9217.
2024-10-07 23:49:11 +03:00
Heikki Linnakangas
dbbe57a837 Remove unused local vars and a prototype for non-existent function
Per compiler warnings. Part of the cleanup issue #9217.
2024-10-07 23:49:09 +03:00
Em Sharnoff
cc29def544 vm-monitor: Ignore LFC in postgres cgroup memory threshold (#8668)
In short: Currently we reserve 75% of memory to the LFC, meaning that if
we scale up to keep postgres using less than 25% of the compute's
memory.

This means that for certain memory-heavy workloads, we end up scaling
much higher than is actually needed — in the worst case, up to 4x,
although in practice it tends not to be quite so bad.

Part of neondatabase/autoscaling#1030.
2024-10-07 21:25:34 +01:00
Arpad Müller
912d47ec02 storage_broker: update hyper and tonic again (#9299)
Update hyper and tonic again in the storage broker, this time with a fix
for the issue that made us revert the update last time.

The first commit is a revert of #9268, the second a fix for the issue.

fixes #9231.
2024-10-07 21:12:13 +02:00
Tristan Partin
6eba29c732 Improve logging on changes in a compute's status
I'm trying to debug a situation with the LR benchmark publisher not
being in the correct state. This should aid in debugging, while just
being generally useful.

PR: https://github.com/neondatabase/neon/pull/9265
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-07 13:19:48 -04:00
Heikki Linnakangas
99d4c1877b Replace BUFFERTAGS_EQUAL compatibility macro with new-style function (#9294)
In PostgreSQL v16, BUFFERTAGS_EQUAL was replaced with a static inline
macro, BufferTagsEqual. Let's use the new name going forward, and have
backwards-compatibility glue to allow using the new name on v14 and v15,
rather than the other way round. This also makes BufferTagsEquals
consistent with InitBufferTag, for which we were already using the new
name.
2024-10-07 19:49:27 +03:00
Jere Vaara
2272dc8a48 feat(compute_tools): Create JWKS Postgres roles without attributes (#9031)
Requires https://github.com/neondatabase/neon/pull/9086 first to have
`local_proxy_config`. This logic can still be reviewed implementation
wise.

Create JWT Auth functionality related roles without attributes and
`neon_superuser` group.

Read the JWT related roles from `local_proxy_config` `JWKS` settings and
handle them differently than other console created roles.
2024-10-07 19:37:32 +03:00
Heikki Linnakangas
323bd018cd Make sure BufferTag padding bytes are cleared in hash keys (#9292)
The prefetch-queue hash table uses a BufferTag struct as the hash key,
and it's hashed using hash_bytes(). It's important that all the padding
bytes in the key are cleared, because hash_bytes() will include them.

I was getting compiler warnings like this on v14 and v15, when compiling
with -Warray-bounds:

    In function ‘prfh_lookup_hash_internal’,
inlined from ‘prfh_lookup’ at
pg_install/v14/include/postgresql/server/lib/simplehash.h:821:9,
inlined from ‘neon_read_at_lsnv’ at pgxn/neon/pagestore_smgr.c:2789:11,
inlined from ‘neon_read_at_lsn’ at pgxn/neon/pagestore_smgr.c:2904:2:
pg_install/v14/include/postgresql/server/storage/relfilenode.h:90:43:
warning: array subscript ‘PrefetchRequest[0]’ is partly outside array
bounds of ‘BufferTag[1]’ {aka ‘struct buftag[1]’} [-Warray-bounds]
       89 |         ((node1).relNode == (node2).relNode && \
          |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       90 |          (node1).dbNode == (node2).dbNode && \
          |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
       91 |          (node1).spcNode == (node2).spcNode)
          |          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pg_install/v14/include/postgresql/server/storage/buf_internals.h:116:9:
note: in expansion of macro ‘RelFileNodeEquals’
      116 |         RelFileNodeEquals((a).rnode, (b).rnode) && \
          |         ^~~~~~~~~~~~~~~~~
pgxn/neon/neon_pgversioncompat.h:25:31: note: in expansion of macro
‘BUFFERTAGS_EQUAL’
       25 | #define BufferTagsEqual(a, b) BUFFERTAGS_EQUAL(*(a), *(b))
          |                               ^~~~~~~~~~~~~~~~
pgxn/neon/pagestore_smgr.c:220:34: note: in expansion of macro
‘BufferTagsEqual’
220 | #define SH_EQUAL(tb, a, b) (BufferTagsEqual(&(a)->buftag,
&(b)->buftag))
          |                                  ^~~~~~~~~~~~~~~
pg_install/v14/include/postgresql/server/lib/simplehash.h:280:77: note:
in expansion of macro ‘SH_EQUAL’
280 | #define SH_COMPARE_KEYS(tb, ahash, akey, b) (ahash ==
SH_GET_HASH(tb, b) && SH_EQUAL(tb, b->SH_KEY, akey))
| ^~~~~~~~
pg_install/v14/include/postgresql/server/lib/simplehash.h:799:21: note:
in expansion of macro ‘SH_COMPARE_KEYS’
      799 |                 if (SH_COMPARE_KEYS(tb, hash, key, entry))
          |                     ^~~~~~~~~~~~~~~
    pgxn/neon/pagestore_smgr.c: In function ‘neon_read_at_lsn’:
    pgxn/neon/pagestore_smgr.c:2742:25: note: object ‘buftag’ of size 20
     2742 |         BufferTag       buftag = {0};
          |                         ^~~~~~

This commit silences those warnings, although it's not clear to me why
the compiler complained like that in the first place. I found the issue
with padding bytes while looking into those warnings, but that was
coincidental, I don't think the padding bytes explain the warnings as
such.

In v16, the BUFFERTAGS_EQUAL macro was replaced with a static inline
function, and that also silences the compiler warning. Not clear to me
why.
2024-10-07 18:04:04 +03:00
Folke Behrens
ad267d849f proxy: Move module base files into module directory (#9297) 2024-10-07 16:25:34 +02:00
Conrad Ludgate
8cd7b5bf54 proxy: rename console -> control_plane, rename web -> console_redirect (#9266)
rename console -> control_plane
rename web -> console_redirect

I think these names are a little more representative.
2024-10-07 14:09:54 +01:00
Konstantin Knizhnik
47c3c9a413 Fix update of statistic for LFC/prefetch (#9272)
## Problem

See #9199

## Summary of changes

Fix update of hits/misses for LFC and prefetch introduced in
78938d1b59

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-07 12:21:16 +03:00
Arseny Sher
eae4470bb6 safekeeper: remove local WAL files ignoring peer_horizon_lsn. (#8900)
If peer safekeeper needs garbage collected segment it will be fetched
now from s3 using on-demand WAL download. Reduces danger of running out of disk space when safekeeper fails.
2024-10-04 19:07:39 +03:00
Ivan Efremov
2d248aea6f proxy: exclude triple logging of connect compute errors (#9277)
Fixes (#9020)
 - Use the compute::COULD_NOT_CONNECT for connection error message;
 - Eliminate logging for one connection attempt;
 - Typo fix.
2024-10-04 18:21:39 +03:00
Conrad Ludgate
6c05f89f7d proxy: add local-proxy to compute image (#8823)
1. Adds local-proxy to compute image and vm spec
2. Updates local-proxy config processing, writing PID to a file eagerly
3. Updates compute-ctl to understand local proxy compute spec and to
send SIGHUP to local-proxy over that pid.

closes https://github.com/neondatabase/cloud/issues/16867
2024-10-04 14:52:01 +00:00
Arseny Sher
db53f98725 neon walsender_hooks: take basebackup LSN directly. (#9263)
NeonWALReader needs to know LSN before which WAL is not available
locally, that is, basebackup LSN. Previously it was taken from
WalpropShmemState, but that's racy, as walproposer sets its there only
after successfull election. Get it directly with GetRedoStartLsn.

Should fix flakiness of
test_ondemand_wal_download_in_replication_slot_funcs etc.

ref #9201
2024-10-04 14:56:15 +01:00
Erik Grinaker
04a6222418 remote_storage: add head_object integration test (#9274) 2024-10-04 12:40:41 +01:00
Vlad Lazar
dcf7af5a16 storcon: do timeline creation on all attached location (#9237)
## Problem

Creation of a timelines during a reconciliation can lead to
unavailability if the user attempts to
start a compute before the storage controller has notified cplane of the
cut-over.

## Summary of changes

Create timelines on all currently attached locations. For the latest
location, we still look
at the database (this is a previously). With this change we also look
into the observed state
to find *other* attached locations.

Related https://github.com/neondatabase/neon/issues/9144
2024-10-04 11:56:43 +01:00
Erik Grinaker
37158d0424 pageserver: use conditional GET for secondary tenant heatmaps (#9236)
## Problem

Secondary tenant heatmaps were always downloaded, even when they hadn't
changed. This can be avoided by using a conditional GET request passing
the `ETag` of the previous heatmap.

## Summary of changes

The `ETag` was already plumbed down into the heatmap downloader, and
just needed further plumbing into the remote storage backends.

* Add a `DownloadOpts` struct and pass it to
`RemoteStorage::download()`.
* Add an optional `DownloadOpts::etag` field, which uses a conditional
GET and returns `DownloadError::Unmodified` on match.
2024-10-04 12:29:48 +02:00
Erik Grinaker
60fb840e1f Cargo.toml: enable sso for aws-config (#9261)
## Problem

The S3 tests couldn't use SSO authentication for local tests against S3.

## Summary of changes

Enable the `sso` feature of `aws-config`. Also run `cargo hakari
generate` which made some updates to `workspace_hack`.
2024-10-04 11:27:06 +01:00
Heikki Linnakangas
52232dd85c tests: Add a comment explaining the rules of NeonLocalCli wrappers (#9195) 2024-10-03 22:03:29 +03:00
Heikki Linnakangas
8ef0c38b23 tests: Rename NeonLocalCli functions to match the 'neon_local' commands (#9195)
This makes it more clear that the functions in NeonLocalCli are just
typed wrappers around the corresponding 'neon_local' commands.
2024-10-03 22:03:27 +03:00
Heikki Linnakangas
56bb1ac458 tests: Move NeonCli and friends to separate file (#9195)
In the passing, rename it to NeonLocalCli, to reflect that the binary
is called 'neon_local'.

Add wrapper for the 'timeline_import' command, eliminating the last
raw call to the raw_cli() function from tests, except for a few in
test_neon_cli.py which are about testing the 'neon_local' iteself. All
the other calls are now made through the strongly-typed wrapper
functions
2024-10-03 22:03:25 +03:00
Heikki Linnakangas
19db9e9aad tests: Replace direct calls to neon_cli with wrappers in NeonEnv (#9195)
Add wrappers for a few commands that didn't have them before. Move the
logic to generate tenant and timeline IDs from NeonCli to the callers,
so that NeonCli is more purely just a type-safe wrapper around
'neon_local'.
2024-10-03 22:03:22 +03:00
David Gomes
4e9b32c442 chore: makes some onboarding document improvements (#9216)
* I had to install `m4` in order to be able to run locally
* The docs/docker.md was missing a pointer to where the compute node
code is

(Was originally on #8888 but I am pulling this out)
2024-10-03 20:58:30 +02:00
David Gomes
2fac0b7fac chore: remove unnecessary comments in compute/Dockerfile.compute-node (#9253)
See [this
comment](https://github.com/neondatabase/neon/pull/8888#discussion_r1783130082).
2024-10-03 18:26:41 +00:00
Arpad Müller
e3d6ecaeee Revert hyper and tonic updates (#9268) 2024-10-03 19:21:22 +01:00
Arseny Sher
d785fcb5ff safekeeper: fix panic in debug_dump. (#9097)
Panic was triggered only when dump selected no timelines.

sentry report:
https://neondatabase.sentry.io/issues/5832368589/
2024-10-03 19:22:22 +03:00
Vlad Lazar
552fa2b972 pageserver: tweak oversized key read path warning (#9221)
## Problem

`Oversized vectored read [...]` logs are spewing in prod because we have
a few keys that
are unexpectedly large:
* reldir/relblock - these are unbounded, so it's known technical debt
* slru block - they can be a bit bigger than 128KiB due to storage
format overhead

## Summary of changes

* Bump threshold to 130KiB
* Don't warn on oversized reldir and dbdir keys 

Closes https://github.com/neondatabase/neon/issues/8967
2024-10-03 16:40:35 +01:00
Arpad Müller
9d93dd4807 Rename hyper 1.0 to hyper and hyper 0.14 to hyper0 (#9254)
Follow-up of #9234 to give hyper 1.0 the version-free name, and the
legacy version of hyper the one with the version number inside. As we
move away from hyper 0.14, we can remove the `hyper0` name piece by
piece.

Part of #9255
2024-10-03 16:33:43 +02:00
Heikki Linnakangas
53b6e1a01c vm-monitor: Upgrade axum from 0.6 to 0.7 (#9257)
Because:
- it's nice to be up-to-date,
- we already had axum 0.7 in our dependency tree, so this avoids having
to compile two versions, and
- removes one of the remaining dpendencies to hyper version 0

Also bumps the 'tokio-tungstenite' dependency, to avoid having two
versions in the dependency tree.
2024-10-03 16:49:39 +03:00
Folke Behrens
90f731f3b1 Merge pull request #9256 from neondatabase/rc/proxy/2024-10-03
Proxy release 2024-10-03
2024-10-03 11:01:41 +02:00
Joonas Koivunen
dbef1b064c chore: smaller layer changes (#9247)
Address minor technical debt in Layer inspired by #9224:

- layer usage as arg same as in spans
- avoid one Weak::upgrade
2024-10-03 09:38:45 +01:00
Heikki Linnakangas
6a9e2d657c Remove unnecessary dependencies from postgis-build image (#9211)
The apt install stage before this commit:

    0 upgraded, 391 newly installed, 0 to remove and 9 not upgraded.
    Need to get 261 MB of archives.

after:

    0 upgraded, 367 newly installed, 0 to remove and 9 not upgraded.
    Need to get 220 MB of archives.
2024-10-03 10:05:23 +03:00
Arpad Müller
2d8f6d7906 Suppress wal lag timeout warnings right after tenant attachment (#9232)
As seen in https://github.com/neondatabase/cloud/issues/17335, during
releases we can have ingest lags that are above the limits for warnings.
However, such lags are part of normal pageserver startup.

Therefore, calculate a certain cooldown timestamp until which we accept
lags up to a certain size. The heuristic is chosen to grow the later we get
to fully load the tenant, and we also add 60 seconds as a grace period
after that term.
2024-10-03 02:33:09 +01:00
Arpad Müller
1b176fe74a Use hyper 1.0 and tonic 0.12 in storage broker (#9234)
Fixes #9231 .

Upgrade hyper to 1.4.0 and use hyper 1.4 instead of 0.14 in the storage
broker, together with tonic 0.12. The two upgrades go hand in hand.

Thanks to the broker being independent from other components, we can
upgrade its hyper version without touching the other components, which
makes things easier.
2024-10-03 00:48:12 +02:00
Heikki Linnakangas
1dec93f129 Add compute_tools/ to the list of paths that trigger an E2E run on a PR (#9251)
compute_ctl is an important part of the interfaces between the control
plane and the compute, so it seems important to E2E test any changes
there.
2024-10-03 00:31:19 +03:00
Alexander Bayandin
16002f5e45 test_runner: bump requests and psycopg2-binary (#9248)
## Problem

```
Warning: The file chosen for install of requests 2.32.0 (requests-2.32.0-py3-none-any.whl) is yanked. Reason for being yanked: Yanked due to conflicts with CVE-2024-35195 mitigation
```

## Summary of changes
- Update `requests` to fix the warning
- Update `psycopg2-binary`
2024-10-02 21:26:45 +01:00
dotdister
09d4bad1be Change parentheses to clarify conditions in walproposer (#9180)
Some parentheses in conditional expressions are redundant or necessary
for clarity conditional expressions in walproposer.

## Summary of changes

Change some parentheses to clarify conditions in walproposer.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-10-02 14:49:52 -04:00
Heikki Linnakangas
d20448986c Fix metric name of the 'getpage_wait_seconds_bucket' metric (#9242)
Per convention, histogram buckets have the '_bucket' suffix. I got that
wrong in commit 0d500bbd5b.

Fixes https://github.com/neondatabase/neon/issues/9241
2024-10-02 20:05:14 +03:00
John Spray
d54624153d tests: sync_after_each_test -> sync_between_tests (#9239)
## Problem

We are seeing frequent pageserver startup timelines while it calls
syncfs(). There is an existing fixture that syncs _after_ tests, but not
before the first one. We hypothesize that some failures are happening on
the first test in a job.

## Summary of changes

- extend the existing sync_after_each_test to be a sync between all
tests, including sync'ing before running the first test. That should
remove any ambiguity about whether the sync is happening on the correct
node.

This is an alternative to https://github.com/neondatabase/neon/pull/8957
-- I didn't realize until I saw Alexander's comment on that PR that we
have an existing hook that syncs filesystems and can be extended.
2024-10-02 17:44:25 +01:00
Alex Chi Z.
700885471f fix(test): only test num of L1 layers in compaction smoke test (#9186)
close https://github.com/neondatabase/neon/issues/9160

For whatever reason, pg17's WAL pattern seems different from others,
which triggers some flaky behavior within the compaction smoke test.

## Summary of changes

* Run L0 compaction before proceeding with the read benchmark.
* So that we can ensure the num of L0 layers is 0 and test the
compaction behavior only with L1 layers.

We have a threshold for triggering L0 compaction. In some cases, the
test case did not produce enough L0 layers to do a L0 compaction,
therefore leaving the layer map with 3+ L0 layers above the L1 layers.
This increases the average read depth for the timeline.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-02 17:42:35 +01:00
Vlad Lazar
38a8dcab9f storcon: add metric for long running reconciles (#9207)
## Problem

We don't have an alert for long running reconciles. Stuck reconciles are
problematic
as we've seen in a recent incident.

## Summary of changes

Add a new metric `storage_controller_reconcile_long_running_total` with
labels: `{tenant_id, shard_number, seq}`.
The metric is removed after the long running reconcile finishes. These
events should be rare, so we won't break
the bank on cardinality.

Related https://github.com/neondatabase/neon/issues/9150
2024-10-02 17:25:11 +01:00
Vlad Lazar
8dbfda98d4 storcon: ignore deleted timelines on new location catch-up (#9244)
## Problem

If a timeline was deleted right before waiting for LSNs to catch up
before the cut-over,
then we would wait forever. 

## Summary of changes

Fix the issue and add a test for timeline deletions mid migration. 

Related https://github.com/neondatabase/neon/issues/9144
2024-10-02 17:23:26 +01:00
John Spray
f875e107aa pageserver: tweak logging of "became visible" for layers (#9224)
## Problem

Recent change to avoid the "became visible" log messages from certain
tasks missed a task: the logical size calculation that happens as a
child of synthetic size calculation.

Related: https://github.com/neondatabase/neon/issues/9058

## Summary of changes

- Add OnDemandLogicalSize to the list of permitted tasks for reads
making a covered layer visible
- Tweak the log message to use layer name instead of key: this is more
terse, and easier to use when debugging, as one can search for it
elsewhere to see when the layer was written/downloaded etc.
2024-10-02 13:21:04 +01:00
Folke Behrens
1e90e792d6 proxy: Add timeout to webauth confirmation wait (#9227)
```shell
$ cargo run -p proxy --bin proxy -- --auth-backend=web --webauth-confirmation-timeout=5s
```

```
$ psql -h localhost -p 4432
NOTICE:  Welcome to Neon!
Authenticate by visiting within 5s:
    http://localhost:3000/psql_session/e946900c8a9bc6e9


psql: error: connection to server at "localhost" (::1), port 4432 failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?
connection to server at "localhost" (127.0.0.1), port 4432 failed: ERROR:  Disconnected due to inactivity after 5s.
```
2024-10-02 12:10:56 +02:00
Matthias van de Meent
ea32f1d0a3 Expose more granular wait event data to the user (#9163)
In PG17, there is this newfangled custom wait events system. This commit
adds that feature to Neon, so that users can see what their backends may
be waiting for when a PostgreSQL backend is playing the waiting game in
Neon code.
2024-10-02 11:12:50 +02:00
Heikki Linnakangas
2e3b7862d0 Fix compute metrics collector config (#9235) 2024-10-02 09:44:00 +01:00
Arpad Müller
387e569259 Update aws SDK crates (#9233)
This updates the aws SDK crates to their newest released versions.
2024-10-02 08:00:08 +02:00
Alex Chi Z.
31f12f6426 fix: ignore tonic to resolve advisories (#9230)
check-rust-style fails because tonic version too old, this does not seem
to be an easy fix, so ignore it from the deny list.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-01 19:26:54 -04:00
Heikki Linnakangas
8861e8a323 Fix the size of the perf counters shared memory array (#9226)
MaxBackends doesn't include auxiliary processes. Whenever an aux process
made IO operations that updated the counters, they would scribble over
shared memory beoynd the end of the array. The relsize cache hash table
comes after the array, so the symptom was an error about hash table
corruption in the relsize cache hash.
2024-10-01 20:07:51 +01:00
Arseny Sher
62e22dfd85 Backpressure: reset ps display after it is done. (#8980)
Previously we set the 'backpressure throttling' status, but overwrote
current one and never reset it back.
2024-10-01 20:55:05 +03:00
Arseny Sher
17672c88ff tests: wait walreceiver on sks to be gone on 'immediate' ep restart. (#9099)
When endpoint is stopped in immediate mode and started again there is a
chance of old connection delivering some WAL to safekeepers after second
start checked need for sync-safekeepers and thus grabbed basebackup LSN.
It makes basebackup unusable, so compute panics. Avoid flakiness by
waiting for walreceivers on safekeepers to be gone in such cases. A
better way would be to bump term on safekeepers if sync-safekeepers is
skipped, but it needs more infrastructure.

ref https://github.com/neondatabase/neon/issues/9079
2024-10-01 20:54:00 +03:00
Matthias van de Meent
6efdb1d0f3 Fix small memory accounting bug in libpagestore (#9223)
Found while searching for other issues in shared memory.

The bug should be benign, in that it over-allocates memory for this
struct, but doesn't allow for out-of-bounds writes.
2024-10-01 17:37:59 +01:00
Erik Grinaker
325de52e73 pageserver: remove TenantConfOpt::TryFrom<toml_edit::Item> (#9219)
Following #7656, `TenantConfOpt::TryFrom<toml_edit::Item>` appears to be
dead code. This patch removes `TenantConfOpt::TryFrom<toml_edit::Item>`.

The code does appear to be dead, since the TOML config is deserialized
into `TenantConfig` (via `LocationConfig`) and then converted into
`TenantConfOpt`.

This was verified by adding a panic to `try_from()` and running the
pageserver unit tests as well as a local end-to-end cluster (including
creating a new tenant and restarting the pageserver). This did not fail,
so this is not used on the common happy path at least. No explicit
`try_from` or `try_into` calls were found either.

Resolves #8918.
2024-10-01 16:35:18 +01:00
Anastasia Lubennikova
ce73db9316 Fix post_apply_config() (#9220)
Bring back post_apply_config() step 
that was accidentally removed in 78938d1
2024-10-01 16:28:58 +01:00
Shinya Kato
b675997f48 safekeeper: Fix a log message of HTTP worker (#9213)
## Problem
There is a wrong log message.

## Summary of changes
Fixed the log message.
2024-10-01 17:16:53 +02:00
Alex Chi Z.
49f99eb729 docs: add aux file v2 RFC (#9115)
aux v2 migration is near the end and I rewrote the RFC based on what I
proposed (several months before...) and what I actually implemented.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-01 15:56:54 +01:00
Heikki Linnakangas
0d500bbd5b Add new compute metrics to sql exporter (#9190)
These are the perf counters added in commit 263dfba6ee.

Note: This relies on 'neon' extension version 1.5. The default was
bumped to 1.5 in commit d696c41807.

---------

Co-authored-by: Matthias van de Meent <matthias@neon.tech>
2024-10-01 17:38:19 +03:00
Heikki Linnakangas
1b8b50755c Use debian packages for cmake again (#9212)
On bookworm, 'cmake' is new enough that we can just use it. On bullseye,
we can get a new-enough package from backports. By including 'cmake' in
the build-deps stage, we don't need to install it separately in all the
later build stages that need it.

See https://github.com/neondatabase/neon/pull/2699, where we switched to
downloading and building a specific version.
2024-10-01 15:09:09 +03:00
Conrad Ludgate
4391b25d01 proxy: ignore typ and use jwt.alg rather than jwk.alg (#9215)
Microsoft exposes JWKs without the alg header. It's only included on the
tokens. Not a problem.

Also noticed that wrt the `typ` header:
> It will typically not be used by applications when it is already known
that the object is a JWT. This parameter is ignored by JWT
implementations; any processing of this parameter is performed by the
JWT application.

Since we know we are expecting JWTs only, I've followed the guidance and
removed the validation.
2024-10-01 10:36:49 +01:00
John Spray
40b10b878a storage_scrubber: retry on index deletion failures (#9204)
## Problem

In automated tests running on AWS S3, we frequently see scrubber
failures when it can't delete an index.

`location_conf_churn`:

https://neon-github-public-dev.s3.amazonaws.com/reports/main/11076221056/index.html#/testresult/f89b1916b6a693e2

`scrubber_physical_gc`:

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9178/11074269153/index.html#/testresult/9885ed5aa0fe38b6

## Summary of changes

Wrap index deletion in a backoff::retry

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-10-01 10:34:39 +01:00
David Gomes
d6c6b0a509 feat(compute): adds pg_session_jwt extension to compute image (#8888)
## Problem

We need the
[pg_session_jwt](https://github.com/neondatabase/pg_session_jwt/)
extension in the compute image. This PR adds it.

## Summary of changes

I added the `pg_session_jwt` extension in a very similar way to how the
pggraphql and pgtiktoken extensions were added (since they're all
written with pgrx). Then I tested this.

```
$ cd docker-compose/
$ PG_VERSION=16 TAG=10667533475 docker-compose up --build -d
$ psql postgresql://cloud_admin:cloud_admin@localhost:55433/postgres

cloud_admin@postgres=# create extension pg_session_jwt;
CREATE EXTENSION
Time: 43.048 ms

cloud_admin@postgres=# \df auth.*;
                              List of functions
┌────────┬──────────────────┬──────────────────┬─────────────────────┬──────┐
│ Schema │       Name       │ Result data type │ Argument data types │ Type │
├────────┼──────────────────┼──────────────────┼─────────────────────┼──────┤
│ auth   │ get              │ jsonb            │ s text              │ func │
│ auth   │ init             │ void             │ kid bigint, s jsonb │ func │
│ auth   │ jwt_session_init │ void             │ s text              │ func │
│ auth   │ user_id          │ text             │                     │ func │
└────────┴──────────────────┴──────────────────┴─────────────────────┴──────┘
(4 rows)

cloud_admin@postgres=# select auth.init(cast('1' as bigint), to_jsonb(TEXT '{ "kty": "EC", "kid": "571683be-33cf-4e67-bccc-8905c0ebb862", "crv": "P-521", "alg": "ES512", "x": "AM_GsnQvKML2yXdn_OsN8PdgO1Sf9XMXih5vQMKLmJkp-Iz_FFWJUt6uyR_qp4brr8Ji2kjGJgN4cQJpg2kskH7V", "y": "AZg-salw24lCmsBP-BCBa5jT6INkTwLtCOC7o0BIxDVvmIEH1-PQAJVYVJPTFvPMi_PLa0QlOm-ufJYkynwa2Mau" }'));
ERROR:  called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"{ \\\"kty\\\": \\\"EC\\\", \\\"kid\\\": \\\"571683be-33cf-4e67-bccc-8905c0ebb862\\\", \\\"crv\\\": \\\"P-521\\\", \\\"alg\\\": \\\"ES512\\\", \\\"x\\\": \\\"AM_GsnQvKML2yXdn_OsN8PdgO1Sf9XMXih5vQMKLmJkp-Iz_FFWJUt6uyR_qp4brr8Ji2kjGJgN4cQJpg2kskH7V\\\", \\\"y\\\": \\\"AZg-salw24lCmsBP-BCBa5jT6INkTwLtCOC7o0BIxDVvmIEH1-PQAJVYVJPTFvPMi_PLa0QlOm-ufJYkynwa2Mau\\\" }\", expected struct JwkEcKey", line: 0, column: 0)
Time: 6.991 ms
```

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Move the download location to a proper URL
2024-10-01 10:29:56 +01:00
John Spray
d515727e94 tests: make test_multi_attach more stable (#9202)
## Problem

`test_multi_attach` is sometimes failing with `invalid compute status
for configuration request: Configuration`. This is likely a result of
the test attempting to reconfigure the compute at the same time as the
storage controller is doing so.

This test was originally written before the storage controller existed,
and is not expecting anything else to be reconfiguring computes at the
same time.

## Summary of changes

- Configure the tenant into scheduling policy `Stop` in the storage
controller at the start of the test, so that it won't try to do anything
to the tenant while the test is running.
2024-10-01 10:15:18 +01:00
Folke Behrens
2e508b1ff9 Upgrade OpenTelemetry and other tracing crates (#9200)
* tracing-utils now returns a `Layer` impl. Removes the need for crates
to
  import OTel crates.
* Drop the /v1/traces URI check. Verified that the code does the right
thing.
* Leave a TODO to hook in an error handler for OTel to log errors to
when it
  assumes the regular pipeline cannot be used/is broken.
2024-10-01 11:02:54 +02:00
John Spray
651ae44569 storage controller: drop out of blocking compute notification loop if migration origin becomes unavailable (#9147)
## Problem

The live migration code waits forever for the compute notification hook,
on the basis that until it succeeds, the compute is probably using the
old location and we shouldn't detach it.

However, if a pageserver stops or restarts in the background, then this
original location might no longer be available, so there is no point
waiting. Waiting is also actively harmful, because it prevents other
reconciliations happening for the tenant shard, such as during an
upgrade where a stuck "drain" migration might prevent the later "fill"
migration from moving the shard back to its original location.

## Summary of changes

- Refactor the notification wait loop into a function
- Add a checks during the loop, for the origin node's cancellation token
and an explicit HTTP request to the origin node to confirm the shard is
still attached there.

Closes: https://github.com/neondatabase/neon/issues/8901
2024-10-01 07:57:22 +00:00
Heikki Linnakangas
65bda19051 Remove unnecessary dev package from compute image (#9210)
libcurl4-openssl-dev is needed to build pgxn/, but libcurl4 is enough at
runtime.
2024-10-01 01:07:43 +03:00
Conrad Ludgate
94a5ca2817 proxy: auth broker (#8855)
Opens http2 connection to local-proxy and forwards requests over with
all headers and body

closes https://github.com/neondatabase/cloud/issues/16039
2024-09-30 20:43:45 +01:00
Arthur Petukhovsky
c07cea80bd Bump vm-builder v0.29.3 -> v0.35.0 (#9208)
We haven't updated it for a while. Now I need the update to add quotas
support to compute images
(https://github.com/neondatabase/cloud/issues/13127).

Previous update: https://github.com/neondatabase/neon/pull/7849
2024-09-30 19:18:42 +01:00
Conrad Ludgate
a2e2362ee9 add proxy-protocol header disable option (#9203)
resolves https://github.com/neondatabase/cloud/issues/18026
2024-09-30 18:11:50 +00:00
Heikki Linnakangas
0a567acdb9 tests: Move comment to more appropriate place
There is no 'pg_bin' in NeonEnv.
2024-09-30 17:56:43 +03:00
Heikki Linnakangas
69ea2776e9 tests: Remove creation of extra timelines in some tests
neon_cli.create_tenant() creates a new tenant *and* a timeline on the
tenant, with name "main". In most tests, there's no need to create
another timeline on the same tenant.

There are some more tests that do that, but in the remaining cases, I
wasn't be 100% if the presence of extra root timelines affect what the
tests test, so I left them alone.
2024-09-30 17:56:40 +03:00
Heikki Linnakangas
4dc9cb7cf9 tests: Remove some spurious list_timelines calls
These calls seem really out of place. We know what the initial tenant
and branch are in these tests, just like in all other tests.
2024-09-30 17:56:37 +03:00
John Spray
7424e7269c tests: longer timeout in test_delete_timeline_client_hangup (#9161)
## Problem

This test waits for a request to finish, and then expects deletion to
complete almost immediately. The request completes, but it's a 202, the
timeline is still deleting in the background: we need to be more
patient.

## Summary of changes

- Adjust iterations from 2 to 10 when waiting for deletion
2024-09-30 15:46:07 +01:00
a-masterov
5dc68e4e6a test_compatibility: fix the regexes detecting the version (#9205)
## Problem
The Neon components, built locally and by the GitHub workflow have
slightly different version prefixes (git: vs git-env:)
This does not allow running tests against local builds correctly.

## Summary of changes
The regular expressions were changed to work with both
prefixes.
2024-09-30 16:37:14 +02:00
John Spray
7cfd116856 pageserver: refactor immediate_gc into TenantManager (#9183)
## Problem

Legacy functions that were called as `mgr::` and relied on the static
TENANTS, see #5796

## Summary of changes

- Move the last stray function (immediate_gc) into TenantManager

Closes: https://github.com/neondatabase/neon/issues/5796
2024-09-30 09:27:28 +01:00
Heikki Linnakangas
d696c41807 Bump default neon extension version to 1.5 (#9188)
Commit 263dfba6ee introduced neon extension version 1.5, which included
some new functions and views for metrics. It didn't bump the default
neon extension number yet, so that we could still safely roll back to
the old binary if necessary. This bumps the default version.
2024-09-30 09:20:52 +03:00
Alexander Bayandin
3c72192065 CI(benchmarking): fix setting LD_LIBRARY_PATH (#9191)
## Problem

`pgbench-pgvector` job from Nightly Benchmarks fails with the error:

```
/__w/_temp/f45bc2eb-4c4c-4f0a-8030-99079303fa65.sh: line 17: LD_LIBRARY_PATH: unbound variable
```

## Summary of changes
- Fix `LD_LIBRARY_PATH: unbound variable` error in benchmarks
2024-09-29 22:27:53 +00:00
Alexander Bayandin
d2d9921761 CI(benchmarking): fix Nightly Benchmarks (#9178)
## Problem

Nightly Benchmarks have been broken for some time due to various
reasons, this PR fixes it

## Summary of changes
- Pull `build-tools` image from dockerhub for `benchmarking` workflow
- Use `aws-actions/configure-aws-credentials` to upload/download
artifacts from S3
- Fix Postgres 16 installation (for pgbench)
2024-09-28 02:44:22 +01:00
Arthur Petukhovsky
ba498a630a Set disk quotas on bind in compute_ctl (#8936)
Part of https://github.com/neondatabase/cloud/issues/13127. Resolves
#9153

What changed in this PR:
1. Adds `ComputeSpec.disk_quota_bytes: Option<u64>`
2. Adds new arg to compute_ctl: `--set-disk-quota-for-fs <mountpoint>`
3. Implements running `/neonvm/bin/set-disk-quota` with the right value
if both cmdline arg AND field in the spec are specified
4. Patches `/etc/sudoers.d` to allow `compute_ctl` to set quota with
sudo

This PR is very similar to the swap support added earlier, you can take
a look at it as prior art: #7434

In theory, it can be implemented outside of compute_ctl when we will
have a separate neonvm daemon, but we are not there yet. Current
implementation is the simplest possible to unblock computes with larger
disks.

All code related to usage of `/neonvm/bin/set-disk-quota` is located in
`disk_quota.rs`. We need to call this script with the following
arguments: `/neonvm/bin/set-disk-quota {size_kb} {mountpoint}`. Quotas
are set on the filesystem level, so we need to provide path to the
directory that filesystem was mounted to.

I tested this change locally with
https://github.com/neondatabase/cloud/pull/17270. It should be safe to
merge, because this feature is gated by both cmdline arg and field in
the spec. If control-plane doesn't set values in both places,
compute_ctl won't be affected by this change.
2024-09-27 20:52:22 +01:00
Heikki Linnakangas
e989a5e4a2 neon_local: Use clap derive macros to parse the CLI args (#9103)
This is easier to work with.
2024-09-27 22:08:46 +03:00
Alex Chi Z.
cde1654d7b fix(pageserver): abort process if fsync fails (#9108)
close https://github.com/neondatabase/neon/issues/8140

The original issue is rather vague on what we should do. After
discussion w/ @problame we decided to narrow down the problems we want
to solve in that issue.

* read path -- do not panic for now.
* write path -- panic only on write errors (i.e., device error, fsync
error), but not on no-space for now.

The guideline is that if the pageserver behavior could lead to violation
of persistent constraints (i.e., return an operation as successful but
not actually persisting things), we should panic. Fsync is the place
where both of us agree that we should panic, because if fsync fails, the
kernel will mark dirty pages as clean, and the next fsync will not
necessarily return false. This would make the storage client assume the
operation is successful.

## Summary of changes

Make fsync panic on fatal errors.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-27 19:58:50 +01:00
Heikki Linnakangas
cf6a776fcf tests: Reduce the # of iterations in safekeeper::test_random_schedules (#9182)
To make it faster. On my laptop, it takes about 30 before this commit.
In the arm64 debug variant in CI, it takes about 120 s. Reduce it by
factor of 4.
2024-09-27 16:25:35 +00:00
Matthias van de Meent
5c5871111a WalProposer: Read WAL directly from WAL buffers in PG17 (#9171)
This reduces the overhead of the WalProposer when it is not being
throttled by SK WAL acceptance rate
2024-09-27 17:47:05 +02:00
Yuchen Liang
d56c4e7a38 pageserver: remove AdjacentVectoredReadBuilder and bump minmimum io_buffer_alignment to 512 (#9175)
Part of #8130

## Problem

After deploying https://github.com/neondatabase/infra/pull/1927, we
shipped `io_buffer_alignment=512` to all prod region. The
`AdjacentVectoredReadBuilder` code path is no longer taken and we are
running pageserver unit tests 6 times in the CI. Removing it would
reduce the test duration by 30-60s.

## Summary of changes

- Remove `AdjacentVectoredReadBuilder` code.
- Bump the minimum `io_buffer_alignment` requirement to at least 512
bytes.
- Use default `io_buffer_alignment` for Rust unit tests.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-09-27 16:41:42 +01:00
Conrad Ludgate
43b2445d0b proxy: add jwks endpoint to control plane and mock providers (#9165) 2024-09-27 16:08:43 +01:00
Yuchen Liang
42ef08db47 fix(pageserver): LSN lease edge cases around restarts/migrations (#9055)
Part of #7497, closes #8817.

## Problem

See #8817. 

## Summary of changes

**compute_ctl**

- Renew lsn lease as soon as `/configure` updates pageserver_connstr,
use `state_changed` Condvar for synchronization.

**pageserver**

As mentioned in
https://github.com/neondatabase/neon/issues/8817#issuecomment-2315768076,
we still want some permanent error reported if a lease cannot be
granted. By considering attachment mode and the added
`lsn_lease_deadline` when processing lease requests, we can also bound
the case of bad requests to a very short period after migration/restart.

- Refactor https://github.com/neondatabase/neon/pull/9024 and move
`lsn_lease_deadline` to `AttachedTenantConf` so timeline can easily
access it.
- Have separate HTTP `init_lsn_lease` and  libpq `renew_lsn_lease` API.
  - Always do LSN verification for the initial HTTP lease request.
- LSN verification for the renewal is **still done** when tenants are
not in `AttachedSingle` and we have pass the `lsn_lease_deadline`, which
give plenty of time for compute to renew the lease.
 
**neon_local**

- add and call `timeline_init_lsn_lease` mgmt_api at static endpoint
start. The initial lsn lease http request is sent when we run `cargo
neon endpoint start <static endpoint>`.


## Testing

- Extend `test_readonly_node_gc` to do pageserver restarts and
migration.

## Future Work

- The control plane should make the initial lease request through HTTP
when creating a static endpoint. This is currently only done in
`neon_local`.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-09-27 09:56:52 -04:00
Tristan Partin
fc962c9605 Use long options when calling initdb
Verbosity in this case is good when reading the code. Short options are
better when operating in an interactive shell.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-09-27 08:22:16 -05:00
Heikki Linnakangas
357fa070a3 Add gdb to build-tools (#9125)
So that compute_ctl can use it to print backtrace on core dumps

See issue #2800.
2024-09-27 15:36:24 +03:00
Heikki Linnakangas
02cdd37b56 Dump backtrace if a core dump is called just "core" (#9125)
I hope this lets us capture backtraces in CI. At least it makes it
work on my laptop, which is valuable even if we need to do more for
CI.

See issue #2800.
2024-09-27 15:36:24 +03:00
Vlad Lazar
fa354a65ab libs: improve logging on PG connection errors (#9130)
## Problem
We get some unexpected errors, but don't know who they're happening for.

## Summary of change
Add tenant id and peer address to PG connection error logs.

Related https://github.com/neondatabase/cloud/issues/17336
2024-09-27 12:36:43 +01:00
Arseny Sher
40f7930a7d safekeeper: skip syncfs on start if --no-sync is specified. (#9166)
https://neondb.slack.com/archives/C059ZC138NR/p1727350911890989?thread_ts=1727350211.370869&cid=C059ZC138NR
2024-09-27 09:59:38 +03:00
Conrad Ludgate
ec07a1ecc9 proxy: make local-proxy config by signal with PID, refine JWKS apis with role caching (#9164) 2024-09-26 19:01:48 +01:00
Arseny Sher
c4cdfe66ac Fix flakiness of test_timeline_copy.
Timeline might be not initialized when timeline_start_lsn is
queried. Spotted by CI.
2024-09-26 19:01:45 +03:00
Alex Chi Z.
42e19e952f fix(pageserver): categorize client error in basebackup metrics (#9110)
We separated client error from basebackup error log lines in
https://github.com/neondatabase/neon/pull/7523, but we didn't do
anything for the metrics. In this patch, we fixed it.

ref https://github.com/neondatabase/neon/issues/8970

## Summary of changes

We use the same criteria as in `log_query_error` producing an info line
(instead of error) for the metrics. We added a `client_error` category
for the basebackup query time metrics.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-26 11:38:19 -04:00
John Spray
3d255d601b pageserver: rename control plane client & chunk validation requests (#8997)
## Problem

- In https://github.com/neondatabase/neon/pull/8784, the validate
controller API is modified to check generations directly in the
database. It batches tenants into separate queries to avoid generating a
huge statement, but
- While updating this, I realized that "control_plane_client" is a kind
of confusing name for the client code now that it primarily talks to the
storage controller (the case of talking to the control plane will go
away in a few months).

## Summary of changes

- Big rename to "ControllerUpcallClient" -- this reflects the storage
controller's api naming, where the paths used by the pageserver are in
`/upcall/`
- When sending validate requests, break them up into chunks so that we
avoid possible edge cases of generating any HTTP requests that require
database I/O across many thousands of tenants.

This PR mixes a functional change with a refactor, but the commits are
cleanly separated -- only the last commit is a functional change.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-09-26 16:06:34 +01:00
Arthur Petukhovsky
80e974d05b fix(compute_ctl): race condition in configurator (#9162)
There was a tricky race condition in compute_ctl, that sometimes makes
configurator skip updates. It makes a deadlock because:
- control-plane cannot configure compute, because it's in
ConfigurationPending state
- compute_ctl doesn't do any reconfiguration because
`configurator_main_loop` missed notification for it

Full sequence that reproduces the issue:
1. `start_compute` finishes works and changes status
`self.set_status(ComputeStatus::Running);`
2. configurator received update about `Running` state and dropped the
mutex lock in the iteration
3. `/configure` request was triggered at the same time as step 1, and
got the mutex lock
4. same `/configure` request set the spec and updated the state to
`ConfigurationPending`, also sent a notification
5. next iteration in configurator got the mutex lock, but missed the
notification

There are more details in this slack thread:
https://neondb.slack.com/archives/C03438W3FLZ/p1727281028478689?thread_ts=1727261220.483799&cid=C03438W3FLZ

---------

Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2024-09-26 15:42:17 +01:00
Alexander Bayandin
7fdf1ab5b6 CI: run compatibility tests on Postgres 17 (#9145)
## Problem

The latest storage release has generated artifacts for Postgres 17,
so we can enable compatibility tests this version

## Summary of changes
- Unskip `test_backward_compatibility` / `test_forward_compatibility` on
Postgres 17
2024-09-26 15:17:01 +01:00
Conrad Ludgate
7736b748d3 Merge pull request #9159 from neondatabase/rc/proxy/2024-09-26
Proxy release 2024-09-26
2024-09-26 09:22:33 +01:00
Arpad Müller
7bae78186b Forbid creation of child timelines of archived timeline (#9122)
We don't want to allow any new child timelines of archived timelines. If
you want any new child timelines, you should first un-archive the
timeline.
 
Part of #8088
2024-09-26 02:05:25 +02:00
Heikki Linnakangas
7e560dd00e chore: Silence clippy warning with nightly (#9157)
The warning:

    warning: first doc comment paragraph is too long
      --> pageserver/src/tenant/checks.rs:7:1
       |
7 | / /// Checks whether a layer map is valid (i.e., is a valid result
of the current compaction algorithm if no...
8 | | /// The function checks if we can split the LSN range of a delta
layer only at the LSNs of the delta layer...
    9  | | ///
    10 | | /// ```plain
       | |_
       |
= help: for further information visit
https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph
= note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default
    help: add an empty line
       |
7 ~ /// Checks whether a layer map is valid (i.e., is a valid result of
the current compaction algorithm if nothing goes wrong).
    8  + ///
       |

Fix by applying the suggestion.
2024-09-25 21:29:16 +00:00
Tristan Partin
684e924211 Fix compute_logical_snapshot_files for v14
The function, pg_ls_logicalsnapdir(), was added in version 15.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-09-25 16:25:17 -05:00
Tristan Partin
8ace9ea25f Format long single DATA line in pgxn/Makefile
This should be a little more readable.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-09-25 16:25:17 -05:00
Alex Chi Z.
6a4f49b08b fix(pageserver): passthrough partition cancel error (#9154)
close https://github.com/neondatabase/neon/issues/9142

## Summary of changes

passthrough CollectKeyspaceError::Cancelled to
CompactionError::ShuttingDown

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-25 21:35:33 +01:00
Alexander Bayandin
c6e89445e2 CI(promote-images): fix prod ECR auth (#9146)
A cherry-pick from the previous release (#9131)

## Problem
Login to prod ECR doesn't work anymore:
```
Retrieving registries data through *** SDK...
*** ECR detected with eu-central-1 region
Error: The security token included in the request is invalid.
```

## Summary of changes
- Fix login to prod ECR by using `aws-actions/configure-aws-credentials`
2024-09-25 18:22:39 +01:00
Vlad Lazar
04f32b9526 tests: remove patching up of az id column (#8968)
This was required since the compat tests used a snapshot generated from
a version of neon local which didn't contain the availability_zone_id
column.
2024-09-25 17:22:32 +01:00
Heikki Linnakangas
6f2333f52b CI: Leave out unnecessary build files from binary artifact (#9135)
The pg_install/build directory contains .o files and such intermediate
results from the build, which are not needed in the final tarball.
Except for src/test/regress/regress.so and a few other .so files in that
directory; keep those.

This reduces the size of the neon-Linux-X64-release-artifact.tar.zst
artifact from about 1.5 GB to 700 MB.

(I attempted this a long time ago already, by moving the build/
directory out of pg_install altogether, see PR #2127. But I never got
around to finish that work.)

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-09-25 19:07:20 +03:00
Yuchen Liang
d447f49bc3 fix(pageserver): handle lsn lease requests for unnormalized lsns (#9137)
Fixes https://github.com/neondatabase/neon/issues/9098.

## Problem

See
https://github.com/neondatabase/neon/issues/9098#issuecomment-2372484969.

### Related

A similar problem happened with branch creation, which was discussed
[here](https://github.com/neondatabase/neon/pull/2143#issuecomment-1199969052)
and fixed by https://github.com/neondatabase/neon/pull/2529.

## Summary of changes

- Normalize the lsn on pageserver side upon lsn lease request, stores
the normalized LSN.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-09-25 14:57:38 +00:00
Vlad Lazar
c5972389aa storcon: include timeline ID in LSN waiting logs (#9141)
## Problem
Hard to tell which timeline is holding the migration.

## Summary of Changes
Add timeline id to log.
2024-09-25 15:54:41 +01:00
Matthias van de Meent
c4f5736d5a Build images for PG17 using Debian 12 "Bookworm" (#9132)
This increases the support window of the OS used for PG17 by 2 years
compared to the previous usage of Debian 11 "Bullseye".
2024-09-25 17:50:05 +03:00
Alexey Kondratov
518f598e2d docs(rfc): Independent compute release flow (#8881)
Related to https://github.com/neondatabase/cloud/issues/11698
2024-09-25 16:24:09 +02:00
John Spray
4b711caf5e storage controller: make proxying of GETs to pageservers more robust (#9065)
## Problem

These commits are split off from
https://github.com/neondatabase/neon/pull/8971/commits where I was
fixing this to make a better scale test pass -- Vlad also independently
recognized these issues with cloudbench in
https://github.com/neondatabase/neon/issues/9062.

1. The storage controller proxies GET requests to pageservers based on
their intent, not the ground truth of where they're really attached.
2. Proxied requests can race with scheduling to tenants, resulting in
404 responses if the request hits the wrong pageserver.

Closes: https://github.com/neondatabase/neon/issues/9062

## Summary of changes

1. If a shard has a running reconciler, then use the database
generation_pageserver to decide who to proxy the request to
2. If such a request gets a 404 response and its scheduled node has
changed since the request was dispatched.
2024-09-25 13:56:39 +00:00
Vlad Lazar
2cf47b1477 storcon: do az aware scheduling (#9083)
## Problem

Storage controller didn't previously consider AZ locality between
compute and pageservers
when scheduling nodes. Control plane has this feature, and, since we are
migrating tenants
away from it, we need feature parity to avoid perf degradations.

## Summary of changes

The change itself is fairly simple:
1. Thread az info into the scheduler
2. Add an extra member to the scheduling scores

Step (2) deserves some more discussion. Let's break it down by the shard
type being scheduled:

**Attached Shards**

We wish for attached shards of a tenant to end up in the preferred AZ of
the tenant since that
is where the compute is like to be. 

The AZ member for `NodeAttachmentSchedulingScore` has been placed
below the affinity score (so it's got the second biggest weight for
picking the node). The rationale for going
below the affinity score is to avoid having all shards of a single
tenant placed on the same node in 2 node
regions, since that would mean that one tenant can drive the general
workload of an entire pageserver.
I'm not 100% sure this is the right decision, so open to discussing
hoisting the AZ up to first place.

 **Secondary Shards**

We wish for secondary shards of a tenant to be scheduled in a different
AZ from the preferred one
for HA purposes.

The AZ member for `NodeSecondarySchedulingScore` has been placed first,
so nodes in different AZs
from the preferred one will always be considered first. On small
clusters, this can mean that all the secondaries
of a tenant are scheduled to the same pageserver, but secondaries don't
use up as many resources as the
attached location, so IMO the argument made for attached shards doesn't
hold.

Related: https://github.com/neondatabase/neon/issues/8848
2024-09-25 14:31:04 +01:00
Folke Behrens
7dcfcccf7c Re-export git-version from utils and remove as direct dep (#9138) 2024-09-25 14:38:35 +02:00
Vlad Lazar
a26cc29d92 storcon: add tags to scheduler logs (#9127)
We log something at info level each time we schedule a shard to a
non-secondary location.

Might as well have context for it.
2024-09-25 10:16:06 +01:00
Alex Chi Z.
5f2f31e879 fix(test): storage scrubber should only log to stdout with info (#9067)
As @koivunej mentioned in the storage channel, for regress test, we
don't need to create a log file for the scrubber, and we should reduce
noisy logs.

## Summary of changes

* Disable log file creation for storage scrubber
* Only log at info level

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-24 22:33:03 +00:00
Damian972
938b163b42 chore(docker-compose): fix typo in readme (#9133)
Typo in the readme inside docker-compose folder

## Summary of changes
- Update the readme
2024-09-24 18:05:23 -04:00
Heikki Linnakangas
5cbf5b45ae Remove TenantState::Loading (#9118)
The last real use was removed in commit de90bf4663. It was still used in
a few unit tests, but they can use Attaching too.
2024-09-24 20:58:54 +00:00
Heikki Linnakangas
af5c54ed14 test: Make test_lfc_resize more robust (#9117)
1. Increase statement_timeout. It defaults to 120 s, which is not quite
enough on slow or busy systems with debug build. On my laptop, the index
creation takes about 100 s. On buildfarm, we've seen failures, e.g:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9084/10997888708/index.html#suites/821f97908a487f1d7d3a2a4dd1571e99/db1834bddfe8c5b9/

2. Keep twiddling the LFC size through the whole test. Before, we would
do it for the first 10 seconds, but that only covers a small part of the
pgbench initialization phase. Change the loop so that the pgbench run
time determines how long the test runs, and we keep changing the LFC for
the whole time.

In the passing, also fix bogus test description, copy-pasted from a
completely unrelated test.
2024-09-24 23:38:16 +03:00
Alexander Bayandin
523cf71721 Fix compiler warnings on macOS (#9128)
## Problem

Compilation of neon extension on macOS produces a warning
```
pgxn/neon/neon_perf_counters.c:50:1: error: non-void function does not return a value [-Werror,-Wreturn-type]
```

## Summary of changes
- Change the return type of `NeonPerfCountersShmemInit` to void
2024-09-24 18:11:31 +00:00
Arpad Müller
c47f355ec1 Catch Cancelled and don't print a warning for it (#9121)
In the `imitate_synthetic_size_calculation_worker` function, we might
obtain the `Cancelled` error variant instead of hitting the cancellation
token based path. Therefore, catch `Cancelled` and handle it analogously
to the cancellation case.
 
Fixes #8886.
2024-09-24 17:28:56 +00:00
Yuchen Liang
4f67b0225b pageserver: handle decompression outside vectored read_blobs (#8942)
Part of #8130.

## Problem

Currently, decompression is performed within the `read_blobs`
implementation and the decompressed blob will be appended to the end of
the `BytesMut` buffer. We will lose this flexibility of extending the
buffer when we switch to using our own dio-aligned buffer (WIP in
https://github.com/neondatabase/neon/pull/8730). To facilitate the
adoption of aligned buffer, we need to refactor the code to perform
decompression outside `read_blobs`.

## Summary of changes

- `VectoredBlobReader::read_blobs` will return `VectoredBlob` without
performing decompression and appending decompressed blob. It becomes the
caller's responsibility to decompress the buffer.
- Added a new `BufView` type that functions as `Cow<Bytes, &[u8]>`.
- Perform decompression within `VectoredBlob::read` so that people don't
have to explicitly thinking about compression when using the reader
interface.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-09-24 16:41:38 +00:00
Heikki Linnakangas
2f7cecaf6a test: Poll pageserver availability more aggressively at test startup
Even with the 100 ms interval, on my laptop the pageserver always
becomes available on second attempt, so this saves about 900 ms at
every test startup.
2024-09-24 17:16:43 +03:00
Heikki Linnakangas
589594c2e1 test: Skip fsync when initdb'ing the storage controller db
After initdb, we configure it with "fsync=off" anyway.
2024-09-24 17:16:43 +03:00
Heikki Linnakangas
70fe007519 test: Make test_hot_standby_feedback more forgiving of slow initialization (#9113)
Don't start waiting for the index to appear in the secondary until it
has been created in the primary. Before, if the "pgbench -i" step took
more than 60 s, we would give up.

There was a flaky test failure along those lines at:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9105/10997477941/index.html#suites/950eff205b552e248417890b8b8f189e/73cf4b5648fa6f74/
Hopefully, this avoids such failures in the future.
2024-09-24 16:41:59 +03:00
a-masterov
b224a5a377 Move the patch to compute (#9120)
## Problem
All the other patches were moved to the compute directory, and only one
was left in the patches subdirectory in the root directory.

## Summary of changes
The patch was moved to the compute directory as others
2024-09-24 15:13:18 +02:00
Christian Schwarz
a65d437930 chore(#9077): cleanups & code dedup (#9082)
Punted from https://github.com/neondatabase/neon/pull/9077
2024-09-24 13:05:07 +00:00
Matthias van de Meent
fc67f8dc60 Update PostgreSQL 17 from 17rc1 to 17.0 (#9119)
The PostgreSQL 17 vendor module is now based on postgres/postgres @
d7ec59a63d745ba74fba0e280bbf85dc6d1caa3e, presumably the final code
change before the V17 tag.
2024-09-24 14:15:52 +02:00
Folke Behrens
2b65a2b53e proxy: check if IP is allowed during webauth flow (#9101)
neondatabase/cloud#12018
2024-09-24 11:52:25 +02:00
Vlad Lazar
9490360df4 storcon: improve initial shard scheduling (#9081)
## Problem

Scheduling on tenant creation uses different heuristics compared to the
scheduling done during
background optimizations. This results in scenarios where shards are
created and then immediately
migrated by the optimizer. 

## Summary of changes

1. Make scheduler aware of the type of the shard it is scheduling
(attached vs secondary).
We wish to have different heuristics.
2. For attached shards, include the attached shard count from the
context in the node score
calculation. This brings initial shard scheduling in line with what the
optimization passes do.
3. Add a test for (2).

This looks like a bigger change than required, but the refactoring
serves as the basis for az-aware
shard scheduling where we also need to make the distinction between
attached and secondary shards.

Closes https://github.com/neondatabase/neon/issues/8969
2024-09-24 09:03:41 +00:00
a-masterov
91d947654e Add regression tests for a cloud-based Neon instance (#8681)
## Problem
We need to be able to run the regression tests against a cloud-based
Neon staging instance to prepare the migration to the arm architecture.

## Summary of changes
Some tests were modified to work on the cloud instance (i.e. added
passwords, server-side copy changed to client-side, etc)

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-09-24 09:44:45 +02:00
Yuchen Liang
37aa6fd953 scrubber: retry when missing index key in the listing (#8873)
Part of #8128, fixes #8872.

## Problem

See #8872.

## Summary of changes

- Retry `list_timeline_blobs` another time if 
  - there are layer file keys listed but not index.
  - failed to download index.
- Instrument code with `analyze-tenant` and `analyze-timeline` span.
- Remove `initdb_archive` check, it could have been deleted.
- Return with exit code 1 on fatal error if `--exit-code` parameter is set.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-09-23 21:58:12 +00:00
Heikki Linnakangas
3ad567290c Move metric exporter and pgbouncer config files
Instead of adding them to the VM image late in the build process, when
putting together the final VM image, include them in the earlier
compute image already. That makes it more convenient to edit the
files, and to test them.
2024-09-24 00:35:52 +03:00
Heikki Linnakangas
3a110e45ed Move files related to building compute image into compute/ dir
Seems nice to keep all these together. This also provides a nice place
for a README file to describe the compute image build process. For
now, it briefly describes the contents of the directory, but can be
expanded.
2024-09-24 00:35:52 +03:00
Heikki Linnakangas
e7e6319e20 Fix compiler warnings with nightly rustc about elided lifetimes having names (#9105)
The warnings:

    warning: elided lifetime has a name
        --> pageserver/src/metrics.rs:1386:29
         |
    1382 |     pub(crate) fn start_timer<'c: 'a, 'a>(
| -- lifetime `'a` declared here
    ...
    1386 |     ) -> Option<impl Drop + '_> {
| ^^ this elided lifetime gets resolved as `'a`
         |
         = note: `#[warn(elided_named_lifetimes)]` on by default

    warning: elided lifetime has a name
        --> pageserver/src/metrics.rs:1537:46
         |
    1534 |     pub(crate) fn start_recording<'c: 'a, 'a>(
| -- lifetime `'a` declared here
    ...
    1537 |     ) -> BasebackupQueryTimeOngoingRecording<'_, '_> {
| ^^ this elided lifetime gets resolved as `'a`

    warning: elided lifetime has a name
        --> pageserver/src/metrics.rs:1537:50
         |
    1534 |     pub(crate) fn start_recording<'c: 'a, 'a>(
| -- lifetime `'a` declared here
    ...
    1537 |     ) -> BasebackupQueryTimeOngoingRecording<'_, '_> {
| ^^ this elided lifetime gets resolved as `'a`

    warning: elided lifetime has a name
        --> pageserver/src/tenant.rs:3630:25
         |
    3622 |     async fn prepare_new_timeline<'a>(
| -- lifetime `'a` declared here
    ...
    3630 |     ) -> anyhow::Result<UninitializedTimeline> {
| ^^^^^^^^^^^^^^^^^^^^^ this elided lifetime gets resolved as `'a`
2024-09-23 23:31:32 +02:00
Matthias van de Meent
d865881d59 NOAI (#9084)
We can't FlushOneBuffer when we're in redo-only mode on PageServer, so
make execution of that function conditional on us not running in
pageserver walredo mode.
2024-09-23 21:16:42 +00:00
Konstantin Knizhnik
1c5d6e59a0 Maintain number of used pages for LFC (#9088)
## Problem

LFC cache entry is chunk (right now size of chunk is 1Mb). LFC
statistics shows number of chunks, but not number of used pages. And
autoscaling team wants to know how sparse LFC is:
https://neondb.slack.com/archives/C04DGM6SMTM/p1726782793595969
It is possible to obtain it from the view `select count(*) from
local_cache`.
Nut it is expensive operation, enumerating all entries in LFC under
lock.

## Summary of changes

This PR added "file_cache_used_pages" to `neon_lfc_stats` view:
```
 select * from neon_lfc_stats;
        lfc_key        | lfc_value 
-----------------------+-----------
 file_cache_misses     |   3139029
 file_cache_hits       |   4098394
 file_cache_used       |      1024
 file_cache_writes     |   3173728
 file_cache_size       |      1024
 file_cache_used_pages |     25689
(6 rows)
```

Please notice that this PR doesn't change neon extension API, so no need
to create new version of Neon extension.
 
## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-09-23 22:05:32 +03:00
Heikki Linnakangas
263dfba6ee Add views for metrics about pageserver requests (#9008)
The metrics include a histogram of how long we need to wait for a
GetPage request, number of reconnects, and number of requests among
other things.

The metrics are not yet exported anywhere, but you can query them
manually.

Note: This does *not* bump the default version of the 'neon' extension. We
will do that later, as a separate PR. The reason is that this allows us to roll back
the compute image smoothly, if necessary. Once the image that includes the
new extension .so file with the new functions has been rolled out, and we're
confident that we don't need to roll back the image anymore, we can change
default extension version and actually start using the new functions and views.

This is what the view looks like:

```
postgres=# select * from neon_perf_counters ;
                metric                 | bucket_le |  value   
---------------------------------------+-----------+----------
 getpage_wait_seconds_count            |           |      300
 getpage_wait_seconds_sum              |           | 0.048506
 getpage_wait_seconds_bucket           |     2e-05 |        0
 getpage_wait_seconds_bucket           |     3e-05 |        0
 getpage_wait_seconds_bucket           |     6e-05 |       71
 getpage_wait_seconds_bucket           |    0.0001 |      124
 getpage_wait_seconds_bucket           |    0.0002 |      248
 getpage_wait_seconds_bucket           |    0.0003 |      279
 getpage_wait_seconds_bucket           |    0.0006 |      297
 getpage_wait_seconds_bucket           |     0.001 |      298
 getpage_wait_seconds_bucket           |     0.002 |      298
 getpage_wait_seconds_bucket           |     0.003 |      298
 getpage_wait_seconds_bucket           |     0.006 |      300
 getpage_wait_seconds_bucket           |      0.01 |      300
 getpage_wait_seconds_bucket           |      0.02 |      300
 getpage_wait_seconds_bucket           |      0.03 |      300
 getpage_wait_seconds_bucket           |      0.06 |      300
 getpage_wait_seconds_bucket           |       0.1 |      300
 getpage_wait_seconds_bucket           |       0.2 |      300
 getpage_wait_seconds_bucket           |       0.3 |      300
 getpage_wait_seconds_bucket           |       0.6 |      300
 getpage_wait_seconds_bucket           |         1 |      300
 getpage_wait_seconds_bucket           |         2 |      300
 getpage_wait_seconds_bucket           |         3 |      300
 getpage_wait_seconds_bucket           |         6 |      300
 getpage_wait_seconds_bucket           |        10 |      300
 getpage_wait_seconds_bucket           |        20 |      300
 getpage_wait_seconds_bucket           |        30 |      300
 getpage_wait_seconds_bucket           |        60 |      300
 getpage_wait_seconds_bucket           |       100 |      300
 getpage_wait_seconds_bucket           |  Infinity |      300
 getpage_prefetch_requests_total       |           |       69
 getpage_sync_requests_total           |           |      231
 getpage_prefetch_misses_total         |           |        0
 getpage_prefetch_discards_total       |           |        0
 pageserver_requests_sent_total        |           |      323
 pageserver_requests_disconnects_total |           |        0
 pageserver_send_flushes_total         |           |      323
 file_cache_hits_total                 |           |        0
(39 rows)
```
2024-09-23 21:28:50 +03:00
Heikki Linnakangas
df3996265f test: Downgrade info message on removing empty directories (#9093)
It was pretty noisy. It changed from debug to info level in commit
78938d1b59, but I believe that was not purpose.
2024-09-23 20:10:22 +02:00
Alex Chi Z.
29699529df feat(pageserver): filter keys with gc-compaction (#9004)
Part of https://github.com/neondatabase/neon/issues/8002

Close https://github.com/neondatabase/neon/issues/8920

Legacy compaction (as well as gc-compaction) rely on the GC process to
remove unused layer files, but this relies on many factors (i.e., key
partition) to ensure data in a dropped table can be eventually removed.

In gc-compaction, we consider the keyspace information when doing the
compaction process. If a key is not in the keyspace, we will skip that
key and not include it in the final output.

However, this is not easy to implement because gc-compaction considers
branch points (i.e., retain_lsns) and the retained keyspaces could
change across different LSNs. Therefore, for now, we only remove aux v1
keys in the compaction process.

## Summary of changes

* Add `FilterIterator` to filter out keys.
* Integrate `FilterIterator` with gc-compaction.
* Add `collect_gc_compaction_keyspace` for a spec of keyspaces that can
be retained during the gc-compaction process.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-23 16:30:44 +00:00
Nikita Kalyanov
f446e08fb8 change HTTP method to comply with spec (#9100)
There is discrepancy with the spec, it has PUT
2024-09-23 15:53:06 +02:00
Christian Schwarz
4d5add9ca0 compact_level0_phase1: remove final traces of value access mode config (#8935)
refs https://github.com/neondatabase/neon/issues/8184
stacked atop https://github.com/neondatabase/neon/pull/8934

This PR changes from ignoring the config field to rejecting configs that
contain it.

PR https://github.com/neondatabase/infra/pull/1903 removes the field
usage from `pageserver.toml`.

It rolls into prod sooner or in the same release as this PR.
2024-09-23 15:05:22 +02:00
Christian Schwarz
59b4c2eaf9 walredo: add a ping method (#8952)
Not used in production, but in benchmarks, to demonstrate minimal RTT.
(It would be nice to not have to copy the 8KiB of zeroes, but, that
would require larger protocol changes).

Found this useful in investigation
https://github.com/neondatabase/neon/pull/8952.
2024-09-23 10:19:37 +00:00
Vlad Lazar
5432155b0d storcon: update compute hook state on detach (#9045)
## Problem

Previously, the storage controller may send compute notifications
containing stale pageservers (i.e. pageserver serving the shard was
detached). This happened because detaches did not update the compute
hook state.

## Summary of Changes

Update compute hook state on shard detach.

Fixes #8928
2024-09-23 10:05:02 +01:00
Heikki Linnakangas
e16e82749f Remove unused crates from workspace Cargo.toml
These were not referenced in any of the other Cargo.toml files in the
workspace. They were not being built because of that, so there was
little harm in having them listed, but let's be tidy.
2024-09-23 00:37:41 +03:00
Heikki Linnakangas
9f653893b9 Update a few dependencies, removing some indirect dependencies
cargo update ciborium iana-time-zone lazy_static schannel uuid
    cargo update hyper@0.14
    cargo update  --precise 2.9.7 ureq

It might be worthwhile just update all our dependencies at some point,
but this is aimed at pruning the dependency tree, to make the build a
little faster. That's also why I didn't update ureq to the latest
version: that would've added a dependency to yet another version of
rustls.
2024-09-23 00:37:41 +03:00
Heikki Linnakangas
913af44219 Update "memoffset" crate
To eliminate one version of it from our dependency tree.
2024-09-23 00:37:41 +03:00
Heikki Linnakangas
ecd615ab6d Update "hostname" crate
We were already building v0.4.0 as an indirect dependency, so this
avoids having to build two different versions of it.
2024-09-23 00:37:41 +03:00
Heikki Linnakangas
c9b2ec9ff1 Check submodule forward progress (#8949)
We frequently mess up our submodule references. This adds one safeguard:
it checks that the submodule references are only updated "forwards", not
to some older commit, or a commit that's not a descended of the previous 
one.

As next step, I'm thinking that we should automate things so that when
you merge a PR to the 'neon' repository that updates the submodule
references, the REL_*_STABLE_neon branches are automatically updated to
match the submodule references. That way, you never need to manually
merge PRs in the postgres repository, it's all triggered from commits in
the 'neon' repository. But that's not included here.
2024-09-22 21:46:53 +03:00
Arpad Müller
a3800dcb0c Move load_timeline_metadata into separate function (#9080)
Moves the per-timeline code to load timeline metadata into a new
dedicated function called `load_timeline_metadata`. The old
`load_timeline_metadata` becomes `load_timelines_metadata`.

Split out of #8907

Part of #8088
2024-09-21 12:36:41 +00:00
Heikki Linnakangas
9a32aa828d Fix init of WAL page header at startup (#8914)
If the primary is started at an LSN within the first of a 16 MB WAL
segment, the "long XLOG page header" at the beginning of the segment was
not initialized correctly. That has gone unnnoticed, because under
normal circumstances, nothing looks at the page header. The WAL that is
streamed to the safekeepers starts at the new record's LSN, not at the
beginning of the page, so that bogus page header didn't propagate
elsewhere, and a primary server doesn't normally read the WAL its
written. Which is good because the contents of the page would be bogus
anyway, as it wouldn't contain any of the records before the LSN where
the new record is written.

Except that in the following cases a primary does read its own WAL:

1. When there are two-phase transactions in prepared state at
checkpoint. The checkpointer reads the two-phase state from the
XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/.

2. Logical decoding reads the WAL starting from the replication slot's
restart LSN.

This PR fixes the problem with two-phase transactions. For that, it's
sufficient to initialize the page header correctly. The checkpointer
only needs to read XLOG_XACT_PREPARE records that were generated after
the server startup, so it's still OK that older WAL is missing / bogus.

I have not investigated if we have a problem with logical decoding,
however. Let's deal with that separately.

Special thanks to @Lzjing-1997, who independently found the same bug
and opened a PR to fix it, although I did not use that PR.
2024-09-21 04:00:38 +03:00
Anastasia Lubennikova
f03f7b3868 Bump vendor/postgres to include extension path fix (#9076)
This is a pre requisite for
https://github.com/neondatabase/neon/pull/8681
2024-09-20 20:24:40 +03:00
Christian Schwarz
ec5dce04eb pageserver: throttling: per-tenant metrics + more metrics to help understand throttle queue depth (#9077) 2024-09-20 16:48:26 +00:00
John Spray
6014f15157 pageserver: suppress noisy "layer became visible" logs (#9064)
## Problem

When layer visibility was added, an info log was included for the
situation where actual access to a layer disagrees with the visibility
calculation. This situation is safe, but I was interested in seeing when
it happens.

The log is pretty high volume, so this PR refines it to fire less often.

## Summary of changes

- For cases where accessing non-visible layers is normal, don't log at
all.
- Extend a unit test to increase confidence that the updates to
visibility on access are working as expected
- During compaction, only call the visibility calculation routine if
some image layers were created: previously, frequent calls resulted in
the visibility of layers getting reset every time we passed through
create_image_layers.
2024-09-20 16:07:09 +00:00
Conrad Ludgate
e675a21346 utils: leaky bucket should only report throttled if the notify queue is blocked on sleep (#9072)
## Problem

Seems that PS might be too eager in reporting throttled tasks

## Summary of changes

Introduce a sleep counter. If the sleep counter increases, then the
acquire tasks was throttled.
2024-09-20 16:09:39 +01:00
Alex Chi Z.
6b93230270 fix(pageserver): receive body error now 500 (#9052)
close https://github.com/neondatabase/neon/issues/8903

In https://github.com/neondatabase/neon/issues/8903 we observed JSON
decoding error to have the following error message in the log:

```
Error processing HTTP request: Resource temporarily unavailable: 3956 (pageserver-6.ap-southeast-1.aws.neon.tech) error receiving body: error decoding response body
```

This is hard to understand. In this patch, we make the error message
more reasonable.

## Summary of changes

* receive body error is now an internal server error, passthrough the
`reqwest::Error` (only decoding error) as `anyhow::Error`.
* instead of formatting the error using `to_string`, we use the
alternative `anyhow::Error` formatting, so that it prints out the cause
of the error (i.e., what exactly cannot serde decode).

I would expect seeing something like `error receiving body: error
decoding response body: XXX field not found` after this patch, though I
didn't set up a testing environment to observe the exact behavior.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-20 10:37:28 -04:00
Heikki Linnakangas
797aa4ffaa Skip running clippy in --release mode. (#9073)
It's pretty expensive to run, and there is very little difference
between debug and release builds that could lead to different clippy
warnings.

This is extracted from PR #8912. That PR wandered off into various
improvements we could make, but we seem to have consensus on this part
at least.
2024-09-20 17:22:58 +03:00
Christian Schwarz
c45b56e0bb pageserver: add counters for started smgr/getpage requests (#9069)
After this PR

```
curl localhost:9898/metrics | grep smgr_ | grep start
```

```
pageserver_smgr_query_started_count{shard_id="0000",smgr_query_type="get_page_at_lsn",tenant_id="...",timeline_id="..."} 0
pageserver_smgr_query_started_global_count{smgr_query_type="get_db_size"} 0
pageserver_smgr_query_started_global_count{smgr_query_type="get_page_at_lsn"} 0
pageserver_smgr_query_started_global_count{smgr_query_type="get_rel_exists"} 0
pageserver_smgr_query_started_global_count{smgr_query_type="get_rel_size"} 0
pageserver_smgr_query_started_global_count{smgr_query_type="get_slru_segment"} 0
```

We instantiate the per-tenant counter only for `get_page_at_lsn`.
2024-09-20 14:55:50 +01:00
Alexander Bayandin
3104f0f250 Safekeeper: fix OpenAPI spec (#9066)
## Problem

Safekeeper's OpenAPI spec is incorrect:

```
Semantic error at paths./v1/tenant/{tenant_id}/timeline/{timeline_id}.get.responses.404.content.application/json.schema.$ref
$refs must reference a valid location in the document
Jump to line 126
```
Checked on https://editor.swagger.io

## Summary of changes
- Add `NotFoundError` 
- Add `description` and `license` fields to make Cloud OpenAPI spec
linter happy
2024-09-20 12:00:05 +01:00
Arseny Sher
f2c08195f0 Bump vendor/postgres.
Includes PRs:
- ERROR out instead of segfaulting when walsender slots are full.
- logical worker: respond to publisher even under dense stream.
2024-09-20 12:38:42 +03:00
Alex Chi Z.
d0cbfda15c refactor(pageserver): check layer map valid in one place (#9051)
We have 3 places where we implement layer map checks.

## Summary of changes

Now we have a single check function being called in all places.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-19 20:29:28 +00:00
Yuchen Liang
1708743e78 pageserver: wait for lsn lease duration after transition into AttachedSingle (#9024)
Part of #7497, closes https://github.com/neondatabase/neon/issues/8890.

## Problem

Since leases are in-memory objects, we need to take special care of them
after pageserver restarts and while doing a live migration. The approach
we took for pageserver restart is to wait for at least lease duration
before doing first GC. We want to do the same for live migration. Since
we do not do any GC when a tenant is in `AttachedStale` or
`AttachedMulti` mode, only the transition from `AttachedMulti` to
`AttachedSingle` requires this treatment.

## Summary of changes

- Added `lsn_lease_deadline` field in `GcBlock::reasons`: the tenant is
temporarily blocked from GC until we reach the deadline. This
information does not persist to S3.
- In `GCBlock::start`, skip the GC iteration if we are blocked by the
lsn lease deadline.
- In `TenantManager::upsert_location`, set the lsn_lease_deadline to
`Instant::now() + lsn_lease_length` so the granted leases have a chance
to be renewed before we run GC for the first time after transitioned
from AttachedMulti to AttachedSingle.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2024-09-19 17:27:10 +01:00
Conrad Ludgate
0a1ca7670c proxy: remove auth info from http conn info & fixup jwt api trait (#9047)
misc changes split out from #8855 

- **allow cloning the request context in a read-only fashion for
background tasks**
- **propagate endpoint and request context through the jwk cache**
- **only allow password based auth for md5 during testing**
- **remove auth info from conn info**
2024-09-19 15:09:30 +00:00
Alex Chi Z.
ff9f065c43 impr(pageserver): log image layer creation (#9050)
https://github.com/neondatabase/neon/pull/9028 changed the image layer
creation log into trace level. However, I personally find logging image
layer creation useful when reading the logs -- it makes it clear that
the image layer creation is happening and gives a clear idea of the
progress. Therefore, I propose to continue logging them for
create_image_layers set of functions.

## Summary of changes

* Add info logging for all image layers created in legacy compaction.
* Add info logging for all layers creation in testing functions.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-19 10:43:12 -04:00
Vlad Lazar
21eeafaaa5 pageserver: simple fix for vectored read image layer skip (#9026)
## Problem

Different keyspaces may require different floor LSNs in vectored
delta layer visits. This patch adds support for such cases.

## Summary of changes

Different keyspaces wishing to read the same layer might
require different stop lsns (or lsn floor). The start LSN
of the read (or the lsn ceil) will always be the same.

With this observation, we fix skipping of image layers by
indexing the fringe by layer id plus lsn floor.

This is very simple, but means that we can visit delta layers twice
in certain cases. Still, I think it's very unlikely for any extra
merging to have taken place in this case, so perhaps it makes sense to go
with the simpler patch.

Fixes https://github.com/neondatabase/neon/issues/9012
Alternative to https://github.com/neondatabase/neon/pull/9025
2024-09-19 14:51:00 +01:00
Arseny Sher
32a0e759bd safekeeper: add wal_last_modified to debug_dump.
Adds to debug_dump option to include highest modified time among all WAL
segments. In passing replace some str with OsStr to have less unwraps.
2024-09-19 16:17:25 +03:00
Heikki Linnakangas
7c489092b7 Remove unused duplicate DEFAULT_INGEST_BATCH_SIZE constant
This constant in 'tenant_conf_defaults' was unused, but there's
another constant with the same name in the global 'defaults'. I wish
the setting was configurable per-tenant, but it isn't, so let's remove
the confusing duplicate.
2024-09-19 15:41:35 +03:00
Heikki Linnakangas
06d55a3b12 Clean up concurrent logical size calc semaphore initialization
The DEFAULT_CONCURRENT_TENANT_SIZE_LOGICAL_SIZE_QUERIES constant was
unused, because we had just hardcoded it to 1 where the constant
should've been used.

Remove the ConfigurableSemaphore::Default implementation, since it was
unused.
2024-09-19 15:41:35 +03:00
Heikki Linnakangas
5c68e6a172 Remove unused constant
The code that used it was removed in commit b9d2c7bdd5
2024-09-19 15:41:35 +03:00
Heikki Linnakangas
2753abc0d8 Remove leftover enums for configuring vectored get implementation
The settings were removed in commit corb9d2c7b.
2024-09-19 15:41:35 +03:00
Conrad Ludgate
9c23333cb3 Merge pull request #9056 from neondatabase/rc/proxy/2024-09-19
Proxy release 2024-09-19
2024-09-19 10:41:17 +01:00
Heikki Linnakangas
a523548ed1 Remove unused cleanup_remaining_timeline_fs_traces function
There's some more code that still checks for uninit and delete
markers, see callers of is_delete_mark and is_uninit_mark, and github
issue #5718. But these functions were outright dead.
2024-09-19 11:57:10 +03:00
Heikki Linnakangas
2d4e5af18b Remove unused code for parsing a postgresql.conf file 2024-09-19 11:57:10 +03:00
Heikki Linnakangas
5da2340e74 Remove misc dead code in control_plane/ 2024-09-19 11:57:10 +03:00
Heikki Linnakangas
7b34c2d7af Remove misc dead code in libs/ 2024-09-19 11:57:10 +03:00
Heikki Linnakangas
15ae1fc3df Remove a few postgres constants that were not used
Dead code is generally useless, but with Postgres constants in
particular, I'm also worried that if they're not used anywhere, we
might fail to update them at a Postgres version update, and get very
confused later when they have wrong values.
2024-09-19 11:57:10 +03:00
Heikki Linnakangas
728b79b9dd Remove some unnecessary derives 2024-09-19 11:57:10 +03:00
Alex Chi Z.
9d1c6f23d3 fix(storage-scrubber): log version after initialize the logger (#9049)
When I checked the log in Grafana I couldn't find the scrubber version.
Then I realized that it should be logged after the logger gets
initialized.

## Summary of changes

Log after initializing the logger for the scrubber.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-09-18 14:13:57 -04:00
Christian Schwarz
035a49a6b2 neon_local start: parallel startup to break cyclic dependency (#8950)
(Found this useful during investigation
https://github.com/neondatabase/cloud/issues/16886.)

Problem
-------

Before this PR, `neon_local` sequentially does the following:
1. launch storcon process
2. wait for storcon to signal readiness
[here](75310fe441/control_plane/src/storage_controller.rs (L804-L808))
3. start pageserver
4. wait for pageserver to become ready
[here](c43e664ff5/control_plane/src/pageserver.rs (L343-L346))
5. etc

The problem is that storcon's readiness waits for the
[`startup_reconcile`](cbcd4058ed/storage_controller/src/service.rs (L520-L523))
to complete.

But pageservers aren't started at this point.

So, worst case we wait for `STARTUP_RECONCILE_TIMEOUT/2`, i.e., 15s.

This is more than the 10s default timeout allowed by neon_local.

So, the result is that `neon_local start` fails to start storcon and
stops everything.

Solution
--------

In this PR I choose the the radical solution to start everything in
parallel.

It junks up the output because we do stuff like `print!(".")` to
indicate progress.
We should just abandon that.
And switch to `utils::logging` + `tracing` with separate spans for each
component.
I can do that in this PR or we leave it as a follow-up.

Alternatives Considered
-----------------------

The Pageserver's `/v1/status` or in fact any endpoint of the mgmt API
will not `accept()` on the mgmt API socket until after the `re-attach`
call to storcon returned success.

So, it's insufficient to change the startup order to start Pageservers
first.

We cannot easily change Pageserver startup order because
`init_tenant_mgr` must complete before we start serving the mgmt API.
Otherwise tenant detach calls et al can race with `init_tenant_mgr`.

We'd have to add a "loading" state to tenant mgr and make all API
endpoints except `/v1/status` wait for _that_ to complete.


Related
-------

- https://github.com/neondatabase/neon/pull/6475
2024-09-18 18:17:55 +02:00
Folke Behrens
794bd4b866 proxy: mock cplane usable without allowed-ips table (#9046) 2024-09-18 17:14:53 +02:00
Alexander Bayandin
ac6a1151ae test_postgres_version: reenable version check for prereleased versions 2024-09-18 14:51:59 +01:00
Tristan Partin
2f37f0384c Add v17 to revisions.json
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-09-18 14:51:59 +01:00
Alexander Bayandin
e161a2fa42 CI(deploy): fix deploy to staging and prod (#9030)
## Problem

It turns out the previous approach (with `skip_if` input) doesn't work
(from https://github.com/neondatabase/neon/pull/9017).
Revert it and use more straightforward if-conditions

## Summary of changes
- Revert efbe8db7f1
- Add if-condition to`promote-compatibility-data` job and relevant
comments
2024-09-18 14:26:47 +01:00
Folke Behrens
c5cd8577ff proxy: make sql-over-http max request/response sizes configurable (#9029) 2024-09-18 13:58:51 +02:00
Heikki Linnakangas
3454ef7507 Refactor ImageLayerWriter to avoid passing a Timeline to finish() (#9028)
Commit ca5390a89d made a similar change to DeltaLayerWriter.

We bumped into this with Stas with our hackathon project, to create a
standalong program to create image layers directly from a Postgres data
directory. It needs to create image layers without having a Timeline and
other pageserver machinery.

This downgrades the "created image layer {}" message from INFO to TRACE
level. TRACE is used for the corresponding message on delta layer
creation too. The path logged in the message is now the temporary path,
before the file is renamed to its final name. Again commit ca5390a89d
made the same change for the message on delta layer creation.
2024-09-18 13:16:51 +03:00
Christian Schwarz
135e7e4306 add neon_local subcommand for the broker & use that from regression tests (#8948)
There's currently no way to just start/stop broker from `neon_local`.

This PR
* adds a sub-command
* uses that sub-command from the test suite instead of the pre-existing
Python `subprocess` based approach.

Found this useful during investigation
https://github.com/neondatabase/cloud/issues/16886.
2024-09-18 09:10:27 +02:00
Christian Schwarz
3cd2a3f931 refactor(walredo): process launch & kill-on-error machinery (#8951)
Immediate benefit: easier to spot what's going on.

Later benefit: use the extracted method in PR

- https://github.com/neondatabase/neon/pull/8952

which adds a `ping` command to walredo.

Found this useful during investigation
https://github.com/neondatabase/cloud/issues/16886.
2024-09-17 19:16:33 +00:00
Alexander Bayandin
d78f5ce6da CI: don't fetch the whole git history if it's not required (#9021)
## Problem
We do use `actions/checkout` with `fetch-depth: 0` when it's not
required

## Summary of changes
- Remove unneeded `fetch-depth: 0`
- Add a comment if `fetch-depth: 0` is required
2024-09-17 18:40:05 +01:00
Arpad Müller
a1b71b73fe Rename some S3 usages to "remote storage" in exposed messages (#8999)
In exposed messages like log messages we mentioned "S3", which is not
entirely accurate as we support Azure blob storage now as well.
2024-09-17 19:15:01 +02:00
Tristan Partin
6138eb50e9 Fix test code related to migrations
We added another migration in 5876c441ab,
but didn't bump this value. This had no effect, but best to fix it
anyway.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-09-17 15:56:05 +01:00
Heikki Linnakangas
d211f00f05 Remove unnecessary dependencies (#9000)
Found by "cargo machete"
2024-09-17 17:55:45 +03:00
Alexander Bayandin
cd4276fd65 CI: fix release pipeline (#9017)
## Problem

We've got 2 non-blocking failures on the release pipeline:
- `promote-compatibility-data` job got skipped _presumably_ because one
of the dependencies of `deploy` job (`push-to-acr-dev`) got skipped
(https://github.com/neondatabase/neon/pull/8940)
- `coverage-report` job fails because we don't build debug artifacts in
the release branch (https://github.com/neondatabase/neon/pull/8561)

## Summary of changes
- Always run `push-to-acr-dev` / `push-to-acr-prod` jobs, but add
`skip_if` parameter to the reusable workflow, which can skip the job
internally, without skipping externally
- Do not run `coverage-report` on release branches
2024-09-17 10:17:48 +01:00
Vlad Lazar
b719d58863 storcon: forward requests from stepped down instance to the current leader (#8954)
## Problem
It turns out that we can't rely on external orchestration to promptly
route trafic to the new leader. This is downtime inducing.
Forwarding provides a safe way out.

## Safety
We forward when:
1. Request is not one of ["/control/v1/step_down", "/status", "/ready",
"/metrics"]
2. Current instance is in [`LeadershipStatus::SteppedDown`] state
3. There is a leader in the database to forward to
4. Leader from step (3) is not the current instance

If a storcon instance is persisted in the database, then we know that it
is the current leader.
There's one exception: time between handling step-down request and the
new leader updating the
database.

Let's treat the happy case first. The stepped down node does not produce
any side effects,
since all request handling happens on the leader.

As for the edge case, we are guaranteed to always have a maximum of two
running instances.
Hence, if we are in the edge case scenario the leader persisted in the
database is the
stepped down instance that received the request. Condition (4) above
covers this scenario.

## Summary of changes
* Conversion utilities for reqwest <-> hyper. I'm not happy with these,
but I don't see a better way. Open to suggestions.
* Add request forwarding logic
* Update each request handler. Again, not happy with this. If anyone
knows a nice to wrap the handlers, lmk. Me and Joonas tried :/
* Update each handler to maybe forward
* Tweak tests to showcase new behaviour
2024-09-17 09:25:42 +01:00
Heikki Linnakangas
2db840d8b8 Move a few test functions related to auth tokens to separate file (#9018)
For readability. neon_fixtures.py is huge.
2024-09-17 06:53:18 +03:00
Heikki Linnakangas
4295ff0f07 Mark a couple of test fixtures as session-scoped (#9018)
pg_distrib_dir doesn't include the Postgres version and only depends
on env variables which cannot change during a test run, so it can be
marked as session-scoped. Similarly, the platform cannot change during
a test run.
2024-09-17 06:53:18 +03:00
Heikki Linnakangas
c6f56b8462 Remove redundant get_dir_size() function (#9018)
There was another copy of it in utils.py. The only difference is that
the version in utils.py tolerates files that are concurrently
removed. That seems fine for the few callers in neon_fixtures.py too.
2024-09-17 06:53:18 +03:00
Heikki Linnakangas
fec9321fc0 Use Path type in a few more places in neon_fixtures.py (#9018)
This is in preparation of replacing neon_fixtures.get_dir_size with
neon_fixtures.utils.get_dir_size() in next commit.
2024-09-17 06:53:18 +03:00
Heikki Linnakangas
3a52e356c1 Remove unused function (#9018) 2024-09-17 06:53:18 +03:00
Tristan Partin
5e16c7bb0b Generate pgbench data on the server for most tests
This should generally be faster when running tests, especially those
that run with higher scales.

Ignoring test_lfc_resize since it seems like we are hitting a query
timeout for some reason that I have yet to investigate. A little bit of
improvemnt is better than none.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-09-16 23:37:36 +01:00
Heikki Linnakangas
2bbb4d3e1c Remove misc unused code (#9014) 2024-09-16 18:45:19 +00:00
Matthias van de Meent
c8bedca582 Fix PG17's extension modifications (#9010)
This also reduces the GRANT statements to one per created _reset
function
2024-09-16 17:06:31 +01:00
Tristan Partin
5876c441ab Grant access to pg_show_replication_origin_status for neon_superuser
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-09-16 16:38:55 +01:00
Alexander Bayandin
b2c83db54d CI(gather-rust-build-stats): set PQ_LIB_DIR to Postgres 17 (#9001)
## Problem

`gather-rust-build-stats` extra CI job fails with 
```
"PQ_LIB_DIR" doesn't exist in the configured path: "/__w/neon/neon/pg_install/v16/lib"
```

## Summary of changes
- Use the path to Postgres 17 for the `gather-rust-build-stats` job. 
The job uses Postgres built by `make walproposer-lib`
2024-09-16 12:44:26 +01:00
Matthias van de Meent
0a8c5e1214 Fix broken image for PG17 (#8998)
Most extensions are not required to run Neon-based PostgreSQL, but the
Neon extension is _quite_ critical, so let's make sure we include it.

## Problem

Staging doesn't have working compute images for PG17

## Summary of changes

Disable some PG17 filters so that we get the critical components into the PG17 image
2024-09-13 15:10:52 +01:00
Matthias van de Meent
78938d1b59 [compute/postgres] feature: PostgreSQL 17 (#8573)
This adds preliminary PG17 support to Neon, based on RC1 / 2024-09-04
07b828e9d4

NOTICE: The data produced by the included version of the PostgreSQL fork
may not be compatible with the future full release of PostgreSQL 17 due to
expected or unexpected future changes in magic numbers and internals.
DO NOT EXPECT DATA IN V17-TENANTS TO BE COMPATIBLE WITH THE 17.0
RELEASE!

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-09-12 23:18:41 +01:00
Conrad Ludgate
66a99009ba Merge pull request #8799 from neondatabase/rc/proxy/2024-08-22
Proxy release 2024-08-22
2024-08-22 10:04:56 +01:00
Conrad Ludgate
5d4c57491f Merge pull request #8723 from neondatabase/rc/proxy/2024-08-14
Proxy release 2024-08-14
2024-08-14 13:05:51 +01:00
Conrad Ludgate
73935ea3a2 Merge pull request #8647 from neondatabase/rc/proxy/2024-08-08
Proxy release 2024-08-08
2024-08-08 15:37:09 +01:00
Conrad Ludgate
32e595d4dd Merge branch 'release-proxy' into rc/proxy/2024-08-08 2024-08-08 13:53:33 +01:00
Conrad Ludgate
b0d69acb07 Merge pull request #8505 from neondatabase/rc/proxy/2024-07-25
Proxy release 2024-07-25
2024-07-25 11:07:19 +01:00
Conrad Ludgate
98355a419a Merge pull request #8351 from neondatabase/rc/proxy/2024-07-11
Proxy release 2024-07-11
2024-07-11 10:40:17 +01:00
Conrad Ludgate
cfb03d6cf0 Merge pull request #8178 from neondatabase/rc/proxy/2024-06-27
Proxy release 2024-06-27
2024-06-27 11:35:30 +01:00
Conrad Ludgate
d81ef3f962 Revert "proxy: update tokio-postgres to allow arbitrary config params (#8076)"
This reverts commit 78d9059fc7.
2024-06-27 09:46:58 +01:00
Conrad Ludgate
5d62c67e75 Merge pull request #8117 from neondatabase/rc/proxy/2024-06-20
Proxy release 2024-06-20
2024-06-20 11:42:35 +01:00
Anna Khanova
53d53d5b1e Merge pull request #7980 from neondatabase/rc/proxy/2024-06-06
Proxy release 2024-06-06
2024-06-06 13:14:40 +02:00
Anna Khanova
29fe6ea47a Merge pull request #7909 from neondatabase/rc/proxy/2024-05-30
Proxy release 2024-05-30
2024-05-30 14:59:41 +02:00
Alexander Bayandin
640327ccb3 Merge pull request #7880 from neondatabase/rc/proxy/2024-05-24
Proxy release 2024-05-24
2024-05-24 18:00:18 +01:00
Anna Khanova
7cf0f6b37e Merge pull request #7853 from neondatabase/rc/proxy/2024-05-23
Proxy release 2024-05-23
2024-05-23 12:09:13 +02:00
Anna Khanova
03c2c569be [proxy] Do not fail after parquet upload error (#7858)
## Problem

If the parquet upload was unsuccessful, it will panic.

## Summary of changes

Write error in logs instead.
2024-05-23 11:44:47 +02:00
Conrad Ludgate
eff6d4538a Merge pull request #7654 from neondatabase/rc/proxy/2024-05-08
Proxy release 2024-05-08
2024-05-08 11:56:20 +01:00
Conrad Ludgate
5ef7782e9c Merge pull request #7649 from neondatabase/rc/proxy/2024-05-08
Proxy release 2024-05-08
2024-05-08 06:54:03 +01:00
Conrad Ludgate
73101db8c4 Merge branch 'release-proxy' into rc/proxy/2024-05-08 2024-05-08 06:43:57 +01:00
Anna Khanova
bccdfc6d39 Merge pull request #7580 from neondatabase/rc/proxy/2024-05-02
Proxy release 2024-05-02
2024-05-02 12:00:01 +02:00
Anna Khanova
99595813bb proxy: keep track on the number of events from redis by type. (#7582)
## Problem

It's unclear what is the distribution of messages, proxy is consuming
from redis.

## Summary of changes

Add counter.
2024-05-02 11:56:19 +02:00
Anna Khanova
fe07b54758 Merge pull request #7507 from neondatabase/rc/proxy/2024-04-25
Proxy release 2024-04-25
2024-04-25 13:50:05 +02:00
Anna Khanova
a42d173e7b proxy: Fix cancellations (#7510)
## Problem

Cancellations were published to the channel, that was never read.

## Summary of changes

Fallback to global redis publishing.
2024-04-25 13:42:25 +02:00
Anna Khanova
e07f689238 Update connect to compute and wake compute retry configs (#7509)
## Problem

## Summary of changes

Decrease waiting time
2024-04-25 13:20:21 +02:00
Conrad Ludgate
7831eddc88 Merge pull request #7417 from neondatabase/rc/proxy/2024-04-18
Proxy release 2024-04-18
2024-04-18 12:03:07 +01:00
Conrad Ludgate
943b1bc80c Merge pull request #7366 from neondatabase/proxy-hotfix
Release proxy (2024-04-11 hotfix)
2024-04-12 10:15:14 +01:00
Conrad Ludgate
95a184e9b7 proxy: fix overloaded db connection closure (#7364)
## Problem

possible for the database connections to not close in time.

## Summary of changes

force the closing of connections if the client has hung up
2024-04-11 23:38:47 +01:00
Conrad Ludgate
3fa17e9d17 Merge pull request #7357 from neondatabase/rc/proxy/2024-04-11
Proxy release 2024-04-11
2024-04-11 11:49:45 +01:00
Anna Khanova
55e0fd9789 Merge pull request #7304 from neondatabase/rc/proxy/2024-04-04
Proxy release 2024-04-04
2024-04-04 12:40:11 +02:00
Anna Khanova
2a88889f44 Merge pull request #7254 from neondatabase/rc/proxy/2024-03-27
Proxy release 2024-03-27
2024-03-27 11:44:09 +01:00
Conrad Ludgate
5bad8126dc Merge pull request #7173 from neondatabase/rc/proxy/2024-03-19
Proxy release 2024-03-19
2024-03-19 12:11:42 +00:00
Anna Khanova
27bc242085 Merge pull request #7119 from neondatabase/rc/proxy/2024-03-14
Proxy release 2024-03-14
2024-03-14 14:57:05 +05:00
Anna Khanova
192b49cc6d Merge branch 'release-proxy' into rc/proxy/2024-03-14 2024-03-14 14:16:36 +05:00
Conrad Ludgate
e1b60f3693 Merge pull request #7041 from neondatabase/rc/proxy/2024-03-07
Proxy release 2024-03-07
2024-03-08 08:19:16 +00:00
Anna Khanova
2804f5323b Merge pull request #6997 from neondatabase/rc/proxy/2024-03-04
Proxy release 2024-03-04
2024-03-04 17:36:11 +04:00
Anna Khanova
676adc6b32 Merge branch 'release-proxy' into rc/proxy/2024-03-04 2024-03-04 16:41:46 +04:00
Nikita Kalyanov
96a4e8de66 Add /terminate API (#6745) (#6853)
this is to speed up suspends, see
https://github.com/neondatabase/cloud/issues/10284


Cherry-pick to release branch to build new compute images
2024-02-22 11:51:19 +02:00
Arseny Sher
01180666b0 Merge pull request #6803 from neondatabase/releases/2024-02-19
Release 2024-02-19
2024-02-19 16:38:35 +04:00
Conrad Ludgate
6c94269c32 Merge pull request #6758 from neondatabase/release-proxy-2024-02-14
2024-02-14 Proxy Release
2024-02-15 09:45:08 +00:00
Anna Khanova
edc691647d Proxy: remove fail fast logic to connect to compute (#6759)
## Problem

Flaky tests

## Summary of changes

Remove failfast logic
2024-02-15 07:42:12 +00:00
Conrad Ludgate
855d7b4781 hold cancel session (#6750)
## Problem

In a recent refactor, we accidentally dropped the cancel session early

## Summary of changes

Hold the cancel session during proxy passthrough
2024-02-14 14:57:22 +00:00
Anna Khanova
c49c9707ce Proxy: send cancel notifications to all instances (#6719)
## Problem

If cancel request ends up on the wrong proxy instance, it doesn't take
an effect.

## Summary of changes

Send redis notifications to all proxy pods about the cancel request.

Related issue: https://github.com/neondatabase/neon/issues/5839,
https://github.com/neondatabase/cloud/issues/10262
2024-02-14 14:57:22 +00:00
Anna Khanova
2227540a0d Proxy refactor auth+connect (#6708)
## Problem

Not really a problem, just refactoring.

## Summary of changes

Separate authenticate from wake compute.

Do not call wake compute second time if we managed to connect to
postgres or if we got it not from cache.
2024-02-14 14:57:22 +00:00
Conrad Ludgate
f1347f2417 proxy: add more http logging (#6726)
## Problem

hard to see where time is taken during HTTP flow.

## Summary of changes

add a lot more for query state. add a conn_id field to the sql-over-http
span
2024-02-14 14:57:22 +00:00
Conrad Ludgate
30b295b017 proxy: some more parquet data (#6711)
## Summary of changes

add auth_method and database to the parquet logs
2024-02-14 14:57:22 +00:00
Anna Khanova
1cef395266 Proxy: copy bidirectional fork (#6720)
## Problem

`tokio::io::copy_bidirectional` doesn't close the connection once one of
the sides closes it. It's not really suitable for the postgres protocol.

## Summary of changes

Fork `copy_bidirectional` and initiate a shutdown for both connections.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2024-02-14 14:57:22 +00:00
John Spray
78d160f76d Merge pull request #6721 from neondatabase/releases/2024-02-12
Release 2024-02-12
2024-02-12 09:35:30 +00:00
Vlad Lazar
b9238059d6 Merge pull request #6617 from neondatabase/releases/2024-02-05
Release 2024-02-05
2024-02-05 12:50:38 +00:00
Arpad Müller
d0cb4b88c8 Don't preserve temp files on creation errors of delta layers (#6612)
There is currently no cleanup done after a delta layer creation error,
so delta layers can accumulate. The problem gets worse as the operation
gets retried and delta layers accumulate on the disk. Therefore, delete
them from disk (if something has been written to disk).
2024-02-05 09:58:18 +00:00
John Spray
1ec3e39d4e Merge pull request #6504 from neondatabase/releases/2024-01-29
Release 2024-01-29
2024-01-29 10:05:01 +00:00
John Spray
a1a74eef2c Merge pull request #6420 from neondatabase/releases/2024-01-22
Release 2024-01-22
2024-01-22 17:24:11 +00:00
John Spray
90e689adda pageserver: mark tenant broken when cancelling attach (#6430)
## Problem

When a tenant is in Attaching state, and waiting for the
`concurrent_tenant_warmup` semaphore, it also listens for the tenant
cancellation token. When that token fires, Tenant::attach drops out.
Meanwhile, Tenant::set_stopping waits forever for the tenant to exit
Attaching state.

Fixes: https://github.com/neondatabase/neon/issues/6423

## Summary of changes

- In the absence of a valid state for the tenant, it is set to Broken in
this path. A more elegant solution will require more refactoring, beyond
this minimal fix.

(cherry picked from commit 93572a3e99)
2024-01-22 16:20:57 +00:00
Christian Schwarz
f0b2d4b053 fixup(#6037): actually fix the issue, #6388 failed to do so (#6429)
Before this patch, the select! still retured immediately if `futs` was
empty. Must have tested a stale build in my manual testing of #6388.

(cherry picked from commit 15c0df4de7)
2024-01-22 15:23:12 +00:00
Anna Khanova
299d9474c9 Proxy: fix gc (#6426)
## Problem

Gc currently doesn't work properly.

## Summary of changes

Change statement on running gc.
2024-01-22 14:39:09 +01:00
Conrad Ludgate
7234208b36 bump shlex (#6421)
## Problem

https://rustsec.org/advisories/RUSTSEC-2024-0006

## Summary of changes

`cargo update -p shlex`

(cherry picked from commit 5559b16953)
2024-01-22 09:49:33 +00:00
Christian Schwarz
93450f11f5 Merge pull request #6354 from neondatabase/releases/2024-01-15
Release 2024-01-15

NB: the previous release PR https://github.com/neondatabase/neon/pull/6286 was accidentally merged by merge-by-squash instead of merge-by-merge-commit.
See https://github.com/neondatabase/neon/pull/6354#issuecomment-1891706321 for more context.
2024-01-15 14:30:25 +01:00
Christian Schwarz
2f0f9edf33 Merge remote-tracking branch 'origin/release' into releases/2024-01-15 2024-01-15 09:36:42 +00:00
Christian Schwarz
d424f2b7c8 empty commit so we can produce a merge commit 2024-01-15 09:36:22 +00:00
Christian Schwarz
21315e80bc Merge branch 'releases/2024-01-08--not-squashed' into releases/2024-01-15 2024-01-15 09:31:07 +00:00
vipvap
483b66d383 Merge branch 'release' into releases/2024-01-08 (not-squashed merge of #6286)
Release PR https://github.com/neondatabase/neon/pull/6286 got
accidentally merged-by-squash intstead of merge-by-merge-commit.

This commit shows how things would look like if 6286 had been
merged-by-squash.

```
git reset --hard 9f1327772
git merge --no-ff 5c0264b591
```

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-01-15 09:28:08 +00:00
vipvap
aa72a22661 Release 2024-01-08 (#6286)
Release 2024-01-08
2024-01-08 09:26:27 +00:00
Shany Pozin
5c0264b591 Merge branch 'release' into releases/2024-01-08 2024-01-08 09:34:06 +02:00
Arseny Sher
9f13277729 Merge pull request #6242 from neondatabase/releases/2024-01-02
Release 2024-01-02
2024-01-02 12:04:43 +04:00
Arseny Sher
54aa319805 Don't split WAL record across two XLogData's when sending from safekeepers.
As protocol demands. Not following this makes standby complain about corrupted
WAL in various ways.

https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719
closes https://github.com/neondatabase/cloud/issues/9057
2024-01-02 10:54:00 +04:00
Arseny Sher
4a227484bf Add large insertion and slow WAL sending to test_hot_standby.
To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL
records torn across several XLogData messages. Add failpoint to safekeeper to
slow down sending. Also check for corrupted WAL complains in standby log.

Make the test a bit simpler in passing, e.g. we don't need explicit commits as
autocommit is enabled by default.

https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719
https://github.com/neondatabase/cloud/issues/9057
2024-01-02 10:54:00 +04:00
Arseny Sher
2f83f85291 Add failpoint support to safekeeper.
Just a copy paste from pageserver.
2024-01-02 10:54:00 +04:00
Arseny Sher
d6cfcb0d93 Move failpoint support code to utils.
To enable them in safekeeper as well.
2024-01-02 10:54:00 +04:00
Arseny Sher
392843ad2a Fix safekeeper START_REPLICATION (term=n).
It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of
uncommitted WAL since cdb08f03 hanged. Add test for this.
2024-01-02 10:54:00 +04:00
Arseny Sher
bd4dae8f4a compute_ctl: kill postgres and sync-safekeeprs on exit.
Otherwise they are left orphaned when compute_ctl is terminated with a
signal. It was invisible most of the time because normally neon_local or k8s
kills postgres directly and then compute_ctl finishes gracefully. However, in
some tests compute_ctl gets stuck waiting for sync-safekeepers which
intentionally never ends because safekeepers are offline, and we want to stop
compute_ctl without leaving orphanes behind.

This is a quite rough approach which doesn't wait for children termination. A
better way would be to convert compute_ctl to async which would make waiting
easy.
2024-01-02 10:54:00 +04:00
Shany Pozin
b05fe53cfd Merge pull request #6240 from neondatabase/releases/2024-01-01
Release 2024-01-01
2024-01-01 11:07:30 +02:00
Christian Schwarz
c13a2f0df1 Merge pull request #6192 from neondatabase/releases/2023-12-19
Release 2023-12-19

We need to do a config change that requires restarting the pageservers.
Slip in two metrics-related commits that didn't make this week's regularly release.
2023-12-19 14:52:47 +01:00
Christian Schwarz
39be366fc5 higher resolution histograms for getpage@lsn (#6177)
part of https://github.com/neondatabase/cloud/issues/7811
2023-12-19 13:46:59 +00:00
Christian Schwarz
6eda0a3158 [PRE-MERGE] fix metric pageserver_initial_logical_size_start_calculation
(This is a pre-merge cherry-pick of https://github.com/neondatabase/neon/pull/6191)

It wasn't being incremented.

Fixup of

    commit 1c88824ed0
    Author: Christian Schwarz <christian@neon.tech>
    Date:   Fri Dec 1 12:52:59 2023 +0100

        initial logical size calculation: add a bunch of metrics (#5995)
2023-12-19 13:46:55 +00:00
Shany Pozin
306c7a1813 Merge pull request #6173 from neondatabase/sasha_release_bypassrls_replication
Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles
2023-12-18 22:16:36 +02:00
Sasha Krassovsky
80be423a58 Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles 2023-12-18 10:22:36 -08:00
Shany Pozin
5dcfef82f2 Merge pull request #6163 from neondatabase/releases/2023-12-18
Release 2023-12-18-2
2023-12-18 15:34:17 +02:00
Christian Schwarz
e67b8f69c0 [PRE-MERGE] pageserver: Reduce tracing overhead in timeline::get #6115
Pre-merge `git merge --squash` of
https://github.com/neondatabase/neon/pull/6115

Lowering the tracing level in get_value_reconstruct_data and
get_or_maybe_download from info to debug reduces the overhead
of span creation in non-debug environments.
2023-12-18 13:39:48 +01:00
Shany Pozin
e546872ab4 Merge pull request #6158 from neondatabase/releases/2023-12-18
Release 2023-12-18
2023-12-18 14:24:34 +02:00
John Spray
322ea1cf7c pageserver: on-demand activation cleanups (#6157)
## Problem

#6112 added some logs and metrics: clean these up a bit:
- Avoid counting startup completions for tenants launched after startup
- exclude no-op cases from timing histograms 
- remove a rogue log messages
2023-12-18 11:14:19 +00:00
Vadim Kharitonov
3633742de9 Merge pull request #6121 from neondatabase/releases/2023-12-13
Release 2023-12-13
2023-12-13 12:39:43 +01:00
Joonas Koivunen
079d3a37ba Merge remote-tracking branch 'origin/release' into releases/2023-12-13
this handles the hotfix introduced conflict.
2023-12-13 10:07:19 +00:00
Vadim Kharitonov
a46e77b476 Merge pull request #6090 from neondatabase/releases/2023-12-11
Release 2023-12-11
2023-12-12 12:10:35 +01:00
Tristan Partin
a92702b01e Add submodule paths as safe directories as a precaution
The check-codestyle-rust-arm job requires this for some reason, so let's
just add them everywhere we do this workaround.
2023-12-11 22:00:35 +00:00
Tristan Partin
8ff3253f20 Fix git ownership issue in check-codestyle-rust-arm
We have this workaround for other jobs. Looks like this one was
forgotten about.
2023-12-11 22:00:35 +00:00
Joonas Koivunen
04b82c92a7 fix: accidential return Ok (#6106)
Error indicating request cancellation OR timeline shutdown was deemed as
a reason to exit the background worker that calculated synthetic size.
Fix it to only be considered for avoiding logging such of such errors.

This conflicted on tenant_shard_id having already replaced tenant_id on
`main`.
2023-12-11 21:41:36 +00:00
Vadim Kharitonov
e5bf423e68 Merge branch 'release' into releases/2023-12-11 2023-12-11 11:55:48 +01:00
Vadim Kharitonov
60af392e45 Merge pull request #6057 from neondatabase/vk/patch_timescale_for_production
Revert timescaledb for pg14 and pg15 (#6056)
2023-12-06 16:21:16 +01:00
Vadim Kharitonov
661fc41e71 Revert timescaledb for pg14 and pg15 (#6056)
```
could not start the compute node: compute is in state "failed": db error: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory Caused by: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory
```
2023-12-06 16:14:07 +01:00
Shany Pozin
702c488f32 Merge pull request #6022 from neondatabase/releases/2023-12-04
Release 2023-12-04
2023-12-05 17:03:28 +02:00
Sasha Krassovsky
45c5122754 Remove trusted from wal2json 2023-12-04 12:36:19 -08:00
Shany Pozin
558394f710 fix merge 2023-12-04 11:41:27 +02:00
Shany Pozin
73b0898608 Merge branch 'release' into releases/2023-12-04 2023-12-04 11:36:26 +02:00
Joonas Koivunen
e65be4c2dc Merge pull request #6013 from neondatabase/releases/2023-12-01-hotfix
fix: use create_new instead of create for mutex file
2023-12-01 15:35:56 +02:00
Joonas Koivunen
40087b8164 fix: use create_new instead of create for mutex file 2023-12-01 12:54:49 +00:00
Shany Pozin
c762b59483 Merge pull request #5986 from neondatabase/Release-11-30-hotfix
Notify safekeeper readiness with systemd.
2023-11-30 10:01:05 +02:00
Arseny Sher
5d71601ca9 Notify safekeeper readiness with systemd.
To avoid downtime during deploy, as in busy regions initial load can currently
take ~30s.
2023-11-30 08:23:31 +03:00
Shany Pozin
a113c3e433 Merge pull request #5945 from neondatabase/release-2023-11-28-hotfix
Release 2023 11 28 hotfix
2023-11-28 08:14:59 +02:00
Anastasia Lubennikova
e81fc598f4 Update neon extension relocatable for existing installations (#5943) 2023-11-28 00:12:39 +00:00
Anastasia Lubennikova
48b845fa76 Make neon extension relocatable to allow SET SCHEMA (#5942) 2023-11-28 00:12:32 +00:00
Shany Pozin
27096858dc Merge pull request #5922 from neondatabase/releases/2023-11-27
Release 2023-11-27
2023-11-27 09:58:51 +02:00
Shany Pozin
4430d0ae7d Merge pull request #5876 from neondatabase/releases/2023-11-17
Release 2023-11-17
2023-11-20 09:11:58 +02:00
Joonas Koivunen
6e183aa0de Merge branch 'main' into releases/2023-11-17 2023-11-19 15:25:47 +00:00
Vadim Kharitonov
fd6d0b7635 Merge branch 'release' into releases/2023-11-17 2023-11-17 10:51:45 +01:00
Vadim Kharitonov
3710c32aae Merge pull request #5778 from neondatabase/releases/2023-11-03
Release 2023-11-03
2023-11-03 16:06:58 +01:00
Vadim Kharitonov
be83bee49d Merge branch 'release' into releases/2023-11-03 2023-11-03 11:18:15 +01:00
Alexander Bayandin
cf28e5922a Merge pull request #5685 from neondatabase/releases/2023-10-26
Release 2023-10-26
2023-10-27 10:42:12 +01:00
Em Sharnoff
7d384d6953 Bump vm-builder v0.18.2 -> v0.18.4 (#5666)
Only applicable change was neondatabase/autoscaling#584, setting
pgbouncer auth_dbname=postgres in order to fix superuser connections
from preventing dropping databases.
2023-10-26 20:15:45 +01:00
Em Sharnoff
4b3b37b912 Bump vm-builder v0.18.1 -> v0.18.2 (#5646)
Only applicable change was neondatabase/autoscaling#571, removing the
postgres_exporter flags `--auto-discover-databases` and
`--exclude-databases=...`
2023-10-26 20:15:29 +01:00
Shany Pozin
1d8d200f4d Merge pull request #5668 from neondatabase/sp/aux_files_cherry_pick
Cherry pick: Ignore missed AUX_FILES_KEY when generating image layer (#5660)
2023-10-26 10:08:16 +03:00
Konstantin Knizhnik
0d80d6ce18 Ignore missed AUX_FILES_KEY when generating image layer (#5660)
## Problem

Logical replication requires new AUX_FILES_KEY which is definitely
absent in existed database.
We do not have function to check if key exists in our KV storage.
So I have to handle the error in `list_aux_files` method.
But this key is also included in key space range and accessed y
`create_image_layer` method.

## Summary of changes

Check if AUX_FILES_KEY  exists before including it in keyspace.

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Shany Pozin <shany@neon.tech>
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2023-10-26 09:30:28 +03:00
Shany Pozin
f653ee039f Merge pull request #5638 from neondatabase/releases/2023-10-24
Release 2023-10-24
2023-10-24 12:10:52 +03:00
Em Sharnoff
e614a95853 Merge pull request #5610 from neondatabase/sharnoff/rc-2023-10-20-vm-monitor-fixes
Release 2023-10-20: vm-monitor memory.high throttling fixes
2023-10-20 00:11:06 -07:00
Em Sharnoff
850db4cc13 vm-monitor: Deny not fail downscale if no memory stats yet (#5606)
Fixes an issue we observed on staging that happens when the
autoscaler-agent attempts to immediately downscale the VM after binding,
which is typical for pooled computes.

The issue was occurring because the autoscaler-agent was requesting
downscaling before the vm-monitor had gathered sufficient cgroup memory
stats to be confident in approving it. When the vm-monitor returned an
internal error instead of denying downscaling, the autoscaler-agent
retried the connection and immediately hit the same issue (in part
because cgroup stats are collected per-connection, rather than
globally).
2023-10-19 21:56:55 -07:00
Em Sharnoff
8a316b1277 vm-monitor: Log full error on message handling failure (#5604)
There's currently an issue with the vm-monitor on staging that's not
really feasible to debug because the current display impl gives no
context to the errors (just says "failed to downscale").

Logging the full error should help.

For communications with the autoscaler-agent, it's ok to only provide
the outermost cause, because we can cross-reference with the VM logs.
At some point in the future, we may want to change that.
2023-10-19 21:56:50 -07:00
Em Sharnoff
4d13bae449 vm-monitor: Switch from memory.high to polling memory.stat (#5524)
tl;dr it's really hard to avoid throttling from memory.high, and it
counts tmpfs & page cache usage, so it's also hard to make sense of.

In the interest of fixing things quickly with something that should be
*good enough*, this PR switches to instead periodically fetch memory
statistics from the cgroup's memory.stat and use that data to determine
if and when we should upscale.

This PR fixes #5444, which has a lot more detail on the difficulties
we've hit with memory.high. This PR also supersedes #5488.
2023-10-19 21:56:36 -07:00
Vadim Kharitonov
49377abd98 Merge pull request #5577 from neondatabase/releases/2023-10-17
Release 2023-10-17
2023-10-17 12:21:20 +02:00
Christian Schwarz
a6b2f4e54e limit imitate accesses concurrency, using same semaphore as compactions (#5578)
Before this PR, when we restarted pageserver, we'd see a rush of
`$number_of_tenants` concurrent eviction tasks starting to do imitate
accesses building up in the period of `[init_order allows activations,
$random_access_delay + EvictionPolicyLayerAccessThreshold::period]`.

We simply cannot handle that degree of concurrent IO.

We already solved the problem for compactions by adding a semaphore.
So, this PR shares that semaphore for use by evictions.

Part of https://github.com/neondatabase/neon/issues/5479

Which is again part of https://github.com/neondatabase/neon/issues/4743

Risks / Changes In System Behavior
==================================

* we don't do evictions as timely as we currently do
* we log a bunch of warnings about eviction taking too long
* imitate accesses and compactions compete for the same concurrency
limit, so, they'll slow each other down through this shares semaphore

Changes
=======

- Move the `CONCURRENT_COMPACTIONS` semaphore into `tasks.rs`
- Rename it to `CONCURRENT_BACKGROUND_TASKS`
- Use it also for the eviction imitate accesses:
    - Imitate acceses are both per-TIMELINE and per-TENANT
    - The per-TENANT is done through coalescing all the per-TIMELINE
      tasks via a tokio mutex `eviction_task_tenant_state`.
    - We acquire the CONCURRENT_BACKGROUND_TASKS permit early, at the
      beginning of the eviction iteration, much before the imitate
      acesses start (and they may not even start at all in the given
      iteration, as they happen only every $threshold).
    - Acquiring early is **sub-optimal** because when the per-timline
      tasks coalesce on the `eviction_task_tenant_state` mutex,
      they are already holding a CONCURRENT_BACKGROUND_TASKS permit.
    - It's also unfair because tenants with many timelines win
      the CONCURRENT_BACKGROUND_TASKS more often.
    - I don't think there's another way though, without refactoring
      more of the imitate accesses logic, e.g, making it all per-tenant.
- Add metrics for queue depth behind the semaphore.
I found these very useful to understand what work is queued in the
system.

    - The metrics are tagged by the new `BackgroundLoopKind`.
    - On a green slate, I would have used `TaskKind`, but we already had
      pre-existing labels whose names didn't map exactly to task kind.
      Also the task kind is kind of a lower-level detail, so, I think
it's fine to have a separate enum to identify background work kinds.

Future Work
===========

I guess I could move the eviction tasks from a ticker to "sleep for
$period".
The benefit would be that the semaphore automatically "smears" the
eviction task scheduling over time, so, we only have the rush on restart
but a smeared-out rush afterward.

The downside is that this perverts the meaning of "$period", as we'd
actually not run the eviction at a fixed period. It also means the the
"took to long" warning & metric becomes meaningless.

Then again, that is already the case for the compaction and gc tasks,
which do sleep for `$period` instead of using a ticker.

(cherry picked from commit 9256788273)
2023-10-17 12:16:26 +02:00
Shany Pozin
face60d50b Merge pull request #5526 from neondatabase/releases/2023-10-11
Release 2023-10-11
2023-10-11 11:16:39 +03:00
Shany Pozin
9768aa27f2 Merge pull request #5516 from neondatabase/releases/2023-10-10
Release 2023-10-10
2023-10-10 14:16:47 +03:00
Shany Pozin
96b2e575e1 Merge pull request #5445 from neondatabase/releases/2023-10-03
Release 2023-10-03
2023-10-04 13:53:37 +03:00
Alexander Bayandin
7222777784 Update checksums for pg_jsonschema & pg_graphql (#5455)
## Problem

Folks have re-taged releases for `pg_jsonschema` and `pg_graphql` (to
increase timeouts on their CI), for us, these are a noop changes, 
but unfortunately, this will cause our builds to fail due to checksums 
mismatch (this might not strike right away because of the build cache).
- 8ba7c7be9d
- aa7509370a

## Summary of changes
- `pg_jsonschema` update checksum
- `pg_graphql` update checksum
2023-10-03 18:44:30 +01:00
Em Sharnoff
5469fdede0 Merge pull request #5422 from neondatabase/sharnoff/rc-2023-09-28-fix-restart-on-postmaster-SIGKILL
Release 2023-09-28: Fix (lack of) restart on neonvm postmaster SIGKILL
2023-09-28 10:48:51 -07:00
MMeent
72aa6b9fdd Fix neon_zeroextend's WAL logging (#5387)
When you log more than a few blocks, you need to reserve the space in
advance. We didn't do that, so we got errors. Now we do that, and
shouldn't get errors.
2023-09-28 09:37:28 -07:00
Em Sharnoff
ae0634b7be Bump vm-builder v0.17.11 -> v0.17.12 (#5407)
Only relevant change is neondatabase/autoscaling#534 - refer there for
more details.
2023-09-28 09:28:04 -07:00
Shany Pozin
70711f32fa Merge pull request #5375 from neondatabase/releases/2023-09-26
Release 2023-09-26
2023-09-26 15:19:45 +03:00
Vadim Kharitonov
52a88af0aa Merge pull request #5336 from neondatabase/releases/2023-09-19
Release 2023-09-19
2023-09-19 11:16:43 +02:00
Alexander Bayandin
b7a43bf817 Merge branch 'release' into releases/2023-09-19 2023-09-19 09:07:20 +01:00
Alexander Bayandin
dce91b33a4 Merge pull request #5318 from neondatabase/releases/2023-09-15-1
Postgres 14/15: Use previous extensions versions
2023-09-15 16:30:44 +01:00
Alexander Bayandin
23ee4f3050 Revert plv8 only 2023-09-15 15:45:23 +01:00
Alexander Bayandin
46857e8282 Postgres 14/15: Use previous extensions versions 2023-09-15 15:27:00 +01:00
Alexander Bayandin
368ab0ce54 Merge pull request #5313 from neondatabase/releases/2023-09-15
Release 2023-09-15
2023-09-15 10:39:56 +01:00
Konstantin Knizhnik
a5987eebfd References to old and new blocks were mixed in xlog_heap_update handler (#5312)
## Problem

See https://neondb.slack.com/archives/C05L7D1JAUS/p1694614585955029

https://www.notion.so/neondatabase/Duplicate-key-issue-651627ce843c45188fbdcb2d30fd2178

## Summary of changes

Swap old/new block references

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-09-15 10:11:41 +01:00
Alexander Bayandin
6686ede30f Update checksum for pg_hint_plan (#5309)
## Problem

The checksum for `pg_hint_plan` doesn't match:
```
sha256sum: WARNING: 1 computed checksum did NOT match
```

Ref
https://github.com/neondatabase/neon/actions/runs/6185715461/job/16793609251?pr=5307

It seems that the release was retagged yesterday:
https://github.com/ossc-db/pg_hint_plan/releases/tag/REL16_1_6_0

I don't see any malicious changes from 15_1.5.1:
https://github.com/ossc-db/pg_hint_plan/compare/REL15_1_5_1...REL16_1_6_0,
so it should be ok to update.

## Summary of changes
- Update checksum for `pg_hint_plan` 16_1.6.0
2023-09-15 09:54:42 +01:00
Em Sharnoff
373c7057cc vm-monitor: Fix cgroup throttling (#5303)
I believe this (not actual IO problems) is the cause of the "disk speed
issue" that we've had for VMs recently. See e.g.:

1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH
2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659

The vm-informant (and now, the vm-monitor, its replacement) is supposed
to gradually increase the `neon-postgres` cgroup's memory.high value,
because otherwise the kernel will throttle all the processes in the
cgroup.

This PR fixes a bug with the vm-monitor's implementation of this
behavior.

---

Other references, for the vm-informant's implementation:

- Original issue: neondatabase/autoscaling#44
- Original PR: neondatabase/autoscaling#223
2023-09-15 09:54:42 +01:00
Shany Pozin
7d6ec16166 Merge pull request #5296 from neondatabase/releases/2023-09-13
Release 2023-09-13
2023-09-13 13:49:14 +03:00
Shany Pozin
0e6fdc8a58 Merge pull request #5283 from neondatabase/releases/2023-09-12
Release 2023-09-12
2023-09-12 14:56:47 +03:00
Christian Schwarz
521438a5c6 fix deadlock around TENANTS (#5285)
The sequence that can lead to a deadlock:

1. DELETE request gets all the way to `tenant.shutdown(progress,
false).await.is_err() ` , while holding TENANTS.read()
2. POST request for tenant creation comes in, calls `tenant_map_insert`,
it does `let mut guard = TENANTS.write().await;`
3. Something that `tenant.shutdown()` needs to wait for needs a
`TENANTS.read().await`.
The only case identified in exhaustive manual scanning of the code base
is this one:
Imitate size access does `get_tenant().await`, which does
`TENANTS.read().await` under the hood.

In the above case (1) waits for (3), (3)'s read-lock request is queued
behind (2)'s write-lock, and (2) waits for (1).
Deadlock.

I made a reproducer/proof-that-above-hypothesis-holds in
https://github.com/neondatabase/neon/pull/5281 , but, it's not ready for
merge yet and we want the fix _now_.

fixes https://github.com/neondatabase/neon/issues/5284
2023-09-12 14:13:13 +03:00
Vadim Kharitonov
07d7874bc8 Merge pull request #5202 from neondatabase/releases/2023-09-05
Release 2023-09-05
2023-09-05 12:16:06 +02:00
Anastasia Lubennikova
1804111a02 Merge pull request #5161 from neondatabase/rc-2023-08-31
Release 2023-08-31
2023-08-31 16:53:17 +03:00
Arthur Petukhovsky
cd0178efed Merge pull request #5150 from neondatabase/release-sk-fix-active-timeline
Release 2023-08-30
2023-08-30 11:43:39 +02:00
Shany Pozin
333574be57 Merge pull request #5133 from neondatabase/releases/2023-08-29
Release 2023-08-29
2023-08-29 14:02:58 +03:00
Alexander Bayandin
79a799a143 Merge branch 'release' into releases/2023-08-29 2023-08-29 11:17:57 +01:00
Conrad Ludgate
9da06af6c9 Merge pull request #5113 from neondatabase/release-http-connection-fix
Release 2023-08-25
2023-08-25 17:21:35 +01:00
Conrad Ludgate
ce1753d036 proxy: dont return connection pending (#5107)
## Problem

We were returning Pending when a connection had a notice/notification
(introduced recently in #5020). When returning pending, the runtime
assumes you will call `cx.waker().wake()` in order to continue
processing.

We weren't doing that, so the connection task would get stuck

## Summary of changes

Don't return pending. Loop instead
2023-08-25 16:42:30 +01:00
Alek Westover
67db8432b4 Fix cargo deny errors (#5068)
## Problem
cargo deny lint broken

Links to the CVEs:

[rustsec.org/advisories/RUSTSEC-2023-0052](https://rustsec.org/advisories/RUSTSEC-2023-0052)

[rustsec.org/advisories/RUSTSEC-2023-0053](https://rustsec.org/advisories/RUSTSEC-2023-0053)
One is fixed, the other one isn't so we allow it (for now), to unbreak
CI. Then later we'll try to get rid of webpki in favour of the rustls
fork.

## Summary of changes
```
+ignore = ["RUSTSEC-2023-0052"]
```
2023-08-25 16:42:30 +01:00
Vadim Kharitonov
4e2e44e524 Enable neon-pool-opt-in (#5062) 2023-08-22 09:06:14 +01:00
Vadim Kharitonov
ed786104f3 Merge pull request #5060 from neondatabase/releases/2023-08-22
Release 2023-08-22
2023-08-22 09:41:02 +02:00
Stas Kelvich
84b74f2bd1 Merge pull request #4997 from neondatabase/sk/proxy-release-23-07-15
Fix lint
2023-08-15 18:54:20 +03:00
Arthur Petukhovsky
fec2ad6283 Fix lint 2023-08-15 18:49:02 +03:00
Stas Kelvich
98eebd4682 Merge pull request #4996 from neondatabase/sk/proxy_release
Disable neon-pool-opt-in
2023-08-15 18:37:50 +03:00
Arthur Petukhovsky
2f74287c9b Disable neon-pool-opt-in 2023-08-15 18:34:17 +03:00
Shany Pozin
aee1bf95e3 Merge pull request #4990 from neondatabase/releases/2023-08-15
Release 2023-08-15
2023-08-15 15:34:38 +03:00
Shany Pozin
b9de9d75ff Merge branch 'release' into releases/2023-08-15 2023-08-15 14:35:00 +03:00
Stas Kelvich
7943b709e6 Merge pull request #4940 from neondatabase/sk/release-23-05-25-proxy-fixup
Release: proxy retry fixup
2023-08-09 13:53:19 +03:00
Conrad Ludgate
d7d066d493 proxy: delay auth on retry (#4929)
## Problem

When an endpoint is shutting down, it can take a few seconds. Currently
when starting a new compute, this causes an "endpoint is in transition"
error. We need to add delays before retrying to ensure that we allow
time for the endpoint to shutdown properly.

## Summary of changes

Adds a delay before retrying in auth. connect_to_compute already has
this delay
2023-08-09 12:54:24 +03:00
Felix Prasanna
e78ac22107 release fix: revert vm builder bump from 0.13.1 -> 0.15.0-alpha1 (#4932)
This reverts commit 682dfb3a31.

hotfix for a CLI arg issue in the monitor
2023-08-08 21:08:46 +03:00
Vadim Kharitonov
76a8f2bb44 Merge pull request #4923 from neondatabase/releases/2023-08-08
Release 2023-08-08
2023-08-08 11:44:38 +02:00
Vadim Kharitonov
8d59a8581f Merge branch 'release' into releases/2023-08-08 2023-08-08 10:54:34 +02:00
Vadim Kharitonov
b1ddd01289 Define NEON_SMGR to make it possible for extensions to use Neon SMG API (#4889)
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2023-08-03 16:28:31 +03:00
Alexander Bayandin
6eae4fc9aa Release 2023-08-02: update pg_embedding (#4877)
Cherry-picking ca4d71a954 from `main` into
the `release`

Co-authored-by: Vadim Kharitonov <vadim2404@users.noreply.github.com>
2023-08-03 08:48:09 +02:00
Christian Schwarz
765455bca2 Merge pull request #4861 from neondatabase/releases/2023-08-01--2-fix-pipeline
ci: fix upload-postgres-extensions-to-s3 job
2023-08-01 13:22:07 +02:00
Christian Schwarz
4204960942 ci: fix upload-postgres-extensions-to-s3 job
commit

	commit 5f8fd640bf
	Author: Alek Westover <alek.westover@gmail.com>
	Date:   Wed Jul 26 08:24:03 2023 -0400

	    Upload Test Remote Extensions (#4792)

switched to using the release tag instead of `latest`, but,
the `promote-images` job only uploads `latest` to the prod ECR.

The switch to using release tag was good in principle, but,
reverting that part to make the release pipeine work.

Note that a proper fix should abandon use of `:latest` tag
at all: currently, if a `main` pipeline runs concurrently
with a `release` pipeline, the `release` pipeline may end
up using the `main` pipeline's images.
2023-08-01 12:01:45 +02:00
Christian Schwarz
67345d66ea Merge pull request #4858 from neondatabase/releases/2023-08-01
Release 2023-08-01
2023-08-01 10:44:01 +02:00
Shany Pozin
2266ee5971 Merge pull request #4803 from neondatabase/releases/2023-07-25
Release 2023-07-25
2023-07-25 14:21:07 +03:00
Shany Pozin
b58445d855 Merge pull request #4746 from neondatabase/releases/2023-07-18
Release 2023-07-18
2023-07-18 14:45:39 +03:00
Conrad Ludgate
36050e7f3d Merge branch 'release' into releases/2023-07-18 2023-07-18 12:00:09 +01:00
Alexander Bayandin
33360ed96d Merge pull request #4705 from neondatabase/release-2023-07-12
Release 2023-07-12 (only proxy)
2023-07-12 19:44:36 +01:00
Conrad Ludgate
39a28d1108 proxy wake_compute loop (#4675)
## Problem

If we fail to wake up the compute node, a subsequent connect attempt
will definitely fail. However, kubernetes won't fail the connection
immediately, instead it hangs until we timeout (10s).

## Summary of changes

Refactor the loop to allow fast retries of compute_wake and to skip a
connect attempt.
2023-07-12 18:40:11 +01:00
Conrad Ludgate
efa6aa134f allow repeated IO errors from compute node (#4624)
## Problem

#4598 compute nodes are not accessible some time after wake up due to
kubernetes DNS not being fully propagated.

## Summary of changes

Update connect retry mechanism to support handling IO errors and
sleeping for 100ms

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-07-12 18:40:06 +01:00
Alexander Bayandin
2c724e56e2 Merge pull request #4646 from neondatabase/releases/2023-07-06-hotfix
Release 2023-07-06 (add pg_embedding extension only)
2023-07-06 12:19:52 +01:00
Alexander Bayandin
feff887c6f Compile pg_embedding extension (#4634)
```
CREATE EXTENSION embedding;
CREATE TABLE t (val real[]);
INSERT INTO t (val) VALUES ('{0,0,0}'), ('{1,2,3}'), ('{1,1,1}'), (NULL);
CREATE INDEX ON t USING hnsw (val) WITH (maxelements = 10, dims=3, m=3);
INSERT INTO t (val) VALUES (array[1,2,4]);

SELECT * FROM t ORDER BY val <-> array[3,3,3];
   val   
---------
 {1,2,3}
 {1,2,4}
 {1,1,1}
 {0,0,0}
 
(5 rows)
```
2023-07-06 09:39:41 +01:00
Vadim Kharitonov
353d915fcf Merge pull request #4633 from neondatabase/releases/2023-07-05
Release 2023-07-05
2023-07-05 15:10:47 +02:00
Vadim Kharitonov
2e38098cbc Merge branch 'release' into releases/2023-07-05 2023-07-05 12:41:48 +02:00
Vadim Kharitonov
a6fe5ea1ac Merge pull request #4571 from neondatabase/releases/2023-06-27
Release 2023-06-27
2023-06-27 12:55:33 +02:00
Vadim Kharitonov
05b0aed0c1 Merge branch 'release' into releases/2023-06-27 2023-06-27 12:22:12 +02:00
Alex Chi Z
cd1705357d Merge pull request #4561 from neondatabase/releases/2023-06-23-hotfix
Release 2023-06-23 (pageserver-only)
2023-06-23 15:38:50 -04:00
Christian Schwarz
6bc7561290 don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker
The consumption metrics synthetic size worker does logical size calculation.
Logical size calculation currently does synchronous disk IO.
This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving other futures.

While there's work on the way to move the synchronous disk IO into spawn_blocking,
the quickfix here is to use the BACKGROUND_RUNTIME instead of MGMT_REQUEST_RUNTIME.

Actually it's not just a quickfix. We simply shouldn't be blocking MGMT_REQUEST_RUNTIME
executor threads on CPU or sync disk IO.
That work isn't done yet, as many of the mgmt tasks still _do_ disk IO.
But it's not as intensive as the logical size calculations that we're fixing here.

While we're at it, fix disk-usage-based eviction in a similar way.
It wasn't the culprit here, according to prod logs, but it can theoretically be
a little CPU-intensive.

More context, including graphs from Prod:
https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949

(cherry picked from commit d6e35222ea)
2023-06-23 20:54:07 +02:00
Christian Schwarz
fbd3ac14b5 Merge pull request #4544 from neondatabase/releases/2023-06-21-hotfix
Release 2023-06-21 (fixup for post-merge failed 2023-06-20)
2023-06-21 16:54:34 +03:00
Christian Schwarz
e437787c8f cargo update -p openssl (#4542)
To unblock release
https://github.com/neondatabase/neon/pull/4536#issuecomment-1600678054

Context: https://rustsec.org/advisories/RUSTSEC-2023-0044
2023-06-21 15:52:56 +03:00
Christian Schwarz
3460dbf90b Merge pull request #4536 from neondatabase/releases/2023-06-20
Release 2023-06-20 (actually 2023-06-21)
2023-06-21 14:19:14 +03:00
Vadim Kharitonov
6b89d99677 Merge pull request #4521 from neondatabase/release_2023-06-15
Release 2023 06 15
2023-06-15 17:40:01 +02:00
Vadim Kharitonov
6cc8ea86e4 Merge branch 'main' into release_2023-06-15 2023-06-15 16:50:44 +02:00
Shany Pozin
e62a492d6f Merge pull request #4486 from neondatabase/releases/2023-06-13
Release 2023-06-13
2023-06-13 15:21:35 +03:00
Alexey Kondratov
a475cdf642 [compute_ctl] Fix logging if catalog updates are skipped (#4480)
Otherwise, it wasn't clear from the log when Postgres started up
completely if catalog updates were skipped.

Follow-up for 4936ab6
2023-06-13 13:37:24 +02:00
Stas Kelvich
7002c79a47 Merge pull request #4447 from neondatabase/release_proxy_08-06-2023
Release proxy 08 06 2023
2023-06-08 21:02:54 +03:00
Vadim Kharitonov
ee6cf357b4 Merge pull request #4427 from neondatabase/releases/2023-06-06
Release 2023-06-06
2023-06-06 14:42:21 +02:00
Vadim Kharitonov
e5c2086b5f Merge branch 'release' into releases/2023-06-06 2023-06-06 12:33:56 +02:00
Shany Pozin
5f1208296a Merge pull request #4395 from neondatabase/releases/2023-06-01
Release 2023-06-01
2023-06-01 10:58:00 +03:00
Stas Kelvich
88e8e473cd Merge pull request #4345 from neondatabase/release-23-05-25-proxy
Release 23-05-25, take 3
2023-05-25 19:40:43 +03:00
Stas Kelvich
b0a77844f6 Add SQL-over-HTTP endpoint to Proxy
This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON
response structure resembling that of the node-postgres driver. This method,
using HTTP POST, achieves smaller amortized latencies in edge setups due to
fewer round trips and an enhanced open connection reuse by the v8 engine.

This update involves several intricacies:
1. SQL injection protection: We employed the extended query protocol, modifying
   the rust-postgres driver to send queries in one roundtrip using a text
   protocol rather than binary, bypassing potential issues like those identified
   in https://github.com/sfackler/rust-postgres/issues/1030.

2. Postgres type compatibility: As not all postgres types have binary
   representations (e.g., acl's in pg_class), we adjusted rust-postgres to
   respond with text protocol, simplifying serialization and fixing queries with
   text-only types in response.

3. Data type conversion: Considering JSON supports fewer data types than
   Postgres, we perform conversions where possible, passing all other types as
   strings. Key conversions include:
   - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain
     text)
   - postgres bool, null, text -> json bool, null, string
   - postgres array -> json array
   - postgres json and jsonb -> json object

4. Alignment with node-postgres: To facilitate integration with js libraries,
   we've matched the response structure of node-postgres, returning command tags
   and column oids. Command tag capturing was added to the rust-postgres
   functionality as part of this change.
2023-05-25 17:59:17 +03:00
Vadim Kharitonov
1baf464307 Merge pull request #4309 from neondatabase/releases/2023-05-23
Release 2023-05-23
2023-05-24 11:56:54 +02:00
Alexander Bayandin
e9b8e81cea Merge branch 'release' into releases/2023-05-23 2023-05-23 12:54:08 +01:00
Alexander Bayandin
85d6194aa4 Fix regress-tests job for Postgres 15 on release branch (#4254)
## Problem

Compatibility tests don't support Postgres 15 yet, but we're still
trying to upload compatibility snapshot (which we do not collect).

Ref
https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129

## Summary of changes

Add `pg_version` parameter to `run-python-test-set` actions and do not
upload compatibility snapshot for Postgres 15
2023-05-16 17:19:12 +01:00
Vadim Kharitonov
333a7a68ef Merge pull request #4245 from neondatabase/releases/2023-05-16
Release 2023-05-16
2023-05-16 13:38:40 +02:00
Vadim Kharitonov
6aa4e41bee Merge branch 'release' into releases/2023-05-16 2023-05-16 12:48:23 +02:00
Joonas Koivunen
840183e51f try: higher page_service timeouts to isolate an issue 2023-05-11 16:24:53 +03:00
Shany Pozin
cbccc94b03 Merge pull request #4184 from neondatabase/releases/2023-05-09
Release 2023-05-09
2023-05-09 15:30:36 +03:00
Stas Kelvich
fce227df22 Merge pull request #4163 from neondatabase/main
Release 23-05-05
2023-05-05 15:56:23 +03:00
Stas Kelvich
bd787e800f Merge pull request #4133 from neondatabase/main
Release 23-04-01
2023-05-01 18:52:46 +03:00
Shany Pozin
4a7704b4a3 Merge pull request #4131 from neondatabase/sp/hotfix_adding_sks_us_west
Hotfix: Adding 4 new pageservers and two sets of safekeepers to us west 2
2023-05-01 15:17:38 +03:00
Shany Pozin
ff1119da66 Add 2 new sets of safekeepers to us-west2 2023-05-01 14:35:31 +03:00
Shany Pozin
4c3ba1627b Add 4 new Pageservers for retool launch 2023-05-01 14:34:38 +03:00
Vadim Kharitonov
1407174fb2 Merge pull request #4110 from neondatabase/vk/release_2023-04-28
Release 2023 04 28
2023-04-28 17:43:16 +02:00
Vadim Kharitonov
ec9dcb1889 Merge branch 'release' into vk/release_2023-04-28 2023-04-28 16:32:26 +02:00
Joonas Koivunen
d11d781afc revert: "Add check for duplicates of generated image layers" (#4104)
This reverts commit 732acc5.

Reverted PR: #3869

As noted in PR #4094, we do in fact try to insert duplicates to the
layer map, if L0->L1 compaction is interrupted. We do not have a proper
fix for that right now, and we are in a hurry to make a release to
production, so revert the changes related to this to the state that we
have in production currently. We know that we have a bug here, but
better to live with the bug that we've had in production for a long
time, than rush a fix to production without testing it in staging first.

Cc: #4094, #4088
2023-04-28 16:31:35 +02:00
Anastasia Lubennikova
4e44565b71 Merge pull request #4000 from neondatabase/releases/2023-04-11
Release 2023-04-11
2023-04-11 17:47:41 +03:00
Stas Kelvich
4ed51ad33b Add more proxy cnames 2023-04-11 15:59:35 +03:00
Arseny Sher
1c1ebe5537 Merge pull request #3946 from neondatabase/releases/2023-04-04
Release 2023-04-04
2023-04-04 14:38:40 +04:00
Christian Schwarz
c19cb7f386 Merge pull request #3935 from neondatabase/releases/2023-04-03
Release 2023-04-03
2023-04-03 16:19:49 +02:00
Vadim Kharitonov
4b97d31b16 Merge pull request #3896 from neondatabase/releases/2023-03-28
Release 2023-03-28
2023-03-28 17:58:06 +04:00
Shany Pozin
923ade3dd7 Merge pull request #3855 from neondatabase/releases/2023-03-21
Release 2023-03-21
2023-03-21 13:12:32 +02:00
Arseny Sher
b04e711975 Merge pull request #3825 from neondatabase/release-2023-03-15
Release 2023.03.15
2023-03-15 15:38:00 +03:00
Arseny Sher
afd0a6b39a Forward framed read buf contents to compute before proxy pass.
Otherwise they get lost. Normally buffer is empty before proxy pass, but this is
not the case with pipeline mode of out npm driver; fixes connection hangup
introduced by b80fe41af3 for it.

fixes https://github.com/neondatabase/neon/issues/3822
2023-03-15 15:36:06 +04:00
Lassi Pölönen
99752286d8 Use RollingUpdate strategy also for legacy proxy (#3814)
## Describe your changes
We have previously changed the neon-proxy to use RollingUpdate. This
should be enabled in legacy proxy too in order to avoid breaking
connections for the clients and allow for example backups to run even
during deployment. (https://github.com/neondatabase/neon/pull/3683)

## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3333
2023-03-15 15:35:51 +04:00
Arseny Sher
15df93363c Merge pull request #3804 from neondatabase/release-2023-03-13
Release 2023.03.13
2023-03-13 20:25:40 +03:00
Vadim Kharitonov
bc0ab741af Merge pull request #3758 from neondatabase/releases/2023-03-07
Release 2023-03-07
2023-03-07 12:38:47 +01:00
Christian Schwarz
51d9dfeaa3 Merge pull request #3743 from neondatabase/releases/2023-03-03
Release 2023-03-03
2023-03-03 19:20:21 +01:00
Shany Pozin
f63cb18155 Merge pull request #3713 from neondatabase/releases/2023-02-28
Release 2023-02-28
2023-02-28 12:52:24 +02:00
Arseny Sher
0de603d88e Merge pull request #3707 from neondatabase/release-2023-02-24
Release 2023-02-24

Hotfix for UNLOGGED tables. Contains #3706
Also contains rebase on 14.7 and 15.2 #3581
2023-02-25 00:32:11 +04:00
Heikki Linnakangas
240913912a Fix UNLOGGED tables.
Instead of trying to create missing files on the way, send init fork contents as
main fork from pageserver during basebackup. Add test for that. Call
put_rel_drop for init forks; previously they weren't removed. Bump
vendor/postgres to revert previous approach on Postgres side.

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>

ref https://github.com/neondatabase/postgres/pull/264
ref https://github.com/neondatabase/postgres/pull/259
ref https://github.com/neondatabase/neon/issues/1222
2023-02-24 23:54:53 +04:00
MMeent
91a4ea0de2 Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581)
## Describe your changes
Rebase vendored PostgreSQL onto 14.7 and 15.2

## Issue ticket number and link

#3579

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [x] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
    ```
The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL
14 and 15.2 for PostgreSQL 15.
    ```
2023-02-24 23:54:42 +04:00
Arseny Sher
8608704f49 Merge pull request #3691 from neondatabase/release-2023-02-23
Release 2023-02-23

Hotfix for the unlogged tables with indexes issue.

neondatabase/postgres#259
neondatabase/postgres#262
2023-02-23 13:39:33 +04:00
Arseny Sher
efef68ce99 Bump vendor/postgres to include hotfix for unlogged tables with indexes.
https://github.com/neondatabase/postgres/pull/259
https://github.com/neondatabase/postgres/pull/262
2023-02-23 08:49:43 +04:00
Joonas Koivunen
8daefd24da Merge pull request #3679 from neondatabase/releases/2023-02-22
Releases/2023-02-22
2023-02-22 15:56:55 +02:00
Arthur Petukhovsky
46cc8b7982 Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671)
We migrated all timelines to
`safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be
removed.
2023-02-22 15:07:57 +02:00
Sergey Melnikov
38cd90dd0c Add -v to ansible invocations (#3670)
To get more debug output on failures
2023-02-22 15:07:57 +02:00
Joonas Koivunen
a51b269f15 fix: hold permit until GetObject eof (#3663)
previously we applied the ratelimiting only up to receiving the headers
from s3, or somewhere near it. the commit adds an adapter which carries
the permit until the AsyncRead has been disposed.

fixes #3662.
2023-02-22 15:07:57 +02:00
Joonas Koivunen
43bf6d0a0f calculate_logical_size: no longer use spawn_blocking (#3664)
Calculation of logical size is now async because of layer downloads, so
we shouldn't use spawn_blocking for it. Use of `spawn_blocking`
exhausted resources which are needed by `tokio::io::copy` when copying
from a stream to a file which lead to deadlock.

Fixes: #3657
2023-02-22 15:07:57 +02:00
Joonas Koivunen
15273a9b66 chore: ignore all compaction inactive tenant errors (#3665)
these are happening in tests because of #3655 but they sure took some
time to appear.

makes the `Compaction failed, retrying in 2s: Cannot run compaction
iteration on inactive tenant` into a globally allowed error, because it
has been seen failing on different test cases.
2023-02-22 15:07:57 +02:00
Joonas Koivunen
78aca668d0 fix: log download failed error (#3661)
Fixes #3659
2023-02-22 15:07:57 +02:00
Vadim Kharitonov
acbf4148ea Merge pull request #3656 from neondatabase/releases/2023-02-21
Release 2023-02-21
2023-02-21 16:03:48 +01:00
Vadim Kharitonov
6508540561 Merge branch 'release' into releases/2023-02-21 2023-02-21 15:31:16 +01:00
Arthur Petukhovsky
a41b5244a8 Add new safekeeper to ap-southeast-1 prod (#3645) (#3646)
To trigger deployment of #3645 to production.
2023-02-20 15:22:49 +00:00
Shany Pozin
2b3189be95 Merge pull request #3600 from neondatabase/releases/2023-02-14
Release 2023-02-14
2023-02-15 13:31:30 +02:00
Vadim Kharitonov
248563c595 Merge pull request #3553 from neondatabase/releases/2023-02-07
Release 2023-02-07
2023-02-07 14:07:44 +01:00
Vadim Kharitonov
14cd6ca933 Merge branch 'release' into releases/2023-02-07 2023-02-07 12:11:56 +01:00
Vadim Kharitonov
eb36403e71 Release 2023 01 31 (#3497)
Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Shany Pozin <shany@neon.tech>
Co-authored-by: Sergey Melnikov <sergey@neon.tech>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>
2023-01-31 15:06:35 +02:00
Anastasia Lubennikova
3c6f779698 Merge pull request #3411 from neondatabase/release_2023_01_23
Fix Release 2023 01 23
2023-01-23 20:10:03 +02:00
Joonas Koivunen
f67f0c1c11 More tenant size fixes (#3410)
Small changes, but hopefully this will help with the panic detected in
staging, for which we cannot get the debugging information right now
(end-of-branch before branch-point).
2023-01-23 17:46:13 +02:00
Shany Pozin
edb02d3299 Adding pageserver3 to staging (#3403) 2023-01-23 17:46:13 +02:00
Konstantin Knizhnik
664a69e65b Fix slru_segment_key_range function: segno was assigned to incorrect Key field (#3354) 2023-01-23 17:46:13 +02:00
Anastasia Lubennikova
478322ebf9 Fix tenant size orphans (#3377)
Before only the timelines which have passed the `gc_horizon` were
processed which failed with orphans at the tree_sort phase. Example
input in added `test_branched_empty_timeline_size` test case.

The PR changes iteration to happen through all timelines, and in
addition to that, any learned branch points will be calculated as they
would had been in the original implementation if the ancestor branch had
been over the `gc_horizon`.

This also changes how tenants where all timelines are below `gc_horizon`
are handled. Previously tenant_size 0 was returned, but now they will
have approximately `initdb_lsn` worth of tenant_size.

The PR also adds several new tenant size tests that describe various corner
cases of branching structure and `gc_horizon` setting.
They are currently disabled to not consume time during CI.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-01-23 17:46:13 +02:00
Joonas Koivunen
802f174072 fix: dont stop pageserver if we fail to calculate synthetic size 2023-01-23 17:46:13 +02:00
Alexey Kondratov
47f9890bae [compute_ctl] Make role deletion spec processing idempotent (#3380)
Previously, we were trying to re-assign owned objects of the already
deleted role. This were causing a crash loop in the case when compute
was restarted with a spec that includes delta operation for role
deletion. To avoid such cases, check that role is still present before
calling `reassign_owned_objects`.

Resolves neondatabase/cloud#3553
2023-01-23 17:46:13 +02:00
Christian Schwarz
262265daad Revert "Use actual temporary dir for pageserver unit tests"
This reverts commit 826e89b9ce.

The problem with that commit was that it deletes the TempDir while
there are still EphemeralFile instances open.

At first I thought this could be fixed by simply adding

  Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None))

to TenantHarness::drop, but it turned out to be insufficient.

So, reverting the commit until we find a proper solution.

refs https://github.com/neondatabase/neon/issues/3385
2023-01-23 17:46:13 +02:00
bojanserafimov
300da5b872 Improve layer map docstrings (#3382) 2023-01-23 17:46:13 +02:00
Heikki Linnakangas
7b22b5c433 Switch to 'tracing' for logging, restructure code to make use of spans.
Refactors Compute::prepare_and_run. It's split into subroutines
differently, to make it easier to attach tracing spans to the
different stages. The high-level logic for waiting for Postgres to
exit is moved to the caller.

Replace 'env_logger' with 'tracing', and add `#instrument` directives
to different stages fo the startup process. This is a fairly
mechanical change, except for the changes in 'spec.rs'. 'spec.rs'
contained some complicated formatting, where parts of log messages
were printed directly to stdout with `print`s. That was a bit messed
up because the log normally goes to stderr, but those lines were
printed to stdout. In our docker images, stderr and stdout both go to
the same place so you wouldn't notice, but I don't think it was
intentional.

This changes the log format to the default
'tracing_subscriber::format' format. It's different from the Postgres
log format, however, and because both compute_tools and Postgres print
to the same log, it's now a mix of two different formats.  I'm not
sure how the Grafana log parsing pipeline can handle that. If it's a
problem, we can build custom formatter to change the compute_tools log
format to be the same as Postgres's, like it was before this commit,
or we can change the Postgres log format to match tracing_formatter's,
or we can start printing compute_tool's log output to a different
destination than Postgres
2023-01-23 17:46:12 +02:00
Kirill Bulatov
ffca97bc1e Enable logs in unit tests 2023-01-23 17:46:12 +02:00
Kirill Bulatov
cb356f3259 Use actual temporary dir for pageserver unit tests 2023-01-23 17:46:12 +02:00
Vadim Kharitonov
c85374295f Change SENTRY_ENVIRONMENT from "development" to "staging" 2023-01-23 17:46:12 +02:00
Anastasia Lubennikova
4992160677 Fix metric_collection_endpoint for prod.
It was incorrectly set to staging url
2023-01-23 17:46:12 +02:00
Heikki Linnakangas
bd535b3371 If an error happens while checking for core dumps, don't panic.
If we panic, we skip the 30s wait in 'main', and don't give the
console a chance to observe the error. Which is not nice.

Spotted by @ololobus at
https://github.com/neondatabase/neon/pull/3352#discussion_r1072806981
2023-01-23 17:46:12 +02:00
Kirill Bulatov
d90c5a03af Add more io::Error context when fail to operate on a path (#3254)
I have a test failure that shows 

```
Caused by:
    0: Failed to reconstruct a page image:
    1: Directory not empty (os error 39)
```

but does not really show where exactly that happens.

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d

The PR aims to add more context in debugging that issue.
2023-01-23 17:46:12 +02:00
Anastasia Lubennikova
2d02cc9079 Merge pull request #3365 from neondatabase/main
Release 2023-01-17
2023-01-17 16:41:34 +02:00
Christian Schwarz
49ad94b99f Merge pull request #3301 from neondatabase/release-2023-01-10
Release 2023-01-10
2023-01-10 16:42:26 +01:00
Christian Schwarz
948a217398 Merge commit '95bf19b85a06b27a7fc3118dee03d48648efab15' into release-2023-01-10
Conflicts:
        .github/helm-values/neon-stress.proxy-scram.yaml
        .github/helm-values/neon-stress.proxy.yaml
        .github/helm-values/staging.proxy-scram.yaml
        .github/helm-values/staging.proxy.yaml
        All of the above were deleted in `main` after we hotfixed them
        in `release. Deleting them here
        storage_broker/src/bin/storage_broker.rs
        Hotfix toned down logging, but `main` has sinced implemented
        a proper fix. Taken `main`'s side, see
        https://neondb.slack.com/archives/C033RQ5SPDH/p1673354385387479?thread_ts=1673354306.474729&cid=C033RQ5SPDH

closes https://github.com/neondatabase/neon/issues/3287
2023-01-10 15:40:14 +01:00
Dmitry Rodionov
125381eae7 Merge pull request #3236 from neondatabase/dkr/retrofit-sk4-sk4-change
Move zenith-1-sk-3 to zenith-1-sk-4 (#3164)
2022-12-30 14:13:50 +03:00
Arthur Petukhovsky
cd01bbc715 Move zenith-1-sk-3 to zenith-1-sk-4 (#3164) 2022-12-30 12:32:52 +02:00
Dmitry Rodionov
d8b5e3b88d Merge pull request #3229 from neondatabase/dkr/add-pageserver-for-release
add pageserver to new region see https://github.com/neondatabase/aws/pull/116

decrease log volume for pageserver
2022-12-30 12:34:04 +03:00
Dmitry Rodionov
06d25f2186 switch to debug from info to produce less noise 2022-12-29 17:48:47 +02:00
Dmitry Rodionov
f759b561f3 add pageserver to new region see https://github.com/neondatabase/aws/pull/116 2022-12-29 17:17:35 +02:00
Sergey Melnikov
ece0555600 Push proxy metrics to Victoria Metrics (#3106) 2022-12-16 14:44:49 +02:00
Joonas Koivunen
73ea0a0b01 fix(remote_storage): use cached credentials (#3128)
IMDSv2 has limits, and if we query it on every s3 interaction we are
going to go over those limits. Changes the s3_bucket client
configuration to use:
- ChainCredentialsProvider to handle env variables or imds usage
- LazyCachingCredentialsProvider to actually cache any credentials

Related: https://github.com/awslabs/aws-sdk-rust/issues/629
Possibly related: https://github.com/neondatabase/neon/issues/3118
2022-12-16 14:44:49 +02:00
Arseny Sher
d8f6d6fd6f Merge pull request #3126 from neondatabase/broker-lb-release
Deploy broker with L4 LB in new env.
2022-12-16 01:25:28 +03:00
Arseny Sher
d24de169a7 Deploy broker with L4 LB in new env.
Seems to be fixing issue with missing keepalives.
2022-12-16 01:45:32 +04:00
Arseny Sher
0816168296 Hotfix: terminate subscription if channel is full.
Might help as a hotfix, but need to understand root better.
2022-12-15 12:23:56 +03:00
Dmitry Rodionov
277b44d57a Merge pull request #3102 from neondatabase/main
Hotfix. See commits for details
2022-12-14 19:38:43 +03:00
MMeent
68c2c3880e Merge pull request #3038 from neondatabase/main
Release 22-12-14
2022-12-14 14:35:47 +01:00
Arthur Petukhovsky
49da498f65 Merge pull request #2833 from neondatabase/main
Release 2022-11-16
2022-11-17 08:44:10 +01:00
Stas Kelvich
2c76ba3dd7 Merge pull request #2718 from neondatabase/main-rc-22-10-28
Release 22-10-28
2022-10-28 20:33:56 +03:00
Arseny Sher
dbe3dc69ad Merge branch 'main' into main-rc-22-10-28
Release 22-10-28.
2022-10-28 19:10:11 +04:00
Arseny Sher
8e5bb3ed49 Enable etcd compaction in neon_local. 2022-10-27 12:53:20 +03:00
Stas Kelvich
ab0be7b8da Avoid debian-testing packages in compute Dockerfiles
plv8 can only be built with a fairly new gold linker version. We used to install
it via binutils packages from testing, but it also updates libc and that causes
troubles in the resulting image as different extensions were built against
different libc versions. We could either use libc from debian-testing everywhere
or restrain from using testing packages and install necessary programs manually.
This patch uses the latter approach: gold for plv8 and cmake for h3 are
installed manually.

In a passing declare h3_postgis as a safe extension (previous omission).
2022-10-27 12:53:20 +03:00
bojanserafimov
b4c55f5d24 Move pagestream api to libs/pageserver_api (#2698) 2022-10-27 12:53:20 +03:00
mikecaat
ede70d833c Add a docker-compose example file (#1943) (#2666)
Co-authored-by: Masahiro Ikeda <masahiro.ikeda.us@hco.ntt.co.jp>
2022-10-27 12:53:20 +03:00
Sergey Melnikov
70c3d18bb0 Do not release to new staging proxies on release (#2685) 2022-10-27 12:53:20 +03:00
bojanserafimov
7a491f52c4 Add draw_timeline binary (#2688) 2022-10-27 12:53:20 +03:00
Alexander Bayandin
323c4ecb4f Add data format backward compatibility tests (#2626) 2022-10-27 12:53:20 +03:00
Anastasia Lubennikova
3d2466607e Merge pull request #2692 from neondatabase/main-rc
Release 2022-10-25
2022-10-25 18:18:58 +03:00
Anastasia Lubennikova
ed478b39f4 Merge branch 'release' into main-rc 2022-10-25 17:06:33 +03:00
Stas Kelvich
91585a558d Merge pull request #2678 from neondatabase/stas/hotfix_schema
Hotfix to disable grant create on public schema
2022-10-22 02:54:31 +03:00
Stas Kelvich
93467eae1f Hotfix to disable grant create on public schema
`GRANT CREATE ON SCHEMA public` fails if there is no schema `public`.
Disable it in release for now and make a better fix later (it is
needed for v15 support).
2022-10-22 02:26:28 +03:00
Stas Kelvich
f3aac81d19 Merge pull request #2668 from neondatabase/main
Release 2022-10-21
2022-10-21 15:21:42 +03:00
Stas Kelvich
979ad60c19 Merge pull request #2581 from neondatabase/main
Release 2022-10-07
2022-10-07 16:50:55 +03:00
Stas Kelvich
9316cb1b1f Merge pull request #2573 from neondatabase/main
Release 2022-10-06
2022-10-07 11:07:06 +03:00
Anastasia Lubennikova
e7939a527a Merge pull request #2377 from neondatabase/main
Release 2022-09-01
2022-09-01 20:20:44 +03:00
Arthur Petukhovsky
36d26665e1 Merge pull request #2299 from neondatabase/main
* Check for entire range during sasl validation (#2281)

* Gen2 GH runner (#2128)

* Re-add rustup override

* Try s3 bucket

* Set git version

* Use v4 cache key to prevent problems

* Switch to v5 for key

* Add second rustup fix

* Rebase

* Add kaniko steps

* Fix typo and set compress level

* Disable global run default

* Specify shell for step

* Change approach with kaniko

* Try less verbose shell spec

* Add submodule pull

* Add promote step

* Adjust dependency chain

* Try default swap again

* Use env

* Don't override aws key

* Make kaniko build conditional

* Specify runs on

* Try without dependency link

* Try soft fail

* Use image with git

* Try passing to next step

* Fix duplicate

* Try other approach

* Try other approach

* Fix typo

* Try other syntax

* Set env

* Adjust setup

* Try step 1

* Add link

* Try global env

* Fix mistake

* Debug

* Try other syntax

* Try other approach

* Change order

* Move output one step down

* Put output up one level

* Try other syntax

* Skip build

* Try output

* Re-enable build

* Try other syntax

* Skip middle step

* Update check

* Try first step of dockerhub push

* Update needs dependency

* Try explicit dir

* Add missing package

* Try other approach

* Try other approach

* Specify region

* Use with

* Try other approach

* Add debug

* Try other approach

* Set region

* Follow AWS example

* Try github approach

* Skip Qemu

* Try stdin

* Missing steps

* Add missing close

* Add echo debug

* Try v2 endpoint

* Use v1 endpoint

* Try without quotes

* Revert

* Try crane

* Add debug

* Split steps

* Fix duplicate

* Add shell step

* Conform to options

* Add verbose flag

* Try single step

* Try workaround

* First request fails hunch

* Try bullseye image

* Try other approach

* Adjust verbose level

* Try previous step

* Add more debug

* Remove debug step

* Remove rogue indent

* Try with larger image

* Add build tag step

* Update workflow for testing

* Add tag step for test

* Remove unused

* Update dependency chain

* Add ownership fix

* Use matrix for promote

* Force update

* Force build

* Remove unused

* Add new image

* Add missing argument

* Update dockerfile copy

* Update Dockerfile

* Update clone

* Update dockerfile

* Go to correct folder

* Use correct format

* Update dockerfile

* Remove cd

* Debug find where we are

* Add debug on first step

* Changedir to postgres

* Set workdir

* Use v1 approach

* Use other dependency

* Try other approach

* Try other approach

* Update dockerfile

* Update approach

* Update dockerfile

* Update approach

* Update dockerfile

* Update dockerfile

* Add workspace hack

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Change last step

* Cleanup pull in prep for review

* Force build images

* Add condition for latest tagging

* Use pinned version

* Try without name value

* Remove more names

* Shorten names

* Add kaniko comments

* Pin kaniko

* Pin crane and ecr helper

* Up one level

* Switch to pinned tag for rust image

* Force update for test

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box>

* Add missing step output, revert one deploy step (#2285)

* Add missing step output, revert one deploy step

* Conform to syntax

* Update approach

* Add missing value

* Add missing needs

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Error for fatal not git repo (#2286)

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Use main, not branch for ref check (#2288)

* Use main, not branch for ref check

* Add more debug

* Count main, not head

* Try new approach

* Conform to syntax

* Update approach

* Get full history

* Skip checkout

* Cleanup debug

* Remove more debug

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Fix docker zombie process issue (#2289)

* Fix docker zombie process issue

* Init everywhere

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Fix 1.63 clippy lints (#2282)

* split out timeline metrics, track layer map loading and size calculation

* reset rust cache for clippy run to avoid an ICE

additionally remove trailing whitespaces

* Rename pg_control_ffi.h to bindgen_deps.h, for clarity.

The pg_control_ffi.h name implies that it only includes stuff related to
pg_control.h. That's mostly true currently, but really the point of the
file is to include everything that we need to generate Rust definitions
from.

* Make local mypy behave like CI mypy (#2291)

* Fix flaky pageserver restarts in tests (#2261)

* Remove extra type aliases (#2280)

* Update cachepot endpoint (#2290)

* Update cachepot endpoint

* Update dockerfile & remove env

* Update image building process

* Cannot use metadata endpoint for this

* Update workflow

* Conform to kaniko syntax

* Update syntax

* Update approach

* Update dockerfiles

* Force update

* Update dockerfiles

* Update dockerfile

* Cleanup dockerfiles

* Update s3 test location

* Revert s3 experiment

* Add more debug

* Specify aws region

* Remove debug, add prefix

* Remove one more debug

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* workflows/benchmarking: increase timeout (#2294)

* Rework `init` in pageserver CLI  (#2272)

* Do not create initial tenant and timeline (adjust Python tests for that)
* Rework config handling during init, add --update-config to manage local config updates

* Fix: Always build images (#2296)

* Always build images

* Remove unused

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>

* Move auto-generated 'bindings' to a separate inner module.

Re-export only things that are used by other modules.

In the future, I'm imagining that we run bindgen twice, for Postgres
v14 and v15. The two sets of bindings would go into separate
'bindings_v14' and 'bindings_v15' modules.

Rearrange postgres_ffi modules.

Move function, to avoid Postgres version dependency in timelines.rs
Move function to generate a logical-message WAL record to postgres_ffi.

* fix cargo test

* Fix walreceiver and safekeeper bugs (#2295)

- There was an issue with zero commit_lsn `reason: LaggingWal { current_commit_lsn: 0/0, new_commit_lsn: 1/6FD90D38, threshold: 10485760 } }`. The problem was in `send_wal.rs`, where we initialized `end_pos = Lsn(0)` and in some cases sent it to the pageserver.
- IDENTIFY_SYSTEM previously returned `flush_lsn` as a physical end of WAL. Now it returns `flush_lsn` (as it was) to walproposer and `commit_lsn` to everyone else including pageserver.
- There was an issue with backoff where connection was cancelled right after initialization: `connected!` -> `safekeeper_handle_db: Connection cancelled` -> `Backoff: waiting 3 seconds`. The problem was in sleeping before establishing the connection. This is fixed by reworking retry logic.
- There was an issue with getting `NoKeepAlives` reason in a loop. The issue is probably the same as the previous.
- There was an issue with filtering safekeepers based on retry attempts, which could filter some safekeepers indefinetely. This is fixed by using retry cooldown duration instead of retry attempts.
- Some `send_wal.rs` connections failed with errors without context. This is fixed by adding a timeline to safekeepers errors.

New retry logic works like this:
- Every candidate has a `next_retry_at` timestamp and is not considered for connection until that moment
- When walreceiver connection is closed, we update `next_retry_at` using exponential backoff, increasing the cooldown on every disconnect.
- When `last_record_lsn` was advanced using the WAL from the safekeeper, we reset the retry cooldown and exponential backoff, allowing walreceiver to reconnect to the same safekeeper instantly.

* on safekeeper registration pass availability zone param (#2292)

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com>
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Anton Galitsyn <agalitsyn@users.noreply.github.com>
2022-08-18 15:32:33 +03:00
Arthur Petukhovsky
873347f977 Merge pull request #2275 from neondatabase/main
* github/workflows: Fix git dubious ownership (#2223)

* Move relation size cache from WalIngest to DatadirTimeline (#2094)

* Move relation sie cache to layered timeline

* Fix obtaining current LSN for relation size cache

* Resolve merge conflicts

* Resolve merge conflicts

* Reestore 'lsn' field in DatadirModification

* adjust DatadirModification lsn in ingest_record

* Fix formatting

* Pass lsn to get_relsize

* Fix merge conflict

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* refactor: replace lazy-static with once-cell (#2195)

- Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy`
- fixes #1147

Signed-off-by: Ankur Srivastava <best.ankur@gmail.com>

* Add more buckets to pageserver latency metrics (#2225)

* ignore record property warning to fix benchmarks

* increase statement timeout

* use event so it fires only if workload thread successfully finished

* remove debug log

* increase timeout to pass test with real s3

* avoid duplicate parameter, increase timeout

* Major migration script (#2073)

This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview.

Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>

* Fix etcd typos

* Fix links to safekeeper protocol docs. (#2188)

safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in
commit 0b14fdb078, as part of reorganizing the docs into 'mdbook' format.

Fixes issue #1475. Thanks to @banks for spotting the outdated references.

In addition to fixing the above issue, this patch also fixes other broken links as a result of 0b14fdb078. See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Thang Pham <thang@neon.tech>

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

* support node id and remote storage params in docker_entrypoint.sh

* Safe truncate (#2218)

* Move relation sie cache to layered timeline

* Fix obtaining current LSN for relation size cache

* Resolve merge conflicts

* Resolve merge conflicts

* Reestore 'lsn' field in DatadirModification

* adjust DatadirModification lsn in ingest_record

* Fix formatting

* Pass lsn to get_relsize

* Fix merge conflict

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/pgdatadir_mapping.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Check if relation exists before trying to truncat it

refer #1932

* Add test reporducing FSM truncate problem

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Fix exponential backoff values

* Update back `vendor/postgres` back; it was changed accidentally. (#2251)

Commit 4227cfc96e accidentally reverted vendor/postgres to an older
version. Update it back.

* Add pageserver checkpoint_timeout option.

To flush inmemory layer eventually when no new data arrives, which helps
safekeepers to suspend activity (stop pushing to the broker). Default 10m should
be ok.

* Share exponential backoff code and fix logic for delete task failure (#2252)

* Fix bug when import large (>1GB) relations (#2172)

Resolves #2097 

- use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get`
- update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. 
  + `small` is the old test
  + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build

* Fix timeline physical size flaky tests (#2244)

Resolves #2212.

- use `wait_for_last_flush_lsn` in `test_timeline_physical_size_*` tests

## Context
Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk.

### Note
The `LayerFlushThread` threads are processed **concurrently**, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly

* postgres_ffi/waldecoder: validate more header fields

* postgres_ffi/waldecoder: remove unused startlsn

* postgres_ffi/waldecoder: introduce explicit `enum State`

Previously it was emulated with a combination of nullable fields.
This change should make the logic more readable.

* disable `test_import_from_pageserver_multisegment` (#2258)

This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure.

See: #2255, #2256

* get_binaries uses DOCKER_TAG taken from docker image build step (#2260)

* [proxy] Rework wire format of the password hack and some errors (#2236)

The new format has a few benefits: it's shorter, simpler and
human-readable as well. We don't use base64 anymore, since
url encoding got us covered.

We also show a better error in case we couldn't parse the
payload; the users should know it's all about passing the
correct project name.

* test_runner/pg_clients: collect docker logs (#2259)

* get_binaries script fix (#2263)

* get_binaries uses DOCKER_TAG taken from docker image build step

* remove docker tag discovery at all and fix get_binaries for version variable

* Better storage sync logs (#2268)

* Find end of WAL on safekeepers using WalStreamDecoder.

We could make it inside wal_storage.rs, but taking into account that
 - wal_storage.rs reading is async
 - we don't need s3 here
 - error handling is different; error during decoding is normal
I decided to put it separately.

Test
cargo test test_find_end_of_wal_last_crossing_segment
prepared earlier by @yeputons passes now.

Fixes https://github.com/neondatabase/neon/issues/544
      https://github.com/neondatabase/cloud/issues/2004
Supersedes https://github.com/neondatabase/neon/pull/2066

* Improve walreceiver logic (#2253)

This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers.

- There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237
- Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down.
- Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second.
- `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast.
- `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck.

* increase timeout in wait_for_upload to avoid spurious failures when testing with real s3

* Bump vendor/postgres to include XLP_FIRST_IS_CONTRECORD fix. (#2274)

* Set up a workflow to run pgbench against captest (#2077)

Signed-off-by: Ankur Srivastava <best.ankur@gmail.com>
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru>
Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>
Co-authored-by: Ankur Srivastava <ansrivas@users.noreply.github.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Kirill Bulatov <kirill@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Thang Pham <thang@neon.tech>
Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com>
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
Co-authored-by: Egor Suvorov <egor@neon.tech>
Co-authored-by: Andrey Taranik <andrey@cicd.team>
Co-authored-by: Dmitry Ivanov <ivadmi5@gmail.com>
2022-08-15 21:30:45 +03:00
Arthur Petukhovsky
e814ac16f9 Merge pull request #2219 from neondatabase/main
Release 2022-08-04
2022-08-04 20:06:34 +03:00
Heikki Linnakangas
ad3055d386 Merge pull request #2203 from neondatabase/release-uuid-ossp
Deploy new storage and compute version to production

Release 2022-08-02
2022-08-02 15:08:14 +03:00
Heikki Linnakangas
94e03eb452 Merge remote-tracking branch 'origin/main' into 'release'
Release 2022-08-01
2022-08-02 12:43:49 +03:00
Sergey Melnikov
380f26ef79 Merge pull request #2170 from neondatabase/main (Release 2022-07-28)
Release 2022-07-28
2022-07-28 14:16:52 +03:00
Arthur Petukhovsky
3c5b7f59d7 Merge pull request #2119 from neondatabase/main
Release 2022-07-19
2022-07-19 11:58:48 +03:00
Arthur Petukhovsky
fee89f80b5 Merge pull request #2115 from neondatabase/main-2022-07-18
Release 2022-07-18
2022-07-18 19:21:11 +03:00
Arthur Petukhovsky
41cce8eaf1 Merge remote-tracking branch 'origin/release' into main-2022-07-18 2022-07-18 18:21:20 +03:00
Alexey Kondratov
f88fe0218d Merge pull request #1842 from neondatabase/release-deploy-hotfix
[HOTFIX] Release deploy fix

This PR uses this branch neondatabase/postgres#171 and several required commits from the main to use only locally built compute-tools. This should allow us to rollout safekeepers sync issue fix on prod
2022-06-01 11:04:30 +03:00
Alexey Kondratov
cc856eca85 Install missing openssl packages in the Github Actions workflow 2022-05-31 21:31:31 +02:00
Alexey Kondratov
cf350c6002 Use :local compute-tools tag to build compute-node image 2022-05-31 21:31:16 +02:00
Arseny Sher
0ce6b6a0a3 Merge pull request #1836 from neondatabase/release-hotfix-basebackup-lsn-page-boundary
Bump vendor/postgres to hotfix basebackup LSN comparison.
2022-05-31 16:54:03 +04:00
Arseny Sher
73f247d537 Bump vendor/postgres to hotfix basebackup LSN comparison. 2022-05-31 16:00:50 +04:00
Andrey Taranik
960be82183 Merge pull request #1792 from neondatabase/main
Release 2202-05-25 (second)
2022-05-25 16:37:57 +03:00
Andrey Taranik
806e5a6c19 Merge pull request #1787 from neondatabase/main
Release 2022-05-25
2022-05-25 13:34:11 +03:00
Alexey Kondratov
8d5df07cce Merge pull request #1385 from zenithdb/main
Release main 2022-03-22
2022-03-22 05:04:34 -05:00
Andrey Taranik
df7a9d1407 release fix 2022-03-16 (#1375) 2022-03-17 00:43:28 +03:00
1541 changed files with 195475 additions and 56641 deletions

View File

@@ -3,6 +3,16 @@
# by the RUSTDOCFLAGS env var in CI.
rustdocflags = ["-Arustdoc::private_intra_doc_links"]
# Enable frame pointers. This may have a minor performance overhead, but makes it easier and more
# efficient to obtain stack traces (and thus CPU/heap profiles). It may also avoid seg faults that
# we've seen with libunwind-based profiling. See also:
#
# * <https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>
# * <https://github.com/rust-lang/rust/pull/122646>
#
# NB: the RUSTFLAGS envvar will replace this. Make sure to update e.g. Dockerfile as well.
rustflags = ["-Cforce-frame-pointers=yes"]
[alias]
build_testing = ["build", "--features", "testing"]
neon = ["run", "--bin", "neon_local"]

View File

@@ -46,6 +46,9 @@ workspace-members = [
"utils",
"wal_craft",
"walproposer",
"postgres-protocol2",
"postgres-types2",
"tokio-postgres2",
]
# Write out exact versions rather than a semver range. (Defaults to false.)

View File

@@ -4,27 +4,27 @@
!Cargo.lock
!Cargo.toml
!Makefile
!postgres.mk
!rust-toolchain.toml
!scripts/combine_control_files.py
!scripts/ninstall.sh
!vm-cgconfig.conf
!docker-compose/run-tests.sh
# Directories
!.cargo/
!.config/
!compute/
!compute_tools/
!control_plane/
!docker-compose/ext-src
!libs/
!neon_local/
!pageserver/
!patches/
!pgxn/
!proxy/
!endpoint_storage/
!storage_scrubber/
!safekeeper/
!storage_broker/
!storage_controller/
!trace/
!vendor/postgres-*/
!workspace_hack/
!build_tools/patches

View File

@@ -3,6 +3,7 @@ name: Bug Template
about: Used for describing bugs
title: ''
labels: t/bug
type: Bug
assignees: ''
---

View File

@@ -4,6 +4,7 @@ about: A set of related tasks contributing towards specific outcome, comprising
more than 1 week of work.
title: 'Epic: '
labels: t/Epic
type: Epic
assignees: ''
---

View File

@@ -1,21 +0,0 @@
## Release 202Y-MM-DD
**NB: this PR must be merged only by 'Create a merge commit'!**
### Checklist when preparing for release
- [ ] Read or refresh [the release flow guide](https://www.notion.so/neondatabase/Release-general-flow-61f2e39fd45d4d14a70c7749604bd70b)
- [ ] Ask in the [cloud Slack channel](https://neondb.slack.com/archives/C033A2WE6BZ) that you are going to rollout the release. Any blockers?
- [ ] Does this release contain any db migrations? Destructive ones? What is the rollback plan?
<!-- List everything that should be done **before** release, any issues / setting changes / etc -->
### Checklist after release
- [ ] Make sure instructions from PRs included in this release and labeled `manual_release_instructions` are executed (either by you or by people who wrote them).
- [ ] Based on the merged commits write release notes and open a PR into `website` repo ([example](https://github.com/neondatabase/website/pull/219/files))
- [ ] Check [#dev-production-stream](https://neondb.slack.com/archives/C03F5SM1N02) Slack channel
- [ ] Check [stuck projects page](https://console.neon.tech/admin/projects?sort=last_active&order=desc&stuck=true)
- [ ] Check [recent operation failures](https://console.neon.tech/admin/operations?action=create_timeline%2Cstart_compute%2Cstop_compute%2Csuspend_compute%2Capply_config%2Cdelete_timeline%2Cdelete_tenant%2Ccreate_branch%2Ccheck_availability&sort=updated_at&order=desc&had_retries=some)
- [ ] Check [cloud SLO dashboard](https://neonprod.grafana.net/d/_oWcBMJ7k/cloud-slos?orgId=1)
- [ ] Check [compute startup metrics dashboard](https://neonprod.grafana.net/d/5OkYJEmVz/compute-startup-time)
<!-- List everything that should be done **after** release, any admin UI configuration / Grafana dashboard / alert changes / setting changes / etc -->

View File

@@ -4,9 +4,12 @@ self-hosted-runner:
- large
- large-arm64
- small
- small-metal
- small-arm64
- unit-perf
- us-east-2
config-variables:
- AWS_ECR_REGION
- AZURE_DEV_CLIENT_ID
- AZURE_DEV_REGISTRY_NAME
- AZURE_DEV_SUBSCRIPTION_ID
@@ -14,9 +17,30 @@ config-variables:
- AZURE_PROD_REGISTRY_NAME
- AZURE_PROD_SUBSCRIPTION_ID
- AZURE_TENANT_ID
- BENCHMARK_INGEST_TARGET_PROJECTID
- BENCHMARK_LARGE_OLTP_PROJECTID
- BENCHMARK_PROJECT_ID_PUB
- BENCHMARK_PROJECT_ID_SUB
- DEV_AWS_OIDC_ROLE_ARN
- DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN
- HETZNER_CACHE_BUCKET
- HETZNER_CACHE_ENDPOINT
- HETZNER_CACHE_REGION
- NEON_DEV_AWS_ACCOUNT_ID
- NEON_PROD_AWS_ACCOUNT_ID
- PGREGRESS_PG16_PROJECT_ID
- PGREGRESS_PG17_PROJECT_ID
- REMOTE_STORAGE_AZURE_CONTAINER
- REMOTE_STORAGE_AZURE_REGION
- SLACK_CICD_CHANNEL_ID
- SLACK_COMPUTE_CHANNEL_ID
- SLACK_ON_CALL_DEVPROD_STREAM
- SLACK_ON_CALL_QA_STAGING_STREAM
- SLACK_ON_CALL_STORAGE_STAGING_STREAM
- SLACK_ONCALL_COMPUTE_GROUP
- SLACK_ONCALL_PROXY_GROUP
- SLACK_ONCALL_STORAGE_GROUP
- SLACK_PROXY_CHANNEL_ID
- SLACK_RUST_CHANNEL_ID
- SLACK_STORAGE_CHANNEL_ID
- SLACK_UPCOMING_RELEASE_CHANNEL_ID
- DEV_AWS_OIDC_ROLE_ARN

View File

@@ -7,6 +7,9 @@ inputs:
type: boolean
required: false
default: false
aws-oidc-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
outputs:
base-url:
@@ -35,11 +38,14 @@ runs:
#
- name: Set variables
shell: bash -euxo pipefail {0}
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
BUCKET: neon-github-public-dev
run: |
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
if [ -n "${PR_NUMBER}" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \
[ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
@@ -55,8 +61,6 @@ runs:
echo "LOCK_FILE=${LOCK_FILE}" >> $GITHUB_ENV
echo "WORKDIR=${WORKDIR}" >> $GITHUB_ENV
echo "BUCKET=${BUCKET}" >> $GITHUB_ENV
env:
BUCKET: neon-github-public-dev
# TODO: We can replace with a special docker image with Java and Allure pre-installed
- uses: actions/setup-java@v4
@@ -66,6 +70,7 @@ runs:
- name: Install Allure
shell: bash -euxo pipefail {0}
working-directory: /tmp
run: |
if ! which allure; then
ALLURE_ZIP=allure-${ALLURE_VERSION}.zip
@@ -76,8 +81,15 @@ runs:
rm -f ${ALLURE_ZIP}
fi
env:
ALLURE_VERSION: 2.27.0
ALLURE_ZIP_SHA256: b071858fb2fa542c65d8f152c5c40d26267b2dfb74df1f1608a589ecca38e777
ALLURE_VERSION: 2.32.2
ALLURE_ZIP_SHA256: 3f28885e2118f6317c92f667eaddcc6491400af1fb9773c1f3797a5fa5174953
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oidc-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
# Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this
- name: Acquire lock
@@ -183,7 +195,7 @@ runs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry/virtualenvs
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- name: Store Allure test stat in the DB (new)
if: ${{ !cancelled() && inputs.store-test-results-into-db == 'true' }}
@@ -221,6 +233,8 @@ runs:
REPORT_URL: ${{ steps.generate-report.outputs.report-url }}
COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
const { REPORT_URL, COMMIT_SHA } = process.env

View File

@@ -8,6 +8,9 @@ inputs:
unique-key:
description: 'string to distinguish different results in the same run'
required: true
aws-oidc-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
@@ -15,11 +18,14 @@ runs:
steps:
- name: Set variables
shell: bash -euxo pipefail {0}
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
REPORT_DIR: ${{ inputs.report-dir }}
run: |
PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)
if [ "${PR_NUMBER}" != "null" ]; then
if [ -n "${PR_NUMBER}" ]; then
BRANCH_OR_PR=pr-${PR_NUMBER}
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then
elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \
[ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then
# Shortcut for special branches
BRANCH_OR_PR=${GITHUB_REF_NAME}
else
@@ -28,8 +34,13 @@ runs:
echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV
echo "REPORT_DIR=${REPORT_DIR}" >> $GITHUB_ENV
env:
REPORT_DIR: ${{ inputs.report-dir }}
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oidc-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
- name: Upload test results
shell: bash -euxo pipefail {0}

View File

@@ -15,10 +15,19 @@ inputs:
prefix:
description: "S3 prefix. Default is '${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
required: false
aws-oidc-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oidc-role-arn }}
role-duration-seconds: 3600
- name: Download artifact
id: download-artifact
shell: bash -euxo pipefail {0}

View File

@@ -84,7 +84,13 @@ runs:
--header "Authorization: Bearer ${API_KEY}"
)
role_name=$(echo $roles | jq --raw-output '.roles[] | select(.protected == false) | .name')
role_name=$(echo "$roles" | jq --raw-output '
(.roles | map(select(.protected == false))) as $roles |
if any($roles[]; .name == "neondb_owner")
then "neondb_owner"
else $roles[0].name
end
')
echo "role_name=${role_name}" >> $GITHUB_OUTPUT
env:
API_HOST: ${{ inputs.api_host }}
@@ -107,13 +113,13 @@ runs:
)
if [ -z "${reset_password}" ]; then
sleep 1
sleep $i
continue
fi
password=$(echo $reset_password | jq --raw-output '.role.password')
if [ "${password}" == "null" ]; then
sleep 1
sleep $i # increasing backoff
continue
fi

View File

@@ -17,6 +17,38 @@ inputs:
compute_units:
description: '[Min, Max] compute units'
default: '[1, 1]'
# settings below only needed if you want the project to be sharded from the beginning
shard_split_project:
description: 'by default new projects are not shard-split initiailly, but only when shard-split threshold is reached, specify true to explicitly shard-split initially'
required: false
default: 'false'
disable_sharding:
description: 'by default new projects use storage controller default policy to shard-split when shard-split threshold is reached, specify true to explicitly disable sharding'
required: false
default: 'false'
admin_api_key:
description: 'Admin API Key needed for shard-splitting. Must be specified if shard_split_project is true'
required: false
shard_count:
description: 'Number of shards to split the project into, only applies if shard_split_project is true'
required: false
default: '8'
stripe_size:
description: 'Stripe size, optional, in 8kiB pages. e.g. set 2048 for 16MB stripes. Default is 128 MiB, only applies if shard_split_project is true'
required: false
default: '32768'
psql_path:
description: 'Path to psql binary - it is caller responsibility to provision the psql binary'
required: false
default: '/tmp/neon/pg_install/v16/bin/psql'
libpq_lib_path:
description: 'Path to directory containing libpq library - it is caller responsibility to provision the libpq library'
required: false
default: '/tmp/neon/pg_install/v16/lib'
project_settings:
description: 'A JSON object with project settings'
required: false
default: '{}'
outputs:
dsn:
@@ -34,9 +66,9 @@ runs:
# A shell without `set -x` to not to expose password/dsn in logs
shell: bash -euo pipefail {0}
run: |
project=$(curl \
res=$(curl \
"https://${API_HOST}/api/v2/projects" \
--fail \
-w "%{http_code}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${API_KEY}" \
@@ -48,9 +80,18 @@ runs:
\"provisioner\": \"k8s-neonvm\",
\"autoscaling_limit_min_cu\": ${MIN_CU},
\"autoscaling_limit_max_cu\": ${MAX_CU},
\"settings\": { }
\"settings\": ${PROJECT_SETTINGS}
}
}")
code=${res: -3}
if [[ ${code} -ge 400 ]]; then
echo Request failed with error code ${code}
echo ${res::-3}
exit 1
else
project=${res::-3}
fi
# Mask password
echo "::add-mask::$(echo $project | jq --raw-output '.roles[] | select(.name != "web_access") | .password')"
@@ -63,6 +104,39 @@ runs:
echo "project_id=${project_id}" >> $GITHUB_OUTPUT
echo "Project ${project_id} has been created"
if [ "${SHARD_SPLIT_PROJECT}" = "true" ]; then
# determine tenant ID
TENANT_ID=`${PSQL} ${dsn} -t -A -c "SHOW neon.tenant_id"`
echo "Splitting project ${project_id} with tenant_id ${TENANT_ID} into $((SHARD_COUNT)) shards with stripe size $((STRIPE_SIZE))"
echo "Sending PUT request to https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split"
echo "with body {\"new_shard_count\": $((SHARD_COUNT)), \"new_stripe_size\": $((STRIPE_SIZE))}"
# we need an ADMIN API KEY to invoke storage controller API for shard splitting (bash -u above checks that the variable is set)
curl -X PUT \
"https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split" \
-H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
-d "{\"new_shard_count\": $SHARD_COUNT, \"new_stripe_size\": $STRIPE_SIZE}"
fi
if [ "${DISABLE_SHARDING}" = "true" ]; then
# determine tenant ID
TENANT_ID=`${PSQL} ${dsn} -t -A -c "SHOW neon.tenant_id"`
echo "Explicitly disabling shard-splitting for project ${project_id} with tenant_id ${TENANT_ID}"
echo "Sending PUT request to https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/policy"
echo "with body {\"scheduling\": \"Essential\"}"
# we need an ADMIN API KEY to invoke storage controller API for shard splitting (bash -u above checks that the variable is set)
curl -X PUT \
"https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/policy" \
-H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
-d "{\"scheduling\": \"Essential\"}"
fi
env:
API_HOST: ${{ inputs.api_host }}
API_KEY: ${{ inputs.api_key }}
@@ -70,3 +144,11 @@ runs:
POSTGRES_VERSION: ${{ inputs.postgres_version }}
MIN_CU: ${{ fromJSON(inputs.compute_units)[0] }}
MAX_CU: ${{ fromJSON(inputs.compute_units)[1] }}
SHARD_SPLIT_PROJECT: ${{ inputs.shard_split_project }}
DISABLE_SHARDING: ${{ inputs.disable_sharding }}
ADMIN_API_KEY: ${{ inputs.admin_api_key }}
SHARD_COUNT: ${{ inputs.shard_count }}
STRIPE_SIZE: ${{ inputs.stripe_size }}
PSQL: ${{ inputs.psql_path }}
LD_LIBRARY_PATH: ${{ inputs.libpq_lib_path }}
PROJECT_SETTINGS: ${{ inputs.project_settings }}

View File

@@ -36,18 +36,26 @@ inputs:
description: 'Region name for real s3 tests'
required: false
default: ''
rerun_flaky:
description: 'Whether to rerun flaky tests'
rerun_failed:
description: 'Whether to rerun failed tests'
required: false
default: 'false'
pg_version:
description: 'Postgres version to use for tests'
required: false
default: 'v16'
sanitizers:
description: 'enabled or disabled'
required: false
default: 'disabled'
type: string
benchmark_durations:
description: 'benchmark durations JSON'
required: false
default: '{}'
aws-oidc-role-arn:
description: 'OIDC role arn to interract with S3'
required: true
runs:
using: "composite"
@@ -56,8 +64,9 @@ runs:
if: inputs.build_type != 'remote'
uses: ./.github/actions/download
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}${{ inputs.sanitizers == 'enabled' && '-sanitized' || '' }}-artifact
path: /tmp/neon
aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}
- name: Download Neon binaries for the previous release
if: inputs.build_type != 'remote'
@@ -66,6 +75,7 @@ runs:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact
path: /tmp/neon-previous
prefix: latest
aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}
- name: Download compatibility snapshot
if: inputs.build_type != 'remote'
@@ -77,6 +87,7 @@ runs:
# The lack of compatibility snapshot (for example, for the new Postgres version)
# shouldn't fail the whole job. Only relevant test should fail.
skip-if-does-not-exist: true
aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}
- name: Checkout
if: inputs.needs_postgres_source == 'true'
@@ -88,7 +99,7 @@ runs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry/virtualenvs
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- name: Install Python deps
shell: bash -euxo pipefail {0}
@@ -102,10 +113,9 @@ runs:
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: ${{ inputs.build_type }}
COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}
ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'backward compatibility breakage')
ALLOW_FORWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'forward compatibility breakage')
RERUN_FLAKY: ${{ inputs.rerun_flaky }}
RERUN_FAILED: ${{ inputs.rerun_failed }}
PG_VERSION: ${{ inputs.pg_version }}
SANITIZERS: ${{ inputs.sanitizers }}
shell: bash -euxo pipefail {0}
run: |
# PLATFORM will be embedded in the perf test report
@@ -115,12 +125,15 @@ runs:
export DEFAULT_PG_VERSION=${PG_VERSION#v}
export LD_LIBRARY_PATH=${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/lib
export BENCHMARK_CONNSTR=${BENCHMARK_CONNSTR:-}
export ASAN_OPTIONS=detect_leaks=0:detect_stack_use_after_return=0:abort_on_error=1:strict_string_checks=1:check_initialization_order=1:strict_init_order=1
export UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1
if [ "${BUILD_TYPE}" = "remote" ]; then
export REMOTE_ENV=1
fi
PERF_REPORT_DIR="$(realpath test_runner/perf-report-local)"
echo "PERF_REPORT_DIR=${PERF_REPORT_DIR}" >> ${GITHUB_ENV}
rm -rf $PERF_REPORT_DIR
TEST_SELECTION="test_runner/${{ inputs.test_selection }}"
@@ -150,15 +163,8 @@ runs:
EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"
fi
if [ "${RERUN_FLAKY}" == "true" ]; then
mkdir -p $TEST_OUTPUT
poetry run ./scripts/flaky_tests.py "${TEST_RESULT_CONNSTR}" \
--days 7 \
--output "$TEST_OUTPUT/flaky.json" \
--pg-version "${DEFAULT_PG_VERSION}" \
--build-type "${BUILD_TYPE}"
EXTRA_PARAMS="--flaky-tests-json $TEST_OUTPUT/flaky.json $EXTRA_PARAMS"
if [ "${RERUN_FAILED}" == "true" ]; then
EXTRA_PARAMS="--reruns 2 $EXTRA_PARAMS"
fi
# We use pytest-split plugin to run benchmarks in parallel on different CI runners
@@ -204,11 +210,12 @@ runs:
--verbose \
-rA $TEST_SELECTION $EXTRA_PARAMS
if [[ "${{ inputs.save_perf_report }}" == "true" ]]; then
export REPORT_FROM="$PERF_REPORT_DIR"
export REPORT_TO="$PLATFORM"
scripts/generate_and_push_perf_report.sh
fi
- name: Upload performance report
if: ${{ !cancelled() && inputs.save_perf_report == 'true' }}
shell: bash -euxo pipefail {0}
run: |
export REPORT_FROM="${PERF_REPORT_DIR}"
scripts/generate_and_push_perf_report.sh
- name: Upload compatibility snapshot
# Note, that we use `github.base_ref` which is a target branch for a PR
@@ -218,10 +225,22 @@ runs:
name: compatibility-snapshot-${{ runner.arch }}-${{ inputs.build_type }}-pg${{ inputs.pg_version }}
# Directory is created by test_compatibility.py::test_create_snapshot, keep the path in sync with the test
path: /tmp/test_output/compatibility_snapshot_pg${{ inputs.pg_version }}/
# The lack of compatibility snapshot shouldn't fail the job
# (for example if we didn't run the test for non build-and-test workflow)
skip-if-does-not-exist: true
aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}
- uses: aws-actions/configure-aws-credentials@v4
if: ${{ !cancelled() }}
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oidc-role-arn }}
role-duration-seconds: 3600 # 1 hour should be more than enough to upload report
- name: Upload test results
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-store
with:
report-dir: /tmp/test_output/allure/results
unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}
unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}-${{ runner.arch }}
aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}

View File

@@ -14,9 +14,11 @@ runs:
name: coverage-data-artifact
path: /tmp/coverage
skip-if-does-not-exist: true # skip if there's no previous coverage to download
aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}
- name: Upload coverage data
uses: ./.github/actions/upload
with:
name: coverage-data-artifact
path: /tmp/coverage
aws-oidc-role-arn: ${{ inputs.aws-oidc-role-arn }}

View File

@@ -1,36 +0,0 @@
name: "Set custom docker config directory"
description: "Create a directory for docker config and set DOCKER_CONFIG"
# Use custom DOCKER_CONFIG directory to avoid conflicts with default settings
runs:
using: "composite"
steps:
- name: Show warning on GitHub-hosted runners
if: runner.environment == 'github-hosted'
shell: bash -euo pipefail {0}
run: |
# Using the following environment variables to find a path to the workflow file
# ${GITHUB_WORKFLOW_REF} - octocat/hello-world/.github/workflows/my-workflow.yml@refs/heads/my_branch
# ${GITHUB_REPOSITORY} - octocat/hello-world
# ${GITHUB_REF} - refs/heads/my_branch
# From https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/variables
filename_with_ref=${GITHUB_WORKFLOW_REF#"$GITHUB_REPOSITORY/"}
filename=${filename_with_ref%"@$GITHUB_REF"}
# https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#setting-a-warning-message
title='Unnecessary usage of `.github/actions/set-docker-config-dir`'
message='No need to use `.github/actions/set-docker-config-dir` action on GitHub-hosted runners'
echo "::warning file=${filename},title=${title}::${message}"
- uses: pyTooling/Actions/with-post-step@74afc5a42a17a046c90c68cb5cfa627e5c6c5b6b # v1.0.7
env:
DOCKER_CONFIG: .docker-custom-${{ github.run_id }}-${{ github.run_attempt }}
with:
main: |
mkdir -p "${DOCKER_CONFIG}"
echo DOCKER_CONFIG=${DOCKER_CONFIG} | tee -a $GITHUB_ENV
post: |
if [ -d "${DOCKER_CONFIG}" ]; then
rm -r "${DOCKER_CONFIG}"
fi

View File

@@ -7,18 +7,28 @@ inputs:
path:
description: "A directory or file to upload"
required: true
skip-if-does-not-exist:
description: "Allow to skip if path doesn't exist, fail otherwise"
default: false
required: false
prefix:
description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
required: false
aws-oidc-role-arn:
description: "the OIDC role arn for aws auth"
required: false
default: ""
runs:
using: "composite"
steps:
- name: Prepare artifact
id: prepare-artifact
shell: bash -euxo pipefail {0}
env:
SOURCE: ${{ inputs.path }}
ARCHIVE: /tmp/uploads/${{ inputs.name }}.tar.zst
SKIP_IF_DOES_NOT_EXIST: ${{ inputs.skip-if-does-not-exist }}
run: |
mkdir -p $(dirname $ARCHIVE)
@@ -33,14 +43,29 @@ runs:
elif [ -f ${SOURCE} ]; then
time tar -cf ${ARCHIVE} --zstd ${SOURCE}
elif ! ls ${SOURCE} > /dev/null 2>&1; then
echo >&2 "${SOURCE} does not exist"
exit 2
if [ "${SKIP_IF_DOES_NOT_EXIST}" = "true" ]; then
echo 'SKIPPED=true' >> $GITHUB_OUTPUT
exit 0
else
echo >&2 "${SOURCE} does not exist"
exit 2
fi
else
echo >&2 "${SOURCE} is neither a directory nor a file, do not know how to handle it"
exit 3
fi
echo 'SKIPPED=false' >> $GITHUB_OUTPUT
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-region: eu-central-1
role-to-assume: ${{ inputs.aws-oidc-role-arn }}
role-duration-seconds: 3600
- name: Upload artifact
if: ${{ steps.prepare-artifact.outputs.SKIPPED == 'false' }}
shell: bash -euxo pipefail {0}
env:
SOURCE: ${{ inputs.path }}

13
.github/file-filters.yaml vendored Normal file
View File

@@ -0,0 +1,13 @@
rust_code: ['**/*.rs', '**/Cargo.toml', '**/Cargo.lock']
rust_dependencies: ['**/Cargo.lock']
v14: ['vendor/postgres-v14/**', 'Makefile', 'pgxn/**']
v15: ['vendor/postgres-v15/**', 'Makefile', 'pgxn/**']
v16: ['vendor/postgres-v16/**', 'Makefile', 'pgxn/**']
v17: ['vendor/postgres-v17/**', 'Makefile', 'pgxn/**']
rebuild_neon_extra:
- .github/workflows/neon_extra_builds.yml
rebuild_macos:
- .github/workflows/build-macos.yml

View File

@@ -1,14 +1,3 @@
## Problem
## Summary of changes
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above checklist

69
.github/scripts/generate_image_maps.py vendored Normal file
View File

@@ -0,0 +1,69 @@
import itertools
import json
import os
import sys
source_tag = os.getenv("SOURCE_TAG")
target_tag = os.getenv("TARGET_TAG")
branch = os.getenv("BRANCH")
dev_acr = os.getenv("DEV_ACR")
prod_acr = os.getenv("PROD_ACR")
dev_aws = os.getenv("DEV_AWS")
prod_aws = os.getenv("PROD_AWS")
aws_region = os.getenv("AWS_REGION")
components = {
"neon": ["neon"],
"compute": [
"compute-node-v14",
"compute-node-v15",
"compute-node-v16",
"compute-node-v17",
"vm-compute-node-v14",
"vm-compute-node-v15",
"vm-compute-node-v16",
"vm-compute-node-v17",
],
}
registries = {
"dev": [
"docker.io/neondatabase",
"ghcr.io/neondatabase",
f"{dev_aws}.dkr.ecr.{aws_region}.amazonaws.com",
f"{dev_acr}.azurecr.io/neondatabase",
],
"prod": [
f"{prod_aws}.dkr.ecr.{aws_region}.amazonaws.com",
f"{prod_acr}.azurecr.io/neondatabase",
],
}
release_branches = ["release", "release-proxy", "release-compute"]
outputs: dict[str, dict[str, list[str]]] = {}
target_tags = (
[target_tag, "latest"]
if branch == "main"
else [target_tag, "released"]
if branch in release_branches
else [target_tag]
)
target_stages = ["dev", "prod"] if branch in release_branches else ["dev"]
for component_name, component_images in components.items():
for stage in target_stages:
outputs[f"{component_name}-{stage}"] = {
f"ghcr.io/neondatabase/{component_image}:{source_tag}": [
f"{registry}/{component_image}:{tag}"
for registry, tag in itertools.product(registries[stage], target_tags)
if not (registry == "ghcr.io/neondatabase" and tag == source_tag)
]
for component_image in component_images
}
with open(os.getenv("GITHUB_OUTPUT", "/dev/null"), "a") as f:
for key, value in outputs.items():
f.write(f"{key}={json.dumps(value)}\n")
print(f"Image map for {key}:\n{json.dumps(value, indent=2)}\n\n", file=sys.stderr)

110
.github/scripts/lint-release-pr.sh vendored Executable file
View File

@@ -0,0 +1,110 @@
#!/usr/bin/env bash
set -euo pipefail
DOCS_URL="https://docs.neon.build/overview/repositories/neon.html"
message() {
if [[ -n "${GITHUB_PR_NUMBER:-}" ]]; then
gh pr comment --repo "${GITHUB_REPOSITORY}" "${GITHUB_PR_NUMBER}" --edit-last --body "$1" \
|| gh pr comment --repo "${GITHUB_REPOSITORY}" "${GITHUB_PR_NUMBER}" --body "$1"
fi
echo "$1"
}
report_error() {
message "$1
For more details, see the documentation: ${DOCS_URL}"
exit 1
}
case "$RELEASE_BRANCH" in
"release") COMPONENT="Storage" ;;
"release-proxy") COMPONENT="Proxy" ;;
"release-compute") COMPONENT="Compute" ;;
*)
report_error "Unknown release branch: ${RELEASE_BRANCH}"
;;
esac
# Identify main and release branches
MAIN_BRANCH="origin/main"
REMOTE_RELEASE_BRANCH="origin/${RELEASE_BRANCH}"
# Find merge base
MERGE_BASE=$(git merge-base "${MAIN_BRANCH}" "${REMOTE_RELEASE_BRANCH}")
echo "Merge base of ${MAIN_BRANCH} and ${RELEASE_BRANCH}: ${MERGE_BASE}"
# Get the HEAD commit (last commit in PR, expected to be the merge commit)
LAST_COMMIT=$(git rev-parse HEAD)
MERGE_COMMIT_MESSAGE=$(git log -1 --format=%s "${LAST_COMMIT}")
EXPECTED_MESSAGE_REGEX="^$COMPONENT release [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2} UTC$"
if ! [[ "${MERGE_COMMIT_MESSAGE}" =~ ${EXPECTED_MESSAGE_REGEX} ]]; then
report_error "Merge commit message does not match expected pattern: '<component> release YYYY-MM-DD'
Expected component: ${COMPONENT}
Found: '${MERGE_COMMIT_MESSAGE}'"
fi
echo "✅ Merge commit message is correctly formatted: '${MERGE_COMMIT_MESSAGE}'"
LAST_COMMIT_PARENTS=$(git cat-file -p "${LAST_COMMIT}" | jq -sR '[capture("parent (?<parent>[0-9a-f]{40})"; "g") | .parent]')
if [[ "$(echo "${LAST_COMMIT_PARENTS}" | jq 'length')" -ne 2 ]]; then
report_error "Last commit must be a merge commit with exactly two parents"
fi
EXPECTED_RELEASE_HEAD=$(git rev-parse "${REMOTE_RELEASE_BRANCH}")
if echo "${LAST_COMMIT_PARENTS}" | jq -e --arg rel "${EXPECTED_RELEASE_HEAD}" 'index($rel) != null' > /dev/null; then
LINEAR_HEAD=$(echo "${LAST_COMMIT_PARENTS}" | jq -r '[.[] | select(. != $rel)][0]' --arg rel "${EXPECTED_RELEASE_HEAD}")
else
report_error "Last commit must merge the release branch (${RELEASE_BRANCH})"
fi
echo "✅ Last commit correctly merges the previous commit and the release branch"
echo "Top commit of linear history: ${LINEAR_HEAD}"
MERGE_COMMIT_TREE=$(git rev-parse "${LAST_COMMIT}^{tree}")
LINEAR_HEAD_TREE=$(git rev-parse "${LINEAR_HEAD}^{tree}")
if [[ "${MERGE_COMMIT_TREE}" != "${LINEAR_HEAD_TREE}" ]]; then
report_error "Tree of merge commit (${MERGE_COMMIT_TREE}) does not match tree of linear history head (${LINEAR_HEAD_TREE})
This indicates that the merge of ${RELEASE_BRANCH} into this branch was not performed using the merge strategy 'ours'"
fi
echo "✅ Merge commit tree matches the linear history head"
EXPECTED_PREVIOUS_COMMIT="${LINEAR_HEAD}"
# Now traverse down the history, ensuring each commit has exactly one parent
CURRENT_COMMIT="${EXPECTED_PREVIOUS_COMMIT}"
while [[ "${CURRENT_COMMIT}" != "${MERGE_BASE}" && "${CURRENT_COMMIT}" != "${EXPECTED_RELEASE_HEAD}" ]]; do
CURRENT_COMMIT_PARENTS=$(git cat-file -p "${CURRENT_COMMIT}" | jq -sR '[capture("parent (?<parent>[0-9a-f]{40})"; "g") | .parent]')
if [[ "$(echo "${CURRENT_COMMIT_PARENTS}" | jq 'length')" -ne 1 ]]; then
report_error "Commit ${CURRENT_COMMIT} must have exactly one parent"
fi
NEXT_COMMIT=$(echo "${CURRENT_COMMIT_PARENTS}" | jq -r '.[0]')
if [[ "${NEXT_COMMIT}" == "${MERGE_BASE}" ]]; then
echo "✅ Reached merge base (${MERGE_BASE})"
PR_BASE="${MERGE_BASE}"
elif [[ "${NEXT_COMMIT}" == "${EXPECTED_RELEASE_HEAD}" ]]; then
echo "✅ Reached release branch (${EXPECTED_RELEASE_HEAD})"
PR_BASE="${EXPECTED_RELEASE_HEAD}"
elif [[ -z "${NEXT_COMMIT}" ]]; then
report_error "Unexpected end of commit history before reaching merge base"
fi
# Move to the next commit in the chain
CURRENT_COMMIT="${NEXT_COMMIT}"
done
echo "✅ All commits are properly ordered and linear"
echo "✅ Release PR structure is valid"
echo
message "Commits that are part of this release:
$(git log --oneline "${PR_BASE}..${LINEAR_HEAD}")"

31
.github/scripts/previous-releases.jq vendored Normal file
View File

@@ -0,0 +1,31 @@
# Expects response from https://docs.github.com/en/rest/releases/releases?apiVersion=2022-11-28#list-releases as input,
# with tag names `release` for storage, `release-compute` for compute and `release-proxy` for proxy releases.
# Extract only the `tag_name` field from each release object
[ .[].tag_name ]
# Transform each tag name into a structured object using regex capture
| reduce map(
capture("^(?<full>release(-(?<component>proxy|compute))?-(?<version>\\d+))$")
| {
component: (.component // "storage"), # Default to "storage" if no component is specified
version: (.version | tonumber), # Convert the version number to an integer
full: .full # Store the full tag name for final output
}
)[] as $entry # Loop over the transformed list
# Accumulate the latest (highest-numbered) version for each component
({};
.[$entry.component] |= (if . == null or $entry.version > .version then $entry else . end))
# Ensure that each component exists, or fail
| (["storage", "compute", "proxy"] - (keys)) as $missing
| if ($missing | length) > 0 then
"Error: Found no release for \($missing | join(", "))!\n" | halt_error(1)
else . end
# Convert the resulting object into an array of formatted strings
| to_entries
| map("\(.key)=\(.value.full)")
# Output each string separately
| .[]

45
.github/scripts/push_with_image_map.py vendored Normal file
View File

@@ -0,0 +1,45 @@
import json
import os
import subprocess
RED = "\033[91m"
RESET = "\033[0m"
image_map = os.getenv("IMAGE_MAP")
if not image_map:
raise ValueError("IMAGE_MAP environment variable is not set")
try:
parsed_image_map: dict[str, list[str]] = json.loads(image_map)
except json.JSONDecodeError as e:
raise ValueError("Failed to parse IMAGE_MAP as JSON") from e
failures = []
pending = [(source, target) for source, targets in parsed_image_map.items() for target in targets]
while len(pending) > 0:
if len(failures) > 10:
print("Error: more than 10 failures!")
for failure in failures:
print(f'"{failure[0]}" failed with the following output:')
print(failure[1])
raise RuntimeError("Retry limit reached.")
source, target = pending.pop(0)
cmd = ["docker", "buildx", "imagetools", "create", "-t", target, source]
print(f"Running: {' '.join(cmd)}")
result = subprocess.run(cmd, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
if result.returncode != 0:
failures.append((" ".join(cmd), result.stdout, target))
pending.append((source, target))
print(
f"{RED}[RETRY]{RESET} Push failed for {target}. Retrying... (failure count: {len(failures)})"
)
print(result.stdout)
if len(failures) > 0 and (github_output := os.getenv("GITHUB_OUTPUT")):
failed_targets = [target for _, _, target in failures]
with open(github_output, "a") as f:
f.write(f"push_failures={json.dumps(failed_targets)}\n")

View File

@@ -3,19 +3,26 @@ name: Prepare benchmarking databases by restoring dumps
on:
workflow_call:
# no inputs needed
defaults:
run:
shell: bash -euxo pipefail {0}
permissions:
contents: read
jobs:
setup-databases:
permissions:
contents: write
statuses: write
id-token: write # aws-actions/configure-aws-credentials
strategy:
fail-fast: false
matrix:
platform: [ aws-rds-postgres, aws-aurora-serverless-v2-postgres, neon ]
platform: [ aws-rds-postgres, aws-aurora-serverless-v2-postgres, neon, neon_pg17 ]
database: [ clickbench, tpch, userexample ]
env:
LD_LIBRARY_PATH: /tmp/neon/pg_install/v16/lib
PLATFORM: ${{ matrix.platform }}
@@ -23,22 +30,33 @@ jobs:
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Set up Connection String
id: set-up-prep-connstr
run: |
case "${PLATFORM}" in
neon)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}
;;
neon_pg17)
CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR_PG17 }}
;;
aws-rds-postgres)
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }}
CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }}
;;
aws-aurora-serverless-v2-postgres)
CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_CONNSTR }}
CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_CONNSTR }}
;;
*)
echo >&2 "Unknown PLATFORM=${PLATFORM}"
@@ -46,9 +64,16 @@ jobs:
;;
esac
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
- uses: actions/checkout@v4
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours
- name: Download Neon artifact
uses: ./.github/actions/download
@@ -56,24 +81,25 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# we create a table that has one row for each database that we want to restore with the status whether the restore is done
# we create a table that has one row for each database that we want to restore with the status whether the restore is done
- name: Create benchmark_restore_status table if it does not exist
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-prep-connstr.outputs.connstr }}
DATABASE_NAME: ${{ matrix.database }}
# to avoid a race condition of multiple jobs trying to create the table at the same time,
# to avoid a race condition of multiple jobs trying to create the table at the same time,
# we use an advisory lock
run: |
${PG_BINARIES}/psql "${{ env.BENCHMARK_CONNSTR }}" -c "
SELECT pg_advisory_lock(4711);
SELECT pg_advisory_lock(4711);
CREATE TABLE IF NOT EXISTS benchmark_restore_status (
databasename text primary key,
restore_done boolean
);
SELECT pg_advisory_unlock(4711);
"
- name: Check if restore is already done
id: check-restore-done
env:
@@ -107,7 +133,7 @@ jobs:
DATABASE_NAME: ${{ matrix.database }}
run: |
mkdir -p /tmp/dumps
aws s3 cp s3://neon-github-dev/performance/pgdumps/$DATABASE_NAME/$DATABASE_NAME.pg_dump /tmp/dumps/
aws s3 cp s3://neon-github-dev/performance/pgdumps/$DATABASE_NAME/$DATABASE_NAME.pg_dump /tmp/dumps/
- name: Replace database name in connection string
if: steps.check-restore-done.outputs.skip != 'true'
@@ -126,17 +152,17 @@ jobs:
else
new_connstr="${base_connstr}/${DATABASE_NAME}"
fi
echo "database_connstr=${new_connstr}" >> $GITHUB_OUTPUT
echo "database_connstr=${new_connstr}" >> $GITHUB_OUTPUT
- name: Restore dump
if: steps.check-restore-done.outputs.skip != 'true'
env:
DATABASE_NAME: ${{ matrix.database }}
DATABASE_CONNSTR: ${{ steps.replace-dbname.outputs.database_connstr }}
# the following works only with larger computes:
# the following works only with larger computes:
# PGOPTIONS: "-c maintenance_work_mem=8388608 -c max_parallel_maintenance_workers=7"
# we add the || true because:
# the dumps were created with Neon and contain neon extensions that are not
# the dumps were created with Neon and contain neon extensions that are not
# available in RDS, so we will always report an error, but we can ignore it
run: |
${PG_BINARIES}/pg_restore --clean --if-exists --no-owner --jobs=4 \

View File

@@ -19,10 +19,30 @@ on:
description: 'debug or release'
required: true
type: string
pg-versions:
description: 'a json array of postgres versions to run regression tests on'
test-cfg:
description: 'a json object of postgres versions and lfc states to run regression tests on'
required: true
type: string
sanitizers:
description: 'enabled or disabled'
required: false
default: 'disabled'
type: string
test-selection:
description: 'specification of selected test(s) to run'
required: false
default: ''
type: string
test-run-count:
description: 'number of runs to perform for selected tests'
required: false
default: 1
type: number
rerun-failed:
description: 'rerun failed tests to ignore flaky tests'
required: false
default: true
type: boolean
defaults:
run:
@@ -31,17 +51,21 @@ defaults:
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
permissions:
contents: read
jobs:
build-neon:
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}
runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
contents: read
container:
image: ${{ inputs.build-tools-image }}
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# Raise locked memory limit for tokio-epoll-uring.
# On 5.10 LTS kernels < 5.10.162 (and generally mainline kernels < 5.12),
# io_uring will account the memory of the CQ and SQ as locked.
@@ -53,21 +77,12 @@ jobs:
BUILD_TAG: ${{ inputs.build-tag }}
steps:
- name: Fix git ownership
run: |
# Workaround for `fatal: detected dubious ownership in repository at ...`
#
# Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers
# Ref https://github.com/actions/checkout/issues/785
#
git config --global --add safe.directory ${{ github.workspace }}
git config --global --add safe.directory ${GITHUB_WORKSPACE}
for r in 14 15 16; do
git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"
git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"
done
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@v4
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: true
@@ -83,93 +98,120 @@ jobs:
id: pg_v16_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT
- name: Set pg 17 revision for caching
id: pg_v17_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) >> $GITHUB_OUTPUT
# Set some environment variables used by all the steps.
#
# CARGO_FLAGS is extra options to pass to "cargo build", "cargo test" etc.
# It also includes --features, if any
# CARGO_FLAGS is extra options to pass to all "cargo" subcommands.
#
# CARGO_FEATURES is passed to "cargo metadata". It is separate from CARGO_FLAGS,
# because "cargo metadata" doesn't accept --release or --debug options
# CARGO_PROFILE is passed to "cargo build", "cargo test" etc, but not to
# "cargo metadata", because it doesn't accept --release or --debug options.
#
# We run tests with addtional features, that are turned off by default (e.g. in release builds), see
# corresponding Cargo.toml files for their descriptions.
- name: Set env variables
env:
ARCH: ${{ inputs.arch }}
SANITIZERS: ${{ inputs.sanitizers }}
run: |
CARGO_FEATURES="--features testing"
CARGO_FLAGS="--locked --features testing"
if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' ]]; then
cov_prefix="scripts/coverage --profraw-prefix=$GITHUB_JOB --dir=/tmp/coverage run"
CARGO_FLAGS="--locked"
CARGO_PROFILE=""
elif [[ $BUILD_TYPE == "debug" ]]; then
cov_prefix=""
CARGO_FLAGS="--locked"
CARGO_PROFILE=""
elif [[ $BUILD_TYPE == "release" ]]; then
cov_prefix=""
CARGO_FLAGS="--locked --release"
CARGO_PROFILE="--release"
fi
if [[ $SANITIZERS == 'enabled' ]]; then
make_vars="WITH_SANITIZERS=yes"
else
make_vars=""
fi
{
echo "cov_prefix=${cov_prefix}"
echo "CARGO_FEATURES=${CARGO_FEATURES}"
echo "make_vars=${make_vars}"
echo "CARGO_FLAGS=${CARGO_FLAGS}"
echo "CARGO_PROFILE=${CARGO_PROFILE}"
echo "CARGO_HOME=${GITHUB_WORKSPACE}/.cargo"
} >> $GITHUB_ENV
- name: Cache postgres v14 build
id: cache_pg_14
uses: actions/cache@v4
uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1 # v1.8.0
with:
endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}
bucket: ${{ vars.HETZNER_CACHE_BUCKET }}
accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}
use-fallback: false
path: pg_install/v14
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v4
uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1 # v1.8.0
with:
endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}
bucket: ${{ vars.HETZNER_CACHE_BUCKET }}
accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}
use-fallback: false
path: pg_install/v15
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Cache postgres v16 build
id: cache_pg_16
uses: actions/cache@v4
uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1 # v1.8.0
with:
endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}
bucket: ${{ vars.HETZNER_CACHE_BUCKET }}
accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}
use-fallback: false
path: pg_install/v16
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'
run: mold -run make postgres-v14 -j$(nproc)
- name: Cache postgres v17 build
id: cache_pg_17
uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1 # v1.8.0
with:
endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}
bucket: ${{ vars.HETZNER_CACHE_BUCKET }}
accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}
use-fallback: false
path: pg_install/v17
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v17_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Build postgres v15
if: steps.cache_pg_15.outputs.cache-hit != 'true'
run: mold -run make postgres-v15 -j$(nproc)
- name: Build postgres v16
if: steps.cache_pg_16.outputs.cache-hit != 'true'
run: mold -run make postgres-v16 -j$(nproc)
- name: Build neon extensions
run: mold -run make neon-pg-ext -j$(nproc)
- name: Build all
# Note: the Makefile picks up BUILD_TYPE and CARGO_PROFILE from the env variables
run: mold -run make ${make_vars} all -j$(nproc) CARGO_BUILD_FLAGS="$CARGO_FLAGS"
- name: Build walproposer-lib
run: mold -run make walproposer-lib -j$(nproc)
run: mold -run make ${make_vars} walproposer-lib -j$(nproc)
- name: Run cargo build
- name: Build unit tests
if: inputs.sanitizers != 'enabled'
run: |
PQ_LIB_DIR=$(pwd)/pg_install/v16/lib
export PQ_LIB_DIR
${cov_prefix} mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins --tests
export ASAN_OPTIONS=detect_leaks=0
${cov_prefix} mold -run cargo build $CARGO_FLAGS $CARGO_PROFILE --tests
# Do install *before* running rust tests because they might recompile the
# binaries with different features/flags.
- name: Install rust binaries
env:
ARCH: ${{ inputs.arch }}
SANITIZERS: ${{ inputs.sanitizers }}
run: |
# Install target binaries
mkdir -p /tmp/neon/bin/
binaries=$(
${cov_prefix} cargo metadata $CARGO_FEATURES --format-version=1 --no-deps |
${cov_prefix} cargo metadata $CARGO_FLAGS --format-version=1 --no-deps |
jq -r '.packages[].targets[] | select(.kind | index("bin")) | .name'
)
for bin in $binaries; do
@@ -179,14 +221,14 @@ jobs:
done
# Install test executables and write list of all binaries (for code coverage)
if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' ]]; then
if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' && $SANITIZERS != 'enabled' ]]; then
# Keep bloated coverage data files away from the rest of the artifact
mkdir -p /tmp/coverage/
mkdir -p /tmp/neon/test_bin/
test_exe_paths=$(
${cov_prefix} cargo test $CARGO_FLAGS $CARGO_FEATURES --message-format=json --no-run |
${cov_prefix} cargo test $CARGO_FLAGS $CARGO_PROFILE --message-format=json --no-run |
jq -r '.executable | select(. != null)'
)
for bin in $test_exe_paths; do
@@ -204,33 +246,41 @@ jobs:
done
fi
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours
- name: Run rust tests
if: ${{ inputs.sanitizers != 'enabled' }}
env:
NEXTEST_RETRIES: 3
run: |
PQ_LIB_DIR=$(pwd)/pg_install/v16/lib
export PQ_LIB_DIR
LD_LIBRARY_PATH=$(pwd)/pg_install/v16/lib
LD_LIBRARY_PATH=$(pwd)/pg_install/v17/lib
export LD_LIBRARY_PATH
#nextest does not yet support running doctests
${cov_prefix} cargo test --doc $CARGO_FLAGS $CARGO_FEATURES
${cov_prefix} cargo test --doc $CARGO_FLAGS $CARGO_PROFILE
# run all non-pageserver tests
${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E '!package(pageserver)'
${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_PROFILE -E '!package(pageserver)'
# run pageserver tests with different settings
for io_engine in std-fs tokio-epoll-uring ; do
for io_buffer_alignment in 0 1 512 ; do
NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IOENGINE=$io_engine NEON_PAGESERVER_UNIT_TEST_IO_BUFFER_ALIGNMENT=$io_buffer_alignment ${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(pageserver)'
done
done
# run pageserver tests
# (When developing new pageserver features gated by config fields, we commonly make the rust
# unit tests sensitive to an environment variable NEON_PAGESERVER_UNIT_TEST_FEATURENAME.
# Then run the nextest invocation below for all relevant combinations. Singling out the
# pageserver tests from non-pageserver tests cuts down the time it takes for this CI step.)
NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IOENGINE=tokio-epoll-uring \
${cov_prefix} \
cargo nextest run $CARGO_FLAGS $CARGO_PROFILE -E 'package(pageserver)'
# Run separate tests for real S3
export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty
export REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests
export REMOTE_STORAGE_S3_REGION=eu-central-1
${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(remote_storage)' -E 'test(test_real_s3)'
${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_PROFILE -E 'package(remote_storage)' -E 'test(test_real_s3)'
# Run separate tests for real Azure Blob Storage
# XXX: replace region with `eu-central-1`-like region
@@ -239,16 +289,46 @@ jobs:
export AZURE_STORAGE_ACCESS_KEY="${{ secrets.AZURE_STORAGE_ACCESS_KEY_DEV }}"
export REMOTE_STORAGE_AZURE_CONTAINER="${{ vars.REMOTE_STORAGE_AZURE_CONTAINER }}"
export REMOTE_STORAGE_AZURE_REGION="${{ vars.REMOTE_STORAGE_AZURE_REGION }}"
${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(remote_storage)' -E 'test(test_real_azure)'
${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_PROFILE -E 'package(remote_storage)' -E 'test(test_real_azure)'
- name: Install postgres binaries
run: cp -a pg_install /tmp/neon/pg_install
run: |
# Use tar to copy files matching the pattern, preserving the paths in the destionation
tar c \
pg_install/v* \
build/*/src/test/regress/*.so \
build/*/src/test/regress/pg_regress \
build/*/src/test/isolation/isolationtester \
build/*/src/test/isolation/pg_isolation_regress \
| tar x -C /tmp/neon
- name: Upload Neon artifact
uses: ./.github/actions/upload
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-artifact
name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}${{ inputs.sanitizers == 'enabled' && '-sanitized' || '' }}-artifact
path: /tmp/neon
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Check diesel schema
if: inputs.build-type == 'release' && inputs.arch == 'x64'
env:
DATABASE_URL: postgresql://localhost:1235/storage_controller
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
run: |
export ASAN_OPTIONS=detect_leaks=0
/tmp/neon/bin/neon_local init
/tmp/neon/bin/neon_local storage_controller start
diesel print-schema > storage_controller/src/schema.rs
if [ -n "$(git diff storage_controller/src/schema.rs)" ]; then
echo >&2 "Uncommitted changes in diesel schema"
git diff .
exit 1
fi
/tmp/neon/bin/neon_local storage_controller stop
# XXX: keep this after the binaries.list is formed, so the coverage can properly work later
- name: Merge and upload coverage data
@@ -258,27 +338,36 @@ jobs:
regress-tests:
# Don't run regression tests on debug arm64 builds
if: inputs.build-type != 'debug' || inputs.arch != 'arm64'
permissions:
id-token: write # aws-actions/configure-aws-credentials
contents: read
statuses: write
needs: [ build-neon ]
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}
runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large-metal')) }}
container:
image: ${{ inputs.build-tools-image }}
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# for changed limits, see comments on `options:` earlier in this file
options: --init --shm-size=512mb --ulimit memlock=67108864:67108864
strategy:
fail-fast: false
matrix:
pg_version: ${{ fromJson(inputs.pg-versions) }}
matrix: ${{ fromJSON(format('{{"include":{0}}}', inputs.test-cfg)) }}
steps:
- uses: actions/checkout@v4
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: true
- name: Pytest regression tests
continue-on-error: ${{ matrix.lfc_state == 'with-lfc' && inputs.build-type == 'debug' }}
uses: ./.github/actions/run-python-test-set
timeout-minutes: 60
timeout-minutes: ${{ (inputs.build-type == 'release' && inputs.sanitizers != 'enabled') && 75 || 180 }}
with:
build_type: ${{ inputs.build-type }}
test_selection: regress
@@ -286,13 +375,21 @@ jobs:
run_with_real_s3: true
real_s3_bucket: neon-github-ci-tests
real_s3_region: eu-central-1
rerun_flaky: true
rerun_failed: ${{ inputs.rerun-failed }}
pg_version: ${{ matrix.pg_version }}
sanitizers: ${{ inputs.sanitizers }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
# `--session-timeout` is equal to (timeout-minutes - 10 minutes) * 60 seconds.
# Attempt to stop tests gracefully to generate test reports
# until they are forcibly stopped by the stricter `timeout-minutes` limit.
extra_params: --session-timeout=${{ (inputs.build-type == 'release' && inputs.sanitizers != 'enabled') && 3000 || 10200 }} --count=${{ inputs.test-run-count }}
${{ inputs.test-selection != '' && format('-k "{0}"', inputs.test-selection) || '' }}
env:
TEST_RESULT_CONNSTR: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
CHECK_ONDISK_DATA_COMPATIBILITY: nonempty
BUILD_TAG: ${{ inputs.build-tag }}
PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring
USE_LFC: ${{ matrix.lfc_state == 'with-lfc' && 'true' || 'false' }}
# Temporary disable this step until we figure out why it's so flaky
# Ref https://github.com/neondatabase/neon/issues/4540

View File

@@ -0,0 +1,55 @@
name: Check Codestyle Python
on:
workflow_call:
inputs:
build-tools-image:
description: 'build-tools image'
required: true
type: string
defaults:
run:
shell: bash -euxo pipefail {0}
permissions:
contents: read
jobs:
check-codestyle-python:
runs-on: [ self-hosted, small ]
permissions:
packages: read
container:
image: ${{ inputs.build-tools-image }}
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Cache poetry deps
uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1 # v1.8.0
with:
endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}
bucket: ${{ vars.HETZNER_CACHE_BUCKET }}
accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}
use-fallback: false
path: ~/.cache/pypoetry/virtualenvs
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- run: ./scripts/pysync
- run: poetry run ruff check .
- run: poetry run ruff format --check .
- run: poetry run mypy .

View File

@@ -0,0 +1,102 @@
name: Check Codestyle Rust
on:
workflow_call:
inputs:
build-tools-image:
description: "build-tools image"
required: true
type: string
archs:
description: "Json array of architectures to run on"
type: string
defaults:
run:
shell: bash -euxo pipefail {0}
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
jobs:
check-codestyle-rust:
strategy:
matrix:
arch: ${{ fromJSON(inputs.archs) }}
runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'small-arm64' || 'small')) }}
permissions:
packages: read
container:
image: ${{ inputs.build-tools-image }}
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: true
- name: Cache cargo deps
uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1 # v1.8.0
with:
endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}
bucket: ${{ vars.HETZNER_CACHE_BUCKET }}
accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}
secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}
use-fallback: false
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
# Some of our rust modules use FFI and need those to be checked
- name: Get postgres headers
run: make postgres-headers -j$(nproc)
# cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.
# This will catch compiler & clippy warnings in all feature combinations.
# TODO: use cargo hack for build and test as well, but, that's quite expensive.
# NB: keep clippy args in sync with ./run_clippy.sh
#
# The only difference between "clippy --debug" and "clippy --release" is that in --release mode,
# #[cfg(debug_assertions)] blocks are not built. It's not worth building everything for second
# time just for that, so skip "clippy --release".
- run: |
CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"
if [ "$CLIPPY_COMMON_ARGS" = "" ]; then
echo "No clippy args found in .neon_clippy_args"
exit 1
fi
echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV
- name: Run cargo clippy (debug)
run: cargo hack --features default --ignore-unknown-features --feature-powerset clippy $CLIPPY_COMMON_ARGS
- name: Check documentation generation
run: cargo doc --workspace --no-deps --document-private-items
env:
RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"
# Use `${{ !cancelled() }}` to run quck tests after the longer clippy run
- name: Check formatting
if: ${{ !cancelled() }}
run: cargo fmt --all -- --check
# https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci
- name: Check rust dependencies
if: ${{ !cancelled() }}
run: |
cargo hakari generate --diff # workspace-hack Cargo.toml is up-to-date
cargo hakari manage-deps --dry-run # all workspace crates depend on workspace-hack

169
.github/workflows/_meta.yml vendored Normal file
View File

@@ -0,0 +1,169 @@
name: Generate run metadata
on:
workflow_call:
inputs:
github-event-name:
type: string
required: true
github-event-json:
type: string
required: true
outputs:
build-tag:
description: "Tag for the current workflow run"
value: ${{ jobs.tags.outputs.build-tag }}
release-tag:
description: "Tag for the release if this is an RC PR run"
value: ${{ jobs.tags.outputs.release-tag }}
previous-storage-release:
description: "Tag of the last storage release"
value: ${{ jobs.tags.outputs.storage }}
previous-proxy-release:
description: "Tag of the last proxy release"
value: ${{ jobs.tags.outputs.proxy }}
previous-compute-release:
description: "Tag of the last compute release"
value: ${{ jobs.tags.outputs.compute }}
run-kind:
description: "The kind of run we're currently in. Will be one of `push-main`, `storage-release`, `compute-release`, `proxy-release`, `storage-rc-pr`, `compute-rc-pr`, `proxy-rc-pr`, `pr`, or `workflow-dispatch`"
value: ${{ jobs.tags.outputs.run-kind }}
release-pr-run-id:
description: "Only available if `run-kind in [storage-release, proxy-release, compute-release]`. Contains the run ID of the `Build and Test` workflow, assuming one with the current commit can be found."
value: ${{ jobs.tags.outputs.release-pr-run-id }}
sha:
description: "github.event.pull_request.head.sha on release PRs, github.sha otherwise"
value: ${{ jobs.tags.outputs.sha }}
permissions: {}
defaults:
run:
shell: bash -euo pipefail {0}
jobs:
tags:
runs-on: ubuntu-22.04
outputs:
build-tag: ${{ steps.build-tag.outputs.build-tag }}
release-tag: ${{ steps.build-tag.outputs.release-tag }}
compute: ${{ steps.previous-releases.outputs.compute }}
proxy: ${{ steps.previous-releases.outputs.proxy }}
storage: ${{ steps.previous-releases.outputs.storage }}
run-kind: ${{ steps.run-kind.outputs.run-kind }}
release-pr-run-id: ${{ steps.release-pr-run-id.outputs.release-pr-run-id }}
sha: ${{ steps.sha.outputs.sha }}
permissions:
contents: read
steps:
# Need `fetch-depth: 0` to count the number of commits in the branch
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Get run kind
id: run-kind
env:
RUN_KIND: >-
${{
false
|| (inputs.github-event-name == 'push' && github.ref_name == 'main') && 'push-main'
|| (inputs.github-event-name == 'push' && github.ref_name == 'release') && 'storage-release'
|| (inputs.github-event-name == 'push' && github.ref_name == 'release-compute') && 'compute-release'
|| (inputs.github-event-name == 'push' && github.ref_name == 'release-proxy') && 'proxy-release'
|| (inputs.github-event-name == 'pull_request' && github.base_ref == 'release') && 'storage-rc-pr'
|| (inputs.github-event-name == 'pull_request' && github.base_ref == 'release-compute') && 'compute-rc-pr'
|| (inputs.github-event-name == 'pull_request' && github.base_ref == 'release-proxy') && 'proxy-rc-pr'
|| (inputs.github-event-name == 'pull_request') && 'pr'
|| (inputs.github-event-name == 'workflow_dispatch') && 'workflow-dispatch'
|| 'unknown'
}}
run: |
echo "run-kind=$RUN_KIND" | tee -a $GITHUB_OUTPUT
- name: Get the right SHA
id: sha
env:
SHA: >
${{
contains(fromJSON('["storage-rc-pr", "proxy-rc-pr", "compute-rc-pr"]'), steps.run-kind.outputs.run-kind)
&& fromJSON(inputs.github-event-json).pull_request.head.sha
|| github.sha
}}
run: |
echo "sha=$SHA" | tee -a $GITHUB_OUTPUT
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
ref: ${{ steps.sha.outputs.sha }}
- name: Get build tag
id: build-tag
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CURRENT_BRANCH: ${{ github.head_ref || github.ref_name }}
CURRENT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
RUN_KIND: ${{ steps.run-kind.outputs.run-kind }}
run: |
case $RUN_KIND in
push-main)
echo "build-tag=$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
;;
storage-release)
echo "build-tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
;;
proxy-release)
echo "build-tag=release-proxy-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
;;
compute-release)
echo "build-tag=release-compute-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
;;
pr|storage-rc-pr|compute-rc-pr|proxy-rc-pr)
BUILD_AND_TEST_RUN_ID=$(gh api --paginate \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"/repos/${GITHUB_REPOSITORY}/actions/runs?head_sha=${CURRENT_SHA}&branch=${CURRENT_BRANCH}" \
| jq '[.workflow_runs[] | select(.name == "Build and Test")][0].id // ("Error: No matching workflow run found." | halt_error(1))')
echo "build-tag=$BUILD_AND_TEST_RUN_ID" | tee -a $GITHUB_OUTPUT
case $RUN_KIND in
storage-rc-pr)
echo "release-tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
;;
proxy-rc-pr)
echo "release-tag=release-proxy-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
;;
compute-rc-pr)
echo "release-tag=release-compute-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
;;
esac
;;
workflow-dispatch)
echo "build-tag=$GITHUB_RUN_ID" | tee -a $GITHUB_OUTPUT
;;
*)
echo "Unexpected RUN_KIND ('${RUN_KIND}'), failing to assign build-tag!"
exit 1
esac
- name: Get the previous release-tags
id: previous-releases
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
gh api --paginate \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"/repos/${GITHUB_REPOSITORY}/releases" \
| jq -f .github/scripts/previous-releases.jq -r \
| tee -a "${GITHUB_OUTPUT}"
- name: Get the release PR run ID
id: release-pr-run-id
if: ${{ contains(fromJSON('["storage-release", "compute-release", "proxy-release"]'), steps.run-kind.outputs.run-kind) }}
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
CURRENT_SHA: ${{ github.sha }}
run: |
RELEASE_PR_RUN_ID=$(gh api "/repos/${GITHUB_REPOSITORY}/actions/runs?head_sha=$CURRENT_SHA" | jq '[.workflow_runs[] | select(.name == "Build and Test") | select(.head_branch | test("^rc/release.*$"; "s"))] | first | .id // ("Failed to find Build and Test run from RC PR!" | halt_error(1))')
echo "release-pr-run-id=$RELEASE_PR_RUN_ID" | tee -a $GITHUB_OUTPUT

View File

@@ -1,56 +0,0 @@
name: Push images to ACR
on:
workflow_call:
inputs:
client_id:
description: Client ID of Azure managed identity or Entra app
required: true
type: string
image_tag:
description: Tag for the container image
required: true
type: string
images:
description: Images to push
required: true
type: string
registry_name:
description: Name of the container registry
required: true
type: string
subscription_id:
description: Azure subscription ID
required: true
type: string
tenant_id:
description: Azure tenant ID
required: true
type: string
jobs:
push-to-acr:
runs-on: ubuntu-22.04
permissions:
contents: read # This is required for actions/checkout
id-token: write # This is required for Azure Login to work.
steps:
- name: Azure login
uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a # @v2.1.1
with:
client-id: ${{ inputs.client_id }}
subscription-id: ${{ inputs.subscription_id }}
tenant-id: ${{ inputs.tenant_id }}
- name: Login to ACR
run: |
az acr login --name=${{ inputs.registry_name }}
- name: Copy docker images to ACR ${{ inputs.registry_name }}
run: |
images='${{ inputs.images }}'
for image in ${images}; do
docker buildx imagetools create \
-t ${{ inputs.registry_name }}.azurecr.io/neondatabase/${image}:${{ inputs.image_tag }} \
neondatabase/${image}:${{ inputs.image_tag }}
done

View File

@@ -0,0 +1,128 @@
name: Push images to Container Registry
on:
workflow_call:
inputs:
# Example: {"docker.io/neondatabase/neon:13196061314":["${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/neon:13196061314","neoneastus2.azurecr.io/neondatabase/neon:13196061314"]}
image-map:
description: JSON map of images, mapping from a source image to an array of target images that should be pushed.
required: true
type: string
aws-region:
description: AWS region to log in to. Required when pushing to ECR.
required: false
type: string
aws-account-id:
description: AWS account ID to log in to for pushing to ECR. Required when pushing to ECR.
required: false
type: string
aws-role-to-assume:
description: AWS role to assume to for pushing to ECR. Required when pushing to ECR.
required: false
type: string
azure-client-id:
description: Client ID of Azure managed identity or Entra app. Required when pushing to ACR.
required: false
type: string
azure-subscription-id:
description: Azure subscription ID. Required when pushing to ACR.
required: false
type: string
azure-tenant-id:
description: Azure tenant ID. Required when pushing to ACR.
required: false
type: string
acr-registry-name:
description: ACR registry name. Required when pushing to ACR.
required: false
type: string
permissions: {}
defaults:
run:
shell: bash -euo pipefail {0}
jobs:
push-to-container-registry:
runs-on: ubuntu-22.04
permissions:
id-token: write # Required for aws/azure login
packages: write # required for pushing to GHCR
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
sparse-checkout: .github/scripts/push_with_image_map.py
sparse-checkout-cone-mode: false
- name: Print image-map
run: echo '${{ inputs.image-map }}' | jq
- name: Configure AWS credentials
if: contains(inputs.image-map, 'amazonaws.com/')
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: "${{ inputs.aws-region }}"
role-to-assume: "arn:aws:iam::${{ inputs.aws-account-id }}:role/${{ inputs.aws-role-to-assume }}"
role-duration-seconds: 3600
- name: Login to ECR
if: contains(inputs.image-map, 'amazonaws.com/')
uses: aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 # v2.0.1
with:
registries: "${{ inputs.aws-account-id }}"
- name: Configure Azure credentials
if: contains(inputs.image-map, 'azurecr.io/')
uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a # @v2.1.1
with:
client-id: ${{ inputs.azure-client-id }}
subscription-id: ${{ inputs.azure-subscription-id }}
tenant-id: ${{ inputs.azure-tenant-id }}
- name: Login to ACR
if: contains(inputs.image-map, 'azurecr.io/')
run: |
az acr login --name=${{ inputs.acr-registry-name }}
- name: Login to GHCR
if: contains(inputs.image-map, 'ghcr.io/')
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Log in to Docker Hub
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- name: Copy docker images to target registries
id: push
run: python3 .github/scripts/push_with_image_map.py
env:
IMAGE_MAP: ${{ inputs.image-map }}
- name: Notify Slack if container image pushing fails
if: steps.push.outputs.push_failures || failure()
uses: slackapi/slack-github-action@485a9d42d3a73031f12ec201c457e2162c45d02d # v2.0.0
with:
method: chat.postMessage
token: ${{ secrets.SLACK_BOT_TOKEN }}
payload: |
channel: ${{ vars.SLACK_ON_CALL_DEVPROD_STREAM }}
text: >
*Container image pushing ${{
steps.push.outcome == 'failure' && 'failed completely' || 'succeeded with some retries'
}}* in
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
${{ steps.push.outputs.push_failures && format(
'*Failed targets:*\n• {0}', join(fromJson(steps.push.outputs.push_failures), '\n• ')
) || '' }}

View File

@@ -26,14 +26,19 @@ jobs:
needs: [ check-permissions ]
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
- uses: reviewdog/action-actionlint@v1
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: reviewdog/action-actionlint@a5524e1c19e62881d79c1f1b9b6f09f16356e281 # v1.65.2
env:
# SC2046 - Quote this to prevent word splitting. - https://www.shellcheck.net/wiki/SC2046
# SC2086 - Double quote to prevent globbing and word splitting. - https://www.shellcheck.net/wiki/SC2086
SHELLCHECK_OPTS: --exclude=SC2046,SC2086
with:
fail_on_error: true
fail_level: error
filter_mode: nofilter
level: error

View File

@@ -47,6 +47,11 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"
create-or-update-pr-for-ci-run:
@@ -63,13 +68,18 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"
- uses: actions/checkout@v4
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: main
ref: ${{ github.event.pull_request.head.sha }}
token: ${{ secrets.CI_ACCESS_TOKEN }}
- name: Look for existing PR
id: get-pr
env:
@@ -77,7 +87,7 @@ jobs:
run: |
ALREADY_CREATED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --base main --json number --jq '.[].number')"
echo "ALREADY_CREATED=${ALREADY_CREATED}" >> ${GITHUB_OUTPUT}
- name: Get changed labels
id: get-labels
if: steps.get-pr.outputs.ALREADY_CREATED != ''
@@ -94,8 +104,6 @@ jobs:
echo "LABELS_TO_ADD=${LABELS_TO_ADD}" >> ${GITHUB_OUTPUT}
echo "LABELS_TO_REMOVE=${LABELS_TO_REMOVE}" >> ${GITHUB_OUTPUT}
- run: gh pr checkout "${PR_NUMBER}"
- run: git checkout -b "${BRANCH}"
- run: git push --force origin "${BRANCH}"
@@ -103,7 +111,7 @@ jobs:
- name: Create a Pull Request for CI run (if required)
if: steps.get-pr.outputs.ALREADY_CREATED == ''
env:
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: |
cat << EOF > body.md
@@ -140,7 +148,7 @@ jobs:
- run: git push --force origin "${BRANCH}"
if: steps.get-pr.outputs.ALREADY_CREATED != ''
cleanup:
# Close PRs and delete branchs if the original PR is closed.
@@ -155,6 +163,11 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Close PR and delete `ci-run/pr-${{ env.PR_NUMBER }}` branch
run: |
CLOSED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --json 'closed' --jq '.[].closed')"

File diff suppressed because it is too large Load Diff

View File

@@ -3,32 +3,100 @@ name: Build build-tools image
on:
workflow_call:
inputs:
image-tag:
description: "build-tools image tag"
required: true
archs:
description: "Json array of architectures to build"
# Default values are set in `check-image` job, `set-variables` step
type: string
required: false
debians:
description: "Json array of Debian versions to build"
# Default values are set in `check-image` job, `set-variables` step
type: string
required: false
outputs:
image-tag:
description: "build-tools tag"
value: ${{ inputs.image-tag }}
value: ${{ jobs.check-image.outputs.tag }}
image:
description: "build-tools image"
value: neondatabase/build-tools:${{ inputs.image-tag }}
value: ghcr.io/neondatabase/build-tools:${{ jobs.check-image.outputs.tag }}
defaults:
run:
shell: bash -euo pipefail {0}
concurrency:
group: build-build-tools-image-${{ inputs.image-tag }}
cancel-in-progress: false
# The initial idea was to prevent the waste of resources by not re-building the `build-tools` image
# for the same tag in parallel workflow runs, and queue them to be skipped once we have
# the first image pushed to Docker registry, but GitHub's concurrency mechanism is not working as expected.
# GitHub can't have more than 1 job in a queue and removes the previous one, it causes failures if the dependent jobs.
#
# Ref https://github.com/orgs/community/discussions/41518
#
# concurrency:
# group: build-build-tools-image-${{ inputs.image-tag }}
# cancel-in-progress: false
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
jobs:
check-image:
uses: ./.github/workflows/check-build-tools-image.yml
runs-on: ubuntu-22.04
outputs:
archs: ${{ steps.set-variables.outputs.archs }}
debians: ${{ steps.set-variables.outputs.debians }}
tag: ${{ steps.set-variables.outputs.image-tag }}
everything: ${{ steps.set-more-variables.outputs.everything }}
found: ${{ steps.set-more-variables.outputs.found }}
permissions:
packages: read
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Set variables
id: set-variables
env:
ARCHS: ${{ inputs.archs || '["x64","arm64"]' }}
DEBIANS: ${{ inputs.debians || '["bullseye","bookworm"]' }}
IMAGE_TAG: |
${{ hashFiles('build-tools.Dockerfile',
'.github/workflows/build-build-tools-image.yml') }}
run: |
echo "archs=${ARCHS}" | tee -a ${GITHUB_OUTPUT}
echo "debians=${DEBIANS}" | tee -a ${GITHUB_OUTPUT}
echo "image-tag=${IMAGE_TAG}" | tee -a ${GITHUB_OUTPUT}
- name: Set more variables
id: set-more-variables
env:
IMAGE_TAG: ${{ steps.set-variables.outputs.image-tag }}
EVERYTHING: |
${{ contains(fromJSON(steps.set-variables.outputs.archs), 'x64') &&
contains(fromJSON(steps.set-variables.outputs.archs), 'arm64') &&
contains(fromJSON(steps.set-variables.outputs.debians), 'bullseye') &&
contains(fromJSON(steps.set-variables.outputs.debians), 'bookworm') }}
run: |
if docker manifest inspect ghcr.io/neondatabase/build-tools:${IMAGE_TAG}; then
found=true
else
found=false
fi
echo "everything=${EVERYTHING}" | tee -a ${GITHUB_OUTPUT}
echo "found=${found}" | tee -a ${GITHUB_OUTPUT}
build-image:
needs: [ check-image ]
@@ -36,68 +104,100 @@ jobs:
strategy:
matrix:
arch: [ x64, arm64 ]
arch: ${{ fromJSON(needs.check-image.outputs.archs) }}
debian: ${{ fromJSON(needs.check-image.outputs.debians) }}
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}
permissions:
packages: write
env:
IMAGE_TAG: ${{ inputs.image-tag }}
runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}
steps:
- name: Check `input.tag` is correct
env:
INPUTS_IMAGE_TAG: ${{ inputs.image-tag }}
CHECK_IMAGE_TAG : ${{ needs.check-image.outputs.image-tag }}
run: |
if [ "${INPUTS_IMAGE_TAG}" != "${CHECK_IMAGE_TAG}" ]; then
echo "'inputs.image-tag' (${INPUTS_IMAGE_TAG}) does not match the tag of the latest build-tools image 'inputs.image-tag' (${CHECK_IMAGE_TAG})"
exit 1
fi
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@v4
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: ./.github/actions/set-docker-config-dir
- uses: docker/setup-buildx-action@v3
- uses: neondatabase/dev-actions/set-docker-config-dir@6094485bf440001c94a94a3f9e221e81ff6b6193
- uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
with:
cache-binary: false
- uses: docker/login-action@v3
- uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- uses: docker/login-action@v3
- uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: cache.neon.build
username: ${{ secrets.NEON_CI_DOCKERCACHE_USERNAME }}
password: ${{ secrets.NEON_CI_DOCKERCACHE_PASSWORD }}
- uses: docker/build-push-action@v6
- uses: docker/build-push-action@471d1dc4e07e5cdedd4c2171150001c434f0b7a4 # v6.15.0
with:
file: build-tools.Dockerfile
context: .
provenance: false
push: true
pull: true
file: Dockerfile.build-tools
cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0},mode=max', matrix.arch) || '' }}
tags: neondatabase/build-tools:${{ inputs.image-tag }}-${{ matrix.arch }}
build-args: |
DEBIAN_VERSION=${{ matrix.debian }}
cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.debian }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0}-{1},mode=max', matrix.debian, matrix.arch) || '' }}
tags: |
ghcr.io/neondatabase/build-tools:${{ needs.check-image.outputs.tag }}-${{ matrix.debian }}-${{ matrix.arch }}
merge-images:
needs: [ build-image ]
needs: [ check-image, build-image ]
runs-on: ubuntu-22.04
env:
IMAGE_TAG: ${{ inputs.image-tag }}
permissions:
packages: write
steps:
- uses: docker/login-action@v3
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Create multi-arch image
env:
DEFAULT_DEBIAN_VERSION: bookworm
ARCHS: ${{ join(fromJSON(needs.check-image.outputs.archs), ' ') }}
DEBIANS: ${{ join(fromJSON(needs.check-image.outputs.debians), ' ') }}
EVERYTHING: ${{ needs.check-image.outputs.everything }}
IMAGE_TAG: ${{ needs.check-image.outputs.tag }}
run: |
docker buildx imagetools create -t neondatabase/build-tools:${IMAGE_TAG} \
neondatabase/build-tools:${IMAGE_TAG}-x64 \
neondatabase/build-tools:${IMAGE_TAG}-arm64
for debian in ${DEBIANS}; do
tags=("-t" "ghcr.io/neondatabase/build-tools:${IMAGE_TAG}-${debian}")
if [ "${EVERYTHING}" == "true" ] && [ "${debian}" == "${DEFAULT_DEBIAN_VERSION}" ]; then
tags+=("-t" "ghcr.io/neondatabase/build-tools:${IMAGE_TAG}")
fi
for arch in ${ARCHS}; do
tags+=("ghcr.io/neondatabase/build-tools:${IMAGE_TAG}-${debian}-${arch}")
done
docker buildx imagetools create "${tags[@]}"
done

265
.github/workflows/build-macos.yml vendored Normal file
View File

@@ -0,0 +1,265 @@
name: Check neon with MacOS builds
on:
workflow_call:
inputs:
pg_versions:
description: "Array of the pg versions to build for, for example: ['v14', 'v17']"
type: string
default: '[]'
required: false
rebuild_rust_code:
description: "Rebuild Rust code"
type: boolean
default: false
required: false
rebuild_everything:
description: "If true, rebuild for all versions"
type: boolean
default: false
required: false
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
# TODO: move `check-*` and `files-changed` jobs to the "Caller" Workflow
# We should care about that as Github has limitations:
# - You can connect up to four levels of workflows
# - You can call a maximum of 20 unique reusable workflows from a single workflow file.
# https://docs.github.com/en/actions/sharing-automations/reusing-workflows#limitations
permissions:
contents: read
jobs:
build-pgxn:
if: |
inputs.pg_versions != '[]' || inputs.rebuild_everything ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
timeout-minutes: 30
runs-on: macos-15
strategy:
matrix:
postgres-version: ${{ inputs.rebuild_everything && fromJSON('["v14", "v15", "v16", "v17"]') || fromJSON(inputs.pg_versions) }}
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout main repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set pg ${{ matrix.postgres-version }} for caching
id: pg_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-${{ matrix.postgres-version }}) | tee -a "${GITHUB_OUTPUT}"
- name: Cache postgres ${{ matrix.postgres-version }} build
id: cache_pg
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
with:
path: pg_install/${{ matrix.postgres-version }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ matrix.postgres-version }}-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Checkout submodule vendor/postgres-${{ matrix.postgres-version }}
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
git submodule init vendor/postgres-${{ matrix.postgres-version }}
git submodule update --depth 1 --recursive
- name: Install build dependencies
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
brew install flex bison openssl protobuf icu4c
- name: Set extra env for macOS
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
- name: Build Postgres ${{ matrix.postgres-version }}
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
make postgres-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)
- name: Build Neon Pg Ext ${{ matrix.postgres-version }}
if: steps.cache_pg.outputs.cache-hit != 'true'
run: |
make "neon-pg-ext-${{ matrix.postgres-version }}" -j$(sysctl -n hw.ncpu)
- name: Upload "pg_install/${{ matrix.postgres-version }}" artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: pg_install--${{ matrix.postgres-version }}
path: pg_install/${{ matrix.postgres-version }}
# The artifact is supposed to be used by the next job in the same workflow,
# so theres no need to store it for too long.
retention-days: 1
build-walproposer-lib:
if: |
contains(inputs.pg_versions, 'v17') || inputs.rebuild_everything ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
timeout-minutes: 30
runs-on: macos-15
needs: [build-pgxn]
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout main repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set pg v17 for caching
id: pg_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"
- name: Download "pg_install/v17" artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: pg_install--v17
path: pg_install/v17
# `actions/download-artifact` doesn't preserve permissions:
# https://github.com/actions/download-artifact?tab=readme-ov-file#permission-loss
- name: Make pg_install/v*/bin/* executable
run: |
chmod +x pg_install/v*/bin/*
- name: Cache walproposer-lib
id: cache_walproposer_lib
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
with:
path: build/walproposer-lib
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Checkout submodule vendor/postgres-v17
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run: |
git submodule init vendor/postgres-v17
git submodule update --depth 1 --recursive
- name: Install build dependencies
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run: |
brew install flex bison openssl protobuf icu4c
- name: Set extra env for macOS
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
- name: Build walproposer-lib (only for v17)
if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'
run:
make walproposer-lib -j$(sysctl -n hw.ncpu) PG_INSTALL_CACHED=1
- name: Upload "build/walproposer-lib" artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: build--walproposer-lib
path: build/walproposer-lib
# The artifact is supposed to be used by the next job in the same workflow,
# so theres no need to store it for too long.
retention-days: 1
cargo-build:
if: |
inputs.pg_versions != '[]' || inputs.rebuild_rust_code || inputs.rebuild_everything ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
timeout-minutes: 30
runs-on: macos-15
needs: [build-pgxn, build-walproposer-lib]
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout main repo
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: true
- name: Download "pg_install/v14" artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: pg_install--v14
path: pg_install/v14
- name: Download "pg_install/v15" artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: pg_install--v15
path: pg_install/v15
- name: Download "pg_install/v16" artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: pg_install--v16
path: pg_install/v16
- name: Download "pg_install/v17" artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: pg_install--v17
path: pg_install/v17
- name: Download "build/walproposer-lib" artifact
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: build--walproposer-lib
path: build/walproposer-lib
# `actions/download-artifact` doesn't preserve permissions:
# https://github.com/actions/download-artifact?tab=readme-ov-file#permission-loss
- name: Make pg_install/v*/bin/* executable
run: |
chmod +x pg_install/v*/bin/*
- name: Cache cargo deps
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
with:
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
- name: Install build dependencies
run: |
brew install flex bison openssl protobuf icu4c
- name: Set extra env for macOS
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
- name: Run cargo build
run: cargo build --all --release -j$(sysctl -n hw.ncpu)
- name: Check that no warnings are produced
run: ./run_clippy.sh

View File

@@ -0,0 +1,121 @@
name: Build and Run Selected Test
on:
workflow_dispatch:
inputs:
test-selection:
description: 'Specification of selected test(s), as accepted by pytest -k'
required: true
type: string
run-count:
description: 'Number of test runs to perform'
required: true
type: number
archs:
description: 'Archs to run tests on, e. g.: ["x64", "arm64"]'
default: '["x64"]'
required: true
type: string
build-types:
description: 'Build types to run tests on, e. g.: ["debug", "release"]'
default: '["release"]'
required: true
type: string
pg-versions:
description: 'Postgres versions to use for testing, e.g,: [{"pg_version":"v16"}, {"pg_version":"v17"}])'
default: '[{"pg_version":"v17"}]'
required: true
type: string
defaults:
run:
shell: bash -euxo pipefail {0}
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
jobs:
meta:
uses: ./.github/workflows/_meta.yml
with:
github-event-name: ${{ github.event_name }}
github-event-json: ${{ toJSON(github.event) }}
build-and-test-locally:
needs: [ meta ]
strategy:
fail-fast: false
matrix:
arch: ${{ fromJson(inputs.archs) }}
build-type: ${{ fromJson(inputs.build-types) }}
uses: ./.github/workflows/_build-and-test-locally.yml
with:
arch: ${{ matrix.arch }}
build-tools-image: ghcr.io/neondatabase/build-tools:pinned-bookworm
build-tag: ${{ needs.meta.outputs.build-tag }}
build-type: ${{ matrix.build-type }}
test-cfg: ${{ inputs.pg-versions }}
test-selection: ${{ inputs.test-selection }}
test-run-count: ${{ fromJson(inputs.run-count) }}
rerun-failed: false
secrets: inherit
create-test-report:
needs: [ build-and-test-locally ]
if: ${{ !cancelled() }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
outputs:
report-url: ${{ steps.create-allure-report.outputs.report-url }}
runs-on: [ self-hosted, small ]
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Create Allure report
if: ${{ !cancelled() }}
id: create-allure-report
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_DEV }}
- uses: actions/github-script@v7
if: ${{ !cancelled() }}
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
const report = {
reportUrl: "${{ steps.create-allure-report.outputs.report-url }}",
reportJsonUrl: "${{ steps.create-allure-report.outputs.report-json-url }}",
}
const coverage = {}
const script = require("./scripts/comment-test-report.js")
await script({
github,
context,
fetch,
report,
coverage,
})

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,151 @@
name: Build and Test Fully
on:
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 3 * * *' # run once a day, timezone is utc
workflow_dispatch:
defaults:
run:
shell: bash -euxo pipefail {0}
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
jobs:
tag:
runs-on: [ self-hosted, small ]
container: ${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/base:pinned
outputs:
build-tag: ${{steps.build-tag.outputs.tag}}
steps:
# Need `fetch-depth: 0` to count the number of commits in the branch
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Get build tag
run: |
echo run:$GITHUB_RUN_ID
echo ref:$GITHUB_REF_NAME
echo rev:$(git rev-list --count HEAD)
if [[ "$GITHUB_REF_NAME" == "main" ]]; then
echo "tag=$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
echo "tag=release-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then
echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-compute" ]]; then
echo "tag=release-compute-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
else
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release', 'release-proxy', 'release-compute'"
echo "tag=$GITHUB_RUN_ID" >> $GITHUB_OUTPUT
fi
shell: bash
id: build-tag
build-build-tools-image:
uses: ./.github/workflows/build-build-tools-image.yml
secrets: inherit
build-and-test-locally:
needs: [ tag, build-build-tools-image ]
strategy:
fail-fast: false
matrix:
arch: [ x64, arm64 ]
build-type: [ debug, release ]
uses: ./.github/workflows/_build-and-test-locally.yml
with:
arch: ${{ matrix.arch }}
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
build-tag: ${{ needs.tag.outputs.build-tag }}
build-type: ${{ matrix.build-type }}
rerun-failed: false
test-cfg: '[{"pg_version":"v14", "lfc_state": "with-lfc"},
{"pg_version":"v15", "lfc_state": "with-lfc"},
{"pg_version":"v16", "lfc_state": "with-lfc"},
{"pg_version":"v17", "lfc_state": "with-lfc"},
{"pg_version":"v14", "lfc_state": "without-lfc"},
{"pg_version":"v15", "lfc_state": "without-lfc"},
{"pg_version":"v16", "lfc_state": "without-lfc"},
{"pg_version":"v17", "lfc_state": "withouts-lfc"}]'
secrets: inherit
create-test-report:
needs: [ build-and-test-locally, build-build-tools-image ]
if: ${{ !cancelled() }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
outputs:
report-url: ${{ steps.create-allure-report.outputs.report-url }}
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Create Allure report
if: ${{ !cancelled() }}
id: create-allure-report
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
- uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
if: ${{ !cancelled() }}
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
const report = {
reportUrl: "${{ steps.create-allure-report.outputs.report-url }}",
reportJsonUrl: "${{ steps.create-allure-report.outputs.report-json-url }}",
}
const coverage = {}
const script = require("./scripts/comment-test-report.js")
await script({
github,
context,
fetch,
report,
coverage,
})

View File

@@ -0,0 +1,145 @@
name: Build and Test with Sanitizers
on:
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 1 * * *' # run once a day, timezone is utc
workflow_dispatch:
defaults:
run:
shell: bash -euxo pipefail {0}
concurrency:
# Allow only one workflow per any non-`main` branch.
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
cancel-in-progress: true
env:
RUST_BACKTRACE: 1
COPT: '-Werror'
jobs:
tag:
runs-on: [ self-hosted, small ]
container: ${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/base:pinned
outputs:
build-tag: ${{steps.build-tag.outputs.tag}}
steps:
# Need `fetch-depth: 0` to count the number of commits in the branch
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Get build tag
run: |
echo run:$GITHUB_RUN_ID
echo ref:$GITHUB_REF_NAME
echo rev:$(git rev-list --count HEAD)
if [[ "$GITHUB_REF_NAME" == "main" ]]; then
echo "tag=$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
echo "tag=release-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then
echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-compute" ]]; then
echo "tag=release-compute-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
else
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release', 'release-proxy', 'release-compute'"
echo "tag=$GITHUB_RUN_ID" >> $GITHUB_OUTPUT
fi
shell: bash
id: build-tag
build-build-tools-image:
uses: ./.github/workflows/build-build-tools-image.yml
secrets: inherit
build-and-test-locally:
needs: [ tag, build-build-tools-image ]
strategy:
fail-fast: false
matrix:
arch: [ x64, arm64 ]
build-type: [ release ]
uses: ./.github/workflows/_build-and-test-locally.yml
with:
arch: ${{ matrix.arch }}
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
build-tag: ${{ needs.tag.outputs.build-tag }}
build-type: ${{ matrix.build-type }}
rerun-failed: false
test-cfg: '[{"pg_version":"v17"}]'
sanitizers: enabled
secrets: inherit
create-test-report:
needs: [ build-and-test-locally, build-build-tools-image ]
if: ${{ !cancelled() }}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
outputs:
report-url: ${{ steps.create-allure-report.outputs.report-url }}
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Create Allure report
if: ${{ !cancelled() }}
id: create-allure-report
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
- uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
if: ${{ !cancelled() }}
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
const report = {
reportUrl: "${{ steps.create-allure-report.outputs.report-url }}",
reportJsonUrl: "${{ steps.create-allure-report.outputs.report-json-url }}",
}
const coverage = {}
const script = require("./scripts/comment-test-report.js")
await script({
github,
context,
fetch,
report,
coverage,
})

69
.github/workflows/cargo-deny.yml vendored Normal file
View File

@@ -0,0 +1,69 @@
name: cargo deny checks
on:
workflow_call:
inputs:
build-tools-image:
required: false
type: string
schedule:
- cron: '0 10 * * *'
permissions:
contents: read
jobs:
cargo-deny:
strategy:
matrix:
ref: >-
${{
fromJSON(
github.event_name == 'schedule'
&& '["main","release","release-proxy","release-compute"]'
|| format('["{0}"]', github.sha)
)
}}
runs-on: [self-hosted, small]
permissions:
packages: read
container:
image: ${{ inputs.build-tools-image || 'ghcr.io/neondatabase/build-tools:pinned' }}
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ matrix.ref }}
- name: Check rust licenses/bans/advisories/sources
env:
CARGO_DENY_TARGET: >-
${{ github.event_name == 'schedule' && 'advisories' || 'all' }}
run: cargo deny check --hide-inclusion-graph $CARGO_DENY_TARGET
- name: Post to a Slack channel
if: ${{ github.event_name == 'schedule' && failure() }}
uses: slackapi/slack-github-action@485a9d42d3a73031f12ec201c457e2162c45d02d # v2.0.0
with:
method: chat.postMessage
token: ${{ secrets.SLACK_BOT_TOKEN }}
payload: |
channel: ${{ vars.SLACK_ON_CALL_DEVPROD_STREAM }}
text: |
Periodic cargo-deny on ${{ matrix.ref }}: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
Fixing the problem should be fairly straight forward from the logs. If not, <#${{ vars.SLACK_RUST_CHANNEL_ID }}> is there to help.
Pinging <!subteam^S0838JPSH32|@oncall-devprod>.

View File

@@ -1,51 +0,0 @@
name: Check build-tools image
on:
workflow_call:
outputs:
image-tag:
description: "build-tools image tag"
value: ${{ jobs.check-image.outputs.tag }}
found:
description: "Whether the image is found in the registry"
value: ${{ jobs.check-image.outputs.found }}
defaults:
run:
shell: bash -euo pipefail {0}
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
jobs:
check-image:
runs-on: ubuntu-22.04
outputs:
tag: ${{ steps.get-build-tools-tag.outputs.image-tag }}
found: ${{ steps.check-image.outputs.found }}
steps:
- uses: actions/checkout@v4
- name: Get build-tools image tag for the current commit
id: get-build-tools-tag
env:
IMAGE_TAG: |
${{ hashFiles('Dockerfile.build-tools',
'.github/workflows/check-build-tools-image.yml',
'.github/workflows/build-build-tools-image.yml') }}
run: |
echo "image-tag=${IMAGE_TAG}" | tee -a $GITHUB_OUTPUT
- name: Check if such tag found in the registry
id: check-image
env:
IMAGE_TAG: ${{ steps.get-build-tools-tag.outputs.image-tag }}
run: |
if docker manifest inspect neondatabase/build-tools:${IMAGE_TAG}; then
found=true
else
found=false
fi
echo "found=${found}" | tee -a $GITHUB_OUTPUT

View File

@@ -18,6 +18,11 @@ jobs:
check-permissions:
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Disallow CI runs on PRs from forks
if: |
inputs.github-event-name == 'pull_request' &&

View File

@@ -11,6 +11,11 @@ jobs:
cleanup:
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Cleanup
run: |
gh extension install actions/gh-actions-cache

99
.github/workflows/cloud-extensions.yml vendored Normal file
View File

@@ -0,0 +1,99 @@
name: Cloud Extensions Test
on:
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '45 1 * * *' # run once a day, timezone is utc
workflow_dispatch: # adds ability to run this manually
inputs:
region_id:
description: 'Project region id. If not set, the default region will be used'
required: false
default: 'aws-us-east-2'
defaults:
run:
shell: bash -euxo pipefail {0}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
jobs:
regress:
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
strategy:
fail-fast: false
matrix:
pg-version: [16, 17]
runs-on: us-east-2
container:
# We use the neon-test-extensions image here as it contains the source code for the extensions.
image: ghcr.io/neondatabase/neon-test-extensions-v${{ matrix.pg-version }}:latest
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Evaluate the settings
id: project-settings
run: |
if [[ $((${{ matrix.pg-version }})) -lt 17 ]]; then
ULID=ulid
else
ULID=pgx_ulid
fi
LIBS=timescaledb:rag_bge_small_en_v15,rag_jina_reranker_v1_tiny_en:$ULID
settings=$(jq -c -n --arg libs $LIBS '{preload_libraries:{use_defaults:false,enabled_libraries:($libs| split(":"))}}')
echo settings=$settings >> $GITHUB_OUTPUT
- name: Create Neon Project
id: create-neon-project
uses: ./.github/actions/neon-project-create
with:
region_id: ${{ inputs.region_id || 'aws-us-east-2' }}
postgres_version: ${{ matrix.pg-version }}
project_settings: ${{ steps.project-settings.outputs.settings }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Run the regression tests
run: /run-tests.sh -r /ext-src
env:
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
SKIP: "pg_hint_plan-src,pg_repack-src,pg_cron-src,plpgsql_check-src"
- name: Delete Neon Project
if: ${{ always() }}
uses: ./.github/actions/neon-project-delete
with:
project_id: ${{ steps.create-neon-project.outputs.project_id }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}
slack-message: |
Periodic extensions test on staging: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

138
.github/workflows/cloud-regress.yml vendored Normal file
View File

@@ -0,0 +1,138 @@
name: Cloud Regression Test
on:
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '45 1 * * *' # run once a day, timezone is utc
workflow_dispatch: # adds ability to run this manually
defaults:
run:
shell: bash -euxo pipefail {0}
concurrency:
# Allow only one workflow
group: ${{ github.workflow }}
cancel-in-progress: true
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
jobs:
regress:
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
strategy:
fail-fast: false
matrix:
pg-version: [16, 17]
runs-on: us-east-2
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: true
- name: Patch the test
env:
PG_VERSION: ${{matrix.pg-version}}
run: |
cd "vendor/postgres-v${PG_VERSION}"
patch -p1 < "../../compute/patches/cloud_regress_pg${PG_VERSION}.patch"
- name: Generate a random password
id: pwgen
run: |
set +x
DBPASS=$(dd if=/dev/random bs=48 count=1 2>/dev/null | base64)
echo "::add-mask::${DBPASS//\//}"
echo DBPASS="${DBPASS//\//}" >> "${GITHUB_OUTPUT}"
- name: Change tests according to the generated password
env:
DBPASS: ${{ steps.pwgen.outputs.DBPASS }}
PG_VERSION: ${{matrix.pg-version}}
run: |
cd vendor/postgres-v"${PG_VERSION}"/src/test/regress
for fname in sql/*.sql expected/*.out; do
sed -i.bak s/NEON_PASSWORD_PLACEHOLDER/"'${DBPASS}'"/ "${fname}"
done
for ph in $(grep NEON_MD5_PLACEHOLDER expected/password.out | awk '{print $3;}' | sort | uniq); do
USER=$(echo "${ph}" | cut -c 22-)
MD5=md5$(echo -n "${DBPASS}${USER}" | md5sum | awk '{print $1;}')
sed -i.bak "s/${ph}/${MD5}/" expected/password.out
done
- name: Download Neon artifact
uses: ./.github/actions/download
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create a new branch
id: create-branch
uses: ./.github/actions/neon-branch-create
with:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}
- name: Run the regression tests
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: cloud_regress
pg_version: ${{matrix.pg-version}}
extra_params: -m remote_cluster
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{steps.create-branch.outputs.dsn}}
- name: Delete branch
if: always()
uses: ./.github/actions/neon-branch-delete
with:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}
branch_id: ${{steps.create-branch.outputs.branch_id}}
- name: Create Allure report
id: create-allure-report
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}
slack-message: |
Periodic pg_regress on staging: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
<${{ steps.create-allure-report.outputs.report-url }}|Allure report>
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

43
.github/workflows/fast-forward.yml vendored Normal file
View File

@@ -0,0 +1,43 @@
name: Fast forward merge
on:
pull_request:
types: [labeled]
branches:
- release
- release-proxy
- release-compute
jobs:
fast-forward:
if: ${{ github.event.label.name == 'fast-forward' }}
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Remove fast-forward label to PR
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: |
gh pr edit ${{ github.event.pull_request.number }} --repo "${GITHUB_REPOSITORY}" --remove-label "fast-forward"
- name: Fast forwarding
uses: sequoia-pgp/fast-forward@ea7628bedcb0b0b96e94383ada458d812fca4979
# See https://docs.github.com/en/graphql/reference/enums#mergestatestatus
if: ${{ contains(fromJSON('["clean", "unstable"]'), github.event.pull_request.mergeable_state) }}
with:
merge: true
comment: on-error
github_token: ${{ secrets.CI_ACCESS_TOKEN }}
- name: Comment if mergeable_state is not clean
if: ${{ !contains(fromJSON('["clean", "unstable"]'), github.event.pull_request.mergeable_state) }}
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: |
gh pr comment ${{ github.event.pull_request.number }} \
--repo "${GITHUB_REPOSITORY}" \
--body "Not trying to forward pull-request, because \`mergeable_state\` is \`${{ github.event.pull_request.mergeable_state }}\`, not \`clean\` or \`unstable\`."

View File

@@ -0,0 +1,82 @@
name: Force Test Upgrading of Extension
on:
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '45 2 * * *' # run once a day, timezone is utc
workflow_dispatch: # adds ability to run this manually
defaults:
run:
shell: bash -euxo pipefail {0}
concurrency:
# Allow only one workflow
group: ${{ github.workflow }}
cancel-in-progress: true
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: read
jobs:
regress:
strategy:
fail-fast: false
matrix:
pg-version: [16, 17]
runs-on: small
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: false
- name: Get the last compute release tag
id: get-last-compute-release-tag
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
tag=$(gh api -q '[.[].tag_name | select(startswith("release-compute"))][0]'\
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"/repos/${GITHUB_REPOSITORY}/releases")
echo tag=${tag} >> ${GITHUB_OUTPUT}
- name: Test extension upgrade
timeout-minutes: 60
env:
NEW_COMPUTE_TAG: latest
OLD_COMPUTE_TAG: ${{ steps.get-last-compute-release-tag.outputs.tag }}
TEST_EXTENSIONS_TAG: ${{ steps.get-last-compute-release-tag.outputs.tag }}
PG_VERSION: ${{ matrix.pg-version }}
FORCE_ALL_UPGRADE_TESTS: true
run: ./docker-compose/test_extensions_upgrade.sh
- name: Print logs and clean up
if: always()
run: |
docker compose --profile test-extensions -f ./docker-compose/docker-compose.yml logs || true
docker compose --profile test-extensions -f ./docker-compose/docker-compose.yml down
- name: Post to the Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}
slack-message: |
Test upgrading of extensions: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

200
.github/workflows/ingest_benchmark.yml vendored Normal file
View File

@@ -0,0 +1,200 @@
name: benchmarking ingest
on:
# uncomment to run on push for debugging your PR
# push:
# branches: [ your branch ]
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 9 * * *' # run once a day, timezone is utc
workflow_dispatch: # adds ability to run this manually
defaults:
run:
shell: bash -euxo pipefail {0}
concurrency:
# Allow only one workflow globally because we need dedicated resources which only exist once
group: ingest-bench-workflow
cancel-in-progress: true
permissions:
contents: read
jobs:
ingest:
strategy:
fail-fast: false # allow other variants to continue even if one fails
matrix:
include:
- target_project: new_empty_project_stripe_size_2048
stripe_size: 2048 # 16 MiB
postgres_version: 16
disable_sharding: false
- target_project: new_empty_project_stripe_size_32768
stripe_size: 32768 # 256 MiB # note that this is different from null because using null will shard_split the project only if it reaches the threshold
# while here it is sharded from the beginning with a shard size of 256 MiB
disable_sharding: false
postgres_version: 16
- target_project: new_empty_project
stripe_size: null # run with neon defaults which will shard split only when reaching the threshold
disable_sharding: false
postgres_version: 16
- target_project: new_empty_project
stripe_size: null # run with neon defaults which will shard split only when reaching the threshold
disable_sharding: false
postgres_version: 17
- target_project: large_existing_project
stripe_size: null # cannot re-shared or choose different stripe size for existing, already sharded project
disable_sharding: false
postgres_version: 16
- target_project: new_empty_project_unsharded
stripe_size: null # run with neon defaults which will shard split only when reaching the threshold
disable_sharding: true
postgres_version: 16
max-parallel: 1 # we want to run each stripe size sequentially to be able to compare the results
permissions:
contents: write
statuses: write
id-token: write # aws-actions/configure-aws-credentials
env:
PG_CONFIG: /tmp/neon/pg_install/v16/bin/pg_config
PSQL: /tmp/neon/pg_install/v16/bin/psql
PG_16_LIB_PATH: /tmp/neon/pg_install/v16/lib
PGCOPYDB: /pgcopydb/bin/pgcopydb
PGCOPYDB_LIB_PATH: /pgcopydb/lib
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
timeout-minutes: 1440
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Configure AWS credentials # necessary to download artefacts
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role
- name: Download Neon artifact
uses: ./.github/actions/download
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}
id: create-neon-project-ingest-target
uses: ./.github/actions/neon-project-create
with:
region_id: aws-us-east-2
postgres_version: ${{ matrix.postgres_version }}
compute_units: '[7, 7]' # we want to test large compute here to avoid compute-side bottleneck
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
shard_split_project: ${{ matrix.stripe_size != null && 'true' || 'false' }}
admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }}
shard_count: 8
stripe_size: ${{ matrix.stripe_size }}
disable_sharding: ${{ matrix.disable_sharding }}
- name: Initialize Neon project
if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}
env:
BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-project-ingest-target.outputs.dsn }}
NEW_PROJECT_ID: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}
run: |
echo "Initializing Neon project with project_id: ${NEW_PROJECT_ID}"
export LD_LIBRARY_PATH=${PG_16_LIB_PATH}
${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"
echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV
- name: Create Neon Branch for large tenant
if: ${{ matrix.target_project == 'large_existing_project' }}
id: create-neon-branch-ingest-target
uses: ./.github/actions/neon-branch-create
with:
project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Initialize Neon project
if: ${{ matrix.target_project == 'large_existing_project' }}
env:
BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-branch-ingest-target.outputs.dsn }}
NEW_BRANCH_ID: ${{ steps.create-neon-branch-ingest-target.outputs.branch_id }}
run: |
echo "Initializing Neon branch with branch_id: ${NEW_BRANCH_ID}"
export LD_LIBRARY_PATH=${PG_16_LIB_PATH}
# Extract the part before the database name
base_connstr="${BENCHMARK_INGEST_TARGET_CONNSTR%/*}"
# Extract the query parameters (if any) after the database name
query_params="${BENCHMARK_INGEST_TARGET_CONNSTR#*\?}"
# Reconstruct the new connection string
if [ "$query_params" != "$BENCHMARK_INGEST_TARGET_CONNSTR" ]; then
new_connstr="${base_connstr}/neondb?${query_params}"
else
new_connstr="${base_connstr}/neondb"
fi
${PSQL} "${new_connstr}" -c "drop database ludicrous;"
${PSQL} "${new_connstr}" -c "CREATE DATABASE ludicrous;"
if [ "$query_params" != "$BENCHMARK_INGEST_TARGET_CONNSTR" ]; then
BENCHMARK_INGEST_TARGET_CONNSTR="${base_connstr}/ludicrous?${query_params}"
else
BENCHMARK_INGEST_TARGET_CONNSTR="${base_connstr}/ludicrous"
fi
${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"
echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV
- name: Invoke pgcopydb
uses: ./.github/actions/run-python-test-set
with:
build_type: remote
test_selection: performance/test_perf_ingest_using_pgcopydb.py
run_in_parallel: false
extra_params: -s -m remote_cluster --timeout 86400 -k test_ingest_performance_using_pgcopydb
pg_version: v${{ matrix.postgres_version }}
save_perf_report: true
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_INGEST_SOURCE_CONNSTR: ${{ secrets.BENCHMARK_INGEST_SOURCE_CONNSTR }}
TARGET_PROJECT_TYPE: ${{ matrix.target_project }}
# we report PLATFORM in zenbenchmark NeonBenchmarker perf database and want to distinguish between new project and large tenant
PLATFORM: "${{ matrix.target_project }}-us-east-2-staging"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
- name: show tables sizes after ingest
run: |
export LD_LIBRARY_PATH=${PG_16_LIB_PATH}
${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "\dt+"
- name: Delete Neon Project
if: ${{ always() && startsWith(matrix.target_project, 'new_empty_project') }}
uses: ./.github/actions/neon-project-delete
with:
project_id: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Delete Neon Branch for large tenant
if: ${{ always() && matrix.target_project == 'large_existing_project' }}
uses: ./.github/actions/neon-branch-delete
with:
project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}
branch_id: ${{ steps.create-neon-branch-ingest-target.outputs.branch_id }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}

View File

@@ -27,6 +27,11 @@ jobs:
is-member: ${{ steps.check-user.outputs.is-member }}
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Check whether `${{ github.actor }}` is a member of `${{ github.repository_owner }}`
id: check-user
env:
@@ -69,6 +74,11 @@ jobs:
issues: write # for `gh issue edit`
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Add `${{ env.LABEL }}` label
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -0,0 +1,203 @@
name: large oltp benchmark
on:
# uncomment to run on push for debugging your PR
#push:
# branches: [ bodobolero/synthetic_oltp_workload ]
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 15 * * 0,2,4' # run on Sunday, Tuesday, Thursday at 3 PM UTC
workflow_dispatch: # adds ability to run this manually
defaults:
run:
shell: bash -euxo pipefail {0}
concurrency:
# Allow only one workflow globally because we need dedicated resources which only exist once
group: large-oltp-bench-workflow
cancel-in-progress: false
permissions:
contents: read
jobs:
oltp:
strategy:
fail-fast: false # allow other variants to continue even if one fails
matrix:
include:
# test only read-only custom scripts in new branch without database maintenance
- target: new_branch
custom_scripts: select_any_webhook_with_skew.sql@300 select_recent_webhook.sql@397 select_prefetch_webhook.sql@3
test_maintenance: false
# test all custom scripts in new branch with database maintenance
- target: new_branch
custom_scripts: insert_webhooks.sql@200 select_any_webhook_with_skew.sql@300 select_recent_webhook.sql@397 select_prefetch_webhook.sql@3 IUD_one_transaction.sql@100
test_maintenance: true
# test all custom scripts in reuse branch with database maintenance
- target: reuse_branch
custom_scripts: insert_webhooks.sql@200 select_any_webhook_with_skew.sql@300 select_recent_webhook.sql@397 select_prefetch_webhook.sql@3 IUD_one_transaction.sql@100
test_maintenance: true
max-parallel: 1 # we want to run each benchmark sequentially to not have noisy neighbors on shared storage (PS, SK)
permissions:
contents: write
statuses: write
id-token: write # aws-actions/configure-aws-credentials
env:
TEST_PG_BENCH_DURATIONS_MATRIX: "1h" # todo update to > 1 h
TEST_PGBENCH_CUSTOM_SCRIPTS: ${{ matrix.custom_scripts }}
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
PG_VERSION: 16 # pre-determined by pre-determined project
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
PLATFORM: ${{ matrix.target }}
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
# Increase timeout to 2 days, default timeout is 6h - database maintenance can take a long time
# (normally 1h pgbench, 3h vacuum analyze 3.5h re-index) x 2 = 15h, leave some buffer for regressions
# in one run vacuum didn't finish within 12 hours
timeout-minutes: 2880
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Configure AWS credentials # necessary to download artefacts
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role
- name: Download Neon artifact
uses: ./.github/actions/download
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Branch for large tenant
if: ${{ matrix.target == 'new_branch' }}
id: create-neon-branch-oltp-target
uses: ./.github/actions/neon-branch-create
with:
project_id: ${{ vars.BENCHMARK_LARGE_OLTP_PROJECTID }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Set up Connection String
id: set-up-connstr
run: |
case "${{ matrix.target }}" in
new_branch)
CONNSTR=${{ steps.create-neon-branch-oltp-target.outputs.dsn }}
;;
reuse_branch)
CONNSTR=${{ secrets.BENCHMARK_LARGE_OLTP_REUSE_CONNSTR }}
;;
*)
echo >&2 "Unknown target=${{ matrix.target }}"
exit 1
;;
esac
CONNSTR_WITHOUT_POOLER="${CONNSTR//-pooler/}"
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
echo "connstr_without_pooler=${CONNSTR_WITHOUT_POOLER}" >> $GITHUB_OUTPUT
- name: Delete rows from prior runs in reuse branch
if: ${{ matrix.target == 'reuse_branch' }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr_without_pooler }}
PG_CONFIG: /tmp/neon/pg_install/v16/bin/pg_config
PSQL: /tmp/neon/pg_install/v16/bin/psql
PG_16_LIB_PATH: /tmp/neon/pg_install/v16/lib
run: |
echo "$(date '+%Y-%m-%d %H:%M:%S') - Deleting rows in table webhook.incoming_webhooks from prior runs"
export LD_LIBRARY_PATH=${PG_16_LIB_PATH}
${PSQL} "${BENCHMARK_CONNSTR}" -c "SET statement_timeout = 0; DELETE FROM webhook.incoming_webhooks WHERE created_at > '2025-02-27 23:59:59+00';"
echo "$(date '+%Y-%m-%d %H:%M:%S') - Finished deleting rows in table webhook.incoming_webhooks from prior runs"
- name: Benchmark pgbench with custom-scripts
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: performance
run_in_parallel: false
save_perf_report: true
extra_params: -m remote_cluster --timeout 7200 -k test_perf_oltp_large_tenant_pgbench
pg_version: ${{ env.PG_VERSION }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
- name: Benchmark database maintenance
if: ${{ matrix.test_maintenance }}
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: performance
run_in_parallel: false
save_perf_report: true
extra_params: -m remote_cluster --timeout 172800 -k test_perf_oltp_large_tenant_maintenance
pg_version: ${{ env.PG_VERSION }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr_without_pooler }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
- name: Delete Neon Branch for large tenant
if: ${{ always() && matrix.target == 'new_branch' }}
uses: ./.github/actions/neon-branch-delete
with:
project_id: ${{ vars.BENCHMARK_LARGE_OLTP_PROJECTID }}
branch_id: ${{ steps.create-neon-branch-oltp-target.outputs.branch_id }}
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
- name: Configure AWS credentials # again because prior steps could have exceeded 5 hours
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours
- name: Create Allure report
id: create-allure-report
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream
slack-message: |
Periodic large oltp perf testing: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
<${{ steps.create-allure-report.outputs.report-url }}|Allure report>
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

175
.github/workflows/large_oltp_growth.yml vendored Normal file
View File

@@ -0,0 +1,175 @@
name: large oltp growth
# workflow to grow the reuse branch of large oltp benchmark continuously (about 16 GB per run)
on:
# uncomment to run on push for debugging your PR
# push:
# branches: [ bodobolero/increase_large_oltp_workload ]
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 6 * * *' # 06:00 UTC
- cron: '0 8 * * *' # 08:00 UTC
- cron: '0 10 * * *' # 10:00 UTC
- cron: '0 12 * * *' # 12:00 UTC
- cron: '0 14 * * *' # 14:00 UTC
- cron: '0 16 * * *' # 16:00 UTC
workflow_dispatch: # adds ability to run this manually
defaults:
run:
shell: bash -euxo pipefail {0}
concurrency:
# Allow only one workflow globally because we need dedicated resources which only exist once
group: large-oltp-growth
cancel-in-progress: true
permissions:
contents: read
jobs:
oltp:
strategy:
fail-fast: false # allow other variants to continue even if one fails
matrix:
include:
# for now only grow the reuse branch, not the other branches.
- target: reuse_branch
custom_scripts:
- grow_action_blocks.sql
- grow_action_kwargs.sql
- grow_device_fingerprint_event.sql
- grow_edges.sql
- grow_hotel_rate_mapping.sql
- grow_ocr_pipeline_results_version.sql
- grow_priceline_raw_response.sql
- grow_relabled_transactions.sql
- grow_state_values.sql
- grow_values.sql
- grow_vertices.sql
- update_accounting_coding_body_tracking_category_selection.sql
- update_action_blocks.sql
- update_action_kwargs.sql
- update_denormalized_approval_workflow.sql
- update_device_fingerprint_event.sql
- update_edges.sql
- update_heron_transaction_enriched_log.sql
- update_heron_transaction_enrichment_requests.sql
- update_hotel_rate_mapping.sql
- update_incoming_webhooks.sql
- update_manual_transaction.sql
- update_ml_receipt_matching_log.sql
- update_ocr_pipeine_results_version.sql
- update_orc_pipeline_step_results.sql
- update_orc_pipeline_step_results_version.sql
- update_priceline_raw_response.sql
- update_quickbooks_transactions.sql
- update_raw_finicity_transaction.sql
- update_relabeled_transactions.sql
- update_state_values.sql
- update_stripe_authorization_event_log.sql
- update_transaction.sql
- update_values.sql
- update_vertices.sql
max-parallel: 1 # we want to run each growth workload sequentially (for now there is just one)
permissions:
contents: write
statuses: write
id-token: write # aws-actions/configure-aws-credentials
env:
TEST_PG_BENCH_DURATIONS_MATRIX: "1h"
TEST_PGBENCH_CUSTOM_SCRIPTS: ${{ join(matrix.custom_scripts, ' ') }}
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
PG_VERSION: 16 # pre-determined by pre-determined project
TEST_OUTPUT: /tmp/test_output
BUILD_TYPE: remote
PLATFORM: ${{ matrix.target }}
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Configure AWS credentials # necessary to download artefacts
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role
- name: Download Neon artifact
uses: ./.github/actions/download
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Set up Connection String
id: set-up-connstr
run: |
case "${{ matrix.target }}" in
reuse_branch)
CONNSTR=${{ secrets.BENCHMARK_LARGE_OLTP_REUSE_CONNSTR }}
;;
*)
echo >&2 "Unknown target=${{ matrix.target }}"
exit 1
;;
esac
CONNSTR_WITHOUT_POOLER="${CONNSTR//-pooler/}"
echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT
echo "connstr_without_pooler=${CONNSTR_WITHOUT_POOLER}" >> $GITHUB_OUTPUT
- name: pgbench with custom-scripts
uses: ./.github/actions/run-python-test-set
with:
build_type: ${{ env.BUILD_TYPE }}
test_selection: performance
run_in_parallel: false
save_perf_report: true
extra_params: -m remote_cluster --timeout 7200 -k test_perf_oltp_large_tenant_growth
pg_version: ${{ env.PG_VERSION }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}
VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
- name: Create Allure report
id: create-allure-report
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream
slack-message: |
Periodic large oltp tenant growth increase: ${{ job.status }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>
<${{ steps.create-allure-report.outputs.report-url }}|Allure report>
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

32
.github/workflows/lint-release-pr.yml vendored Normal file
View File

@@ -0,0 +1,32 @@
name: Lint Release PR
on:
pull_request:
branches:
- release
- release-proxy
- release-compute
permissions:
contents: read
jobs:
lint-release-pr:
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout PR branch
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0 # Fetch full history for git operations
ref: ${{ github.event.pull_request.head.ref }}
- name: Run lint script
env:
RELEASE_BRANCH: ${{ github.base_ref }}
run: |
./.github/scripts/lint-release-pr.sh

View File

@@ -26,124 +26,75 @@ jobs:
with:
github-event-name: ${{ github.event_name}}
check-build-tools-image:
needs: [ check-permissions ]
uses: ./.github/workflows/check-build-tools-image.yml
build-build-tools-image:
needs: [ check-build-tools-image ]
needs: [ check-permissions ]
uses: ./.github/workflows/build-build-tools-image.yml
with:
image-tag: ${{ needs.check-build-tools-image.outputs.image-tag }}
secrets: inherit
check-macos-build:
needs: [ check-permissions ]
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
timeout-minutes: 90
runs-on: macos-14
env:
# Use release build only, to have less debug info around
# Hence keeping target/ (and general cache size) smaller
BUILD_TYPE: release
files-changed:
name: Detect what files changed
runs-on: ubuntu-22.04
timeout-minutes: 3
outputs:
v17: ${{ steps.files_changed.outputs.v17 }}
postgres_changes: ${{ steps.postgres_changes.outputs.changes }}
rebuild_rust_code: ${{ steps.files_changed.outputs.rust_code }}
rebuild_everything: ${{ steps.files_changed.outputs.rebuild_neon_extra || steps.files_changed.outputs.rebuild_macos }}
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: true
- name: Install macOS postgres dependencies
run: brew install flex bison openssl protobuf icu4c pkg-config
- name: Set pg 14 revision for caching
id: pg_v14_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT
- name: Set pg 15 revision for caching
id: pg_v15_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT
- name: Set pg 16 revision for caching
id: pg_v16_rev
run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT
- name: Cache postgres v14 build
id: cache_pg_14
uses: actions/cache@v4
- name: Check for Postgres changes
uses: dorny/paths-filter@1441771bbfdd59dcd748680ee64ebd8faab1a242 #v3
id: files_changed
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
token: ${{ github.token }}
filters: .github/file-filters.yaml
base: ${{ github.event_name != 'pull_request' && (github.event.merge_group.base_ref || github.ref_name) || '' }}
ref: ${{ github.event_name != 'pull_request' && (github.event.merge_group.head_ref || github.ref) || '' }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v4
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Cache postgres v16 build
id: cache_pg_16
uses: actions/cache@v4
with:
path: pg_install/v16
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}
- name: Set extra env for macOS
- name: Filter out only v-string for build matrix
id: postgres_changes
env:
CHANGES: ${{ steps.files_changed.outputs.changes }}
run: |
echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV
echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV
v_strings_only_as_json_array=$(echo ${CHANGES} | jq '.[]|select(test("v\\d+"))' | jq --slurp -c)
echo "changes=${v_strings_only_as_json_array}" | tee -a "${GITHUB_OUTPUT}"
- name: Cache cargo deps
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'
run: make postgres-v14 -j$(sysctl -n hw.ncpu)
- name: Build postgres v15
if: steps.cache_pg_15.outputs.cache-hit != 'true'
run: make postgres-v15 -j$(sysctl -n hw.ncpu)
- name: Build postgres v16
if: steps.cache_pg_16.outputs.cache-hit != 'true'
run: make postgres-v16 -j$(sysctl -n hw.ncpu)
- name: Build neon extensions
run: make neon-pg-ext -j$(sysctl -n hw.ncpu)
- name: Build walproposer-lib
run: make walproposer-lib -j$(sysctl -n hw.ncpu)
- name: Run cargo build
run: PQ_LIB_DIR=$(pwd)/pg_install/v16/lib cargo build --all --release
- name: Check that no warnings are produced
run: ./run_clippy.sh
check-macos-build:
needs: [ check-permissions, files-changed ]
uses: ./.github/workflows/build-macos.yml
with:
pg_versions: ${{ needs.files-changed.outputs.postgres_changes }}
rebuild_rust_code: ${{ fromJSON(needs.files-changed.outputs.rebuild_rust_code) }}
rebuild_everything: ${{ fromJSON(needs.files-changed.outputs.rebuild_everything) }}
gather-rust-build-stats:
needs: [ check-permissions, build-build-tools-image ]
needs: [ check-permissions, build-build-tools-image, files-changed ]
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
if: |
contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
(needs.files-changed.outputs.v17 == 'true' || needs.files-changed.outputs.rebuild_everything == 'true') && (
contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||
contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||
github.ref_name == 'main'
)
runs-on: [ self-hosted, large ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
env:
@@ -153,8 +104,13 @@ jobs:
CARGO_INCREMENTAL: 0
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Checkout
uses: actions/checkout@v4
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
submodules: true
@@ -166,26 +122,33 @@ jobs:
run: make walproposer-lib -j$(nproc)
- name: Produce the build stats
run: PQ_LIB_DIR=$(pwd)/pg_install/v16/lib cargo build --all --release --timings -j$(nproc)
run: cargo build --all --release --timings -j$(nproc)
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 3600
- name: Upload the build stats
id: upload-stats
env:
BUCKET: neon-github-public-dev
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
run: |
REPORT_URL=https://${BUCKET}.s3.amazonaws.com/build-stats/${SHA}/${GITHUB_RUN_ID}/cargo-timing.html
aws s3 cp --only-show-errors ./target/cargo-timings/cargo-timing.html "s3://${BUCKET}/build-stats/${SHA}/${GITHUB_RUN_ID}/"
echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT
- name: Publish build stats report
uses: actions/github-script@v7
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
env:
REPORT_URL: ${{ steps.upload-stats.outputs.report-url }}
SHA: ${{ github.event.pull_request.head.sha || github.sha }}
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
const { REPORT_URL, SHA } = process.env

View File

@@ -1,14 +1,14 @@
name: Periodic pagebench performance test on dedicated EC2 machine in eu-central-1 region
name: Periodic pagebench performance test on unit-perf hetzner runner
on:
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# ┌───────────── hour (0 - 23)
# │ ┌───────────── day of the month (1 - 31)
# │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 18 * * *' # Runs at 6 PM UTC every day
# ┌───────────── minute (0 - 59)
# ┌───────────── hour (0 - 23)
# │ ┌───────────── day of the month (1 - 31)
# │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '0 */4 * * *' # Runs every 4 hours
workflow_dispatch: # Allows manual triggering of the workflow
inputs:
commit_hash:
@@ -16,6 +16,11 @@ on:
description: 'The long neon repo commit hash for the system under test (pageserver) to be tested.'
required: false
default: ''
recreate_snapshots:
type: boolean
description: 'Recreate snapshots - !!!WARNING!!! We should only recreate snapshots if the previous ones are no longer compatible. Otherwise benchmarking results are not comparable across runs.'
required: false
default: false
defaults:
run:
@@ -25,131 +30,252 @@ concurrency:
group: ${{ github.workflow }}
cancel-in-progress: false
permissions:
contents: read
jobs:
trigger_bench_on_ec2_machine_in_eu_central_1:
runs-on: [ self-hosted, small ]
run_periodic_pagebench_test:
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
runs-on: [ self-hosted, unit-perf ]
container:
image: neondatabase/build-tools:pinned
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
timeout-minutes: 360 # Set the timeout to 6 hours
env:
API_KEY: ${{ secrets.PERIODIC_PAGEBENCH_EC2_RUNNER_API_KEY }}
RUN_ID: ${{ github.run_id }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY : ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_SECRET }}
AWS_DEFAULT_REGION : "eu-central-1"
AWS_INSTANCE_ID : "i-02a59a3bf86bc7e74"
DEFAULT_PG_VERSION: 16
BUILD_TYPE: release
RUST_BACKTRACE: 1
# NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS: 1 - doesn't work without root in container
S3_BUCKET: neon-github-public-dev
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
steps:
# we don't need the neon source code because we run everything remotely
# however we still need the local github actions to run the allure step below
- uses: actions/checkout@v4
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Show my own (github runner) external IP address - usefull for IP allowlisting
run: curl https://ifconfig.me
- name: Start EC2 instance and wait for the instance to boot up
- name: Set up the environment which depends on $RUNNER_TEMP on nvme drive
id: set-env
shell: bash -euxo pipefail {0}
run: |
aws ec2 start-instances --instance-ids $AWS_INSTANCE_ID
aws ec2 wait instance-running --instance-ids $AWS_INSTANCE_ID
sleep 60 # sleep some time to allow cloudinit and our API server to start up
{
echo "NEON_DIR=${RUNNER_TEMP}/neon"
echo "NEON_BIN=${RUNNER_TEMP}/neon/bin"
echo "POSTGRES_DISTRIB_DIR=${RUNNER_TEMP}/neon/pg_install"
echo "LD_LIBRARY_PATH=${RUNNER_TEMP}/neon/pg_install/v${DEFAULT_PG_VERSION}/lib"
echo "BACKUP_DIR=${RUNNER_TEMP}/instance_store/saved_snapshots"
echo "TEST_OUTPUT=${RUNNER_TEMP}/neon/test_output"
echo "PERF_REPORT_DIR=${RUNNER_TEMP}/neon/test_output/perf-report-local"
echo "ALLURE_DIR=${RUNNER_TEMP}/neon/test_output/allure-results"
echo "ALLURE_RESULTS_DIR=${RUNNER_TEMP}/neon/test_output/allure-results/results"
} >> "$GITHUB_ENV"
- name: Determine public IP of the EC2 instance and set env variable EC2_MACHINE_URL_US
run: |
public_ip=$(aws ec2 describe-instances --instance-ids $AWS_INSTANCE_ID --query 'Reservations[*].Instances[*].PublicIpAddress' --output text)
echo "Public IP of the EC2 instance: $public_ip"
echo "EC2_MACHINE_URL_US=https://${public_ip}:8443" >> $GITHUB_ENV
echo "allure_results_dir=${RUNNER_TEMP}/neon/test_output/allure-results/results" >> "$GITHUB_OUTPUT"
- uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2
with:
aws-region: eu-central-1
role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
role-duration-seconds: 18000 # max 5 hours (needed in case commit hash is still being built)
- name: Determine commit hash
id: commit_hash
shell: bash -euxo pipefail {0}
env:
INPUT_COMMIT_HASH: ${{ github.event.inputs.commit_hash }}
run: |
if [ -z "$INPUT_COMMIT_HASH" ]; then
echo "COMMIT_HASH=$(curl -s https://api.github.com/repos/neondatabase/neon/commits/main | jq -r '.sha')" >> $GITHUB_ENV
if [[ -z "${INPUT_COMMIT_HASH}" ]]; then
COMMIT_HASH=$(curl -s https://api.github.com/repos/neondatabase/neon/commits/main | jq -r '.sha')
echo "COMMIT_HASH=$COMMIT_HASH" >> $GITHUB_ENV
echo "commit_hash=$COMMIT_HASH" >> "$GITHUB_OUTPUT"
echo "COMMIT_HASH_TYPE=latest" >> $GITHUB_ENV
else
echo "COMMIT_HASH=$INPUT_COMMIT_HASH" >> $GITHUB_ENV
COMMIT_HASH="${INPUT_COMMIT_HASH}"
echo "COMMIT_HASH=$COMMIT_HASH" >> $GITHUB_ENV
echo "commit_hash=$COMMIT_HASH" >> "$GITHUB_OUTPUT"
echo "COMMIT_HASH_TYPE=manual" >> $GITHUB_ENV
fi
- name: Checkout the neon repository at given commit hash
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ steps.commit_hash.outputs.commit_hash }}
- name: Start Bench with run_id
# does not reuse ./.github/actions/download because we need to download the artifact for the given commit hash
# example artifact
# s3://neon-github-public-dev/artifacts/48b870bc078bd2c450eb7b468e743b9c118549bf/15036827400/1/neon-Linux-X64-release-artifact.tar.zst /instance_store/artifacts/neon-Linux-release-artifact.tar.zst
- name: Determine artifact S3_KEY for given commit hash and download and extract artifact
id: artifact_prefix
shell: bash -euxo pipefail {0}
env:
ARCHIVE: ${{ runner.temp }}/downloads/neon-${{ runner.os }}-${{ runner.arch }}-release-artifact.tar.zst
COMMIT_HASH: ${{ env.COMMIT_HASH }}
COMMIT_HASH_TYPE: ${{ env.COMMIT_HASH_TYPE }}
run: |
curl -k -X 'POST' \
"${EC2_MACHINE_URL_US}/start_test/${GITHUB_RUN_ID}" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $API_KEY" \
-d "{\"neonRepoCommitHash\": \"${COMMIT_HASH}\"}"
attempt=0
max_attempts=24 # 5 minutes * 24 = 2 hours
- name: Poll Test Status
id: poll_step
run: |
status=""
while [[ "$status" != "failure" && "$status" != "success" ]]; do
response=$(curl -k -X 'GET' \
"${EC2_MACHINE_URL_US}/test_status/${GITHUB_RUN_ID}" \
-H 'accept: application/json' \
-H "Authorization: Bearer $API_KEY")
echo "Response: $response"
set +x
status=$(echo $response | jq -r '.status')
echo "Test status: $status"
if [[ "$status" == "failure" ]]; then
echo "Test failed"
exit 1 # Fail the job step if status is failure
elif [[ "$status" == "success" || "$status" == "null" ]]; then
while [[ $attempt -lt $max_attempts ]]; do
# the following command will fail until the artifacts are available ...
S3_KEY=$(aws s3api list-objects-v2 --bucket "$S3_BUCKET" --prefix "artifacts/$COMMIT_HASH/" \
| jq -r '.Contents[]?.Key' \
| grep "neon-${{ runner.os }}-${{ runner.arch }}-release-artifact.tar.zst" \
| sort --version-sort \
| tail -1) || true # ... thus ignore errors from the command
if [[ -n "${S3_KEY}" ]]; then
echo "Artifact found: $S3_KEY"
echo "S3_KEY=$S3_KEY" >> $GITHUB_ENV
break
elif [[ "$status" == "too_many_runs" ]]; then
echo "Too many runs already running"
echo "too_many_runs=true" >> "$GITHUB_OUTPUT"
exit 1
fi
sleep 60 # Poll every 60 seconds
# Increment attempt counter and sleep for 5 minutes
attempt=$((attempt + 1))
echo "Attempt $attempt of $max_attempts to find artifacts in S3 bucket s3://$S3_BUCKET/artifacts/$COMMIT_HASH failed. Retrying in 5 minutes..."
sleep 300 # Sleep for 5 minutes
done
- name: Retrieve Test Logs
if: always() && steps.poll_step.outputs.too_many_runs != 'true'
if [[ -z "${S3_KEY}" ]]; then
echo "Error: artifact not found in S3 bucket s3://$S3_BUCKET/artifacts/$COMMIT_HASH" after 2 hours
else
mkdir -p $(dirname $ARCHIVE)
time aws s3 cp --only-show-errors s3://$S3_BUCKET/${S3_KEY} ${ARCHIVE}
mkdir -p ${NEON_DIR}
time tar -xf ${ARCHIVE} -C ${NEON_DIR}
rm -f ${ARCHIVE}
fi
- name: Download snapshots from S3
if: ${{ github.event_name != 'workflow_dispatch' || github.event.inputs.recreate_snapshots == 'false' || github.event.inputs.recreate_snapshots == '' }}
id: download_snapshots
shell: bash -euxo pipefail {0}
run: |
curl -k -X 'GET' \
"${EC2_MACHINE_URL_US}/test_log/${GITHUB_RUN_ID}" \
-H 'accept: application/gzip' \
-H "Authorization: Bearer $API_KEY" \
--output "test_log_${GITHUB_RUN_ID}.gz"
- name: Unzip Test Log and Print it into this job's log
if: always() && steps.poll_step.outputs.too_many_runs != 'true'
# Download the snapshots from S3
mkdir -p ${TEST_OUTPUT}
mkdir -p $BACKUP_DIR
cd $BACKUP_DIR
mkdir parts
cd parts
PART=$(aws s3api list-objects-v2 --bucket $S3_BUCKET --prefix performance/pagebench/ \
| jq -r '.Contents[]?.Key' \
| grep -E 'shared-snapshots-[0-9]{4}-[0-9]{2}-[0-9]{2}' \
| sort \
| tail -1)
echo "Latest PART: $PART"
if [[ -z "$PART" ]]; then
echo "ERROR: No matching S3 key found" >&2
exit 1
fi
S3_KEY=$(dirname $PART)
time aws s3 cp --only-show-errors --recursive s3://${S3_BUCKET}/$S3_KEY/ .
cd $TEST_OUTPUT
time cat $BACKUP_DIR/parts/* | zstdcat | tar --extract --preserve-permissions
rm -rf ${BACKUP_DIR}
- name: Cache poetry deps
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry/virtualenvs
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- name: Install Python deps
shell: bash -euxo pipefail {0}
run: ./scripts/pysync
# we need high number of open files for pagebench
- name: show ulimits
shell: bash -euxo pipefail {0}
run: |
gzip -d "test_log_${GITHUB_RUN_ID}.gz"
cat "test_log_${GITHUB_RUN_ID}"
ulimit -a
- name: Run pagebench testcase
shell: bash -euxo pipefail {0}
env:
CI: false # need to override this env variable set by github to enforce using snapshots
run: |
export PLATFORM=hetzner-unit-perf-${COMMIT_HASH_TYPE}
# report the commit hash of the neon repository in the revision of the test results
export GITHUB_SHA=${COMMIT_HASH}
rm -rf ${PERF_REPORT_DIR}
rm -rf ${ALLURE_RESULTS_DIR}
mkdir -p ${PERF_REPORT_DIR}
mkdir -p ${ALLURE_RESULTS_DIR}
PARAMS="--alluredir=${ALLURE_RESULTS_DIR} --tb=short --verbose -rA"
EXTRA_PARAMS="--out-dir ${PERF_REPORT_DIR} --durations-path $TEST_OUTPUT/benchmark_durations.json"
# run only two selected tests
# environment set by parent:
# RUST_BACKTRACE=1 DEFAULT_PG_VERSION=16 BUILD_TYPE=release
./scripts/pytest ${PARAMS} test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py::test_pageserver_characterize_throughput_with_n_tenants ${EXTRA_PARAMS}
./scripts/pytest ${PARAMS} test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py::test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant ${EXTRA_PARAMS}
- name: upload the performance metrics to the Neon performance database which is used by grafana dashboards to display the results
shell: bash -euxo pipefail {0}
run: |
export REPORT_FROM="$PERF_REPORT_DIR"
export GITHUB_SHA=${COMMIT_HASH}
time ./scripts/generate_and_push_perf_report.sh
- name: Upload test results
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-store
with:
report-dir: ${{ steps.set-env.outputs.allure_results_dir }}
unique-key: ${{ env.BUILD_TYPE }}-${{ env.DEFAULT_PG_VERSION }}-${{ runner.arch }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Allure report
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
id: create-allure-report
if: ${{ !cancelled() }}
uses: ./.github/actions/allure-report-generate
with:
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Upload snapshots
if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.recreate_snapshots != 'false' && github.event.inputs.recreate_snapshots != '' }}
id: upload_snapshots
shell: bash -euxo pipefail {0}
run: |
mkdir -p $BACKUP_DIR
cd $TEST_OUTPUT
tar --create --preserve-permissions --file - shared-snapshots | zstd -o $BACKUP_DIR/shared_snapshots.tar.zst
cd $BACKUP_DIR
mkdir parts
split -b 1G shared_snapshots.tar.zst ./parts/shared_snapshots.tar.zst.part.
SNAPSHOT_DATE=$(date +%F) # YYYY-MM-DD
cd parts
time aws s3 cp --recursive . s3://${S3_BUCKET}/performance/pagebench/shared-snapshots-${SNAPSHOT_DATE}/
- name: Post to a Slack channel
if: ${{ github.event.schedule && failure() }}
uses: slackapi/slack-github-action@v1
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: "C033QLM5P7D" # dev-staging-stream
channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream
slack-message: "Periodic pagebench testing on dedicated hardware: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
- name: Cleanup Test Resources
if: always()
if: always()
shell: bash -euxo pipefail {0}
env:
ARCHIVE: ${{ runner.temp }}/downloads/neon-${{ runner.os }}-${{ runner.arch }}-release-artifact.tar.zst
run: |
curl -k -X 'POST' \
"${EC2_MACHINE_URL_US}/cleanup_test/${GITHUB_RUN_ID}" \
-H 'accept: application/json' \
-H "Authorization: Bearer $API_KEY" \
-d ''
# Cleanup the test resources
if [[ -d "${BACKUP_DIR}" ]]; then
rm -rf ${BACKUP_DIR}
fi
if [[ -d "${TEST_OUTPUT}" ]]; then
rm -rf ${TEST_OUTPUT}
fi
if [[ -d "${NEON_DIR}" ]]; then
rm -rf ${NEON_DIR}
fi
rm -rf $(dirname $ARCHIVE)
- name: Stop EC2 instance and wait for the instance to be stopped
if: always() && steps.poll_step.outputs.too_many_runs != 'true'
run: |
aws ec2 stop-instances --instance-ids $AWS_INSTANCE_ID
aws ec2 wait instance-stopped --instance-ids $AWS_INSTANCE_ID

View File

@@ -12,8 +12,8 @@ on:
pull_request:
paths:
- '.github/workflows/pg-clients.yml'
- 'test_runner/pg_clients/**'
- 'test_runner/logical_repl/**'
- 'test_runner/pg_clients/**/*.py'
- 'test_runner/logical_repl/**/*.py'
- 'poetry.lock'
workflow_dispatch:
@@ -25,11 +25,13 @@ defaults:
run:
shell: bash -euxo pipefail {0}
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write # require for posting a status update
env:
DEFAULT_PG_VERSION: 16
DEFAULT_PG_VERSION: 17
PLATFORM: neon-captest-new
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}
AWS_DEFAULT_REGION: eu-central-1
jobs:
@@ -39,15 +41,11 @@ jobs:
with:
github-event-name: ${{ github.event_name }}
check-build-tools-image:
needs: [ check-permissions ]
uses: ./.github/workflows/check-build-tools-image.yml
build-build-tools-image:
needs: [ check-build-tools-image ]
permissions:
packages: write
needs: [ check-permissions ]
uses: ./.github/workflows/build-build-tools-image.yml
with:
image-tag: ${{ needs.check-build-tools-image.outputs.image-tag }}
secrets: inherit
test-logical-replication:
@@ -55,10 +53,10 @@ jobs:
runs-on: ubuntu-22.04
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init --user root
services:
clickhouse:
@@ -92,7 +90,12 @@ jobs:
ports:
- 8083:8083
steps:
- uses: actions/checkout@v4
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Download Neon artifact
uses: ./.github/actions/download
@@ -100,6 +103,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -107,6 +111,8 @@ jobs:
with:
api_key: ${{ secrets.NEON_STAGING_API_KEY }}
postgres_version: ${{ env.DEFAULT_PG_VERSION }}
project_settings: >-
{"enable_logical_replication": true}
- name: Run tests
uses: ./.github/actions/run-python-test-set
@@ -116,6 +122,7 @@ jobs:
run_in_parallel: false
extra_params: -m remote_cluster
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
@@ -132,12 +139,13 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
- name: Post to a Slack channel
if: github.event.schedule && failure()
uses: slackapi/slack-github-action@v1
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream
slack-message: |
@@ -150,14 +158,19 @@ jobs:
runs-on: ubuntu-22.04
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init --user root
steps:
- uses: actions/checkout@v4
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Download Neon artifact
uses: ./.github/actions/download
@@ -165,6 +178,7 @@ jobs:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Create Neon Project
id: create-neon-project
@@ -181,6 +195,7 @@ jobs:
run_in_parallel: false
extra_params: -m remote_cluster
pg_version: ${{ env.DEFAULT_PG_VERSION }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}
@@ -197,12 +212,13 @@ jobs:
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}
- name: Post to a Slack channel
if: github.event.schedule && failure()
uses: slackapi/slack-github-action@v1
uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1
with:
channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream
slack-message: |

View File

@@ -33,10 +33,6 @@ concurrency:
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
env:
FROM_TAG: ${{ inputs.from-tag }}
TO_TAG: pinned
jobs:
check-manifests:
runs-on: ubuntu-22.04
@@ -44,13 +40,21 @@ jobs:
skip: ${{ steps.check-manifests.outputs.skip }}
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Check if we really need to pin the image
id: check-manifests
env:
FROM_TAG: ${{ inputs.from-tag }}
TO_TAG: pinned
run: |
docker manifest inspect neondatabase/build-tools:${FROM_TAG} > ${FROM_TAG}.json
docker manifest inspect neondatabase/build-tools:${TO_TAG} > ${TO_TAG}.json
docker manifest inspect "ghcr.io/neondatabase/build-tools:${FROM_TAG}" > "${FROM_TAG}.json"
docker manifest inspect "ghcr.io/neondatabase/build-tools:${TO_TAG}" > "${TO_TAG}.json"
if diff ${FROM_TAG}.json ${TO_TAG}.json; then
if diff "${FROM_TAG}.json" "${TO_TAG}.json"; then
skip=true
else
skip=false
@@ -64,38 +68,36 @@ jobs:
# use format(..) to catch both inputs.force = true AND inputs.force = 'true'
if: needs.check-manifests.outputs.skip == 'false' || format('{0}', inputs.force) == 'true'
runs-on: ubuntu-22.04
permissions:
id-token: write # for `azure/login`
id-token: write # Required for aws/azure login
packages: write # required for pushing to GHCR
steps:
- uses: docker/login-action@v3
with:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- uses: docker/login-action@v3
with:
registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
- name: Azure login
uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a # @v2.1.1
with:
client-id: ${{ secrets.AZURE_DEV_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_DEV_SUBSCRIPTION_ID }}
- name: Login to ACR
run: |
az acr login --name=neoneastus2
- name: Tag build-tools with `${{ env.TO_TAG }}` in Docker Hub, ECR, and ACR
run: |
docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG} \
-t neoneastus2.azurecr.io/neondatabase/build-tools:${TO_TAG} \
-t neondatabase/build-tools:${TO_TAG} \
neondatabase/build-tools:${FROM_TAG}
uses: ./.github/workflows/_push-to-container-registry.yml
with:
image-map: |
{
"ghcr.io/neondatabase/build-tools:${{ inputs.from-tag }}-bullseye": [
"docker.io/neondatabase/build-tools:pinned-bullseye",
"ghcr.io/neondatabase/build-tools:pinned-bullseye",
"${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/build-tools:pinned-bullseye",
"${{ vars.AZURE_DEV_REGISTRY_NAME }}.azurecr.io/neondatabase/build-tools:pinned-bullseye"
],
"ghcr.io/neondatabase/build-tools:${{ inputs.from-tag }}-bookworm": [
"docker.io/neondatabase/build-tools:pinned-bookworm",
"docker.io/neondatabase/build-tools:pinned",
"ghcr.io/neondatabase/build-tools:pinned-bookworm",
"ghcr.io/neondatabase/build-tools:pinned",
"${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/build-tools:pinned-bookworm",
"${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/build-tools:pinned",
"${{ vars.AZURE_DEV_REGISTRY_NAME }}.azurecr.io/neondatabase/build-tools:pinned-bookworm",
"${{ vars.AZURE_DEV_REGISTRY_NAME }}.azurecr.io/neondatabase/build-tools:pinned"
]
}
aws-region: ${{ vars.AWS_ECR_REGION }}
aws-account-id: "${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}"
aws-role-to-assume: "gha-oidc-neon-admin"
azure-client-id: ${{ vars.AZURE_DEV_CLIENT_ID }}
azure-subscription-id: ${{ vars.AZURE_DEV_SUBSCRIPTION_ID }}
azure-tenant-id: ${{ vars.AZURE_TENANT_ID }}
acr-registry-name: ${{ vars.AZURE_DEV_REGISTRY_NAME }}
secrets: inherit

177
.github/workflows/pre-merge-checks.yml vendored Normal file
View File

@@ -0,0 +1,177 @@
name: Pre-merge checks
on:
pull_request:
paths:
- .github/workflows/_check-codestyle-python.yml
- .github/workflows/_check-codestyle-rust.yml
- .github/workflows/build-build-tools-image.yml
- .github/workflows/pre-merge-checks.yml
merge_group:
defaults:
run:
shell: bash -euxo pipefail {0}
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
jobs:
meta:
runs-on: ubuntu-22.04
permissions:
contents: read
outputs:
python-changed: ${{ steps.python-src.outputs.any_changed }}
rust-changed: ${{ steps.rust-src.outputs.any_changed }}
branch: ${{ steps.group-metadata.outputs.branch }}
pr-number: ${{ steps.group-metadata.outputs.pr-number }}
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: tj-actions/changed-files@ed68ef82c095e0d48ec87eccea555d944a631a4c # v46.0.5
id: python-src
with:
files: |
.github/workflows/_check-codestyle-python.yml
.github/workflows/build-build-tools-image.yml
.github/workflows/pre-merge-checks.yml
**/**.py
poetry.lock
pyproject.toml
- uses: tj-actions/changed-files@ed68ef82c095e0d48ec87eccea555d944a631a4c # v46.0.5
id: rust-src
with:
files: |
.github/workflows/_check-codestyle-rust.yml
.github/workflows/build-build-tools-image.yml
.github/workflows/pre-merge-checks.yml
**/**.rs
**/Cargo.toml
Cargo.toml
Cargo.lock
- name: PRINT ALL CHANGED FILES FOR DEBUG PURPOSES
env:
PYTHON_CHANGED_FILES: ${{ steps.python-src.outputs.all_changed_files }}
RUST_CHANGED_FILES: ${{ steps.rust-src.outputs.all_changed_files }}
run: |
echo "${PYTHON_CHANGED_FILES}"
echo "${RUST_CHANGED_FILES}"
- name: Merge group metadata
if: ${{ github.event_name == 'merge_group' }}
id: group-metadata
env:
MERGE_QUEUE_REF: ${{ github.event.merge_group.head_ref }}
run: |
echo $MERGE_QUEUE_REF | jq -Rr 'capture("refs/heads/gh-readonly-queue/(?<branch>.*)/pr-(?<pr_number>[0-9]+)-[0-9a-f]{40}") | ["branch=" + .branch, "pr-number=" + .pr_number] | .[]' | tee -a "${GITHUB_OUTPUT}"
build-build-tools-image:
if: |
false
|| needs.meta.outputs.python-changed == 'true'
|| needs.meta.outputs.rust-changed == 'true'
needs: [ meta ]
permissions:
contents: read
packages: write
uses: ./.github/workflows/build-build-tools-image.yml
with:
# Build only one combination to save time
archs: '["x64"]'
debians: '["bookworm"]'
secrets: inherit
check-codestyle-python:
if: needs.meta.outputs.python-changed == 'true'
needs: [ meta, build-build-tools-image ]
permissions:
contents: read
packages: read
uses: ./.github/workflows/_check-codestyle-python.yml
with:
# `-bookworm-x64` suffix should match the combination in `build-build-tools-image`
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64
secrets: inherit
check-codestyle-rust:
if: needs.meta.outputs.rust-changed == 'true'
needs: [ meta, build-build-tools-image ]
permissions:
contents: read
packages: read
uses: ./.github/workflows/_check-codestyle-rust.yml
with:
# `-bookworm-x64` suffix should match the combination in `build-build-tools-image`
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64
archs: '["x64"]'
secrets: inherit
# To get items from the merge queue merged into main we need to satisfy "Status checks that are required".
# Currently we require 2 jobs (checks with exact name):
# - conclusion
# - neon-cloud-e2e
conclusion:
# Do not run job on Pull Requests as it interferes with the `conclusion` job from the `build_and_test` workflow
if: always() && github.event_name == 'merge_group'
permissions:
statuses: write # for `github.repos.createCommitStatus(...)`
contents: write
needs:
- meta
- check-codestyle-python
- check-codestyle-rust
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Create fake `neon-cloud-e2e` check
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
with:
# Retry script for 5XX server errors: https://github.com/actions/github-script#retries
retries: 5
script: |
const { repo, owner } = context.repo;
const targetUrl = `${context.serverUrl}/${owner}/${repo}/actions/runs/${context.runId}`;
await github.rest.repos.createCommitStatus({
owner: owner,
repo: repo,
sha: context.sha,
context: `neon-cloud-e2e`,
state: `success`,
target_url: targetUrl,
description: `fake check for merge queue`,
});
- name: Fail the job if any of the dependencies do not succeed or skipped
run: exit 1
if: |
false
|| (github.event_name == 'merge_group' && needs.meta.outputs.branch != 'main')
|| (needs.check-codestyle-python.result == 'skipped' && needs.meta.outputs.python-changed == 'true')
|| (needs.check-codestyle-rust.result == 'skipped' && needs.meta.outputs.rust-changed == 'true')
|| contains(needs.*.result, 'failure')
|| contains(needs.*.result, 'cancelled')
- name: Add fast-forward label to PR to trigger fast-forward merge
if: >-
${{
always()
&& github.event_name == 'merge_group'
&& contains(fromJSON('["release", "release-proxy", "release-compute"]'), needs.meta.outputs.branch)
}}
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: >-
gh pr edit ${{ needs.meta.outputs.pr-number }} --repo "${GITHUB_REPOSITORY}" --add-label "fast-forward"

84
.github/workflows/proxy-benchmark.yml vendored Normal file
View File

@@ -0,0 +1,84 @@
name: Periodic proxy performance test on unit-perf hetzner runner
on:
push: # TODO: remove after testing
branches:
- test-proxy-bench # Runs on pushes to branches starting with test-proxy-bench
# schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
# - cron: '0 5 * * *' # Runs at 5 UTC once a day
workflow_dispatch: # adds an ability to run this manually
defaults:
run:
shell: bash -euo pipefail {0}
concurrency:
group: ${{ github.workflow }}
cancel-in-progress: false
permissions:
contents: read
jobs:
run_periodic_proxybench_test:
permissions:
id-token: write # aws-actions/configure-aws-credentials
statuses: write
contents: write
pull-requests: write
runs-on: [self-hosted, unit-perf]
timeout-minutes: 60 # 1h timeout
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Checkout proxy-bench Repo
uses: actions/checkout@v4
with:
repository: neondatabase/proxy-bench
path: proxy-bench
- name: Set up the environment which depends on $RUNNER_TEMP on nvme drive
id: set-env
shell: bash -euxo pipefail {0}
run: |
PROXY_BENCH_PATH=$(realpath ./proxy-bench)
{
echo "PROXY_BENCH_PATH=$PROXY_BENCH_PATH"
echo "NEON_DIR=${RUNNER_TEMP}/neon"
echo "TEST_OUTPUT=${PROXY_BENCH_PATH}/test_output"
echo ""
} >> "$GITHUB_ENV"
- name: Run proxy-bench
run: ${PROXY_BENCH_PATH}/run.sh
- name: Ingest Bench Results # neon repo script
if: always()
run: |
mkdir -p $TEST_OUTPUT
python $NEON_DIR/scripts/proxy_bench_results_ingest.py --out $TEST_OUTPUT
- name: Push Metrics to Proxy perf database
if: always()
env:
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PROXY_TEST_RESULT_CONNSTR }}"
REPORT_FROM: $TEST_OUTPUT
run: $NEON_DIR/scripts/generate_and_push_perf_report.sh
- name: Docker cleanup
if: always()
run: docker compose down
- name: Notify Failure
if: failure()
run: echo "Proxy bench job failed" && exit 1

93
.github/workflows/random-ops-test.yml vendored Normal file
View File

@@ -0,0 +1,93 @@
name: Random Operations Test
on:
schedule:
# * is a special character in YAML so you have to quote this string
# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of the month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)
# │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)
- cron: '23 */2 * * *' # runs every 2 hours
workflow_dispatch:
inputs:
random_seed:
type: number
description: 'The random seed'
required: false
default: 0
num_operations:
type: number
description: "The number of operations to test"
default: 250
defaults:
run:
shell: bash -euxo pipefail {0}
permissions: {}
env:
DEFAULT_PG_VERSION: 16
PLATFORM: neon-captest-new
AWS_DEFAULT_REGION: eu-central-1
jobs:
run-random-rests:
env:
POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install
runs-on: small
permissions:
id-token: write
statuses: write
strategy:
fail-fast: false
matrix:
pg-version: [16, 17]
container:
image: ghcr.io/neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
options: --init
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Download Neon artifact
uses: ./.github/actions/download
with:
name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact
path: /tmp/neon/
prefix: latest
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
- name: Run tests
uses: ./.github/actions/run-python-test-set
with:
build_type: remote
test_selection: random_ops
run_in_parallel: false
extra_params: -m remote_cluster
pg_version: ${{ matrix.pg-version }}
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
NEON_API_KEY: ${{ secrets.NEON_STAGING_API_KEY }}
RANDOM_SEED: ${{ inputs.random_seed }}
NUM_OPERATIONS: ${{ inputs.num_operations }}
- name: Create Allure report
if: ${{ !cancelled() }}
id: create-allure-report
uses: ./.github/actions/allure-report-generate
with:
store-test-results-into-db: true
aws-oidc-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
env:
REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

View File

@@ -0,0 +1,46 @@
name: Regenerate Postgres Settings
on:
pull_request:
types:
- opened
- synchronize
- reopened
paths:
- pgxn/neon/**.c
- vendor/postgres-v*
- vendor/revisions.json
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref }}
cancel-in-progress: true
permissions:
pull-requests: write
jobs:
regenerate-pg-settings:
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Add comment
uses: thollander/actions-comment-pull-request@65f9e5c9a1f2cd378bd74b2e057c9736982a8e74 # v3
with:
comment-tag: ${{ github.job }}
pr-number: ${{ github.event.number }}
message: |
If this PR added a GUC in the Postgres fork or `neon` extension,
please regenerate the Postgres settings in the `cloud` repo:
```
make NEON_WORKDIR=path/to/neon/checkout \
-C goapp/internal/shareddomain/postgres generate
```
If you're an external contributor, a Neon employee will assist in
making sure this step is done.

12
.github/workflows/release-compute.yml vendored Normal file
View File

@@ -0,0 +1,12 @@
name: Create compute release PR
on:
schedule:
- cron: '0 7 * * FRI'
jobs:
create-release-pr:
uses: ./.github/workflows/release.yml
with:
component: compute
secrets: inherit

View File

@@ -22,7 +22,12 @@ jobs:
runs-on: ubuntu-22.04
steps:
- uses: neondatabase/dev-actions/release-pr-notify@main
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- uses: neondatabase/dev-actions/release-pr-notify@483a843f2a8bcfbdc4c69d27630528a3ddc4e14b # main
with:
slack-token: ${{ secrets.SLACK_BOT_TOKEN }}
slack-channel-id: ${{ vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID || 'C05QQ9J1BRC' }} # if not set, then `#test-release-notifications`

12
.github/workflows/release-proxy.yml vendored Normal file
View File

@@ -0,0 +1,12 @@
name: Create proxy release PR
on:
schedule:
- cron: '0 6 * * TUE'
jobs:
create-release-pr:
uses: ./.github/workflows/release.yml
with:
component: proxy
secrets: inherit

12
.github/workflows/release-storage.yml vendored Normal file
View File

@@ -0,0 +1,12 @@
name: Create storage release PR
on:
schedule:
- cron: '0 6 * * FRI'
jobs:
create-release-pr:
uses: ./.github/workflows/release.yml
with:
component: storage
secrets: inherit

View File

@@ -1,20 +1,34 @@
name: Create Release Branch
name: Create release PR
on:
schedule:
# It should be kept in sync with if-condition in jobs
- cron: '0 6 * * MON' # Storage release
- cron: '0 6 * * THU' # Proxy release
workflow_dispatch:
inputs:
create-storage-release-branch:
type: boolean
description: 'Create Storage release PR'
component:
description: "Component to release"
required: true
type: choice
options:
- compute
- proxy
- storage
cherry-pick:
description: "Commits to cherry-pick (space separated, makes this a hotfix based on previous release)"
required: false
create-proxy-release-branch:
type: boolean
description: 'Create Proxy release PR'
type: string
default: ''
workflow_call:
inputs:
component:
description: "Component to release"
required: true
type: string
cherry-pick:
description: "Commits to cherry-pick (space separated, makes this a hotfix based on previous release)"
required: false
type: string
default: ''
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
@@ -24,84 +38,31 @@ defaults:
shell: bash -euo pipefail {0}
jobs:
create-storage-release-branch:
if: ${{ github.event.schedule == '0 6 * * MON' || format('{0}', inputs.create-storage-release-branch) == 'true' }}
create-release-pr:
runs-on: ubuntu-22.04
permissions:
contents: write # for `git push`
contents: write
steps:
- name: Check out code
uses: actions/checkout@v4
with:
ref: main
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Set environment variables
run: |
echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV
echo "RELEASE_BRANCH=rc/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- name: Create release branch
run: git checkout -b $RELEASE_BRANCH
- name: Configure git
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
- name: Push new branch
run: git push origin $RELEASE_BRANCH
- name: Create pull request into release
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: |
TITLE="Storage & Compute release ${RELEASE_DATE}"
cat << EOF > body.md
## ${TITLE}
**Please merge this Pull Request using 'Create a merge commit' button**
EOF
gh pr create --title "${TITLE}" \
--body-file "body.md" \
--head "${RELEASE_BRANCH}" \
--base "release"
create-proxy-release-branch:
if: ${{ github.event.schedule == '0 6 * * THU' || format('{0}', inputs.create-proxy-release-branch) == 'true' }}
runs-on: ubuntu-22.04
permissions:
contents: write # for `git push`
steps:
- name: Check out code
uses: actions/checkout@v4
with:
ref: main
- name: Set environment variables
run: |
echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV
echo "RELEASE_BRANCH=rc/proxy/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV
- name: Create release branch
run: git checkout -b $RELEASE_BRANCH
- name: Push new branch
run: git push origin $RELEASE_BRANCH
- name: Create pull request into release
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
run: |
TITLE="Proxy release ${RELEASE_DATE}"
cat << EOF > body.md
## ${TITLE}
**Please merge this Pull Request using 'Create a merge commit' button**
EOF
gh pr create --title "${TITLE}" \
--body-file "body.md" \
--head "${RELEASE_BRANCH}" \
--base "release-proxy"
- name: Create release PR
uses: neondatabase/dev-actions/release-pr@290dec821d86fa8a93f019e8c69720f5865b5677
with:
component: ${{ inputs.component }}
cherry-pick: ${{ inputs.cherry-pick }}
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

View File

@@ -0,0 +1,71 @@
name: Report Workflow Stats Batch
on:
schedule:
- cron: '*/15 * * * *'
- cron: '25 0 * * *'
- cron: '25 1 * * 6'
permissions:
contents: read
jobs:
gh-workflow-stats-batch-2h:
name: GitHub Workflow Stats Batch 2 hours
if: github.event.schedule == '*/15 * * * *'
runs-on: ubuntu-22.04
permissions:
actions: read
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Export Workflow Run for the past 2 hours
uses: neondatabase/gh-workflow-stats-action@701b1f202666d0b82e67b4d387e909af2b920127 # v0.2.2
with:
db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
db_table: "gh_workflow_stats_neon"
gh_token: ${{ secrets.GITHUB_TOKEN }}
duration: '2h'
gh-workflow-stats-batch-48h:
name: GitHub Workflow Stats Batch 48 hours
if: github.event.schedule == '25 0 * * *'
runs-on: ubuntu-22.04
permissions:
actions: read
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Export Workflow Run for the past 48 hours
uses: neondatabase/gh-workflow-stats-action@701b1f202666d0b82e67b4d387e909af2b920127 # v0.2.2
with:
db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
db_table: "gh_workflow_stats_neon"
gh_token: ${{ secrets.GITHUB_TOKEN }}
duration: '48h'
gh-workflow-stats-batch-30d:
name: GitHub Workflow Stats Batch 30 days
if: github.event.schedule == '25 1 * * 6'
runs-on: ubuntu-22.04
permissions:
actions: read
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0
with:
egress-policy: audit
- name: Export Workflow Run for the past 30 days
uses: neondatabase/gh-workflow-stats-action@701b1f202666d0b82e67b4d387e909af2b920127 # v0.2.2
with:
db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
db_table: "gh_workflow_stats_neon"
gh_token: ${{ secrets.GITHUB_TOKEN }}
duration: '720h'

View File

@@ -5,6 +5,13 @@ on:
types:
- ready_for_review
workflow_call:
inputs:
github-event-name:
type: string
required: true
github-event-json:
type: string
required: true
defaults:
run:
@@ -15,11 +22,23 @@ env:
E2E_CONCURRENCY_GROUP: ${{ github.repository }}-e2e-tests-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}
jobs:
check-permissions:
if: ${{ !contains(github.event.pull_request.labels.*.name, 'run-no-ci') }}
uses: ./.github/workflows/check-permissions.yml
with:
github-event-name: ${{ inputs.github-event-name || github.event_name }}
cancel-previous-e2e-tests:
needs: [ check-permissions ]
if: github.event_name == 'pull_request'
runs-on: ubuntu-22.04
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Cancel previous e2e-tests runs for this PR
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
@@ -28,45 +47,37 @@ jobs:
run cancel-previous-in-concurrency-group.yml \
--field concurrency_group="${{ env.E2E_CONCURRENCY_GROUP }}"
tag:
runs-on: ubuntu-22.04
outputs:
build-tag: ${{ steps.build-tag.outputs.tag }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get build tag
env:
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
CURRENT_BRANCH: ${{ github.head_ref || github.ref_name }}
CURRENT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
run: |
if [[ "$GITHUB_REF_NAME" == "main" ]]; then
echo "tag=$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
echo "tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT
elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then
echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT
else
echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"
BUILD_AND_TEST_RUN_ID=$(gh run list -b $CURRENT_BRANCH -c $CURRENT_SHA -w 'Build and Test' -L 1 --json databaseId --jq '.[].databaseId')
echo "tag=$BUILD_AND_TEST_RUN_ID" | tee -a $GITHUB_OUTPUT
fi
id: build-tag
meta:
uses: ./.github/workflows/_meta.yml
with:
github-event-name: ${{ inputs.github-event-name || github.event_name }}
github-event-json: ${{ inputs.github-event-json || toJSON(github.event) }}
trigger-e2e-tests:
needs: [ tag ]
needs: [ meta ]
runs-on: ubuntu-22.04
env:
EVENT_ACTION: ${{ github.event.action }}
GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}
TAG: ${{ needs.tag.outputs.build-tag }}
TAG: >-
${{
contains(fromJSON('["compute-release", "compute-rc-pr"]'), needs.meta.outputs.run-kind)
&& needs.meta.outputs.previous-storage-release
|| needs.meta.outputs.build-tag
}}
COMPUTE_TAG: >-
${{
contains(fromJSON('["storage-release", "storage-rc-pr", "proxy-release", "proxy-rc-pr"]'), needs.meta.outputs.run-kind)
&& needs.meta.outputs.previous-compute-release
|| needs.meta.outputs.build-tag
}}
steps:
- name: Wait for `promote-images` job to finish
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@0634a2670c59f64b4a01f0f96f84700a4088b9f0 # v2.12.0
with:
egress-policy: audit
- name: Wait for `push-{neon,compute}-image-dev` job to finish
# It's important to have a timeout here, the script in the step can run infinitely
timeout-minutes: 60
run: |
@@ -77,20 +88,20 @@ jobs:
# For PRs we use the run id as the tag
BUILD_AND_TEST_RUN_ID=${TAG}
while true; do
conclusion=$(gh run --repo ${GITHUB_REPOSITORY} view ${BUILD_AND_TEST_RUN_ID} --json jobs --jq '.jobs[] | select(.name == "promote-images") | .conclusion')
case "$conclusion" in
success)
break
;;
failure | cancelled | skipped)
echo "The 'promote-images' job didn't succeed: '${conclusion}'. Exiting..."
exit 1
;;
*)
echo "The 'promote-images' hasn't succeed yet. Waiting..."
sleep 60
;;
esac
gh run --repo ${GITHUB_REPOSITORY} view ${BUILD_AND_TEST_RUN_ID} --json jobs --jq '[.jobs[] | select((.name | startswith("push-neon-image-dev")) or (.name | startswith("push-compute-image-dev"))) | {"name": .name, "conclusion": .conclusion, "url": .url}]' > jobs.json
if [ $(jq '[.[] | select(.conclusion == "success")] | length' jobs.json) -eq 2 ]; then
break
fi
jq -c '.[]' jobs.json | while read -r job; do
case $(echo $job | jq .conclusion) in
failure | cancelled | skipped)
echo "The '$(echo $job | jq .name)' job didn't succeed: '$(echo $job | jq .conclusion)'. See log in '$(echo $job | jq .url)' Exiting..."
exit 1
;;
esac
done
echo "The 'push-{neon,compute}-image-dev' jobs haven't succeeded yet. Waiting..."
sleep 60
done
- name: Set e2e-platforms
@@ -102,12 +113,17 @@ jobs:
# Default set of platforms to run e2e tests on
platforms='["docker", "k8s"]'
# If the PR changes vendor/, pgxn/ or libs/vm_monitor/ directories, or Dockerfile.compute-node, add k8s-neonvm to the list of platforms.
# If a PR changes anything that affects computes, add k8s-neonvm to the list of platforms.
# If the workflow run is not a pull request, add k8s-neonvm to the list.
if [ "$GITHUB_EVENT_NAME" == "pull_request" ]; then
for f in $(gh api "/repos/${GITHUB_REPOSITORY}/pulls/${PR_NUMBER}/files" --paginate --jq '.[].filename'); do
case "$f" in
vendor/*|pgxn/*|libs/vm_monitor/*|Dockerfile.compute-node)
# List of directories that contain code which affect compute images.
#
# This isn't exhaustive, just the paths that are most directly compute-related.
# For example, compute_ctl also depends on libs/utils, but we don't trigger
# an e2e run on that.
vendor/*|pgxn/*|compute_tools/*|libs/vm_monitor/*|compute/compute-node.Dockerfile)
platforms=$(echo "${platforms}" | jq --compact-output '. += ["k8s-neonvm"] | unique')
;;
*)
@@ -142,6 +158,6 @@ jobs:
--raw-field "commit_hash=$COMMIT_SHA" \
--raw-field "remote_repo=${GITHUB_REPOSITORY}" \
--raw-field "storage_image_tag=${TAG}" \
--raw-field "compute_image_tag=${TAG}" \
--raw-field "compute_image_tag=${COMPUTE_TAG}" \
--raw-field "concurrency_group=${E2E_CONCURRENCY_GROUP}" \
--raw-field "e2e-platforms=${E2E_PLATFORMS}"

4
.gitignore vendored
View File

@@ -1,3 +1,5 @@
/artifact_cache
/build
/pg_install
/target
/tmp_check
@@ -6,6 +8,8 @@ __pycache__/
test_output/
.vscode
.idea
*.swp
tags
neon.iml
/.neon
/integration_tests/.neon

4
.gitmodules vendored
View File

@@ -10,3 +10,7 @@
path = vendor/postgres-v16
url = https://github.com/neondatabase/postgres.git
branch = REL_16_STABLE_neon
[submodule "vendor/postgres-v17"]
path = vendor/postgres-v17
url = https://github.com/neondatabase/postgres.git
branch = REL_17_STABLE_neon

View File

@@ -1,13 +1,29 @@
/compute_tools/ @neondatabase/control-plane @neondatabase/compute
# Autoscaling
/libs/vm_monitor/ @neondatabase/autoscaling
# DevProd & PerfCorr
/.github/ @neondatabase/developer-productivity @neondatabase/performance-correctness
# Compute
/pgxn/ @neondatabase/compute
/vendor/ @neondatabase/compute
/compute/ @neondatabase/compute
/compute_tools/ @neondatabase/compute
# Proxy
/libs/proxy/ @neondatabase/proxy
/proxy/ @neondatabase/proxy
# Storage
/pageserver/ @neondatabase/storage
/safekeeper/ @neondatabase/storage
/storage_controller @neondatabase/storage
/storage_scrubber @neondatabase/storage
/libs/pageserver_api/ @neondatabase/storage
/libs/postgres_ffi/ @neondatabase/compute @neondatabase/storage
/libs/remote_storage/ @neondatabase/storage
/libs/safekeeper_api/ @neondatabase/storage
/libs/vm_monitor/ @neondatabase/autoscaling
/pageserver/ @neondatabase/storage
/pgxn/ @neondatabase/compute
# Shared
/pgxn/neon/ @neondatabase/compute @neondatabase/storage
/proxy/ @neondatabase/proxy
/safekeeper/ @neondatabase/storage
/vendor/ @neondatabase/compute
/libs/compute_api/ @neondatabase/compute @neondatabase/control-plane
/libs/postgres_ffi/ @neondatabase/compute @neondatabase/storage

3718
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -9,21 +9,28 @@ members = [
"pageserver/ctl",
"pageserver/client",
"pageserver/pagebench",
"pageserver/page_api",
"proxy",
"safekeeper",
"safekeeper/client",
"storage_broker",
"storage_controller",
"storage_controller/client",
"storage_scrubber",
"workspace_hack",
"libs/compute_api",
"libs/http-utils",
"libs/pageserver_api",
"libs/postgres_ffi",
"libs/postgres_ffi_types",
"libs/postgres_versioninfo",
"libs/safekeeper_api",
"libs/desim",
"libs/neon-shmem",
"libs/utils",
"libs/consumption_metrics",
"libs/postgres_backend",
"libs/posthog_client_lite",
"libs/pq_proto",
"libs/tenant_size_model",
"libs/metrics",
@@ -33,52 +40,57 @@ members = [
"libs/postgres_ffi/wal_craft",
"libs/vm_monitor",
"libs/walproposer",
"libs/wal_decoder",
"libs/postgres_initdb",
"libs/proxy/postgres-protocol2",
"libs/proxy/postgres-types2",
"libs/proxy/tokio-postgres2",
"endpoint_storage",
]
[workspace.package]
edition = "2021"
edition = "2024"
license = "Apache-2.0"
## All dependency versions, used in the project
[workspace.dependencies]
ahash = "0.8"
anyhow = { version = "1.0", features = ["backtrace"] }
arc-swap = "1.6"
arc-swap = "1.7"
async-compression = { version = "0.4.0", features = ["tokio", "gzip", "zstd"] }
atomic-take = "1.1.0"
azure_core = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }
azure_identity = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage_blobs = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }
flate2 = "1.0.26"
assert-json-diff = "2"
async-stream = "0.3"
async-trait = "0.1"
aws-config = { version = "1.3", default-features = false, features=["rustls"] }
aws-sdk-s3 = "1.26"
aws-sdk-iam = "1.15.0"
aws-config = { version = "1.5", default-features = false, features=["rustls", "sso"] }
aws-sdk-s3 = "1.52"
aws-sdk-iam = "1.46.0"
aws-sdk-kms = "1.47.0"
aws-smithy-async = { version = "1.2.1", default-features = false, features=["rt-tokio"] }
aws-smithy-types = "1.1.9"
aws-smithy-types = "1.2"
aws-credential-types = "1.2.0"
aws-sigv4 = { version = "1.2.1", features = ["sign-http"] }
aws-types = "1.2.0"
axum = { version = "0.6.20", features = ["ws"] }
base64 = "0.13.0"
aws-sigv4 = { version = "1.2", features = ["sign-http"] }
aws-types = "1.3"
axum = { version = "0.8.1", features = ["ws"] }
axum-extra = { version = "0.10.0", features = ["typed-header", "query"] }
base64 = "0.22"
bincode = "1.3"
bindgen = "0.70"
bindgen = "0.71"
bit_field = "0.10.2"
bstr = "1.0"
byteorder = "1.4"
bytes = "1.0"
bytes = "1.9"
camino = "1.1.6"
cfg-if = "1.0.0"
cron = "0.15"
chrono = { version = "0.4", default-features = false, features = ["clock"] }
clap = { version = "4.0", features = ["derive"] }
clap = { version = "4.0", features = ["derive", "env"] }
clashmap = { version = "1.0", features = ["raw-api"] }
comfy-table = "7.1"
const_format = "0.2"
crc32c = "0.6"
crossbeam-deque = "0.8.5"
crossbeam-utils = "0.8.5"
dashmap = { version = "5.5.0", features = ["raw-api"] }
diatomic-waker = { version = "0.2.3" }
either = "1.8"
enum-map = "2.4.2"
enumset = "1.0.12"
@@ -89,154 +101,183 @@ futures = "0.3"
futures-core = "0.3"
futures-util = "0.3"
git-version = "0.3"
governor = "0.8"
hashbrown = "0.14"
hashlink = "0.9.1"
hdrhistogram = "7.5.2"
hex = "0.4"
hex-literal = "0.4"
hmac = "0.12.1"
hostname = "0.3.1"
hostname = "0.4"
http = {version = "1.1.0", features = ["std"]}
http-types = { version = "2", default-features = false }
humantime = "2.1"
http-body-util = "0.1.2"
humantime = "2.2"
humantime-serde = "1.1.1"
hyper = "0.14"
tokio-tungstenite = "0.20.0"
indexmap = "2"
hyper0 = { package = "hyper", version = "0.14" }
hyper = "1.4"
hyper-util = "0.1"
tokio-tungstenite = "0.21.0"
indexmap = { version = "2", features = ["serde"] }
indoc = "2"
inotify = "0.10.2"
ipnet = "2.9.0"
ipnet = "2.10.0"
itertools = "0.10"
itoa = "1.0.11"
jemalloc_pprof = { version = "0.7", features = ["symbolize", "flamegraph"] }
jsonwebtoken = "9"
lasso = "0.7"
libc = "0.2"
md5 = "0.7.0"
measured = { version = "0.0.22", features=["lasso"] }
measured-process = { version = "0.0.22" }
memoffset = "0.8"
nix = { version = "0.27", features = ["dir", "fs", "process", "socket", "signal", "poll"] }
memoffset = "0.9"
nix = { version = "0.30.1", features = ["dir", "fs", "mman", "process", "socket", "signal", "poll"] }
# Do not update to >= 7.0.0, at least. The update will have a significant impact
# on compute startup metrics (start_postgres_ms), >= 25% degradation.
notify = "6.0.0"
num_cpus = "1.15"
num-traits = "0.2.15"
num-traits = "0.2.19"
once_cell = "1.13"
opentelemetry = "0.20.0"
opentelemetry-otlp = { version = "0.13.0", default-features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }
opentelemetry-semantic-conventions = "0.12.0"
opentelemetry = "0.27"
opentelemetry_sdk = "0.27"
opentelemetry-otlp = { version = "0.27", default-features = false, features = ["http-proto", "trace", "http", "reqwest-client"] }
opentelemetry-semantic-conventions = "0.27"
parking_lot = "0.12"
parquet = { version = "53", default-features = false, features = ["zstd"] }
parquet_derive = "53"
pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }
pem = "3.0.3"
pin-project-lite = "0.2"
pprof = { version = "0.14", features = ["criterion", "flamegraph", "frame-pointer", "prost-codec"] }
procfs = "0.16"
prometheus = {version = "0.13", default-features=false, features = ["process"]} # removes protobuf dependency
prost = "0.11"
prost = "0.13.5"
prost-types = "0.13.5"
rand = "0.8"
redis = { version = "0.25.2", features = ["tokio-rustls-comp", "keep-alive"] }
redis = { version = "0.29.2", features = ["tokio-rustls-comp", "keep-alive"] }
regex = "1.10.2"
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls"] }
reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_20"] }
reqwest-middleware = "0.3.0"
reqwest-retry = "0.5"
reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_27"] }
reqwest-middleware = "0.4"
reqwest-retry = "0.7"
routerify = "3"
rpds = "0.13"
rustc-hash = "1.1.0"
rustls = "0.22"
rustls = { version = "0.23.16", default-features = false }
rustls-pemfile = "2"
rustls-split = "0.3"
rustls-pki-types = "1.11"
scopeguard = "1.1"
sysinfo = "0.29.2"
sd-notify = "0.4.1"
send-future = "0.1.0"
sentry = { version = "0.32", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }
sentry = { version = "0.37", default-features = false, features = ["backtrace", "contexts", "panic", "rustls", "reqwest" ] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1"
serde_path_to_error = "0.1"
serde_with = "2.0"
serde_with = { version = "3", features = [ "base64" ] }
serde_assert = "0.5.0"
serde_repr = "0.1.20"
sha2 = "0.10.2"
signal-hook = "0.3"
smallvec = "1.11"
smol_str = { version = "0.2.0", features = ["serde"] }
socket2 = "0.5"
spki = "0.7.3"
strum = "0.26"
strum_macros = "0.26"
"subtle" = "2.5.0"
svg_fmt = "0.4.3"
sync_wrapper = "0.1.2"
tar = "0.4"
task-local-extensions = "0.1.4"
test-context = "0.3"
thiserror = "1.0"
tikv-jemallocator = "0.5"
tikv-jemalloc-ctl = "0.5"
tokio = { version = "1.17", features = ["macros"] }
tikv-jemallocator = { version = "0.6", features = ["profiling", "stats", "unprefixed_malloc_on_supported_platforms"] }
tikv-jemalloc-ctl = { version = "0.6", features = ["stats"] }
tokio = { version = "1.43.1", features = ["macros"] }
tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }
tokio-io-timeout = "1.2.0"
tokio-postgres-rustls = "0.11.0"
tokio-rustls = "0.25"
tokio-postgres-rustls = "0.12.0"
tokio-rustls = { version = "0.26.0", default-features = false, features = ["tls12", "ring"]}
tokio-stream = "0.1"
tokio-tar = "0.3"
tokio-util = { version = "0.7.10", features = ["io", "rt"] }
tokio-util = { version = "0.7.10", features = ["io", "io-util", "rt"] }
toml = "0.8"
toml_edit = "0.22"
tonic = {version = "0.9", features = ["tls", "tls-roots"]}
tower-service = "0.3.2"
tonic = { version = "0.13.1", default-features = false, features = ["channel", "codegen", "gzip", "prost", "router", "server", "tls-ring", "tls-native-roots", "zstd"] }
tonic-reflection = { version = "0.13.1", features = ["server"] }
tower = { version = "0.5.2", default-features = false }
tower-http = { version = "0.6.2", features = ["auth", "request-id", "trace"] }
# This revision uses opentelemetry 0.27. There's no tag for it.
tower-otel = { git = "https://github.com/mattiapenati/tower-otel", rev = "56a7321053bcb72443888257b622ba0d43a11fcd" }
tower-service = "0.3.3"
tracing = "0.1"
tracing-error = "0.2.0"
tracing-opentelemetry = "0.21.0"
tracing-error = "0.2"
tracing-log = "0.2"
tracing-opentelemetry = "0.28"
tracing-serde = "0.2.0"
tracing-subscriber = { version = "0.3", default-features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter", "json"] }
try-lock = "0.2.5"
test-log = { version = "0.2.17", default-features = false, features = ["log"] }
twox-hash = { version = "1.6.3", default-features = false }
typed-json = "0.1"
url = "2.2"
urlencoding = "2.1"
uuid = { version = "1.6.1", features = ["v4", "v7", "serde"] }
walkdir = "2.3.2"
rustls-native-certs = "0.7"
x509-parser = "0.15"
rustls-native-certs = "0.8"
whoami = "1.5.1"
zerocopy = { version = "0.8", features = ["derive", "simd"] }
json-structural-diff = { version = "0.2.0" }
x509-cert = { version = "0.2.5" }
## TODO replace this with tracing
env_logger = "0.10"
env_logger = "0.11"
log = "0.4"
## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
# We want to use the 'neon' branch for these, but there's currently one
# incompatible change on the branch. See:
#
# - PR #8076 which contained changes that depended on the new changes in
# the rust-postgres crate, and
# - PR #8654 which reverted those changes and made the code in proxy incompatible
# with the tip of the 'neon' branch again.
#
# When those proxy changes are re-applied (see PR #8747), we can switch using
# the tip of the 'neon' branch again.
postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }
postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }
postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }
## Azure SDK crates
azure_core = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }
azure_identity = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }
azure_storage_blobs = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }
## Local libraries
compute_api = { version = "0.1", path = "./libs/compute_api/" }
consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }
desim = { version = "0.1", path = "./libs/desim" }
endpoint_storage = { version = "0.0.1", path = "./endpoint_storage/" }
http-utils = { version = "0.1", path = "./libs/http-utils/" }
metrics = { version = "0.1", path = "./libs/metrics/" }
pageserver = { path = "./pageserver" }
pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }
pageserver_client = { path = "./pageserver/client" }
pageserver_compaction = { version = "0.1", path = "./pageserver/compaction/" }
pageserver_page_api = { path = "./pageserver/page_api" }
postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }
postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }
postgres_ffi = { version = "0.1", path = "./libs/postgres_ffi/" }
postgres_ffi_types = { version = "0.1", path = "./libs/postgres_ffi_types/" }
postgres_versioninfo = { version = "0.1", path = "./libs/postgres_versioninfo/" }
postgres_initdb = { path = "./libs/postgres_initdb" }
posthog_client_lite = { version = "0.1", path = "./libs/posthog_client_lite" }
pq_proto = { version = "0.1", path = "./libs/pq_proto/" }
remote_storage = { version = "0.1", path = "./libs/remote_storage/" }
safekeeper_api = { version = "0.1", path = "./libs/safekeeper_api" }
desim = { version = "0.1", path = "./libs/desim" }
safekeeper_client = { path = "./safekeeper/client" }
storage_broker = { version = "0.1", path = "./storage_broker/" } # Note: main broker code is inside the binary crate, so linking with the library shouldn't be heavy.
storage_controller_client = { path = "./storage_controller/client" }
tenant_size_model = { version = "0.1", path = "./libs/tenant_size_model/" }
tracing-utils = { version = "0.1", path = "./libs/tracing-utils/" }
utils = { version = "0.1", path = "./libs/utils/" }
vm_monitor = { version = "0.1", path = "./libs/vm_monitor/" }
wal_decoder = { version = "0.1", path = "./libs/wal_decoder" }
walproposer = { version = "0.1", path = "./libs/walproposer/" }
## Common library dependency
@@ -244,21 +285,23 @@ workspace_hack = { version = "0.1", path = "./workspace_hack/" }
## Build dependencies
criterion = "0.5.1"
rcgen = "0.12"
rcgen = "0.13"
rstest = "0.18"
camino-tempfile = "1.0.2"
tonic-build = "0.9"
tonic-build = "0.13.1"
[patch.crates-io]
# Needed to get `tokio-postgres-rustls` to depend on our fork.
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }
tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }
################# Binary contents sections
[profile.release]
# This is useful for profiling and, to some extent, debug.
# Besides, debug info should not affect the performance.
#
# NB: we also enable frame pointers for improved profiling, see .cargo/config.toml.
debug = true
# disable debug symbols for all packages except this one to decrease binaries size

View File

@@ -2,9 +2,33 @@
### The image itself is mainly used as a container for the binaries and for starting e2e tests with custom parameters.
### By default, the binaries inside the image have some mock parameters and can start, but are not intended to be used
### inside this image in the real deployments.
ARG REPOSITORY=neondatabase
ARG REPOSITORY=ghcr.io/neondatabase
ARG IMAGE=build-tools
ARG TAG=pinned
ARG DEBIAN_VERSION=bookworm
ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim
# Here are the INDEX DIGESTS for the images we use.
# You can get them following next steps for now:
# 1. Get an authentication token from DockerHub:
# TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/debian:pull" | jq -r .token)
# 2. Using that token, query index for the given tag:
# curl -s -H "Authorization: Bearer $TOKEN" \
# -H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \
# "https://registry.hub.docker.com/v2/library/debian/manifests/bullseye-slim" \
# -I | grep -i docker-content-digest
# 3. As a next step, TODO(fedordikarev): create script and schedule workflow to run these checks
# and updates on regular bases and in automated way.
ARG BOOKWORM_SLIM_SHA=sha256:40b107342c492725bc7aacbe93a49945445191ae364184a6d24fedb28172f6f7
ARG BULLSEYE_SLIM_SHA=sha256:e831d9a884d63734fe3dd9c491ed9a5a3d4c6a6d32c5b14f2067357c49b0b7e1
# Here we use ${var/search/replace} syntax, to check
# if base image is one of the images, we pin image index for.
# If var will match one the known images, we will replace it with the known sha.
# If no match, than value will be unaffected, and will process with no-pinned image.
ARG BASE_IMAGE_SHA=debian:${DEBIAN_FLAVOR}
ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bookworm-slim/debian@$BOOKWORM_SLIM_SHA}
ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bullseye-slim/debian@$BULLSEYE_SLIM_SHA}
# Build Postgres
FROM $REPOSITORY/$IMAGE:$TAG AS pg-build
@@ -13,16 +37,25 @@ WORKDIR /home/nonroot
COPY --chown=nonroot vendor/postgres-v14 vendor/postgres-v14
COPY --chown=nonroot vendor/postgres-v15 vendor/postgres-v15
COPY --chown=nonroot vendor/postgres-v16 vendor/postgres-v16
COPY --chown=nonroot vendor/postgres-v17 vendor/postgres-v17
COPY --chown=nonroot pgxn pgxn
COPY --chown=nonroot Makefile Makefile
COPY --chown=nonroot postgres.mk postgres.mk
COPY --chown=nonroot scripts/ninstall.sh scripts/ninstall.sh
ENV BUILD_TYPE=release
RUN set -e \
&& mold -run make -j $(nproc) -s neon-pg-ext \
&& rm -rf pg_install/build \
&& tar -C pg_install -czf /home/nonroot/postgres_install.tar.gz .
# Prepare cargo-chef recipe
FROM $REPOSITORY/$IMAGE:$TAG AS plan
WORKDIR /home/nonroot
COPY --chown=nonroot . .
RUN cargo chef prepare --recipe-path recipe.json
# Build neon binaries
FROM $REPOSITORY/$IMAGE:$TAG AS build
WORKDIR /home/nonroot
@@ -32,12 +65,18 @@ ARG BUILD_TAG
COPY --from=pg-build /home/nonroot/pg_install/v14/include/postgresql/server pg_install/v14/include/postgresql/server
COPY --from=pg-build /home/nonroot/pg_install/v15/include/postgresql/server pg_install/v15/include/postgresql/server
COPY --from=pg-build /home/nonroot/pg_install/v16/include/postgresql/server pg_install/v16/include/postgresql/server
COPY --from=pg-build /home/nonroot/pg_install/v16/lib pg_install/v16/lib
COPY --from=pg-build /home/nonroot/pg_install/v17/include/postgresql/server pg_install/v17/include/postgresql/server
COPY --from=plan /home/nonroot/recipe.json recipe.json
ARG ADDITIONAL_RUSTFLAGS=""
RUN set -e \
&& RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo chef cook --locked --release --recipe-path recipe.json
COPY --chown=nonroot . .
ARG ADDITIONAL_RUSTFLAGS
RUN set -e \
&& PQ_LIB_DIR=$(pwd)/pg_install/v16/lib RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment ${ADDITIONAL_RUSTFLAGS}" cargo build \
&& RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo build \
--bin pg_sni_router \
--bin pageserver \
--bin pagectl \
@@ -45,21 +84,38 @@ RUN set -e \
--bin storage_broker \
--bin storage_controller \
--bin proxy \
--bin endpoint_storage \
--bin neon_local \
--bin storage_scrubber \
--locked --release
# Build final image
#
FROM debian:bullseye-slim
FROM $BASE_IMAGE_SHA
WORKDIR /data
RUN set -e \
&& echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries \
&& apt update \
&& apt install -y \
libreadline-dev \
libseccomp-dev \
ca-certificates \
openssl \
unzip \
curl \
&& ARCH=$(uname -m) \
&& if [ "$ARCH" = "x86_64" ]; then \
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"; \
elif [ "$ARCH" = "aarch64" ]; then \
curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip" -o "awscliv2.zip"; \
else \
echo "Unsupported architecture: $ARCH" && exit 1; \
fi \
&& unzip awscliv2.zip \
&& ./aws/install \
&& rm -rf aws awscliv2.zip \
&& rm -f /etc/apt/apt.conf.d/80-retries \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
&& useradd -d /data neon \
&& chown -R neon:neon /data
@@ -71,12 +127,14 @@ COPY --from=build --chown=neon:neon /home/nonroot/target/release/safekeeper
COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_broker /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_controller /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/proxy /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/endpoint_storage /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/neon_local /usr/local/bin
COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_scrubber /usr/local/bin
COPY --from=pg-build /home/nonroot/pg_install/v14 /usr/local/v14/
COPY --from=pg-build /home/nonroot/pg_install/v15 /usr/local/v15/
COPY --from=pg-build /home/nonroot/pg_install/v16 /usr/local/v16/
COPY --from=pg-build /home/nonroot/pg_install/v17 /usr/local/v17/
COPY --from=pg-build /home/nonroot/postgres_install.tar.gz /data/
# By default, pageserver uses `.neon/` working directory in WORKDIR, so create one and fill it with the dummy config.
@@ -91,15 +149,9 @@ RUN mkdir -p /data/.neon/ && \
> /data/.neon/pageserver.toml && \
chown -R neon:neon /data/.neon
# When running a binary that links with libpq, default to using our most recent postgres version. Binaries
# that want a particular postgres version will select it explicitly: this is just a default.
ENV LD_LIBRARY_PATH=/usr/local/v16/lib
VOLUME ["/data"]
USER neon
EXPOSE 6400
EXPOSE 9898
CMD ["/usr/local/bin/pageserver", "-D", "/data/.neon"]

View File

@@ -1,229 +0,0 @@
FROM debian:bullseye-slim
# Use ARG as a build-time environment variable here to allow.
# It's not supposed to be set outside.
# Alternatively it can be obtained using the following command
# ```
# . /etc/os-release && echo "${VERSION_CODENAME}"
# ```
ARG DEBIAN_VERSION_CODENAME=bullseye
# Add nonroot user
RUN useradd -ms /bin/bash nonroot -b /home
SHELL ["/bin/bash", "-c"]
# System deps
RUN set -e \
&& apt update \
&& apt install -y \
autoconf \
automake \
bison \
build-essential \
ca-certificates \
cmake \
curl \
flex \
git \
gnupg \
gzip \
jq \
libcurl4-openssl-dev \
libbz2-dev \
libffi-dev \
liblzma-dev \
libncurses5-dev \
libncursesw5-dev \
libreadline-dev \
libseccomp-dev \
libsqlite3-dev \
libssl-dev \
libstdc++-10-dev \
libtool \
libxml2-dev \
libxmlsec1-dev \
libxxhash-dev \
lsof \
make \
netcat \
net-tools \
openssh-client \
parallel \
pkg-config \
unzip \
wget \
xz-utils \
zlib1g-dev \
zstd \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# protobuf-compiler (protoc)
ENV PROTOC_VERSION=25.1
RUN curl -fsSL "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/protoc-${PROTOC_VERSION}-linux-$(uname -m | sed 's/aarch64/aarch_64/g').zip" -o "protoc.zip" \
&& unzip -q protoc.zip -d protoc \
&& mv protoc/bin/protoc /usr/local/bin/protoc \
&& mv protoc/include/google /usr/local/include/google \
&& rm -rf protoc.zip protoc
# s5cmd
ENV S5CMD_VERSION=2.2.2
RUN curl -sL "https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_Linux-$(uname -m | sed 's/x86_64/64bit/g' | sed 's/aarch64/arm64/g').tar.gz" | tar zxvf - s5cmd \
&& chmod +x s5cmd \
&& mv s5cmd /usr/local/bin/s5cmd
# LLVM
ENV LLVM_VERSION=18
RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \
&& echo "deb http://apt.llvm.org/${DEBIAN_VERSION_CODENAME}/ llvm-toolchain-${DEBIAN_VERSION_CODENAME}-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \
&& apt update \
&& apt install -y clang-${LLVM_VERSION} llvm-${LLVM_VERSION} \
&& bash -c 'for f in /usr/bin/clang*-${LLVM_VERSION} /usr/bin/llvm*-${LLVM_VERSION}; do ln -s "${f}" "${f%-${LLVM_VERSION}}"; done' \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Install docker
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian ${DEBIAN_VERSION_CODENAME} stable" > /etc/apt/sources.list.d/docker.list \
&& apt update \
&& apt install -y docker-ce docker-ce-cli \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Configure sudo & docker
RUN usermod -aG sudo nonroot && \
echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers && \
usermod -aG docker nonroot
# AWS CLI
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "awscliv2.zip" \
&& unzip -q awscliv2.zip \
&& ./aws/install \
&& rm awscliv2.zip
# Mold: A Modern Linker
ENV MOLD_VERSION=v2.33.0
RUN set -e \
&& git clone https://github.com/rui314/mold.git \
&& mkdir mold/build \
&& cd mold/build \
&& git checkout ${MOLD_VERSION} \
&& cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ .. \
&& cmake --build . -j $(nproc) \
&& cmake --install . \
&& cd .. \
&& rm -rf mold
# LCOV
# Build lcov from a fork:
# It includes several bug fixes on top on v2.0 release (https://github.com/linux-test-project/lcov/compare/v2.0...master)
# And patches from us:
# - Generates json file with code coverage summary (https://github.com/neondatabase/lcov/commit/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz)
RUN for package in Capture::Tiny DateTime Devel::Cover Digest::MD5 File::Spec JSON::XS Memory::Process Time::HiRes JSON; do yes | perl -MCPAN -e "CPAN::Shell->notest('install', '$package')"; done \
&& wget https://github.com/neondatabase/lcov/archive/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz -O lcov.tar.gz \
&& echo "61a22a62e20908b8b9e27d890bd0ea31f567a7b9668065589266371dcbca0992 lcov.tar.gz" | sha256sum --check \
&& mkdir -p lcov && tar -xzf lcov.tar.gz -C lcov --strip-components=1 \
&& cd lcov \
&& make install \
&& rm -rf ../lcov.tar.gz
# Compile and install the static OpenSSL library
ENV OPENSSL_VERSION=1.1.1w
ENV OPENSSL_PREFIX=/usr/local/openssl
RUN wget -O /tmp/openssl-${OPENSSL_VERSION}.tar.gz https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz && \
echo "cf3098950cb4d853ad95c0841f1f9c6d3dc102dccfcacd521d93925208b76ac8 /tmp/openssl-${OPENSSL_VERSION}.tar.gz" | sha256sum --check && \
cd /tmp && \
tar xzvf /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \
rm /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \
cd /tmp/openssl-${OPENSSL_VERSION} && \
./config --prefix=${OPENSSL_PREFIX} -static --static no-shared -fPIC && \
make -j "$(nproc)" && \
make install && \
cd /tmp && \
rm -rf /tmp/openssl-${OPENSSL_VERSION}
# Use the same version of libicu as the compute nodes so that
# clusters created using inidb on pageserver can be used by computes.
#
# TODO: at this time, Dockerfile.compute-node uses the debian bullseye libicu
# package, which is 67.1. We're duplicating that knowledge here, and also, technically,
# Debian has a few patches on top of 67.1 that we're not adding here.
ENV ICU_VERSION=67.1
ENV ICU_PREFIX=/usr/local/icu
# Download and build static ICU
RUN wget -O /tmp/libicu-${ICU_VERSION}.tgz https://github.com/unicode-org/icu/releases/download/release-${ICU_VERSION//./-}/icu4c-${ICU_VERSION//./_}-src.tgz && \
echo "94a80cd6f251a53bd2a997f6f1b5ac6653fe791dfab66e1eb0227740fb86d5dc /tmp/libicu-${ICU_VERSION}.tgz" | sha256sum --check && \
mkdir /tmp/icu && \
pushd /tmp/icu && \
tar -xzf /tmp/libicu-${ICU_VERSION}.tgz && \
pushd icu/source && \
./configure --prefix=${ICU_PREFIX} --enable-static --enable-shared=no CXXFLAGS="-fPIC" CFLAGS="-fPIC" && \
make -j "$(nproc)" && \
make install && \
popd && \
rm -rf icu && \
rm -f /tmp/libicu-${ICU_VERSION}.tgz && \
popd
# Switch to nonroot user
USER nonroot:nonroot
WORKDIR /home/nonroot
# Python
ENV PYTHON_VERSION=3.9.19 \
PYENV_ROOT=/home/nonroot/.pyenv \
PATH=/home/nonroot/.pyenv/shims:/home/nonroot/.pyenv/bin:/home/nonroot/.poetry/bin:$PATH
RUN set -e \
&& cd $HOME \
&& curl -sSO https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer \
&& chmod +x pyenv-installer \
&& ./pyenv-installer \
&& export PYENV_ROOT=/home/nonroot/.pyenv \
&& export PATH="$PYENV_ROOT/bin:$PATH" \
&& export PATH="$PYENV_ROOT/shims:$PATH" \
&& pyenv install ${PYTHON_VERSION} \
&& pyenv global ${PYTHON_VERSION} \
&& python --version \
&& pip install --upgrade pip \
&& pip --version \
&& pip install pipenv wheel poetry
# Switch to nonroot user (again)
USER nonroot:nonroot
WORKDIR /home/nonroot
# Rust
# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)
ENV RUSTC_VERSION=1.81.0
ENV RUSTUP_HOME="/home/nonroot/.rustup"
ENV PATH="/home/nonroot/.cargo/bin:${PATH}"
ARG RUSTFILT_VERSION=0.2.1
ARG CARGO_HAKARI_VERSION=0.9.30
ARG CARGO_DENY_VERSION=0.16.1
ARG CARGO_HACK_VERSION=0.6.31
ARG CARGO_NEXTEST_VERSION=0.9.72
RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && whoami && \
chmod +x rustup-init && \
./rustup-init -y --default-toolchain ${RUSTC_VERSION} && \
rm rustup-init && \
export PATH="$HOME/.cargo/bin:$PATH" && \
. "$HOME/.cargo/env" && \
cargo --version && rustup --version && \
rustup component add llvm-tools rustfmt clippy && \
cargo install rustfilt --version ${RUSTFILT_VERSION} && \
cargo install cargo-hakari --version ${CARGO_HAKARI_VERSION} && \
cargo install cargo-deny --locked --version ${CARGO_DENY_VERSION} && \
cargo install cargo-hack --version ${CARGO_HACK_VERSION} && \
cargo install cargo-nextest --version ${CARGO_NEXTEST_VERSION} && \
rm -rf /home/nonroot/.cargo/registry && \
rm -rf /home/nonroot/.cargo/git
# Show versions
RUN whoami \
&& python --version \
&& pip --version \
&& cargo --version --verbose \
&& rustup --version --verbose \
&& rustc --version --verbose \
&& clang --version
# Set following flag to check in Makefile if its running in Docker
RUN touch /home/nonroot/.docker_build

File diff suppressed because it is too large Load Diff

285
Makefile
View File

@@ -1,9 +1,21 @@
ROOT_PROJECT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
# Where to install Postgres, default is ./pg_install, maybe useful for package managers
# Where to install Postgres, default is ./pg_install, maybe useful for package
# managers.
POSTGRES_INSTALL_DIR ?= $(ROOT_PROJECT_DIR)/pg_install/
OPENSSL_PREFIX_DIR := /usr/local/openssl
# Supported PostgreSQL versions
POSTGRES_VERSIONS = v17 v16 v15 v14
# CARGO_BUILD_FLAGS: Extra flags to pass to `cargo build`. `--locked`
# and `--features testing` are popular examples.
#
# CARGO_PROFILE: Set to override the cargo profile to use. By default,
# it is derived from BUILD_TYPE.
# All intermediate build artifacts are stored here.
BUILD_DIR := build
ICU_PREFIX_DIR := /usr/local/icu
#
@@ -11,33 +23,46 @@ ICU_PREFIX_DIR := /usr/local/icu
# environment variable.
#
BUILD_TYPE ?= debug
WITH_SANITIZERS ?= no
PG_CFLAGS = -fsigned-char
ifeq ($(BUILD_TYPE),release)
PG_CONFIGURE_OPTS = --enable-debug --with-openssl
PG_CFLAGS = -O2 -g3 $(CFLAGS)
# Unfortunately, `--profile=...` is a nightly feature
CARGO_BUILD_FLAGS += --release
PG_CFLAGS += -O2 -g3 $(CFLAGS)
PG_LDFLAGS = $(LDFLAGS)
CARGO_PROFILE ?= --profile=release
else ifeq ($(BUILD_TYPE),debug)
PG_CONFIGURE_OPTS = --enable-debug --with-openssl --enable-cassert --enable-depend
PG_CFLAGS = -O0 -g3 $(CFLAGS)
PG_CFLAGS += -O0 -g3 $(CFLAGS)
PG_LDFLAGS = $(LDFLAGS)
CARGO_PROFILE ?= --profile=dev
else
$(error Bad build type '$(BUILD_TYPE)', see Makefile for options)
endif
ifeq ($(WITH_SANITIZERS),yes)
PG_CFLAGS += -fsanitize=address -fsanitize=undefined -fno-sanitize-recover
COPT += -Wno-error # to avoid failing on warnings induced by sanitizers
PG_LDFLAGS = -fsanitize=address -fsanitize=undefined -static-libasan -static-libubsan $(LDFLAGS)
export CC := gcc
export ASAN_OPTIONS := detect_leaks=0
endif
ifeq ($(shell test -e /home/nonroot/.docker_build && echo -n yes),yes)
# Exclude static build openssl, icu for local build (MacOS, Linux)
# Only keep for build type release and debug
PG_CFLAGS += -I$(OPENSSL_PREFIX_DIR)/include
PG_CONFIGURE_OPTS += --with-icu
PG_CONFIGURE_OPTS += ICU_CFLAGS='-I/$(ICU_PREFIX_DIR)/include -DU_STATIC_IMPLEMENTATION'
PG_CONFIGURE_OPTS += ICU_LIBS='-L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -licui18n -licuuc -licudata -lstdc++ -Wl,-Bdynamic -lm'
PG_CONFIGURE_OPTS += LDFLAGS='-L$(OPENSSL_PREFIX_DIR)/lib -L$(OPENSSL_PREFIX_DIR)/lib64 -L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -Wl,-Bstatic -lssl -lcrypto -Wl,-Bdynamic -lrt -lm -ldl -lpthread'
endif
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Linux)
# Seccomp BPF is only available for Linux
PG_CONFIGURE_OPTS += --with-libseccomp
ifneq ($(WITH_SANITIZERS),yes)
PG_CONFIGURE_OPTS += --with-libseccomp
endif
else ifeq ($(UNAME_S),Darwin)
PG_CFLAGS += -DUSE_PREFETCH
ifndef DISABLE_HOMEBREW
# macOS with brew-installed openssl requires explicit paths
# It can be configured with OPENSSL_PREFIX variable
@@ -66,8 +91,6 @@ CARGO_BUILD_FLAGS += $(filter -j1,$(MAKEFLAGS))
CARGO_CMD_PREFIX += $(if $(filter n,$(MAKEFLAGS)),,+)
# Force cargo not to print progress bar
CARGO_CMD_PREFIX += CARGO_TERM_PROGRESS_WHEN=never CI=1
# Set PQ_LIB_DIR to make sure `storage_controller` get linked with bundled libpq (through diesel)
CARGO_CMD_PREFIX += PQ_LIB_DIR=$(POSTGRES_INSTALL_DIR)/v16/lib
CACHEDIR_TAG_CONTENTS := "Signature: 8a477f597d28d172789f06886806bc55"
@@ -75,135 +98,29 @@ CACHEDIR_TAG_CONTENTS := "Signature: 8a477f597d28d172789f06886806bc55"
# Top level Makefile to build Neon and PostgreSQL
#
.PHONY: all
all: neon postgres neon-pg-ext
all: neon postgres-install neon-pg-ext
### Neon Rust bits
#
# The 'postgres_ffi' depends on the Postgres headers.
.PHONY: neon
neon: postgres-headers walproposer-lib cargo-target-dir
neon: postgres-headers-install walproposer-lib cargo-target-dir
+@echo "Compiling Neon"
$(CARGO_CMD_PREFIX) cargo build $(CARGO_BUILD_FLAGS)
$(CARGO_CMD_PREFIX) cargo build $(CARGO_BUILD_FLAGS) $(CARGO_PROFILE)
.PHONY: cargo-target-dir
cargo-target-dir:
# https://github.com/rust-lang/cargo/issues/14281
mkdir -p target
test -e target/CACHEDIR.TAG || echo "$(CACHEDIR_TAG_CONTENTS)" > target/CACHEDIR.TAG
### PostgreSQL parts
# Some rules are duplicated for Postgres v14 and 15. We may want to refactor
# to avoid the duplication in the future, but it's tolerable for now.
#
$(POSTGRES_INSTALL_DIR)/build/%/config.status:
mkdir -p $(POSTGRES_INSTALL_DIR)
test -e $(POSTGRES_INSTALL_DIR)/CACHEDIR.TAG || echo "$(CACHEDIR_TAG_CONTENTS)" > $(POSTGRES_INSTALL_DIR)/CACHEDIR.TAG
+@echo "Configuring Postgres $* build"
@test -s $(ROOT_PROJECT_DIR)/vendor/postgres-$*/configure || { \
echo "\nPostgres submodule not found in $(ROOT_PROJECT_DIR)/vendor/postgres-$*/, execute "; \
echo "'git submodule update --init --recursive --depth 2 --progress .' in project root.\n"; \
exit 1; }
mkdir -p $(POSTGRES_INSTALL_DIR)/build/$*
VERSION=$*; \
EXTRA_VERSION=$$(cd $(ROOT_PROJECT_DIR)/vendor/postgres-$$VERSION && git rev-parse HEAD); \
(cd $(POSTGRES_INSTALL_DIR)/build/$$VERSION && \
env PATH="$(EXTRA_PATH_OVERRIDES):$$PATH" $(ROOT_PROJECT_DIR)/vendor/postgres-$$VERSION/configure \
CFLAGS='$(PG_CFLAGS)' \
$(PG_CONFIGURE_OPTS) --with-extra-version=" ($$EXTRA_VERSION)" \
--prefix=$(abspath $(POSTGRES_INSTALL_DIR))/$$VERSION > configure.log)
# nicer alias to run 'configure'
# Note: I've been unable to use templates for this part of our configuration.
# I'm not sure why it wouldn't work, but this is the only place (apart from
# the "build-all-versions" entry points) where direct mention of PostgreSQL
# versions is used.
.PHONY: postgres-configure-v16
postgres-configure-v16: $(POSTGRES_INSTALL_DIR)/build/v16/config.status
.PHONY: postgres-configure-v15
postgres-configure-v15: $(POSTGRES_INSTALL_DIR)/build/v15/config.status
.PHONY: postgres-configure-v14
postgres-configure-v14: $(POSTGRES_INSTALL_DIR)/build/v14/config.status
# Install the PostgreSQL header files into $(POSTGRES_INSTALL_DIR)/<version>/include
.PHONY: postgres-headers-%
postgres-headers-%: postgres-configure-%
+@echo "Installing PostgreSQL $* headers"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/include MAKELEVEL=0 install
# Compile and install PostgreSQL
.PHONY: postgres-%
postgres-%: postgres-configure-% \
postgres-headers-% # to prevent `make install` conflicts with neon's `postgres-headers`
+@echo "Compiling PostgreSQL $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$* MAKELEVEL=0 install
+@echo "Compiling libpq $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/interfaces/libpq install
+@echo "Compiling pg_prewarm $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_prewarm install
+@echo "Compiling pg_buffercache $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_buffercache install
+@echo "Compiling pageinspect $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect install
+@echo "Compiling amcheck $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/amcheck install
+@echo "Compiling test_decoding $*"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/test_decoding install
.PHONY: postgres-clean-%
postgres-clean-%:
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$* MAKELEVEL=0 clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_buffercache clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect clean
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/interfaces/libpq clean
.PHONY: postgres-check-%
postgres-check-%: postgres-%
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$* MAKELEVEL=0 check
.PHONY: neon-pg-ext-%
neon-pg-ext-%: postgres-%
+@echo "Compiling neon $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile install
+@echo "Compiling neon_walredo $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile install
+@echo "Compiling neon_rmgr $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_rmgr/Makefile install
+@echo "Compiling neon_test_utils $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile install
+@echo "Compiling neon_utils $*"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-utils-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile install
.PHONY: neon-pg-clean-ext-%
neon-pg-clean-ext-%:
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile clean
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \
-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile clean
neon-pg-ext-%: postgres-install-%
+@echo "Compiling neon-specific Postgres extensions for $*"
mkdir -p $(BUILD_DIR)/pgxn-$*
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \
-C $(BUILD_DIR)/pgxn-$*\
-f $(ROOT_PROJECT_DIR)/pgxn/Makefile install
# Build walproposer as a static library. walproposer source code is located
# in the pgxn/neon directory.
@@ -215,78 +132,36 @@ neon-pg-clean-ext-%:
# they depend on openssl and other libraries that are not included in our
# Rust build.
.PHONY: walproposer-lib
walproposer-lib: neon-pg-ext-v16
walproposer-lib: neon-pg-ext-v17
+@echo "Compiling walproposer-lib"
mkdir -p $(POSTGRES_INSTALL_DIR)/build/walproposer-lib
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \
mkdir -p $(BUILD_DIR)/walproposer-lib
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config COPT='$(COPT)' \
-C $(BUILD_DIR)/walproposer-lib \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile walproposer-lib
cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgport.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib
cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgcommon.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib
ifeq ($(UNAME_S),Linux)
$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgport.a \
cp $(POSTGRES_INSTALL_DIR)/v17/lib/libpgport.a $(BUILD_DIR)/walproposer-lib
cp $(POSTGRES_INSTALL_DIR)/v17/lib/libpgcommon.a $(BUILD_DIR)/walproposer-lib
$(AR) d $(BUILD_DIR)/walproposer-lib/libpgport.a \
pg_strong_random.o
$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgcommon.a \
pg_crc32c.o \
hmac_openssl.o \
$(AR) d $(BUILD_DIR)/walproposer-lib/libpgcommon.a \
checksum_helper.o \
cryptohash_openssl.o \
scram-common.o \
hmac_openssl.o \
md5_common.o \
checksum_helper.o
parse_manifest.o \
scram-common.o
ifeq ($(UNAME_S),Linux)
$(AR) d $(BUILD_DIR)/walproposer-lib/libpgcommon.a \
pg_crc32c.o
endif
.PHONY: walproposer-lib-clean
walproposer-lib-clean:
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config \
-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean
# Shorthand to call neon-pg-ext-% target for all Postgres versions
.PHONY: neon-pg-ext
neon-pg-ext: \
neon-pg-ext-v14 \
neon-pg-ext-v15 \
neon-pg-ext-v16
.PHONY: neon-pg-clean-ext
neon-pg-clean-ext: \
neon-pg-clean-ext-v14 \
neon-pg-clean-ext-v15 \
neon-pg-clean-ext-v16
# shorthand to build all Postgres versions
.PHONY: postgres
postgres: \
postgres-v14 \
postgres-v15 \
postgres-v16
.PHONY: postgres-headers
postgres-headers: \
postgres-headers-v14 \
postgres-headers-v15 \
postgres-headers-v16
.PHONY: postgres-clean
postgres-clean: \
postgres-clean-v14 \
postgres-clean-v15 \
postgres-clean-v16
.PHONY: postgres-check
postgres-check: \
postgres-check-v14 \
postgres-check-v15 \
postgres-check-v16
# This doesn't remove the effects of 'configure'.
.PHONY: clean
clean: postgres-clean neon-pg-clean-ext
$(CARGO_CMD_PREFIX) cargo clean
neon-pg-ext: $(foreach pg_version,$(POSTGRES_VERSIONS),neon-pg-ext-$(pg_version))
# This removes everything
.PHONY: distclean
distclean:
rm -rf $(POSTGRES_INSTALL_DIR)
$(RM) -r $(POSTGRES_INSTALL_DIR) $(BUILD_DIR)
$(CARGO_CMD_PREFIX) cargo clean
.PHONY: fmt
@@ -295,7 +170,7 @@ fmt:
postgres-%-pg-bsd-indent: postgres-%
+@echo "Compiling pg_bsd_indent"
$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/src/tools/pg_bsd_indent/
$(MAKE) -C $(BUILD_DIR)/$*/src/tools/pg_bsd_indent/
# Create typedef list for the core. Note that generally it should be combined with
# buildfarm one to cover platform specific stuff.
@@ -314,23 +189,39 @@ postgres-%-pgindent: postgres-%-pg-bsd-indent postgres-%-typedefs.list
cat $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/typedefs.list |\
cat - postgres-$*-typedefs.list | sort | uniq > postgres-$*-typedefs-full.list
+@echo note: you might want to run it on selected files/dirs instead.
INDENT=$(POSTGRES_INSTALL_DIR)/build/$*/src/tools/pg_bsd_indent/pg_bsd_indent \
INDENT=$(BUILD_DIR)/$*/src/tools/pg_bsd_indent/pg_bsd_indent \
$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/pgindent --typedefs postgres-$*-typedefs-full.list \
$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/ \
--excludes $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/exclude_file_patterns
rm -f pg*.BAK
$(RM) pg*.BAK
# Indent pxgn/neon.
.PHONY: pgindent
neon-pgindent: postgres-v16-pg-bsd-indent neon-pg-ext-v16
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \
FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/find_typedef \
INDENT=$(POSTGRES_INSTALL_DIR)/build/v16/src/tools/pg_bsd_indent/pg_bsd_indent \
PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/pgindent/pgindent \
-C $(POSTGRES_INSTALL_DIR)/build/neon-v16 \
.PHONY: neon-pgindent
neon-pgindent: postgres-v17-pg-bsd-indent neon-pg-ext-v17
$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config COPT='$(COPT)' \
FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v17/src/tools/find_typedef \
INDENT=$(BUILD_DIR)/v17/src/tools/pg_bsd_indent/pg_bsd_indent \
PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v17/src/tools/pgindent/pgindent \
-C $(BUILD_DIR)/neon-v17 \
-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile pgindent
.PHONY: setup-pre-commit-hook
setup-pre-commit-hook:
ln -s -f $(ROOT_PROJECT_DIR)/pre-commit.py .git/hooks/pre-commit
# Targets for building PostgreSQL are defined in postgres.mk.
#
# But if the caller has indicated that PostgreSQL is already
# installed, by setting the PG_INSTALL_CACHED variable, skip it.
ifdef PG_INSTALL_CACHED
postgres-install: skip-install
$(foreach pg_version,$(POSTGRES_VERSIONS),postgres-install-$(pg_version)): skip-install
postgres-headers-install:
+@echo "Skipping installation of PostgreSQL headers because PG_INSTALL_CACHED is set"
skip-install:
+@echo "Skipping PostgreSQL installation because PG_INSTALL_CACHED is set"
else
include postgres.mk
endif

View File

@@ -21,8 +21,10 @@ The Neon storage engine consists of two major components:
See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more information.
## Running local installation
## Running a local development environment
Neon can be run on a workstation for small experiments and to test code changes, by
following these instructions.
#### Installing dependencies on Linux
1. Install build dependencies and other applicable packages
@@ -31,7 +33,7 @@ See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more informati
```bash
apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \
libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler \
libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev
libprotobuf-dev libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev
```
* On Fedora, these packages are needed:
```bash
@@ -58,7 +60,7 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
1. Install XCode and dependencies
```
xcode-select --install
brew install protobuf openssl flex bison icu4c pkg-config
brew install protobuf openssl flex bison icu4c pkg-config m4
# add openssl to PATH, required for ed25519 keys generation in neon_local
echo 'export PATH="$(brew --prefix openssl)/bin:$PATH"' >> ~/.zshrc
@@ -132,7 +134,7 @@ make -j`sysctl -n hw.logicalcpu` -s
To run the `psql` client, install the `postgresql-client` package or modify `PATH` and `LD_LIBRARY_PATH` to include `pg_install/bin` and `pg_install/lib`, respectively.
To run the integration tests or Python scripts (not required to use the code), install
Python (3.9 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.8](https://python-poetry.org/)) in the project directory.
Python (3.11 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.8](https://python-poetry.org/)) in the project directory.
#### Running neon database
@@ -238,7 +240,7 @@ postgres=# select * from t;
> cargo neon stop
```
More advanced usages can be found at [Control Plane and Neon Local](./control_plane/README.md).
More advanced usages can be found at [Local Development Control Plane (`neon_local`))](./control_plane/README.md).
#### Handling build failures
@@ -268,7 +270,7 @@ By default, this runs both debug and release modes, and all supported postgres v
testing locally, it is convenient to run just one set of permutations, like this:
```sh
DEFAULT_PG_VERSION=16 BUILD_TYPE=release ./scripts/pytest
DEFAULT_PG_VERSION=17 BUILD_TYPE=release ./scripts/pytest
```
## Flamegraphs

341
build-tools.Dockerfile Normal file
View File

@@ -0,0 +1,341 @@
ARG DEBIAN_VERSION=bookworm
ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim
# Here are the INDEX DIGESTS for the images we use.
# You can get them following next steps for now:
# 1. Get an authentication token from DockerHub:
# TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/debian:pull" | jq -r .token)
# 2. Using that token, query index for the given tag:
# curl -s -H "Authorization: Bearer $TOKEN" \
# -H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \
# "https://registry.hub.docker.com/v2/library/debian/manifests/bullseye-slim" \
# -I | grep -i docker-content-digest
# 3. As a next step, TODO(fedordikarev): create script and schedule workflow to run these checks
# and updates on regular bases and in automated way.
ARG BOOKWORM_SLIM_SHA=sha256:40b107342c492725bc7aacbe93a49945445191ae364184a6d24fedb28172f6f7
ARG BULLSEYE_SLIM_SHA=sha256:e831d9a884d63734fe3dd9c491ed9a5a3d4c6a6d32c5b14f2067357c49b0b7e1
# Here we use ${var/search/replace} syntax, to check
# if base image is one of the images, we pin image index for.
# If var will match one the known images, we will replace it with the known sha.
# If no match, than value will be unaffected, and will process with no-pinned image.
ARG BASE_IMAGE_SHA=debian:${DEBIAN_FLAVOR}
ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bookworm-slim/debian@$BOOKWORM_SLIM_SHA}
ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bullseye-slim/debian@$BULLSEYE_SLIM_SHA}
FROM $BASE_IMAGE_SHA AS pgcopydb_builder
ARG DEBIAN_VERSION
# Use strict mode for bash to catch errors early
SHELL ["/bin/bash", "-euo", "pipefail", "-c"]
# By default, /bin/sh used in debian images will treat '\n' as eol,
# but as we use bash as SHELL, and built-in echo in bash requires '-e' flag for that.
RUN echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries && \
echo -e "retry_connrefused=on\ntimeout=15\ntries=5\nretry-on-host-error=on\n" > /root/.wgetrc && \
echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /root/.curlrc
COPY build_tools/patches/pgcopydbv017.patch /pgcopydbv017.patch
RUN if [ "${DEBIAN_VERSION}" = "bookworm" ]; then \
set -e && \
apt update && \
apt install -y --no-install-recommends \
ca-certificates wget gpg && \
wget -qO - https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor -o /usr/share/keyrings/postgresql-keyring.gpg && \
echo "deb [signed-by=/usr/share/keyrings/postgresql-keyring.gpg] http://apt.postgresql.org/pub/repos/apt bookworm-pgdg main" > /etc/apt/sources.list.d/pgdg.list && \
apt-get update && \
apt install -y --no-install-recommends \
build-essential \
autotools-dev \
libedit-dev \
libgc-dev \
libpam0g-dev \
libreadline-dev \
libselinux1-dev \
libxslt1-dev \
libssl-dev \
libkrb5-dev \
zlib1g-dev \
liblz4-dev \
libpq5 \
libpq-dev \
libzstd-dev \
postgresql-16 \
postgresql-server-dev-16 \
postgresql-common \
python3-sphinx && \
wget -O /tmp/pgcopydb.tar.gz https://github.com/dimitri/pgcopydb/archive/refs/tags/v0.17.tar.gz && \
mkdir /tmp/pgcopydb && \
tar -xzf /tmp/pgcopydb.tar.gz -C /tmp/pgcopydb --strip-components=1 && \
cd /tmp/pgcopydb && \
patch -p1 < /pgcopydbv017.patch && \
make -s clean && \
make -s -j12 install && \
libpq_path=$(find /lib /usr/lib -name "libpq.so.5" | head -n 1) && \
mkdir -p /pgcopydb/lib && \
cp "$libpq_path" /pgcopydb/lib/; \
else \
# copy command below will fail if we don't have dummy files, so we create them for other debian versions
mkdir -p /usr/lib/postgresql/16/bin && touch /usr/lib/postgresql/16/bin/pgcopydb && \
mkdir -p mkdir -p /pgcopydb/lib && touch /pgcopydb/lib/libpq.so.5; \
fi
FROM $BASE_IMAGE_SHA AS build_tools
ARG DEBIAN_VERSION
# Add nonroot user
RUN useradd -ms /bin/bash nonroot -b /home
# Use strict mode for bash to catch errors early
SHELL ["/bin/bash", "-euo", "pipefail", "-c"]
RUN mkdir -p /pgcopydb/bin && \
mkdir -p /pgcopydb/lib && \
chmod -R 755 /pgcopydb && \
chown -R nonroot:nonroot /pgcopydb
COPY --from=pgcopydb_builder /usr/lib/postgresql/16/bin/pgcopydb /pgcopydb/bin/pgcopydb
COPY --from=pgcopydb_builder /pgcopydb/lib/libpq.so.5 /pgcopydb/lib/libpq.so.5
RUN echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries && \
echo -e "retry_connrefused=on\ntimeout=15\ntries=5\nretry-on-host-error=on\n" > /root/.wgetrc && \
echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /root/.curlrc
# System deps
#
# 'gdb' is included so that we get backtraces of core dumps produced in
# regression tests
RUN set -e \
&& apt update \
&& apt install -y \
autoconf \
automake \
bison \
build-essential \
ca-certificates \
cmake \
curl \
flex \
gdb \
git \
gnupg \
gzip \
jq \
jsonnet \
libcurl4-openssl-dev \
libbz2-dev \
libffi-dev \
liblzma-dev \
libncurses5-dev \
libncursesw5-dev \
libreadline-dev \
libseccomp-dev \
libsqlite3-dev \
libssl-dev \
$([[ "${DEBIAN_VERSION}" = "bullseye" ]] && echo libstdc++-10-dev || echo libstdc++-11-dev) \
libtool \
libxml2-dev \
libxmlsec1-dev \
libxxhash-dev \
lsof \
make \
netcat-openbsd \
net-tools \
openssh-client \
parallel \
pkg-config \
unzip \
wget \
xz-utils \
zlib1g-dev \
zstd \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# sql_exporter
# Keep the version the same as in compute/compute-node.Dockerfile and
# test_runner/regress/test_compute_metrics.py.
ENV SQL_EXPORTER_VERSION=0.17.3
RUN curl -fsSL \
"https://github.com/burningalchemist/sql_exporter/releases/download/${SQL_EXPORTER_VERSION}/sql_exporter-${SQL_EXPORTER_VERSION}.linux-$(case "$(uname -m)" in x86_64) echo amd64;; aarch64) echo arm64;; esac).tar.gz" \
--output sql_exporter.tar.gz \
&& mkdir /tmp/sql_exporter \
&& tar xzvf sql_exporter.tar.gz -C /tmp/sql_exporter --strip-components=1 \
&& mv /tmp/sql_exporter/sql_exporter /usr/local/bin/sql_exporter \
&& rm sql_exporter.tar.gz
# protobuf-compiler (protoc)
# Keep the version the same as in compute/compute-node.Dockerfile
ENV PROTOC_VERSION=25.1
RUN curl -fsSL "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/protoc-${PROTOC_VERSION}-linux-$(uname -m | sed 's/aarch64/aarch_64/g').zip" -o "protoc.zip" \
&& unzip -q protoc.zip -d protoc \
&& mv protoc/bin/protoc /usr/local/bin/protoc \
&& mv protoc/include/google /usr/local/include/google \
&& rm -rf protoc.zip protoc
# s5cmd
ENV S5CMD_VERSION=2.3.0
RUN curl -sL "https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_Linux-$(uname -m | sed 's/x86_64/64bit/g' | sed 's/aarch64/arm64/g').tar.gz" | tar zxvf - s5cmd \
&& chmod +x s5cmd \
&& mv s5cmd /usr/local/bin/s5cmd
# LLVM
ENV LLVM_VERSION=20
RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \
&& echo "deb http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain-${DEBIAN_VERSION}-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \
&& apt update \
&& apt install -y clang-${LLVM_VERSION} llvm-${LLVM_VERSION} \
&& bash -c 'for f in /usr/bin/clang*-${LLVM_VERSION} /usr/bin/llvm*-${LLVM_VERSION}; do ln -s "${f}" "${f%-${LLVM_VERSION}}"; done' \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Install docker
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian ${DEBIAN_VERSION} stable" > /etc/apt/sources.list.d/docker.list \
&& apt update \
&& apt install -y docker-ce docker-ce-cli \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Configure sudo & docker
RUN usermod -aG sudo nonroot && \
echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers && \
usermod -aG docker nonroot
# AWS CLI
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "awscliv2.zip" \
&& unzip -q awscliv2.zip \
&& ./aws/install \
&& rm awscliv2.zip
# Mold: A Modern Linker
ENV MOLD_VERSION=v2.37.1
RUN set -e \
&& git clone https://github.com/rui314/mold.git \
&& mkdir mold/build \
&& cd mold/build \
&& git checkout ${MOLD_VERSION} \
&& cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ .. \
&& cmake --build . -j $(nproc) \
&& cmake --install . \
&& cd .. \
&& rm -rf mold
# LCOV
# Build lcov from a fork:
# It includes several bug fixes on top on v2.0 release (https://github.com/linux-test-project/lcov/compare/v2.0...master)
# And patches from us:
# - Generates json file with code coverage summary (https://github.com/neondatabase/lcov/commit/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz)
RUN set +o pipefail && \
for package in Capture::Tiny DateTime Devel::Cover Digest::MD5 File::Spec JSON::XS Memory::Process Time::HiRes JSON; do \
yes | perl -MCPAN -e "CPAN::Shell->notest('install', '$package')";\
done && \
set -o pipefail
# Split into separate step to debug flaky failures here
RUN wget https://github.com/neondatabase/lcov/archive/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz -O lcov.tar.gz \
&& ls -laht lcov.tar.gz && sha256sum lcov.tar.gz \
&& echo "61a22a62e20908b8b9e27d890bd0ea31f567a7b9668065589266371dcbca0992 lcov.tar.gz" | sha256sum --check \
&& mkdir -p lcov && tar -xzf lcov.tar.gz -C lcov --strip-components=1 \
&& cd lcov \
&& make install \
&& rm -rf ../lcov.tar.gz
# Use the same version of libicu as the compute nodes so that
# clusters created using inidb on pageserver can be used by computes.
#
# TODO: at this time, compute-node.Dockerfile uses the debian bullseye libicu
# package, which is 67.1. We're duplicating that knowledge here, and also, technically,
# Debian has a few patches on top of 67.1 that we're not adding here.
ENV ICU_VERSION=67.1
ENV ICU_PREFIX=/usr/local/icu
# Download and build static ICU
RUN wget -O /tmp/libicu-${ICU_VERSION}.tgz https://github.com/unicode-org/icu/releases/download/release-${ICU_VERSION//./-}/icu4c-${ICU_VERSION//./_}-src.tgz && \
echo "94a80cd6f251a53bd2a997f6f1b5ac6653fe791dfab66e1eb0227740fb86d5dc /tmp/libicu-${ICU_VERSION}.tgz" | sha256sum --check && \
mkdir /tmp/icu && \
pushd /tmp/icu && \
tar -xzf /tmp/libicu-${ICU_VERSION}.tgz && \
pushd icu/source && \
./configure --prefix=${ICU_PREFIX} --enable-static --enable-shared=no CXXFLAGS="-fPIC" CFLAGS="-fPIC" && \
make -j "$(nproc)" && \
make install && \
popd && \
rm -rf icu && \
rm -f /tmp/libicu-${ICU_VERSION}.tgz && \
popd
# Switch to nonroot user
USER nonroot:nonroot
WORKDIR /home/nonroot
RUN echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /home/nonroot/.curlrc
# Python
ENV PYTHON_VERSION=3.11.12 \
PYENV_ROOT=/home/nonroot/.pyenv \
PATH=/home/nonroot/.pyenv/shims:/home/nonroot/.pyenv/bin:/home/nonroot/.poetry/bin:$PATH
RUN set -e \
&& cd $HOME \
&& curl -sSO https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer \
&& chmod +x pyenv-installer \
&& ./pyenv-installer \
&& export PYENV_ROOT=/home/nonroot/.pyenv \
&& export PATH="$PYENV_ROOT/bin:$PATH" \
&& export PATH="$PYENV_ROOT/shims:$PATH" \
&& pyenv install ${PYTHON_VERSION} \
&& pyenv global ${PYTHON_VERSION} \
&& python --version \
&& pip install --upgrade pip \
&& pip --version \
&& pip install pipenv wheel poetry
# Switch to nonroot user (again)
USER nonroot:nonroot
WORKDIR /home/nonroot
# Rust
# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)
ENV RUSTC_VERSION=1.88.0
ENV RUSTUP_HOME="/home/nonroot/.rustup"
ENV PATH="/home/nonroot/.cargo/bin:${PATH}"
ARG RUSTFILT_VERSION=0.2.1
ARG CARGO_HAKARI_VERSION=0.9.36
ARG CARGO_DENY_VERSION=0.18.2
ARG CARGO_HACK_VERSION=0.6.36
ARG CARGO_NEXTEST_VERSION=0.9.94
ARG CARGO_CHEF_VERSION=0.1.71
ARG CARGO_DIESEL_CLI_VERSION=2.2.9
RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && whoami && \
chmod +x rustup-init && \
./rustup-init -y --default-toolchain ${RUSTC_VERSION} && \
rm rustup-init && \
export PATH="$HOME/.cargo/bin:$PATH" && \
. "$HOME/.cargo/env" && \
cargo --version && rustup --version && \
rustup component add llvm-tools rustfmt clippy && \
cargo install rustfilt --version ${RUSTFILT_VERSION} --locked && \
cargo install cargo-hakari --version ${CARGO_HAKARI_VERSION} --locked && \
cargo install cargo-deny --version ${CARGO_DENY_VERSION} --locked && \
cargo install cargo-hack --version ${CARGO_HACK_VERSION} --locked && \
cargo install cargo-nextest --version ${CARGO_NEXTEST_VERSION} --locked && \
cargo install cargo-chef --version ${CARGO_CHEF_VERSION} --locked && \
cargo install diesel_cli --version ${CARGO_DIESEL_CLI_VERSION} --locked \
--features postgres-bundled --no-default-features && \
rm -rf /home/nonroot/.cargo/registry && \
rm -rf /home/nonroot/.cargo/git
# Show versions
RUN whoami \
&& python --version \
&& pip --version \
&& cargo --version --verbose \
&& rustup --version --verbose \
&& rustc --version --verbose \
&& clang --version
RUN if [ "${DEBIAN_VERSION}" = "bookworm" ]; then \
LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version; \
else \
echo "pgcopydb is not available for ${DEBIAN_VERSION}"; \
fi
# Set following flag to check in Makefile if its running in Docker
RUN touch /home/nonroot/.docker_build

View File

@@ -0,0 +1,57 @@
diff --git a/src/bin/pgcopydb/copydb.c b/src/bin/pgcopydb/copydb.c
index d730b03..69a9be9 100644
--- a/src/bin/pgcopydb/copydb.c
+++ b/src/bin/pgcopydb/copydb.c
@@ -44,6 +44,7 @@ GUC dstSettings[] = {
{ "synchronous_commit", "'off'" },
{ "statement_timeout", "0" },
{ "lock_timeout", "0" },
+ { "idle_in_transaction_session_timeout", "0" },
{ NULL, NULL },
};
diff --git a/src/bin/pgcopydb/pgsql.c b/src/bin/pgcopydb/pgsql.c
index 94f2f46..e051ba8 100644
--- a/src/bin/pgcopydb/pgsql.c
+++ b/src/bin/pgcopydb/pgsql.c
@@ -2319,6 +2319,11 @@ pgsql_execute_log_error(PGSQL *pgsql,
LinesBuffer lbuf = { 0 };
+ if (message != NULL){
+ // make sure message is writable by splitLines
+ message = strdup(message);
+ }
+
if (!splitLines(&lbuf, message))
{
/* errors have already been logged */
@@ -2332,6 +2337,7 @@ pgsql_execute_log_error(PGSQL *pgsql,
PQbackendPID(pgsql->connection),
lbuf.lines[lineNumber]);
}
+ free(message); // free copy of message we created above
if (pgsql->logSQL)
{
@@ -3174,11 +3180,18 @@ pgcopy_log_error(PGSQL *pgsql, PGresult *res, const char *context)
/* errors have already been logged */
return;
}
-
if (res != NULL)
{
char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
- strlcpy(pgsql->sqlstate, sqlstate, sizeof(pgsql->sqlstate));
+ if (sqlstate == NULL)
+ {
+ // PQresultErrorField returned NULL!
+ pgsql->sqlstate[0] = '\0'; // Set to an empty string to avoid segfault
+ }
+ else
+ {
+ strlcpy(pgsql->sqlstate, sqlstate, sizeof(pgsql->sqlstate));
+ }
}
char *endpoint =

View File

@@ -12,3 +12,5 @@ disallowed-macros = [
# cannot disallow this, because clippy finds used from tokio macros
#"tokio::pin",
]
allow-unwrap-in-tests = true

8
compute/.gitignore vendored Normal file
View File

@@ -0,0 +1,8 @@
# sql_exporter config files generated from Jsonnet
etc/neon_collector.yml
etc/neon_collector_autoscaling.yml
etc/sql_exporter.yml
etc/sql_exporter_autoscaling.yml
# Node.js dependencies
node_modules/

58
compute/Makefile Normal file
View File

@@ -0,0 +1,58 @@
jsonnet_files = $(wildcard \
etc/*.jsonnet \
etc/sql_exporter/*.libsonnet)
.PHONY: all
all: neon_collector.yml neon_collector_autoscaling.yml sql_exporter.yml sql_exporter_autoscaling.yml
neon_collector.yml: $(jsonnet_files)
JSONNET_PATH=jsonnet:etc jsonnet \
--output-file etc/$@ \
--ext-str pg_version=$(PG_VERSION) \
etc/neon_collector.jsonnet
neon_collector_autoscaling.yml: $(jsonnet_files)
JSONNET_PATH=jsonnet:etc jsonnet \
--output-file etc/$@ \
--ext-str pg_version=$(PG_VERSION) \
etc/neon_collector_autoscaling.jsonnet
sql_exporter.yml: $(jsonnet_files)
JSONNET_PATH=etc jsonnet \
--output-file etc/$@ \
--tla-str collector_name=neon_collector \
--tla-str collector_file=neon_collector.yml \
--tla-str 'connection_string=postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=sql_exporter&pgaudit.log=none' \
etc/sql_exporter.jsonnet
sql_exporter_autoscaling.yml: $(jsonnet_files)
JSONNET_PATH=etc jsonnet \
--output-file etc/$@ \
--tla-str collector_name=neon_collector_autoscaling \
--tla-str collector_file=neon_collector_autoscaling.yml \
--tla-str 'connection_string=postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=sql_exporter_autoscaling&pgaudit.log=none' \
etc/sql_exporter.jsonnet
.PHONY: clean
clean:
$(RM) \
etc/neon_collector.yml \
etc/neon_collector_autoscaling.yml \
etc/sql_exporter.yml \
etc/sql_exporter_autoscaling.yml
.PHONY: jsonnetfmt-test
jsonnetfmt-test:
jsonnetfmt --test $(jsonnet_files)
.PHONY: jsonnetfmt-format
jsonnetfmt-format:
jsonnetfmt --in-place $(jsonnet_files)
.PHONY: manifest-schema-validation
manifest-schema-validation: node_modules
node_modules/.bin/jsonschema validate -d https://json-schema.org/draft/2020-12/schema manifest.schema.json manifest.yaml
node_modules: package.json
npm install
touch node_modules

21
compute/README.md Normal file
View File

@@ -0,0 +1,21 @@
This directory contains files that are needed to build the compute
images, or included in the compute images.
compute-node.Dockerfile
To build the compute image
vm-image-spec.yaml
Instructions for vm-builder, to turn the compute-node image into
corresponding vm-compute-node image.
etc/
Configuration files included in /etc in the compute image
patches/
Some extensions need to be patched to work with Neon. This
directory contains such patches. They are applied to the extension
sources in compute-node.Dockerfile
In addition to these, postgres itself, the neon postgres extension,
and compute_ctl are built and copied into the compute image by
compute-node.Dockerfile.

File diff suppressed because it is too large Load Diff

17
compute/etc/README.md Normal file
View File

@@ -0,0 +1,17 @@
# Compute Configuration
These files are the configuration files for various other pieces of software
that will be running in the compute alongside Postgres.
## `sql_exporter`
### Adding a `sql_exporter` Metric
We use `sql_exporter` to export various metrics from Postgres. In order to add
a metric, you will need to create two files: a `libsonnet` and a `sql` file. You
will then import the `libsonnet` file in one of the collector files, and the
`sql` file will be imported in the `libsonnet` file.
In the event your statistic is an LSN, you may want to cast it to a `float8`
because Prometheus only supports floats. It's probably fine because `float8` can
store integers from `-2^53` to `+2^53` exactly.

View File

@@ -0,0 +1 @@
/usr/local/lib

View File

@@ -0,0 +1,58 @@
{
collector_name: 'neon_collector',
metrics: [
import 'sql_exporter/checkpoints_req.libsonnet',
import 'sql_exporter/checkpoints_timed.libsonnet',
import 'sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet',
import 'sql_exporter/compute_current_lsn.libsonnet',
import 'sql_exporter/compute_logical_snapshot_files.libsonnet',
import 'sql_exporter/compute_logical_snapshots_bytes.libsonnet',
import 'sql_exporter/compute_max_connections.libsonnet',
import 'sql_exporter/compute_receive_lsn.libsonnet',
import 'sql_exporter/compute_subscriptions_count.libsonnet',
import 'sql_exporter/connection_counts.libsonnet',
import 'sql_exporter/db_total_size.libsonnet',
import 'sql_exporter/file_cache_read_wait_seconds_bucket.libsonnet',
import 'sql_exporter/file_cache_read_wait_seconds_count.libsonnet',
import 'sql_exporter/file_cache_read_wait_seconds_sum.libsonnet',
import 'sql_exporter/file_cache_write_wait_seconds_bucket.libsonnet',
import 'sql_exporter/file_cache_write_wait_seconds_count.libsonnet',
import 'sql_exporter/file_cache_write_wait_seconds_sum.libsonnet',
import 'sql_exporter/getpage_prefetch_discards_total.libsonnet',
import 'sql_exporter/getpage_prefetch_misses_total.libsonnet',
import 'sql_exporter/getpage_prefetch_requests_total.libsonnet',
import 'sql_exporter/getpage_prefetches_buffered.libsonnet',
import 'sql_exporter/getpage_sync_requests_total.libsonnet',
import 'sql_exporter/compute_getpage_stuck_requests_total.libsonnet',
import 'sql_exporter/compute_getpage_max_inflight_stuck_time_ms.libsonnet',
import 'sql_exporter/getpage_wait_seconds_bucket.libsonnet',
import 'sql_exporter/getpage_wait_seconds_count.libsonnet',
import 'sql_exporter/getpage_wait_seconds_sum.libsonnet',
import 'sql_exporter/lfc_approximate_working_set_size.libsonnet',
import 'sql_exporter/lfc_approximate_working_set_size_windows.libsonnet',
import 'sql_exporter/lfc_cache_size_limit.libsonnet',
import 'sql_exporter/lfc_chunk_size.libsonnet',
import 'sql_exporter/lfc_hits.libsonnet',
import 'sql_exporter/lfc_misses.libsonnet',
import 'sql_exporter/lfc_used.libsonnet',
import 'sql_exporter/lfc_used_pages.libsonnet',
import 'sql_exporter/lfc_writes.libsonnet',
import 'sql_exporter/logical_slot_restart_lsn.libsonnet',
import 'sql_exporter/max_cluster_size.libsonnet',
import 'sql_exporter/pageserver_disconnects_total.libsonnet',
import 'sql_exporter/pageserver_requests_sent_total.libsonnet',
import 'sql_exporter/pageserver_send_flushes_total.libsonnet',
import 'sql_exporter/pageserver_open_requests.libsonnet',
import 'sql_exporter/pg_stats_userdb.libsonnet',
import 'sql_exporter/replication_delay_bytes.libsonnet',
import 'sql_exporter/replication_delay_seconds.libsonnet',
import 'sql_exporter/retained_wal.libsonnet',
import 'sql_exporter/wal_is_lost.libsonnet',
],
queries: [
{
query_name: 'neon_perf_counters',
query: importstr 'sql_exporter/neon_perf_counters.sql',
},
],
}

View File

@@ -0,0 +1,11 @@
{
collector_name: 'neon_collector_autoscaling',
metrics: [
import 'sql_exporter/lfc_approximate_working_set_size_windows.autoscaling.libsonnet',
import 'sql_exporter/lfc_cache_size_limit.libsonnet',
import 'sql_exporter/lfc_hits.libsonnet',
import 'sql_exporter/lfc_misses.libsonnet',
import 'sql_exporter/lfc_used.libsonnet',
import 'sql_exporter/lfc_writes.libsonnet',
],
}

32
compute/etc/pgbouncer.ini Normal file
View File

@@ -0,0 +1,32 @@
[databases]
;; pgbouncer propagates application_name (if it's specified) to the server, but some
;; clients don't set it. We set default application_name=pgbouncer to make it
;; easier to identify pgbouncer connections in Postgres. If client sets
;; application_name, it will be used instead.
*=host=localhost port=5432 auth_user=cloud_admin application_name=pgbouncer
[pgbouncer]
listen_port=6432
listen_addr=0.0.0.0
auth_type=scram-sha-256
auth_user=cloud_admin
auth_dbname=postgres
client_tls_sslmode=disable
server_tls_sslmode=disable
pool_mode=transaction
max_client_conn=10000
default_pool_size=64
max_prepared_statements=0
admin_users=postgres
unix_socket_dir=/tmp/
unix_socket_mode=0777
; required for pgbouncer_exporter
ignore_startup_parameters=extra_float_digits
; pidfile for graceful termination
pidfile=/tmp/pgbouncer.pid
;; Disable connection logging. It produces a lot of logs that no one looks at,
;; and we can get similar log entries from the proxy too. We had incidents in
;; the past where the logging significantly stressed the log device or pgbouncer
;; itself.
log_connections=0
log_disconnections=0

View File

View File

@@ -0,0 +1,40 @@
function(collector_name, collector_file, connection_string) {
// Configuration for sql_exporter for autoscaling-agent
// Global defaults.
global: {
// If scrape_timeout <= 0, no timeout is set unless Prometheus provides one. The default is 10s.
scrape_timeout: '10s',
// Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
scrape_timeout_offset: '500ms',
// Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
min_interval: '0s',
// Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
// as will concurrent scrapes.
max_connections: 1,
// Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
// always be the same as max_connections.
max_idle_connections: 1,
// Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.
// If 0, connections are not closed due to a connection's age.
max_connection_lifetime: '5m',
},
// The target to monitor and the collectors to execute on it.
target: {
// Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)
// the schema gets dropped or replaced to match the driver expected DSN format.
data_source_name: connection_string,
// Collectors (referenced by name) to execute on the target.
// Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).
collectors: [
collector_name,
],
},
// Collector files specifies a list of globs. One collector definition is read from each matching file.
// Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).
collector_files: [
collector_file,
],
}

View File

@@ -0,0 +1 @@
SELECT num_requested AS checkpoints_req FROM pg_stat_checkpointer;

View File

@@ -0,0 +1,15 @@
local neon = import 'neon.libsonnet';
local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_req.sql';
local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_req.17.sql';
{
metric_name: 'checkpoints_req',
type: 'gauge',
help: 'Number of requested checkpoints',
key_labels: null,
values: [
'checkpoints_req',
],
query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,
}

View File

@@ -0,0 +1 @@
SELECT checkpoints_req FROM pg_stat_bgwriter;

View File

@@ -0,0 +1 @@
SELECT num_timed AS checkpoints_timed FROM pg_stat_checkpointer;

View File

@@ -0,0 +1,15 @@
local neon = import 'neon.libsonnet';
local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_timed.sql';
local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_timed.17.sql';
{
metric_name: 'checkpoints_timed',
type: 'gauge',
help: 'Number of scheduled checkpoints',
key_labels: null,
values: [
'checkpoints_timed',
],
query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,
}

View File

@@ -0,0 +1 @@
SELECT checkpoints_timed FROM pg_stat_bgwriter;

View File

@@ -0,0 +1,10 @@
{
metric_name: 'compute_backpressure_throttling_seconds_total',
type: 'counter',
help: 'Time compute has spent throttled',
key_labels: null,
values: [
'throttled',
],
query: importstr 'sql_exporter/compute_backpressure_throttling_seconds_total.sql',
}

View File

@@ -0,0 +1 @@
SELECT (neon.backpressure_throttling_time()::float8 / 1000000) AS throttled;

View File

@@ -0,0 +1,10 @@
{
metric_name: 'compute_current_lsn',
type: 'gauge',
help: 'Current LSN of the database',
key_labels: null,
values: [
'lsn',
],
query: importstr 'sql_exporter/compute_current_lsn.sql',
}

View File

@@ -0,0 +1,4 @@
SELECT CASE
WHEN pg_catalog.pg_is_in_recovery() THEN (pg_last_wal_replay_lsn() - '0/0')::FLOAT8
ELSE (pg_current_wal_lsn() - '0/0')::FLOAT8
END AS lsn;

Some files were not shown because too many files have changed in this diff Show More