Comparing fcab61bdcd...7c06a33df0 - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-31 01:40:39 +00:00

Author	SHA1	Message	Date
github-actions[bot]	7c06a33df0	Storage release 2025-04-04	2025-04-04 15:40:58 +00:00
Christian Schwarz	6ee84d985a	impr(perf tracing): ability to correlate with page_service logs (#11398 ) # Problem Current perf tracing fields do not allow answering the question what a specific Postgres backend was waiting for. # Background For Pageserver logs, we set the backend PID as the libpq `application_name` on the compute side, and funnel that into the a tracing field for the spans that emit to the global tracing subscriber. # Solution Funnel `application_name`, and the other fields that we use in the logging spans, into the root span for perf tracing. # Refs - fixes https://github.com/neondatabase/neon/issues/11393 - stacked atop https://github.com/neondatabase/neon/pull/11433 - epic: https://github.com/neondatabase/neon/issues/9873	2025-04-04 15:13:54 +00:00
JC Grünhage	295be03a33	impr(ci): send clearer notifications to slack when retrying container image pushes (#11447 ) ## Problem We've started sending slack notifications for failed container image pushes that are being retried. There are more messages coming in than expected, so clicking through the link to see what image failed is happening more often than we hoped. ## Summary of changes - Make slack notifications clearer, including whether the job succeeded and what retries have happened. - Log failures/retries in step more clearly, so that you can easily see when something fails.	2025-04-04 14:56:41 +00:00
Alexander Bayandin	8e1b5a9727	Fix Postgres build on macOS (#11442 ) ## Problem Postgres build fails with the following error on macOS: ``` /Users/bayandin/work/neon//vendor/postgres-v14/src/port/snprintf.c:424:27: error: 'strchrnul' is only available on macOS 15.4 or newer [-Werror,-Wunguarded-availability-new] 424 \| const char next_pct = strchrnul(format + 1, '%'); \| ^~~~~~~~~ /Users/bayandin/work/neon//vendor/postgres-v14/src/port/snprintf.c:376:14: note: 'strchrnul' has been marked as being introduced in macOS 15.4 here, but the deployment target is macOS 15.0.0 376 \| extern char strchrnul(const char s, int c); \| ^ /Users/bayandin/work/neon//vendor/postgres-v14/src/port/snprintf.c:424:27: note: enclose 'strchrnul' in a __builtin_available check to silence this warning 424 \| const char next_pct = strchrnul(format + 1, '%'); \| ^~~~~~~~~ 425 \| 426 \| /* Dump literal data we just scanned over / 427 \| dostr(format, next_pct - format, target); 428 \| if (target->failed) 429 \| break; 430 \| 431 \| if (next_pct == '\0') 432 \| break; 433 \| format = next_pct; \| 1 error generated. ``` ## Summary of changes - Update Postgres fork to include changes from `6da2ba1d8a` Corresponding Postgres PRs: - https://github.com/neondatabase/postgres/pull/608 - https://github.com/neondatabase/postgres/pull/609 - https://github.com/neondatabase/postgres/pull/610 - https://github.com/neondatabase/postgres/pull/611	2025-04-04 14:09:15 +00:00
Vlad Lazar	1ef4258f29	pageserver: add tenant level performance tracing sampling ratio (#11433 ) ## Problem https://github.com/neondatabase/neon/pull/11140 introduces performance tracing with OTEL and a pageserver config which configures the sampling ratio of get page requests. Enabling a non-zero sampling ratio on a per region basis is too aggressive and comes with perf impact that isn't very well understood yet. ## Summary of changes Add a `sampling_ratio` tenant level config which overrides the pageserver level config. Note that we do not cache the config and load it on every get page request such that changes propagate timely. Note that I've had to remove the `SHARD_SELECTION` span to get this to work. The tracing library doesn't expose a neat way to drop a span if one realises it's not needed at runtime. Closes https://github.com/neondatabase/neon/issues/11392	2025-04-04 13:41:28 +00:00
Vlad Lazar	65e2aae6e4	pageserver/secondary: deregister IO metrics (#11283 ) ## Problem IO metrics for secondary locations do not get deregistered when the timeline is removed. ## Summary of changes Stash the request context to be used for downloads in `SecondaryTimelineDetail`. These objects match the lifetime of the secondary timeline location pretty well. When the timeline is removed, deregister the metrics too. Closes https://github.com/neondatabase/neon/issues/11156	2025-04-04 10:52:59 +00:00
a-masterov	edc874e1b3	Use the same test image version as the computer one (#11448 ) ## Problem Changes in compute can cause errors in tests if another version of `neon-test-extensions` image is used. ## Summary of changes Use the same version of `neon-test-extensions` image as `compute` one for docker-compose based extension tests.	2025-04-04 10:13:00 +00:00
Dmitrii Kovalkov	181af302b5	storcon + safekeeper + scrubber: propagate root CA certs everywhere (#11418 ) ## Problem There are some places in the code where we create `reqwest::Client` without providing SSL CA certs from `ssl_ca_file`. These will break after we enable TLS everywhere. - Part of https://github.com/neondatabase/cloud/issues/22686 ## Summary of changes - Support `ssl_ca_file` in storage scrubber. - Add `use_https_safekeeper_api` option to safekeeper to use https for peer requests. - Propagate SSL CA certs to storage_controller/client, storcon's ComputeHook, PeerClient and maybe_forward.	2025-04-04 06:30:48 +00:00
Tristan Partin	497116b76d	Download extension if it does not exist on the filesystem (#11315 ) Previously we attempted to download all extensions in CREATE EXTENSION statements. Extensions like pg_stat_statements and neon are not remote extensions, but still we were requesting them when skip_pg_catalog_updates was set to false. Fixes: https://github.com/neondatabase/neon/issues/11127 Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-04 01:06:22 +00:00
Arpad Müller	a917952b30	Add test_storcon_create_delete_sk_down and make it work (#11400 ) Adds a test `test_storcon_create_delete_sk_down` which tests the reconciler and pending op persistence if faced with a temporary safekeeper downtime during timeline creation or deletion. This is in contrast to `test_explicit_timeline_creation_storcon`, which tests the happy path. We also do some fixes: * timeline and tenant deletion http requests didn't expect a body, but `()` sent one. * we got the tenant deletion http request's return type wrong: it's supposed to be a hash map * we add some logging to improve observability * We fix `list_pending_ops` which had broken code meant to make it possible to restrict oneself to a single pageserver. But diesel doesn't support that sadly, or at least I couldn't figure out a way to make it work. We don't need that functionality, so remove it. * We add an info span to the heartbeater futures with the node id, so that there is no context-free msgs like "Backoff: waiting 1.1 seconds before processing with the task" in the storcon logs. we could also add the full base url of the node but don't do it as most other log lines contain that information already, and if we do duplication it should at least not be verbose. One can always find out the base url from the node id. Successor of #11261 Part of #9011	2025-04-04 00:17:40 +00:00
Tristan Partin	e581b670f4	Improve nightly physical replication benchmark (#11389 ) Log the created project and endpoint IDs and improve typing in the source code to improve readability. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-03 23:00:58 +00:00
Alexander Bayandin	8ed79ed773	build(deps): bump h2 to 4.2.0 (#11437 ) ## Problem We switched `h2` from 4.1.0 to a git commit to fix stubgen (in https://github.com/neondatabase/neon/pull/10491). `h2` 4.2.0 was released soon after that, so we can switch back to a pinned version. Expected no changes, as 4.2.0 is the right next commit after the commit we currently use: `dacd614fed`%5E ## Summary of changes - Bump `h2` to 4.2.0	2025-04-03 21:42:34 +00:00
Alex Chi Z.	381f42519e	fix(pageserver): skip gc-compaction over sparse keyspaces (#11404 ) ## Problem Part of https://github.com/neondatabase/neon/issues/11318 It's not 100% safe for now to run gc-compaction over the sparse keyspace. It might cause deleted file to re-appear if a specific sequence of operations are done as in the issue, which in reality doesn't happen due to how we split delta/image layers based on the key range. A long-term fix would be either having a separate gc-compaction code path for metadata keys (as how we have a different code path for metadata image layer generation), or let the compaction process aware of the information of "there's an image layer that doesn't contain a key" so that we can skip the keys. ## Summary of changes * gc-compaction auto trigger only triggers compaction over the normal data range. * do not hold gc_block_guard across the full compaction job, only hold it during each subcompaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-03 19:40:44 +00:00
Arpad Müller	375df517a0	storcon: return 503 instead of 500 if there is no new leader yet (#11417 ) The leadership transfer protocol between storage controller instances is as follows, listing the steps for the new pod: The new pod does these things: 1. new pod comes online. looks in database if there is a leader. if there is, it asks that leader to step down. 2. the new pod does some operations to come online. they should be fairly short timed, but it's not zero. 3. the new pod updates the leader entry in the database. The old pod, once it gets the step down request, changes its internal state to stepped down. It treats all incoming requests specially now: instead of processing, it wants to forward them to the new pod. The forwarding however only works if the new pod is online already, so before forwarding it reads from the db for a leader (also to get the address to forward to in the first place). If the new pod is not online yet, i.e. during step 2 above, the old pod might legitimately land in the branch which this patch is editing: the leader in the database is a stepped down instance. Before, we've returned a `ApiError::InternalServerError`, but that would print the full backtrace plus an error log. With this patch, we cut down on the noise, as it's an expected situation to have a short storcon downtime while we are cutting over to the new instance. A `ResourceUnavailable` error is not just more fitting, it also doesn't print a backtrace once encountered, and only prints on the INFO log level (see `api_error_handler` function). Fixes #11320 cc #8954	2025-04-03 18:43:16 +00:00
Vlad Lazar	9db63fea7a	pageserver: optionally export perf traces in OTEL format (#11140 ) Based on https://github.com/neondatabase/neon/pull/11139 ## Problem We want to export performance traces from the pageserver in OTEL format. End goal is to see them in Grafana. ## Summary of changes https://github.com/neondatabase/neon/pull/11139 introduces the infrastructure required to run the otel collector alongside the pageserver. ### Design Requirements: 1. We'd like to avoid implementing our own performance tracing stack if possible and use the `tracing` crate if possible. 2. Ideally, we'd like zero overhead of a sampling rate of zero and be a be able to change the tracing config for a tenant on the fly. 3. We should leave the current span hierarchy intact. This includes adding perf traces without modifying existing tracing. To satisfy (3) (and (2) in part) a separate span hierarchy is used. `RequestContext` gains an optional `perf_span` member that's only set when the request was chosen by sampling. All perf span related methods added to `RequestContext` are no-ops for requests that are not sampled. This on its own is not enough for (3), so performance spans use a separate tracing subscriber. The `tracing` crate doesn't have great support for this, so there's a fair amount of boilerplate to override the subscriber at all points of the perf span lifecycle. ### Perf Impact [Periodic pagebench](https://neonprod.grafana.net/d/ddqtbfykfqfi8d/e904990?orgId=1&from=2025-02-08T14:15:59.362Z&to=2025-03-10T14:15:59.362Z&timezone=utc) shows no statistically significant regression with a sample ratio of 0. There's an annotation on the dashboard on 2025-03-06. ### Overview of changes: 1. Clean up the `RequestContext` API a bit. Namely, get rid of the `RequestContext::extend` API and use the builder instead. 2. Add pageserver level configs for tracing: sampling ratio, otel endpoint, etc. 3. Introduce some perf span tracking utilities and expose them via `RequestContext`. We add a `tracing::Span` wrapper to be used for perf spans and a `tracing::Instrumented` equivalent for it. See doc comments for reason. 4. Set up OTEL tracing infra according to configuration. A separate runtime is used for the collector. 5. Add perf traces to the read path. ## Refs - epic https://github.com/neondatabase/neon/issues/9873 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-04-03 17:56:51 +00:00
Alex Chi Z.	bfc767d60d	fix(test): wait for shard split complete for test_lsn_lease_storcon (#11436 ) ## Problem close https://github.com/neondatabase/neon/issues/11397 ref https://github.com/neondatabase/cloud/issues/23667 ## Summary of changes We need to wait until the shard split is complete, otherwise it will print warning like waiting for shard split exclusive lock for 30s. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-03 17:49:45 +00:00
Alex Chi Z.	109c54a300	fix(pageserver): avoid gc-compaction triggering circuit breaker (#11403 ) ## Problem There are some cases where traditional gc might collect some layer files causing gc-compaction cannot read the full history of the key. This needs to be resolved in the long-term by improving the compaction process. For now, let's simply avoid such errors triggering the circuit breaker. ## Summary of changes * Move the place where we trigger the circuit breaker. We only trigger it during compactions other than L0 compactions. We added the trigger a year ago due to file cleanup concerns in image layer compaction. * For gc-compaction, only return errors to the upper compaction_iteration if it's a shutdown error. Otherwise, just log it and skip the compaction for a key range. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-04-03 17:18:37 +00:00
Vlad Lazar	74920d8cd8	storcon: notify compute if correct observed state was refreshed (#11342 ) ## Problem Previously, if the observed state was refreshed and matching the intent, we wouldn't send a compute notification. This is unsafe. There's no guarantee that the location landed on the pageserver _and_ a compute notification for it was delivered. See https://github.com/neondatabase/neon/issues/11291#issuecomment-2743205411 for one such example. ## Summary of changes Add a reproducer and notify the compute if the correct observed state required a refresh. Closes https://github.com/neondatabase/neon/issues/11291	2025-04-03 16:35:55 +00:00
Alex Chi Z.	131b32ef48	fix(pageserver): clean up aux files before detaching (#11299 ) ## Problem Related to https://github.com/neondatabase/cloud/issues/26091 and https://github.com/neondatabase/cloud/issues/25840 Close https://github.com/neondatabase/neon/issues/11297 Discussion on Slack: https://neondb.slack.com/archives/C033RQ5SPDH/p1742320666313969 ## Summary of changes * When detaching, scan all aux files within `sparse_non_inherited_keyspace` in the ancestor timeline and create an image layer exactly at the ancestor LSN. All scanned keys will map to an empty value, which is a delete tombstone. - Note that end_lsn for rewritten delta layers = ancestor_lsn + 1, so the image layer will have image_end_lsn=end_lsn. With the current `select_layer` logic, the read path will always first read the image layer. * Add a test case. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-04-03 15:55:22 +00:00
Suhas Thalanki	581bb5d7d5	removed pg_anon setup from compute dockerfile (#10960 ) ## Problem Removing the `anon` v1 extension in postgres as described in https://github.com/neondatabase/cloud/issues/22663. This extension is not built for postgres v17 and is out of date when compared to the upstream variant which is v2 (we have v1.4). ## Summary of changes Removed the `anon` v1 extension from the compute docker image Related to https://github.com/neondatabase/cloud/issues/22663	2025-04-03 15:26:35 +00:00
JC Grünhage	3c78133477	feat(ci): add 'released' tag to container images from release runs (#11425 ) ## Problem We had a problem with https://github.com/neondatabase/neon/pull/11413 having e2e tests failing, because an e2e test (`8d271bed47`) depended on an unreleased pageserver fix (`0ee5bfa2fc`). This came up because neon release CI runs against the most recent releases of the other components, but cloud e2e tests run against latest, which is tagged from main. ## Summary of changes Add an additional `released` tag for released versions. ## Alternative to consider We could (and maybe should) instead switch to `latest` being used for released versions and `main` being used where we use `latest` right now. That'd also mean we don't have to adjust the CI in the cloud repo.	2025-04-03 14:57:44 +00:00
Suhas Thalanki	46e046e779	Exporting `file_cache_used` to calculate LFC utilization (#11384 ) ## Problem Exporting `file_cache_used` which specifies the number of used chunks in the LFC. This helps calculate LFC utilization as: `file_cache_used_pages / (file_cache_used * file_cache_chunk_size_pages)` ## Summary of changes Exporting `file_cache_used`. Related Issue: https://github.com/neondatabase/cloud/issues/26688	2025-04-03 14:54:45 +00:00
Arpad Müller	d8cee52637	Update rust to 1.86.0 (#11431 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Announcement blog post](https://blog.rust-lang.org/2025/04/03/Rust-1.86.0.html). Prior update was in #10914.	2025-04-03 14:53:28 +00:00
Dmitrii Kovalkov	2e11d129d0	tests: suppress mgm api timeout error in sotrcon (#11428 ) ## Problem Since `0f367cb665` the timeout in `with_client_retries` is implemented via `tokio::timeout` instead of `reqwest::ClientBuilder::timeout` (because we reuse the client). It changed the error representation if the timeout is exceeded. Such errors were suppressed in `allowed_errors.py`, but old regexps do not match the new error. Discussion: https://neondb.slack.com/archives/C033RQ5SPDH/p1743533184736319 ## Summary of changes - Add new `Timeout` error to `allowed_errors.py`	2025-04-03 14:18:50 +00:00
Luís Tavares	43a7423f72	feat: bump pg_session_jwt extension to 0.3.0 (#11399 ) ## Problem Bumps https://github.com/neondatabase/pg_session_jwt to the latest release [v0.3.0](https://github.com/neondatabase/pg_session_jwt/releases/tag/v0.3.0) that introduces PostgREST fallback mechanisms. ## Summary of changes Updates the extension download tar and the extension version in the proxy constant. ## Subscribers @mrl5	2025-04-03 13:01:18 +00:00
Arpad Müller	374736a4de	Print remote_addr span for Failed to serve HTTP connection error (#11423 ) I've encountered this error in #11422. Ideally we'd have the URL as well to associate it with a tenant, but at this level we only have the remote addr I guess. Better than nothing.	2025-04-03 11:58:12 +00:00
Alexander Bayandin	5e507776bc	Compute: update plv8 patch (#11426 ) ## Problem https://github.com/neondatabase/cloud/issues/26866 ## Summary of changes - Update plv8 patch Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2025-04-03 11:29:53 +00:00
Alexander Lakhin	4e8e0951be	Increase timeout for test_pageserver_gc_compaction_smoke (#11410 ) ## Problem The test_pageserver_gc_compaction_smoke fails rather often due to a timeout on slow machines. See https://github.com/neondatabase/neon/issues/11355. ## Summary of changes Increase the timeout for the test.	2025-04-03 11:23:30 +00:00
JC Grünhage	64a8d0c2e6	impr(ci): retry container image pushing and send slack messages for failures (#11416 ) ## Problem We've seen quite a few CI failures related to pushes to docker hub failing with weird error messages that indicate maybe docker hub is just not reliable. ## Summary of changes Retry container image pushing up to 10 times, and send a slack message if we had to retry, regardless of the job succeeding or not.	2025-04-03 08:59:42 +00:00
Tristan Partin	7602e6ffc0	Skip compute_ctl authorization checks in testing builds (#11186 ) We will require authorization in production. We need to skip in testing builds for now because regression tests would fail. See https://github.com/neondatabase/neon/issues/11316 for more information. Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-03 00:00:28 +00:00
Erik Grinaker	17193d6a33	test_runner: fix pagebench tenant configs (#11420 ) ## Problem Pagebench creates a bunch of tenants by first creating a template tenant and copying its remote storage, then attaching the copies to the Pageserver. These tenants had custom configurations to disable GC and compaction. However, these configs were only picked up by the Pageserver on attach, and not registered with the storage controller. This caused the storage controller to replace the tenant configs with the default tenant config, re-enabling GC and compaction which interferes with benchmark performance. Resolves #11381. ## Summary of changes Register the copied tenants with the storage controller, instead of directly attaching them to the Pageserver.	2025-04-02 20:11:39 +00:00
Erik Grinaker	03ae57236f	docs: add compaction notes (#11415 ) Lifted from https://www.notion.so/neondatabase/Rough-Notes-on-Compaction-1baf189e004780859e65ef63b85cfa81?pvs=4.	2025-04-02 19:55:08 +00:00
Arpad Müller	e3d27b2f68	Start safekeeper node IDs with 0 and forbid 0 from registering (#11419 ) Right now we start safekeeper node ids at 0. However, other code treats 0 as invalid (see #11407). We decided on latter. Therefore, make the register python tests register safekeepers starting at node id 1 instead of 0, and forbid safekeepers with id 0 from registering. Context: https://github.com/neondatabase/neon/pull/11407#discussion_r2024852328	2025-04-02 18:36:50 +00:00
Alex Chi Z.	dd1299f337	feat(storcon): passthrough mark invisible and add tests (#11401 ) ## Problem close https://github.com/neondatabase/neon/issues/11279 ## Summary of changes * Allow passthrough of other methods in tenant timeline shard0 passthrough of storcon. * Passthrough mark invisible API in storcon. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-02 17:11:49 +00:00
John Spray	cb19e4e05d	pageserver: remove legacy TimelineInfo::latest_gc_cutoff field (2/2) (#11136 ) ## Problem This field was retained for backward compat only in #10707. Once https://github.com/neondatabase/cloud/pull/25233 is released, nothing will be reading this field. Related: https://github.com/neondatabase/cloud/issues/24250 ## Summary of changes - Remove TimelineInfo::latest_gc_cutoff_lsn	2025-04-02 15:21:58 +00:00
Erik Grinaker	9df230c837	storcon: improve autosplit defaults (#11332 ) ## Problem In #11122, we changed the autosplit behavior to allow repeated and initial splits. The defaults were set such that they retain the current production settings (8 shards at 64 GB). However, these defaults don't really make sense by themselves. Once we deploy new settings to production, we should change the defaults to something more reasonable. ## Summary of changes Changes the following default settings: * `max_split_shards`: 8 → 16 * `initial_split_threshold`: 64 GB → disabled * `initial_split_shards`: 8 → 2	2025-04-02 15:11:52 +00:00
JC Grünhage	3c2bc5baba	fix(ci): run checks on release PRs (#11375 ) ## Problem Hotfix releases mean that sometimes changes in release PRs haven't been tested and linted yet. Disabling tests and lints is therefore not necessarily safe. In the future we will check whether tests have run on the same git tree already to speed things up, but for now we need to turn tests back on fully. This partially reverts: https://github.com/neondatabase/neon/pull/11272 ## Summary of changes Run checks on `.*-rc-pr` runs.	2025-04-02 14:32:53 +00:00
Alexey Kondratov	6667810800	chore(compute_ctl): Minor code and comment fixes (#11411 ) ## Problem In #11376 I mistakenly reworded one comment and also forgot to commit one of the suggestions. ## Summary of changes Fix it here.	2025-04-02 14:20:52 +00:00
Erik Grinaker	47f5bcf2bc	pageserver: don't periodically flush layers for stale attachments (#11317 ) ## Problem Tenants in attachment state `Stale` can't upload layers, and don't run compaction, but still do periodic L0 layer flushes in the tenant housekeeping loop. If the tenant remains stuck in stale mode, this causes a large buildup of L0 layers, causing logging, metrics increases, and possibly alerts. Resolves #11245. ## Summary of changes Don't perform periodic layer flushes in stale attachment state.	2025-04-02 12:55:15 +00:00
a-masterov	c179d098ef	Increase the timeout for extensions upgrade tests (on schedule). (#11406 ) ## Problem Sometimes the forced extension upgrade test fails (on schedule) due to a timeout. ## Summary of changes The timeout is increased to 60 mins.	2025-04-02 11:18:37 +00:00
Peter Bendel	4bc6dbdd5f	use a prod-like shared_buffers size for some perf unit tests (#11373 ) ## Problem In Neon DBaaS we adjust the shared_buffers to the size of the compute, or better described we adjust the max number of connections to the compute size and we adjust the shared_buffers size to the number of max connections according to about the following sizes `2 CU: 225mb; 4 CU: 450mb; 8 CU: 900mb` [see](`877e33b428/goapp/controlplane/internal/pkg/compute/computespec/pg_settings.go (L405)`) ## Summary of changes We should run perf unit tests with settings that is realistic for a paying customer and select 8 CU as the reference for those tests.	2025-04-02 10:43:05 +00:00
John Spray	7dc8370848	storcon: do graceful migrations from chaos injector (#11028 ) ## Problem Followup to https://github.com/neondatabase/neon/pull/10913 Existing chaos injection just does simple cutovers to secondary locations. Let's also exercise code for doing graceful migrations. This should implicitly test how such migrations cope with overlapping with service restarts. ## Summary of changes	2025-04-02 10:08:05 +00:00
Alex Chi Z.	c4fc602115	feat(pageserver): support synthetic size calculation for invisible branches (#11335 ) ## Problem ref https://github.com/neondatabase/neon/issues/11279 Imagine we have a branch with 3 snapshots A, B, and C: ``` base---+---+---+---main \-A \-B \-C base=100G, base-A=1G, A-B=1G, B-C=1G, C-main=1G ``` at this point, the synthetic size should be 100+1+1+1+1=104G. after the deletion, the structure looks like: ``` base---+---+---+ \-A \-B \-C ``` If we simply assume main never exists, the size will be calculated as size(A) + size(B) + size(C)=300GB, which obviously is not what the user would expect. The correct way to do this is to assume part of main still exists, that is to say, set C-main=1G: ``` base---+---+---+main \-A \-B \-C ``` And we will get the correct synthetic size of 100G+1+1+1=103G. ## Summary of changes * Do not generate gc cutoff point for invisible branches. * Use the same LSN as the last branchpoint for branch end. * Remove test_api_handler for mark_invisible. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-01 18:50:58 +00:00
Konstantin Knizhnik	02936b82c5	Fix effective_lsn calculation for prefetch (#11219 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1741594233757489 Consider the following scenario: 1. Backend A wants to prefetch some block B 2. Backend A checks that block B is not present in shared buffer 3. Backend A registers new prefetch request and calls prefetch_do_request 4. prefetch_do_request calls neon_get_request_lsns 5. neon_get_request_lsns obtains LwLSN for block B 6. Backend B downloads B, updates and wallogs it (let say to Lsn1) 7. Block B is once again thrown from shared buffers, its LwLSN is set to Lsn1 8. Backend A obtains current flush LSN, let's say that it is Lsn1 9. Backend A stores Lsn1 as effective_lsn in prefetch slot. 10. Backend A reads page B with LwLSN=Lsn1 11. Backend A finds in prefetch ring response for prefetch request for block B with effective_lsn=Lsn1, so that it satisfies neon_prefetch_response_usable condition 12. Backend A uses deteriorated version of the page! ## Summary of changes Use `not_modified_since` as `effective_lsn`. It should not cause some degrade of performance because we store LwLSN when it was not found in LwLSN hash, so if page is not changed till prefetch response is arrived, then LwLSN should not be changed. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-04-01 15:48:02 +00:00
Arpad Müller	1fad1abb24	storcon: timeline deletion improvements and fixes (#11334 ) This PR contains a bunch of smaller followups and fixes of the original PR #11058. most of these implement suggestions from Arseny: * remove `Queryable, Selectable` from `TimelinePersistence`: they are not needed. * no `Arc` around `CancellationToken`: it itself is an arc wrapper * only schedule deletes instead of scheduling excludes and deletes * persist and delete deletion ops * delete rows in timelines table upon tenant and timeline deletion * set `deleted_at` for timelines we are deleting before we start any reconciles: this flag will help us later to recognize half-executed deletions, or when we crashed before we could remove the timeline row but after we removed the last pending op (handling these situations are left for later). Part of #9011	2025-04-01 15:16:33 +00:00
Arpad Müller	016068b966	API spec: add safekeepers field returned by storcon (#11385 ) Add an optional `safekeepers` field to `TimelineInfo` which is returned by the storcon upon timeline creation if the `--timelines-onto-safekeepers` flag is enabled. It contains the list of safekeepers chosen. Other contexts where we return `TimelineInfo` do not contain the `safekeepers` field, sadly I couldn't make this more type safe like done in Rust via `TimelineCreateResponseStorcon`, as there is no way of flattening or inheritance (and I don't that duplicating the entire type for some minor type safety improvements is worth it). The storcon side has been done in #11058. Part of https://github.com/neondatabase/cloud/issues/16176 cc https://github.com/neondatabase/cloud/issues/16796	2025-04-01 12:39:10 +00:00
Erik Grinaker	80596feeaa	pageserver: invert `CompactFlags::NoYield` as `YieldForL0` (#11382 ) ## Problem `CompactFlags::NoYield` was a bit inconvenient, since every caller except for the background compaction loop should generally set it (e.g. HTTP API calls, tests, etc). It was also inconsistent with `CompactionOutcome::YieldForL0`. ## Summary of changes Invert `CompactFlags::NoYield` as `CompactFlags::YieldForL0`. There should be no behavioral changes.	2025-04-01 11:43:58 +00:00
Erik Grinaker	225cabd84d	pageserver: update upload queue TODOs (#11377 ) Update some upload queue TODOs, particularly to track https://github.com/neondatabase/neon/issues/10283, which I won't get around to.	2025-04-01 11:38:12 +00:00
Alexey Kondratov	557127550c	feat(compute): Add compute_ctl_up metric (#11376 ) ## Problem For computes running inside NeonVM, the actual compute image tag is buried inside the NeonVM spec, and we cannot get it as part of standard k8s container metrics (it's always an image and a tag of the NeonVM runner container). The workaround we currently use is to extract the running computes info from the control plane database with SQL. It has several drawbacks: i) it's complicated, separate DB per region; ii) it's slow; iii) it's still an indirect source of info, i.e. k8s state could be different from what the control plane expects. ## Summary of changes Add a new `compute_ctl_up` gauge metric with `build_tag` and `status` labels. It will help us to both overview what are the tags/versions of all running computes; and to break them down by current status (`empty`, `running`, `failed`, etc.) Later, we could introduce low cardinality (no endpoint or compute ids) streaming aggregates for such metrics, so they will be blazingly fast and usable for monitoring the fleet-wide state.	2025-04-01 08:51:17 +00:00
Konstantin Knizhnik	cfe3e6d4e1	Remove loop from pageserver_try_receive (#11387 ) ## Problem Commit `3da70abfa5` cause noticeable performance regression (40% in update-with-prefetch in test_bulk_update): https://neondb.slack.com/archives/C04BLQ4LW7K/p1742633167580879 ## Summary of changes Remove loop from pageserver_try_receive to make it fetch not more than one response. There is still loop in `pump_prefetch_state` which can fetch as many responses as available. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-31 19:49:32 +00:00
Alex Chi Z.	47d47000df	fix(pageserver): passthrough lsn lease in storcon API (#11386 ) ## Problem part of https://github.com/neondatabase/cloud/issues/23667 ## Summary of changes lsn_lease API can only be used on pageservers. This patch enables storcon passthrough. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-31 19:16:42 +00:00
Matthias van de Meent	e5b95bc9dc	Neon LFC/prefetch: Improve page read handling (#11380 ) Previously we had different meanings for the bitmask of vector IOps. That has now been unified to "bit set = final result, no more scribbling". Furthermore, the LFC read path scribbled on pages that were already read; that's probably not a good thing so that's been fixed too. In passing, the read path of LFC has been updated to read only the requested pages into the provided buffers, thus reducing the IO size of vectorized IOs. ## Problem ## Summary of changes	2025-03-31 17:04:00 +00:00
Alex Chi Z.	0ee5bfa2fc	fix(pageserver): allow sibling archived branch for detaching (#11383 ) ## Problem close https://github.com/neondatabase/neon/issues/11379 ## Summary of changes Remove checks around archived branches for detach v2. I also updated the comments `ancestor_retain_lsn`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-31 16:32:55 +00:00
Fedor Dikarev	00bcafe82e	chore(ci): upgrade stats action with docker images from ghcr.io (#11378 ) ## Problem Current version of GitHub Workflow Stats action pull docker images from DockerHub, that could be an issue with the new pull limits on DockerHub side. ## Summary of changes Switch to version `v0.2.2`, with docker images hosted on `ghcr.io`	2025-03-31 14:21:07 +00:00
Conrad Ludgate	ed117af73e	chore(proxy/tokio-postgres): remove phf from sqlstate and switch to tracing (#11249 ) In sqlstate, we have a manual `phf` construction, which is not explicitly guaranteed to be stable - you're intended to use a build.rs or the macro to make sure it's constructed correctly each time. This was inherited from tokio-postgres upstream, which has the same issue (https://github.com/rust-phf/rust-phf/pull/321#issuecomment-2724521193). We don't need this encoding of sqlstate, so I've switched it to simply parse 5 bytes (https://www.postgresql.org/docs/current/errcodes-appendix.html). While here, I switched out log for tracing.	2025-03-31 12:35:51 +00:00
Konstantin Knizhnik	21a891a06d	Fix IS_LOCAL_REL macro (first class has oid=FirstNormalObjectId) (#11369 ) ## Problem Macro IS_LOCAL_REL used for DEBUG_COMPARE_LOCAL mode use greater-than rather than greater-or-equal comparison while first table really is assigned FirstNormalObjectId. ## Summary of changes Replace strict greater with greater-or-equal comparison. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-31 11:16:35 +00:00
Alexander Bayandin	30a7dd630c	ruff: enable TC — flake8-type-checking (#11368 ) ## Problem `TYPE_CHECKING` is used inconsistently across Python tests. ## Summary of changes - Update `ruff`: 0.7.0 -> 0.11.2 - Enable TC (flake8-type-checking): https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc - (auto)fix all new issues	2025-03-30 18:58:33 +00:00
Erik Grinaker	db5384e1b0	pageserver: remove L0 flush upload wait (#11196 ) ## Problem Previously, L0 flushes would wait for uploads, as a simple form of backpressure. However, this prevented flush pipelining and upload parallelism. It has since been disabled by default and replaced by L0 compaction backpressure. Touches https://github.com/neondatabase/cloud/issues/24664. ## Summary of changes This patch removes L0 flush upload waits, along with the `l0_flush_wait_upload`. This can't be merged until the setting has been removed across the fleet.	2025-03-30 13:14:04 +00:00
JC Grünhage	5cb6a4bc8b	fix(ci): use the right sha in release PRs (#11365 ) ## Problem `github.sha` contains a merge commit of `head` and `base` if we're in a PR. In release PRs, this makes no sense, because we fast-forward the `base` branch to contain the changes from `head`. Even though we correctly use `${{ github.event.pull_request.head.sha \|\| github.sha }}` to reference the git commit when building artifacts, we don't use that when checking out code, because we want to test the merge of head and base usually. In the case of release PRs, we definitely always want to test on the head sha though, because we're going to forward that, and it already has the base sha as a parent, so the merge would end up with the same tree anyway. As a side effect, not checking out `${{ github.event.pull_request.head.sha \|\| github.sha }}` also caused https://github.com/neondatabase/neon/actions/runs/13986389780/job/39173256184#step:6:49 to say `release-tag=release-compute-8187`, while https://github.com/neondatabase/neon/actions/runs/14084613121/job/39445314780#step:6:48 is talking about `build-tag=release-compute-8186` ## Summary of changes Run a few things on `github.event.pull_request.head.sha`, if we're in a release PR.	2025-03-28 11:56:24 +00:00
Folke Behrens	1dbf40ee2c	proxy: Update redis crate (#11372 )	2025-03-28 11:43:52 +00:00
github-actions[bot]	85992000e3	Storage release 2025-03-28	2025-03-28 06:02:32 +00:00
Fedor Dikarev	939354abea	chore(ci): pin python base images to sha (#11367 ) Similar to how we pin base `debian` images, also pin `python` base images, so we better cache them and have reproducible builds.	2025-03-27 17:42:28 +00:00
Fedor Dikarev	1d5d168626	impr(ci): use hetzner buckets for cache (#11364 ) ## Problem Occasionally getting data from GH cache could be slow, with less than 10MB/s and taking 5+ minutes to download cache: ``` Received 20971520 of 2987085791 (0.7%), 9.9 MBs/sec Received 50331648 of 2987085791 (1.7%), 15.9 MBs/sec ... Received 1065353216 of 2987085791 (35.7%), 4.8 MBs/sec Received 1065353216 of 2987085791 (35.7%), 4.7 MBs/sec ... ``` https://github.com/neondatabase/neon/actions/runs/13956437454/job/39068664599#step:7:17 Resulting in getting cache even longer that build time. ## Summary of changes Switch to the caches, that are closer to the runners, and they provided stable throughput about 70-80MB/s	2025-03-27 11:11:45 +00:00
Folke Behrens	b40dd54732	compute-node: Add some debugging tools to image (#11352 ) ## Problem Some useful debugging tools are missing from the compute image and sometimes it's impossible to install them because memory is tightly packed. ## Summary of changes Add the following tools: iproute2, lsof, screen, tcpdump. The other changes come from sorting the packages alphabetically. ```bash $ docker image inspect ghcr.io/neondatabase/vm-compute-node-v16:7555 \| jaq '.[0].Size' 1389759645 $ docker image inspect ghcr.io/neondatabase/vm-compute-node-v16:14083125313 \| jaq '.[0].Size' 1396051101 $ echo $((1396051101 - 1389759645)) 6291456 ```	2025-03-27 11:09:27 +00:00
Folke Behrens	4bb7087d4d	proxy: Fix some clippy warnings coming in next versions (#11359 )	2025-03-26 10:50:16 +00:00
Arpad Müller	5f3551e405	Add "still waiting for task" for slow shutdowns (#11351 ) To help with narrowing down https://github.com/neondatabase/cloud/issues/26362, we make the case more noisy where we are wait for the shutdown of a specific task (in the case of that issue, the `gc_loop`).	2025-03-24 17:29:44 +00:00
Anastasia Lubennikova	3e5884ff01	Revert "feat(compute_ctl): allow to change audit_log_level for existi… (#11343 ) …ng (#11308)" This reverts commit `e5aef3747c`. The logic of this commit was incorrect: enabling audit requires a restart of the compute, because audit extensions use shared_preload_libraries. So it cannot be done in the configuration phase, require endpoint restart instead.	2025-03-21 18:09:34 +00:00
Vlad Lazar	9fc7c22cc9	storcon: add use_local_compute_notifications flag (#11333 ) ## Problem While working on bulk import, I want to use the `control-plane-url` flag for a different request. Currently, the local compute hook is used whenever no control plane is specified in the config. My test requires local compute notifications and a configured `control-plane-url` which isn't supported. ## Summary of changes Add a `use-local-compute-notifications` flag. When this is set, we use the local flow regardless of other config values. It's enabled by default in neon_local and disabled by default in all other envs. I had to turn the flag off in tests that wish to bypass the local flow, but that's expected. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-03-21 15:31:06 +00:00
Folke Behrens	23ad228310	pgxn: Increase the pageserver response timeout a bit (#11339 ) Increase the PS response timeout slightly but noticeably, so it does not coincide with the default TCP_RTO_MAX.	2025-03-21 14:21:53 +00:00
Dmitrii Kovalkov	aeb53fea94	storage: support multiple SSL CA certificates (#11341 ) ## Problem - We need to support multiple SSL CA certificates for graceful root CA certificate rotation. - Closes: https://github.com/neondatabase/cloud/issues/25971 ## Summary of changes - Parses `ssl_ca_file` as a pem bundle, which may contain multiple certificates. Single pem cert is a valid pem bundle, so the change is backward compatible.	2025-03-21 13:43:38 +00:00
Dmitrii Kovalkov	0f367cb665	storcon: reuse reqwest http client (#11327 ) ## Problem - Part of https://github.com/neondatabase/neon/issues/11113 - Building a new `reqwest::Client` for every request is expensive because it parses CA certs under the hood. It's noticeable in storcon's flamegraph. ## Summary of changes - Reuse one `reqwest::Client` for all API calls to avoid parsing CA certificates every time.	2025-03-21 11:48:22 +00:00
John Spray	76088c16d2	storcon: reproduce shard split issue (#11290 ) ## Problem Issue https://github.com/neondatabase/neon/issues/11254 describes a case where restart during a shard split can result in a bad end state in the database. ## Summary of changes - Add a reproducer for the issue - Tighten an existing safety check around updated row counts in complete_shard_split	2025-03-21 08:48:56 +00:00
John Spray	0d99609870	docs: storage controller retro-RFC (#11218 ) ## Problem Various aspects of the controller's job are already described in RFCs, but the overall service didn't have an RFC that records design tradeoffs and the top level structure. ## Summary of changes - Add a retrospective RFC that should be useful for anyone understanding storage controller functionality	2025-03-21 08:32:11 +00:00
Alex Chi Z.	bae9b9acdc	feat(pageserver): persist timeline invisible flag (#11331 ) ## Problem part of https://github.com/neondatabase/neon/issues/11279 ## Summary of changes The invisible flag is used to exclude a timeline from synthetic size calculation. For the first step, let's persist this flag. Most of the code are following the `is_archived` modification flow. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-20 18:39:08 +00:00
Nikita Kalyanov	53f54ba37a	chore: expose detach_v2 (#11325 ) we need this exposed in the spec to use it in cplane. extracted from https://github.com/neondatabase/cloud/pull/26167 ## Problem ## Summary of changes	2025-03-20 18:04:17 +00:00
Dmitrii Kovalkov	28fc051dcc	storage: live ssl certificate reload (#11309 ) ## Problem SSL certs are loaded only during start up. It doesn't allow the rotation of short-lived certificates without server restart. - Closes: https://github.com/neondatabase/cloud/issues/25525 ## Summary of changes - Implement `ReloadingCertificateResolver` which reloads certificates from disk periodically.	2025-03-20 16:26:27 +00:00
Folke Behrens	d0102a473a	pgxn: Include local port in no-response log messages (#11321 ) ## Problem Now that stuck connections are quickly terminated it's not easy to quickly find the right port from the pid to correlate the connection with the one seen on pageserver side. ## Summary of changes Call getsockname() and include the local port number in the no-response-from-pageserver log messages.	2025-03-20 16:06:00 +00:00
Alex Chi Z.	78502798ae	feat(compute_ctl): pass compute type to pageserver with pg_options (#11287 ) ## Problem second try of https://github.com/neondatabase/neon/pull/11185, part of https://github.com/neondatabase/cloud/issues/24706 ## Summary of changes Tristan reminded me of the `options` field of the pg wire protocol, which can be used to pass configurations. This patch adds the parsing on the pageserver side, and supplies `neon.endpoint_type` as part of the `options`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-20 15:48:40 +00:00
Erik Grinaker	65d690b21d	storcon: add repeated auto-splits and initial splits (#11122 ) ## Problem Currently, we only split tenants into 8 shards once, at the 64 GB split threshold. For very large tenants, we need to keep splitting to avoid huge shards. And we also want to eagerly split at a lower threshold to improve throughput during initial ingestion. See https://github.com/neondatabase/cloud/issues/22532#issuecomment-2706215907 for details. Touches https://github.com/neondatabase/cloud/issues/22532. Requires #11157. ## Summary of changes This adds parameters and logic to enable repeated splits when a tenant's largest timeline divided by shard count exceeds `split_threshold`, as well as eager initial splits at a lower threshold to speed up initial ingestion. The default parameters are all set such that they retain the current behavior in production (only split into 8 shards once, at 64 GB). * `split_threshold` now specifies a maximum shard size. When a shard exceeds it, all tenant shards are split by powers of 2 such that all tenant shards fall below `split_threshold`. Disabled by default, like today. * Add `max_split_shards` to specify a max shard count for autosplits. Defaults to 8 to retain current behavior. * Add `initial_split_threshold` and `initial_split_shards` to specify a threshold and target count for eager splits of unsharded tenants. Defaults to 64 GB and 8 shards to retain current production behavior. Because this PR sets `initial_split_threshold` to 64 GB by default, it has the effect of enabling autosplits by default. This was not the case previously, since `split_threshold` defaults to None, but it is already enabled across production and staging. This is temporary until we complete the production rollout. For more details, see code comments. This must wait until #11157 has been deployed to Pageservers. Once this has been deployed to production, we plan to change the parameters to: * `split-threshold`: 256 GB * `initial-split-threshold`: 16 GB * `initial-split-shards`: 4 * `max-split-shards`: 16 The final split points will thus be: * Start: 1 shard * 16 GB: 4 shards * 1 TB: 8 shards * 2 TB: 16 shards We will then change the default settings to be disabled by default. --------- Co-authored-by: John Spray <john@neon.tech>	2025-03-20 15:43:57 +00:00
Arpad Müller	5dd60933d3	Mark min_readable_lsn as required in the pageserver API spec (#11324 ) Fully applies the changes of https://github.com/neondatabase/cloud/pull/25233 to neon.git. The field is always present in the Rust struct definition, so it can be marked as required. cc #10707	2025-03-20 15:25:09 +00:00
Gleb Novikov	2065074559	fast_import: put job status to s3 (#11284 ) ## Problem `fast_import` binary is being run inside neonvms, and they do not support proper `kubectl describe logs` now, there are a bunch of other caveats as well: https://github.com/neondatabase/autoscaling/issues/1320 Anyway, we needed a signal if job finished successfully or not, and if not — at least some error message for the cplane operation. And after [a short discussion](https://neondb.slack.com/archives/C07PG8J1L0P/p1741954251813609), that s3 object is the most convenient at the moment. ## Summary of changes If `s3_prefix` was provided to `fast_import` call, any job run puts a status object file into `{s3_prefix}/status/fast_import` with contents `{"done": true}` or `{"done": false, "error": "..."}`. Added a test as well	2025-03-20 15:23:35 +00:00
Konstantin Knizhnik	3da70abfa5	Fix pageserver_try_receive (#11096 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1741176713523469 The problem is that this function is using `PQgetCopyData(shard->conn, &resp_buff.data, 1 /* async = true */)` to try to fetch next message. But this function returns 0 if the whole message is not present in the buffer. And input buffer may contain only part of message so result is not fetched. ## Summary of changes Use `PQisBusy` + `WaitEventSetWait` to check if data is available and `PQgetCopyData(shard->conn, &resp_buff.data, 0)` to read whole message in this case. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-20 15:21:00 +00:00
Anastasia Lubennikova	e5aef3747c	feat(compute_ctl): allow to change audit_log_level for existing (#11308 ) projects. Preserve the information about the current audit log level in compute state, so that we don't relaunch rsyslog on every spec change https://github.com/neondatabase/cloud/issues/25349 --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2025-03-20 11:23:20 +00:00
Arpad Müller	91dad2514f	storcon: also support tenant deletion for safekeepers (#11289 ) If a tenant gets deleted, delete also all of its timelines. We assume that by the time a tenant is being deleted, no new timelines are being created, so we don't need to worry about races with creation in this situation. Unlike #11233, which was very simple because it listed the timelines and invoked timeline deletion, this PR obtains a list of safekeepers to invoke the tenant deletion on, and then invokes tenant deletion on each safekeeper that has one or multiple timelines. Alternative to #11233 Builds on #11288 Part of #9011	2025-03-20 10:52:21 +00:00
Dmitrii Kovalkov	9bf59989db	storcon: add https API (#11239 ) ## Problem Pageservers use unencrypted HTTP requests for storage controller API. - Closes: https://github.com/neondatabase/cloud/issues/25524 ## Summary of changes - Replace hyper0::server::Server with http_utils::server::Server in storage controller. - Add HTTPS handler for storage controller API. - Support `ssl_ca_file` in pageserver.	2025-03-20 08:22:02 +00:00
Tristan Partin	c6f5a58d3b	Remove potential for SQL injection (#11260 ) Timeline IDs do not contain characters that may cause a SQL injection, but best to always play it safe. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-19 19:19:38 +00:00
Suhas Thalanki	5589efb6de	moving LastWrittenLSNCache to Neon Extension (#11031 ) ## Problem We currently have this code duplicated across different PG versions. Moving this to an extension would reduce duplication and simplify maintenance. ## Summary of changes Moving the LastWrittenLSN code from PG versions to the Neon extension and linking it with hooks. Related Postgres PR: https://github.com/neondatabase/postgres/pull/590 Closes: https://github.com/neondatabase/neon/issues/10973 --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2025-03-19 17:29:40 +00:00
Folke Behrens	019a29748d	proxy: Move release PR creation to Tuesday (#11306 ) Move the creation of the proxy release PR to Tuesday mornings.	2025-03-19 16:45:29 +00:00
StepSecurity Bot	88ea855cff	fix(ci): Fixing StepSecurity Flagged Issues (#11311 ) This pull request is created by [StepSecurity](https://app.stepsecurity.io/securerepo) at the request of @areyou1or0. ## Summary This pull request is created by [StepSecurity](https://app.stepsecurity.io/securerepo) at the request of @areyou1or0. Please merge the Pull Request to incorporate the requested changes. Please tag @areyou1or0 on your message if you have any questions related to the PR. ## Summary This pull request is created by [StepSecurity](https://app.stepsecurity.io/securerepo) at the request of @areyou1or0. Please merge the Pull Request to incorporate the requested changes. Please tag @areyou1or0 on your message if you have any questions related to the PR. ## Security Fixes ### Least Privileged GitHub Actions Token Permissions The GITHUB_TOKEN is an automatically generated secret to make authenticated calls to the GitHub API. GitHub recommends setting minimum token permissions for the GITHUB_TOKEN. - [GitHub Security Guide](https://docs.github.com/en/actions/security-guides/automatic-token-authentication#using-the-github_token-in-a-workflow) - [The Open Source Security Foundation (OpenSSF) Security Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions) ### Pinned Dependencies GitHub Action tags and Docker tags are mutable. This poses a security risk. GitHub's Security Hardening guide recommends pinning actions to full length commit. - [GitHub Security Guide](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions) - [The Open Source Security Foundation (OpenSSF) Security Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies) ### Harden Runner [Harden-Runner](https://github.com/step-security/harden-runner) is an open-source security agent for the GitHub-hosted runner to prevent software supply chain attacks. It prevents exfiltration of credentials, detects tampering of source code during build, and enables running jobs without `sudo` access. See how popular open-source projects use Harden-Runner [here](https://docs.stepsecurity.io/whos-using-harden-runner). <details> <summary>Harden runner usage</summary> You can find link to view insights and policy recommendation in the build log <img src="https://github.com/step-security/harden-runner/blob/main/images/buildlog1.png?raw=true" width="60%" height="60%"> Please refer to [documentation](https://docs.stepsecurity.io/harden-runner) to find more details. </details> will fix https://github.com/neondatabase/cloud/issues/26141	2025-03-19 16:44:22 +00:00
Vlad Lazar	9ce3704ab5	pageseserver: rename cplane api to storage controller api (#11310 ) ## Problem The pageserver upcall api was designed to work with control plane or the storage controller. We have completed the transition period and now the upcall api only targets the storage controller. ## Summary of changes Rename types accordingly and tweak some comments.	2025-03-19 16:29:52 +00:00
Alexey Kondratov	518269ea6a	feat(compute): Add perf test for compute startup time breakdown (#11198 ) ## Problem We had a recent Postgres startup latency (`start_postgres_ms`) degradation, but it was only caught with SLO alerts. There was actually an existing test for the same purpose -- `start_postgres_ms`, but it's doing only two starts, so it's a bit noisy. ## Summary of changes Add new compute startup latency test that does 100 iterations and reports p50, p90 and p99 latencies. Part of https://github.com/neondatabase/cloud/issues/24882	2025-03-19 16:11:33 +00:00
a-masterov	cf9d817a21	Add tests for some extensions currently not covered by the regression tests (#11191 ) ## Problem Some extensions do not contain tests, which can be easily run on top of docker-compose or staging. ## Summary of changes Added the pg_regress based tests for `pg_tiktoken`, `pgx_ulid`, `pg_rag` Now they will be run on top of docker-compose, but I intend to adopt them to be run on top staging in the next PRs	2025-03-19 15:38:33 +00:00
John Spray	55cb07f680	pageserver: improve debuggability of timeline creation failures during chaos testing (#11300 ) ## Problem We're seeing timeline creation failures that look suspiciously like some race with the cleanup-deletion of initdb temporary directories. I couldn't spot the bug, but we can make it a bit easier to debug. Related: https://github.com/neondatabase/neon/issues/11296 ## Summary of changes - Avoid surfacing distracting ENOENT failure to delete as a log error -- this is fine, and can happen if timeline is cancelled while doing initdb, or if initdb itself has an error where it doesn't write the dir (this error is surfaced separately) - Log after purging initdb temp directories	2025-03-19 13:38:03 +00:00
Christian Schwarz	0f20dae3c3	impr: merge `pageserver_api::models::TenantConfig` and `pageserver::tenant::config::TenantConfOpt` (#11298 ) The only difference between - `pageserver_api::models::TenantConfig` and - `pageserver::tenant::config::TenantConfOpt` at this point is that `TenantConfOpt` serializes with `skip_serializing_if = Option::is_none`. That is an efficiency improvement for all the places that currently serde `models::TenantConfig` because new serializations will no longer write `$fieldname: null` for each field that is `None` at runtime. This should be particularly beneficial for Storcon, which stores JSON-serialized `models::TenantConfig` in its DB. # Behavior Changes This PR changes the serialization behavior: we omit `None` fields instead of serializing `$fieldname: null`). So it's a data format change (see section on compatibility below). And it changes API responses from Storcon and Pageserver. ## API Response Compatibility Storcon returns the location description. Afaik it is passed through into - storcon_cli output - storcon UI in console admin UI These outputs will no longer contain `$fieldname: null` values, which de-bloats the output (good). But in storcon UI, it also serves as an editor "default", which will be eliminated after a storcon with this PR is released. ## Data Format Compatibility Backwards compat: new software reading old serialized data will deserialize to the same runtime value because all the field types are exactly the same and `skip_serializing_if` does not affect deserialization. Forward compat: old software reading data serialized by new software will map absence fields in the serialized form to runtime value `Option::None`. This is serde default behavior, see this playground to convince yourself: https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=f7f4e1a169959a3085b6158c022a05eb The `serde(with="humantime_serde")` however behaves strangely: if used on an `Option<Duration>`, it still requires the field to be present, unlike the serde default behavior shown in the previous paragraph. The workaround is to set `serde(default)`. Previously it was set on each individual field, but, we do have the container attribute, so, set it there. This requires deriving a `Default` impl, which, because all fields are `Option`, is non-magic. See my notes here: https://gist.github.com/problame/eddbc225a5d12617e9f2c6413e0cf799 # Future Work We should have separate types (& crates) for - runtime types configuration (e.g. PageServerConf::tenant_config, AttachedLocationConf) - `config-v1` file pageserver local disk file format - `mgmt API` - `pageserver.toml` Right now they all use the same, which is convenient but makes it hard to reason about compatibility breakage. # Refs - corresponding docs.neon.build PR https://github.com/neondatabase/docs/pull/470	2025-03-19 12:47:17 +00:00
JC Grünhage	aedeb37220	fix(ci): put the BUILD_TAG of the upcoming release into RC PR artifacts (#11304 ) ## Problem #11061 changed how artifacts for releases are built, by reusing/retagging the artifacts from release PRs. This resulted in the BUILD_TAG that's baked into the images to not be as expected. Context: https://neondb.slack.com/archives/C08JBTT3R1Q/p1742333300129069 ## Summary of changes Set BUILD_TAG to the release tag of the upcoming release when running inside release PRs.	2025-03-19 09:34:28 +00:00
Anastasia Lubennikova	6af974548e	feat(compute_ctl): Add basic audit logging for computes. (#11170 ) if `audit_log_level` is set to Log, preload pgaudit extension and log DDL with masked parameters into standard postgresql log	2025-03-19 00:13:36 +00:00
Christian Schwarz	9fb77d6cdd	buffered writer: add cancellation sensitivity (#11052 ) In - https://github.com/neondatabase/neon/pull/10993#issuecomment-2690428336 I added infinite retries for buffered writer flush IOs, primarily to gracefully handle ENOSPC but more generally so that the buffered writer is not left in a state where reads from the surrounding InMemoryLayer cause panics. However, I didn't add cancellation sensitivity, which is concerning because then there is no way to detach a timeline/tenant that is encountering the write IO errors. That’s a legitimate scenario in the case of some edge case bug. See the #10993 description for details. This PR - first makes flush loop infallible, enabled by infinite retries - then adds sensitivity to `Timeline::cancel` to the flush loop, thereby making it fallible in one specific way again - finally fixes the InMemoryLayer/EphemeralFile/BufferedWriter amalgamate to remain read-available after flush loop is cancelled. The support for read-availability after cancellation is necessary so that reads from the InMemoryLayer that are already queued up behind the RwLock that wraps the BufferedWriter won't panic because of the `mutable=None` that we leave behind in case the flush loop gets cancelled. # Alternatives One might think that we can only ship the change for read-availability if flush encounters an error, without the infinite retrying and/or cancellation sensitivity complexity. The problem with that is that read-availability sounds good but is really quite useless, because we cannot ingest new WAL without a writable InMemoryLayer. Thus, very soon after we transition to read-only mode, reads from compute are going to wait anyway, but on `wait_lsn` instead of the RwLock, because ingest isn't progressing. Thus, having the infinite flush retries still makes more sense because they're just "slowness" to the user, whereas wait_lsn is hard errors.	2025-03-18 18:48:43 +00:00
JC Grünhage	99639c26b4	fix(ci): update build-tools image references (#11293 ) ## Problem https://github.com/neondatabase/neon/pull/11210 migrated pushing images to ghcr. Unfortunately, it was incomplete in using images from ghcr, which resulted in a few places referencing the ghcr build-tools image, while trying to use docker hub credentials. ## Summary of changes Use build-tools image from ghcr consistently.	2025-03-18 15:21:22 +00:00
Ivan Efremov	86fe26c676	fix(proxy): Fix testodrome HTTP header handling in proxy (#11292 ) Relates to #22486	2025-03-18 15:14:08 +00:00
JC Grünhage	eb6efda98b	impr(ci): move some kinds of tests to PR runs only (#11272 ) ## Problem The pipelines after release merges are slower than they need to be at the moment. This is because some kinds of tests/checks run on all kinds of pipelines, even though they only matter in some of those. ## Summary of changes Run `check-codestyle-{rust,python,jsonnet}`, `build-and-test-locally` and `trigger-e2e-tests` only on regular PRs, not release PR or pushes to main or release branches.	2025-03-18 13:49:34 +00:00
Conrad Ludgate	fd41ab9bb6	chore: remove x509-parser (#11247 ) Both crates seem well maintained. x509-cert is part of the high quality RustCrypto project that we already make heavy use of, and I think it makes sense to reduce the dependencies where possible.	2025-03-18 13:05:08 +00:00
JC Grünhage	2dfff6a2a3	impr(ci): use ghcr.io as the default container registry (#11210 ) ## Problem Docker Hub has new rate limits coming up, and to avoid problems coming with those we're switching to GHCR. ## Summary of changes - Push images to GHCR initially and distribute them from there - Use images from GHCR in docker-compose	2025-03-18 11:30:49 +00:00
Arpad Müller	2cf6ae76fc	storcon: move safekeeper related stuff out of service.rs (#11288 ) There is no functional change here. We move safekeeper related code from `service.rs` to `service/safekeeper_service.rs`, so that safekeeper related stuff is contained in a single file. This also helps with preventing `service.rs` from growing even further. Part of #9011.	2025-03-18 09:00:53 +00:00
Dmitrii Kovalkov	57d51e949d	tests: suppress excessive pageserver errors in test_timeline_ancestor_detach_errors (#11277 ) ## Problem The test is flaky because of the same reasons as described in https://github.com/neondatabase/neon/issues/11177. The test has already suppressed these `WARN` and `ERROR` log messages, but the regexp didn't match all possible errors. ## Summary of changes - Change regexp to suppress all possible allowed error log messages.	2025-03-18 07:10:11 +00:00
Arpad Müller	0d3d639ef3	storcon: remove timeouts for safekeeper heartbeating (#11232 ) PRs #10891 and #10902 have time-bounded the safekeeper heartbeating of the storage controller. Those timeouts were not meant to be permanent, but temporary until we figured out the reasons for the safekeeper heartbeating causing problems. Now they are better understood and resolved. A comment is [here](https://github.com/neondatabase/cloud/issues/24396#issuecomment-2679342929), but most importantly, we've had: * #10954 to send heartbeats concurrently (before the issue was we sent them sequentially, so the total time time was number of nodes times time for timeout to be hit, now the total time is the maximum of all things we are heartbeating) * work to actually make heartbeats work and not error, i.e. JWT rollout for storcon, not sending heartbeats to decomissioned safekeepers, removal of decomissioned safekeepers from the databases Part of https://github.com/neondatabase/cloud/issues/25473	2025-03-18 03:37:45 +00:00
Alex Chi Z.	05ca27c981	fix(pagectl/benches): scope context with debug tools (#11285 ) ## Problem `7c462b3417` requires all contexts have scopes. pagectl/benches don't have such scopes. close https://github.com/neondatabase/neon/issues/11280 ## Summary of changes Adding scopes for the tools. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-17 21:27:27 +00:00
Alex Chi Z.	bb64beffbb	fix(pageserver): log compaction errors with timeline ids (#11231 ) ## Problem Makes it easier to debug. ## Summary of changes Log compaction errors with timeline ids. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-17 19:42:02 +00:00
Konstantin Knizhnik	24f41bee5c	Update LFC in case of unlogged build (#11262 ) ## Problem Unlogged build is used for GIST/SPGIST/GIN/HNSW indexes. In this mode we first change relation class to `RELPERSISTENCE_UNLOGGED` and save them on local disk. But we do not save unlogged relations in LFC. It may cause fetching incorrect value from LFC if relfilenode is reused. ## Summary of changes Save modified pages in LFC on second stage of unlogged build (when modified pages are walloged). There is no need to save pages in LFC at first phase because the will be in any case overwritten with assigned LSN at second phase. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-17 19:06:42 +00:00
Suhas Thalanki	a05c99f487	fix: removed anon pg extension (#10936 ) ## Problem Removing the `anon` v1 extension in postgres as described in https://github.com/neondatabase/cloud/issues/22663. This extension is not built for postgres v17 and is out of date when compared to the upstream variant which is v2 (we have v1.4). ## Summary of changes Removed the `anon` v1 extension from being built or preloaded Related to https://github.com/neondatabase/cloud/issues/22663	2025-03-17 18:23:32 +00:00
JC Grünhage	486ffeef6d	fix(ci): don't have neon-test-extensions release tag push depend on compute-node-image build (#11281 ) ## Problem Failures like https://github.com/neondatabase/neon/actions/runs/13901493608/job/38896940612?pr=11272 are caused by the dependency on `compute-node-image`, which was wrong on release jobs anyway. ## Summary of changes Remove dependency on `compute-node-image` from the job `add-release-tag-to-neon-test-extension-image`.	2025-03-17 16:31:49 +00:00
Arpad Müller	56149a046a	Add test_explicit_timeline_creation_storcon and make it work (#11261 ) Adds a basic test that makes the storcon issue explicit creation of a timeline on safeekepers (main storcon PR in #11058). It was adapted from `test_explicit_timeline_creation` from #11002. Also, do a bunch of fixes needed to get the test work (the API definitions weren't correct), and log more stuff when we can't create a new timeline due to no safekeepers being active. Part of #9011 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2025-03-17 16:28:21 +00:00
Roman Zaynetdinov	db30e1669c	Add /configure_telemetry API endpoint (#11117 ) Work on https://github.com/neondatabase/cloud/issues/23721 and https://github.com/neondatabase/cloud/issues/23714 Depends on https://github.com/neondatabase/neon/pull/11111 - Add `/configure_telemetry` API endpoint - Support second rsyslog configuration for Postgres logs export - Enable logs export when compute feature is enabled and configure Postgres to send logs to syslog I have used `/configure_telemetry` name because in the future I see it also being used for configuring a `pg_tracing` extension to export traces. Let me know if you'd rather have these APIs separate. In this case we can rename it to `/configure_rsyslog`.	2025-03-17 13:53:23 +00:00
JC Grünhage	fdf04d4d81	fix(ci): use correct branch ref for checking whether this is a release merge queue (#11270 ) ## Problem https://github.com/neondatabase/neon/actions/runs/13894288475/job/38871819190 shows the "Add fast-fordward label to PR to trigger fast-forward merge" job being skipped. This is due to not using the right variable for checking which branch the merge queue is merging into. ## Summary of changes Use the `branch` output of the `meta` task for checking the target branch of a merge group.	2025-03-17 09:26:45 +00:00
Alexander Bayandin	136cae76c2	fix(ci): correct regex to detect release-compute RC PRs (#11269 ) ## Problem The regex in `_meta.yml` workflow doesn't detect RC PRs for compute releases: https://neondb.slack.com/archives/C059ZC138NR/p1742164884669389 ## Summary of changes - Fix regex --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech>	2025-03-17 07:25:12 +00:00
Konstantin Knizhnik	15e63afe7d	Support DEBUG_COMPARE_LOCAL mode for unloggedindex build (#11257 ) ## Problem In unlogged index build (used fir GIST/SPGIST/GIN indexes) files is created on disk and then removed at the end. It contradicts to the logic of DEBUG_COMPARE_LOCAL mode. ## Summary of changes Do not create and unlink files in unlogged build in DEBUG_COMPARE_LOCAL mode. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-17 06:07:24 +00:00
Alexey Kondratov	966abd3bd6	fix(compute_ctl): Dollar escaping helper fixes (#11263 ) ## Problem In the previous PR #11045, one edge-case wasn't covered, when an ident contains only one `$`, we were picking `$$` as a 'wrapper'. Yet, when this `$` is at the beginning or at the end of the ident, then we end up with `$$$` in a row which breaks the escaping. ## Summary of changes Start from `x` tag instead of a blank string. Slack: https://neondb.slack.com/archives/C08HV951W2W/p1742076675079769?thread_ts=1742004205.461159&cid=C08HV951W2W	2025-03-16 18:39:54 +00:00
Alexey Kondratov	8566cad23b	chore(docs): Refresh RFC guide to suggest using YYYY-MM-DD prefix (#11252 ) ## Problem Serial/numeric IDs lead to collisions, which is not critical but looks awkward. Previous discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1741891345869979 ## Summary of changes Suggest using the `YYYY-MM-DD` prefix, which i) has less chance of collision; ii) provides out-of-the-box lexicographic sorting; iii) even if it collides, it's not a big deal -- just two RFCs have been started on the same day. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-03-16 17:17:58 +00:00
github-actions[bot]	c33cf739e3	Storage release 2025-03-16	2025-03-16 17:05:15 +00:00
Peter Bendel	228bb75354	Extend large tenant OLTP workload ... (#11166 ) ... to better match the workload characteristics of real Neon customers ## Problem We analyzed workloads of large Neon users and want to extend the oltp workload to include characteristics seen in those workloads. ## Summary of changes - for re-use branch delete inserted rows from last run - adjust expected run-time (time-outs) in GitHub workflow - add queries that exposes the prefetch getpages path - add I/U/D transactions for another table (so far the workload was insert/append-only) - add an explicit vacuum analyze step and measure its time - add reindex concurrently step and measure its time (and take care that this step succeeds even if prior reindex runs have failed or were canceled) - create a second connection string for the pooled connection that removes the `-pooler` suffix from the hostname because we want to run long-running statements (database maintenance) and bypass the pooler which doesn't support unlimited statement timeout ## Test run https://github.com/neondatabase/neon/actions/runs/13851772887/job/38760172415	2025-03-16 14:04:48 +00:00
Cihan Demirci	a5b00b87ba	CI(pre-merge-checks): use step-security/changed-files (#11265 ) Use Step Security maintained version of `tj-actions/changed-files`. https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised#use-the-stepsecurity-maintained-changed-files-action	2025-03-16 13:53:27 +00:00
John Spray	a674ed8caf	storcon: safety check when completing shard split (#11256 ) ## Problem There is a rare race between controller graceful deployment and shard splitting where we may incorrectly both abort _and_ complete the split (on different pods), and thereby leave no shards at all in the database. Related: #11254 ## Summary of changes - In complete_shard_split, refuse to delete anything if child shards are not found	2025-03-14 20:08:24 +00:00
Erik Grinaker	53d50c7ea5	pageserver: deflake compaction tests (#11246 ) These need to set `NoYield`, otherwise they may be preempted by pending L0 compaction.	2025-03-14 17:45:18 +00:00
Dmitrii Kovalkov	3168bd0e3a	tests: suppress "Cancelled request finished with an error" in test_timeline_archive (#11241 ) ## Problem Previous PR https://github.com/neondatabase/neon/pull/11190 didn't suppress `Cancelled request finished with an error` messages, which are also expected, so the test https://github.com/neondatabase/neon/issues/11177 is still flaky. ## Summary of changes - Suppress `Cancelled request finished with an error` in `test_timeline_archive`	2025-03-14 17:42:09 +00:00
Alexander Bayandin	4a97cd0b7e	test_runner: fix tests with jsonnet for Python 3.13 (#11240 ) ## Problem Python's `jsonnet` 0.20.0 doesn't support Python 3.13, so we have a couple of tests xfailed because of that. ## Summary of changes - Bump `jsonnet` to `0.21.0rc2` which supports Python 3.13 - Unxfail `test_sql_exporter_metrics_e2e` and `test_sql_exporter_metrics_smoke` on Python 3.13	2025-03-14 17:02:55 +00:00
JC Grünhage	1812fc7cf1	Merge pull request #11251 from neondatabase/rc/release/2025-03-14--storcon-hotfix Storage controller hotfix 2025-03-14	2025-03-14 16:43:22 +01:00
Anastasia Lubennikova	b7c6738524	feat(compute_ctl): add pgaudt log gc to compute_ctl (#11169 ) - add pgaudt_gc thread to compute_ctl to cleanup old pgaudit logs if they exist. pgaudit can rotate files, but it doesn't delete the old files - Add AUDIT_LOG_DIR_SIZE metric to compute_ctl to track the size of the audit log directory in bytes. - Fix permissions for rsyslog state files directory	2025-03-14 14:08:16 +00:00
Christian Schwarz	929d963a35	Merge branch 'hotfix/release/2025-03-14-storcon-optimizations' into rc/release/2025-03-14--storcon-hotfix	2025-03-14 15:06:25 +01:00
Conrad Ludgate	7fe5a689b4	feat(proxy): export ingress metrics (#11244 ) ## Problem We exposed the direction tag in #10925 but didn't actually include the ingress tag in the export to allow for an adaption period. ## Summary of changes We now export the ingress direction	2025-03-14 13:54:57 +00:00
Christian Schwarz	e405420458	CHERRY PICK: fix(storcon): optimization validation makes decisions based on wrong SecondaryProgress (#11229 ) (cherry picked from commit `04370b48b3`) Conflicts: storage_controller/src/service.rs Because `release` head doesn't yet have `storcon: timetime table, creation and deletion (#11058)`	2025-03-14 14:47:29 +01:00
Dmitrii Kovalkov	b0922967e0	Bump humantime version and remove advisories.ignore (#11242 ) ## Problem - Closes: https://github.com/neondatabase/neon/issues/11179#issuecomment-2724222041 ## Summary of changes - Bump humantime version to `2.2` - Remove `RUSTSEC-2025-0014` from `advisories.ignore`	2025-03-14 11:51:11 +00:00
Dmitrii Kovalkov	f68be2b5e2	safekeeper: https for management API (#11171 ) ## Problem Storage controller uses unencrypted HTTP requests for safekeeper management API. - Closes: https://github.com/neondatabase/cloud/issues/24836 ## Summary of changes - Replace `hyper0::server::Server` with `http_utils::server::Server` in safekeeper. - Add HTTPS handler for safekeeper management API.	2025-03-14 11:41:22 +00:00
Christian Schwarz	04370b48b3	fix(storcon): optimization validation makes decisions based on wrong SecondaryProgress (#11229 ) # Refs - fixes https://github.com/neondatabase/neon/issues/11228 # Problem High-Level When storcon validates whether a `ScheduleOptimizationAction` should be applied, it retrieves the `tenant_secondary_status` to determine whether a secondary is ready for the optimization. When collecting results, it associates secondary statuses with the wrong optimization actions in the batch of optimizations that we're validating. The result is that we make the decision for shard/location X based on the SecondaryStatus of a random secondary location Y in the current batch of optimizations. A possible symptom is an early cutover, as seen in this engineering investigation here: - https://github.com/neondatabase/cloud/issues/25734 # Problem Code-Level This code here in `optimize_all_validate` `97e2e27f68/storage_controller/src/service.rs (L7012-L7029)` zips the `want_secondary_status` with the Vec returned from `tenant_for_shards_api` . However, the Vec returned from `want_secondary_status` is not ordered (it uses FuturesUnordered internally). # Solution Sort the Vec in input order before returning it. `optimize_all_validate` was the only caller affected by this problem While at it, also future-proof similar-looking function `tenant_for_shards`. None of its callers care about the order, but this type of function signature is easy to use incorrectly. # Future Work Avoid the additional iteration, map, and allocation. Change API to leverage AsyncFn (async closure). And/or invert `tenant_for_shards_api` into a Future ext trait / iterator adaptor thing.	2025-03-14 11:21:16 +00:00
Arpad Müller	5359cf717c	storcon: add API definitions for exclude_timeline and term_bump (#11197 ) Adds API definitions for the safekeeper API endpoints `exclude_timeline` and `term_bump`. Also does a bugfix to return the correct type from `delete_timeline`. Part of #8614	2025-03-14 00:00:37 +00:00
Erik Grinaker	d6d78a050f	pageserver: disable `l0_flush_wait_upload` by default (#11215 ) ## Problem This is already disabled in production, as it is replaced by L0 flush delays. It will be removed in a later PR, once the config option is no longer specified in production. ## Summary of changes Disable `l0_flush_wait_upload` by default.	2025-03-13 21:08:28 +00:00
Erik Grinaker	4ff000c042	pageserver: deflake `test_metadata_image_creation` (#11230 ) ## Problem `test_metadata_image_creation ` became flaky with #11212, since image compaction may yield to L0 compaction. ## Summary of changes Set `NoYield` when compacting in tenant tests.	2025-03-13 20:46:21 +00:00
Conrad Ludgate	9a3020d2ce	chore(proxy): pre-initialise metricvecs (#11226 ) ## Problem We noticed that error metrics didn't show for some services with light load. This is not great and can cause problems for dashboards/alerts ## Summary of changes Pre-initialise some metricvecs.	2025-03-13 20:23:53 +00:00
Alex Chi Z.	23b713900e	feat(storcon): passthrough ancestor detach behavior (#11199 ) ## Problem https://github.com/neondatabase/neon/issues/10310 https://github.com/neondatabase/neon/pull/11158 ## Summary of changes We need to passthrough the new detach behavior through the storcon API. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-13 20:21:23 +00:00
Arpad Müller	b1a1be6a4c	switch pytests and neon_local to control_plane_hooks_api (#11195 ) We want to switch away from and deprecate the `--compute-hook-url` param for the storcon in favour of `--control-plane-url` because it allows us to construct urls with `notify-safekeepers`. This PR switches the pytests and neon_local from a `control_plane_compute_hook_api` to a new param named `control_plane_hooks_api` which is supposed to point to the parent of the `notify-attach` URL. We still support reading the old url from disk to not be too disruptive with existing deployments, but we just ignore it. Also add docs for the `notify-safekeepers` upcall API. Follow-up of #11173 Part of https://github.com/neondatabase/neon/issues/11163	2025-03-13 19:50:52 +00:00
Erik Grinaker	8afae9d03c	pageserver: enable `l0_flush_delay_threshold` by default (#11214 ) ## Problem `l0_flush_delay_threshold` has already been set to 30 in production for a couple of weeks. Let's harmonize the default. ## Summary of changes Update `DEFAULT_L0_FLUSH_DELAY_FACTOR` to 3 such that the default `l0_flush_delay_threshold` is `3 * compaction_threshold`. This differs from the production setting, which is hardcoded to 30 (with `compaction_threshold` at 10), and is more appropriate for any tenants that have custom `compaction_threshold` overrides.	2025-03-13 19:15:22 +00:00
JC Grünhage	066b0a1be9	fix(ci): correctly push neon-test-extensions in releases and to ghcr (#11225 ) ## Problem `ef0d4a48a` adjusted how we build container images and how we push them, and the neon-test-extensions image was overlooked. Additionally, is was also missed in `1f0dea9a1`, which pushed our container images to GHCR. ## Summary of changes Push neon-test-extensions to GHCR and also push release tags for it.	2025-03-13 18:18:55 +00:00
Konstantin Knizhnik	398d2794eb	Handle DEBUG_COMPARE_LOCAL mode in neon_zeroextend (#11220 ) ## Problem DEBUG_COMPARE_LOCAL is not supported in neon_zeroextend added in PG16 ## Summary of changes Add support of DEBUG_COMPARE_LOCAL in neon_zeroextend Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-13 16:30:32 +00:00
Erik Grinaker	3c3b9dc919	pageserver: enable `image_creation_preempt_threshold` by default (#11216 ) ## Problem This is already set in production, we should harmonize the default. ## Summary of changes Default `image_creation_preempt_threshold` to 3.	2025-03-13 16:28:21 +00:00
Christian Schwarz	ed31dd2a3c	pageserver: better observability for slow wait_lsn (#11176 ) # Problem We leave too few observability breadcrumbs in the case where wait_lsn is exceptionally slow. # Changes - refactor: extract the monitoring logic out of `log_slow` into `monitor_slow_future` - add global + per-timeline counter for time spent waiting for wait_lsn - It is updated while we're still waiting, similar to what we do for page_service response flush. - add per-timeline counterpair for started & finished wait_lsn count - add slow-logging to leave breadcrumbs in logs, not just metrics For the slow-logging, we need to consider not flooding the logs during a broker or network outage/blip. The solution is a "log-streak-level" concurrency limit per timeline. At any given time, there is at most one slow wait_lsn that is logging the "still running" and "completed" sequence of logs. Other concurrent slow wait_lsn's don't log at all. This leaves at least one breadcrumb in each timeline's logs if some wait_lsn was exceptionally slow during a given period. The full degree of slowness can then be determined by looking at the per-timeline metric. # Performance Reran the `bench_log_slow` benchmark, no difference, so, existing call sites are fine. We do use a Semaphore, but only try_acquire it _after_ things have already been determined to be slow. So, no baseline overhead anticipated. # Refs - https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222	2025-03-13 15:03:53 +00:00
Conrad Ludgate	3dec117572	feat(compute_ctl): use TLS if configured (#10972 ) Closes: https://github.com/neondatabase/cloud/issues/22998 If control-plane reports that TLS should be used, load the certificates (and watch for updates), make sure postgres use them, and detects updates. Procedure: 1. Load certificates 2. Reconfigure postgres/pgbouncer 3. Loop on a timer until certificates have loaded 4. Go to 1 Notes: 1. We only run this procedure if requested on startup by control plane. 2. We needed to compile pgbouncer with openssl enabled 3. Postgres doesn't allow tls keys to be globally accessible - must be read only to the postgres user. I couldn't convince the autoscaling team to let me put this logic into the VM settings, so instead compute_ctl will copy the keys to be read-only by postgres. 4. To mitigate a race condition, we also verify that the key matches the cert.	2025-03-13 15:03:22 +00:00
Alex Chi Z.	b2286f5bcb	fix(pageserver): don't panic if gc-compaction find no keys (#11200 ) ## Problem There was a panic on staging that compaction didn't find any keys. This is possible if all layers selected for compaction does not contain any keys within the current shard. ## Summary of changes Make panic an error. In the future, we can try creating an empty image layer so that GC can clean up those layers. Otherwise, for now, we can only rely on shard ancestor compaction to remove these data. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-13 14:38:45 +00:00
Erik Grinaker	c036fec065	pageserver: enable `compaction_l0_first` by default (#11212 ) ## Problem `compaction_l0_first` has already been enabled in production for a couple of weeks. ## Summary of changes Enable `compaction_l0_first` by default. Also set `CompactFlags::NoYield` in `timeline_checkpoint_handler`, to ensure explicitly requested compaction runs to completion. This endpoint is mainly used in tests, and caused some flakiness where tests expected compaction to complete.	2025-03-13 14:28:42 +00:00
JC Grünhage	89c7e4e917	fix(ci): use paranthesis for error handling in jq when fetching release PRs (#11217 ) ## Problem #11061 introduced code fetching previous releases. #11151 introduced jq error handling, which has also been applied in #11061, but parenthesis have been missed. ## Summary of changes Add parenthesis around error handling code.	2025-03-13 13:40:43 +00:00
Erik Grinaker	5a245a837d	storcon: retain stripe size when autosplitting sharded tenants (#11194 ) ## Problem Autosplits always request `DEFAULT_STRIPE_SIZE` for splits. However, splits do not allow changing the stripe size of already-sharded tenants, and will error out if it differs. In #11168, we are changing the stripe size, which could hit this when attempting to autosplit already sharded tenants. Touches #11168. ## Summary of changes Pass `new_stripe_size: None` when autosplitting already sharded tenants. Otherwise, pass `DEFAULT_STRIPE_SIZE` instead of the shard identity's stripe size, since we want to use the current default rather than an old, persisted default.	2025-03-13 13:28:10 +00:00
devin-ai-integration[bot]	efb1df4362	fix: Change metric_unit from 'microseconds' to 'μs' in test_compute_ctl_api.py (#11209 ) # Fix metric_unit length in test_compute_ctl_api.py ## Description This PR changes the metric_unit from "microseconds" to "μs" in test_compute_ctl_api.py to fix the issue where perf test results were not being stored in the database due to the string exceeding the 10 character limit of the metric_unit column in the perf_test_results table. ## Problem As reported in Slack, the perf test results were not being uploaded to the database because the "microseconds" string (12 characters) exceeds the 10 character limit of the metric_unit column in the perf_test_results table. ## Solution Replace "microseconds" with "μs" in all metric_unit parameters in the test_compute_ctl_api.py file. ## Testing The changes have been committed and pushed. The PR is ready for review. Link to Devin run: https://app.devin.ai/sessions/e29edd672bd34114b059915820e8a853 Requested by: Peter Bendel Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: peterbendel@neon.tech <peterbendel@neon.tech>	2025-03-13 10:17:01 +00:00
JC Grünhage	803e6f908a	fix(ci): fix syntax of lint-release-pr (#11208 ) ## Problem A small adjustment in #11061 broke the lint-release-pr.sh script, and the new version was neither tested nor linted. This has been done now, the script is once again tested and passing `shellcheck`. ## Summary of changes Add missing `el` of `elif` condition chain.	2025-03-13 09:42:38 +00:00
JC Grünhage	afc9524bc7	fix(ci): run lint-release-pr on head-ref (#11206 ) ## Problem #11061 changed release pr creation, and I missed that the workflow will checkout a would-be-merge of the rc branch and the release branch instead of the head ref, unless explicitly instructed otherwise. ## Summary of changes Check out head ref for linting the release PRs.	2025-03-13 08:17:33 +00:00
JC Grünhage	507353404c	fix(ci): pass emtpy body when creating release PRs (#11203 ) ## Problem #11061 changed release pr creation, and I missed that creating PRs using `gh` in non-interactive environments requires `--body` instead of defaulting to an empty body. ## Summary of changes Explicitly set an empty body when creating release PRs.	2025-03-12 23:54:43 +00:00
JC Grünhage	48be4df3f3	fix(ci): fetch all refs in release PR creation (#11201 ) ## Problem #11061 changed release PR creation, and I missed that we need to explicitly fetch the whole history so that the relevant git refs and objects are available. ## Summary of changes - Fetch all git refs including history by setting fetch-depth to 0 - Reference release branch as a remote branch, because we haven't checked it out locally	2025-03-12 22:32:38 +00:00
Alex Chi Z.	c3b3b507f7	feat(pageserver): support detaching behavior v2 (#11158 ) ## Problem close https://github.com/neondatabase/neon/issues/10310 ## Summary of changes This patch adds a new behavior for the detach_ancestor API: detach with multi-level ancestor and no reparenting. Though we can potentially support multi-level + do reparenting / single-level + no-reparenting in the future, as it's not required for the recovery/snapshot epic, I'd prefer keeping things simple now that we only handle the old one and the new one instead of supporting the full feature matrix. I only added a test case of successful detaching instead of testing failures. I'd like to make this into staging and add more tests in the future. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-12 22:27:23 +00:00
JC Grünhage	ef0d4a48a8	Reuse artifacts from release PRs (#11061 ) ## Problem When we release our components, we perform builds in the release PR, then test the components, then merge the PR, and then build everything again, run tests again, and only then start deployments. To speed things up, we want to perform builds and run tests in the PR, and start deployments using the existing artifacts from the release PR. To make that possible, we need to have both CI pipelines running on the same commit hash, which requires fast forwarding release. That only works, if we have a commit in the PR that has the current release branch state as an ancestor. ## Summary of changes - Changes to release PR creation: - Remove templates and automatic bodies for release PRs. The previous template wasn't used anymore, and the automatic body we created in the pipeline didn't contain any useful content anymore after the changees here. - Make it possible to select the source branch. For releases that aren't cut from `main`, like https://github.com/neondatabase/neon/pull/11051, we need a way to trigger the new flow from a different branch. - Determine `release-branch` automatically from the component name instead of passing that as well. - Changes to the merge queue job: - Rename `get-changed-files` to `meta` in preparation of additional data being fetched as part of that job - Fail the merge queue if we're trying to merge into a branch other than main - this is to prevent non-fast-forward merges. - Label PRs to branches other than main as `fast-forward`, to trigger the fast-forward job - Add a fast-forward job that can be triggered with the `fast-forward` label that performs a fast-forward merge. This only happens if the PR has `mergeable_state == clean`, so CI having passed. - Build and Test on releases now skips building images, skips testing images and skips triggering e2e tests. We add new tags to the images from the release PR to tag them as release images, and we push them to the prod registries.	2025-03-12 21:00:59 +00:00
Alex Chi Z.	8a5a739af0	test(pageserver): add small tenant compaction (#11049 ) ## Problem close https://github.com/neondatabase/neon/issues/10881 ## Summary of changes Mock a tenant with very small amount of data. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-12 20:34:19 +00:00
Tristan Partin	5eed0e4b94	Add docs to performance/test_logical_replication.py on how to run the suite (#10175 ) These docs are in tandem with what was recently published on the internal docs site. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-12 17:31:09 +00:00
Tristan Partin	bb3c0ff251	Make collecting the installed extensions metric async (#11071 ) If the goal is to make compute_ctl completely asynchronous, then this is one step to getting there. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-12 16:09:02 +00:00
Conrad Ludgate	7aec1364dd	chore(proxy): remove enum and composite type queries (#11178 ) In our json encoding, we only need to know about array types. Information about composites or enums are not actually used. Enums are quite popular, needing to type query them when not needed can add some latency cost for no gain.	2025-03-12 15:47:17 +00:00
Tristan Partin	40672b739e	Move maybe_add_request_id_header middleware into middleware module (#11187 ) This matches the authorization middleware. --------- Signed-off-by: Tristan Partin <tristan@neon.tech> Co-authored-by: Mikhail Kot <mikhail@neon.tech>	2025-03-12 15:34:46 +00:00
Vlad Lazar	02a83913ec	storcon: do not update observed state on node activation (#11155 ) ## Problem When a node becomes active, we query its locations and update the observed state in-place. This can race with the observed state updates done when processing reconcile results. ## Summary of changes The argument for this reconciliation step is that is reduces the need for background reconciliations. I don't think is actually true anymore. There's two cases. 1. Restart of node after drain. Usually the node does not go through the offline state here, so observed locations were not marked as none. In any case, there should be a handful of shards max on the node since we've just drained it. 2. Node comes back online after failure or network partition. When the node is marked offline, we reschedule everything away from it. When it later becomes active, the previous observed location is extraneous and requires a reconciliation anyway. Closes https://github.com/neondatabase/neon/issues/11148	2025-03-12 15:31:28 +00:00
Erik Grinaker	c7717c85c7	storcon,pageserver: use persisted stripe size when loading unsharded tenants (#11193 ) ## Problem When the storage controller and Pageserver loads tenants from persisted storage, it uses `ShardIdentity::unsharded()` for unsharded tenants. However, this replaces the persisted stripe size of unsharded tenants with the default stripe size. This doesn't really matter for practical purposes, since the stripe size is meaningless for unsharded tenants anyway, but can cause consistency check failures if the persisted stripe size differs from the default. This was seen in #11168, where we change the default stripe size. Touches #11168. ## Summary of changes Carry over the persisted stripe size from `TenantShardPersistence` for unsharded tenants, and from `LocationConf` on Pageservers. Also add bounds checks for type casts when loading persisted shard metadata.	2025-03-12 15:16:54 +00:00
Erik Grinaker	1436b8469c	pageserver: appease unused lint on macOS (#11192 ) ## Problem `info_span!` is only used in a `linux` branch, causing the unused lint to fire on macOS. ## Summary of changes Fully qualify the `info_span!` use.	2025-03-12 14:34:29 +00:00
JC Grünhage	fc515e7be2	chore(deps): bump env_logger to 0.11.7 (#11188 ) ## Problem `humantime` is unmaintained, we want to migrate to `jiff`, see https://github.com/neondatabase/neon/issues/11179. `env_logger` in older versions depend on `humantime`, and newer versions depend on `jiff`, so we need to update it. ## Summary of changes Update `env_logger` to the most recent release, which does not depend on `humantime` anymore.	2025-03-12 14:26:52 +00:00
John Spray	7015dbbdf0	storcon_cli: remove pre-warm helper (#11183 ) ## Problem This command was used when onboarding tenants to the storage controller. We no longer do that, so the command can go. ## Summary of changes - Remove `storcon_cli tenant-warmup` command	2025-03-12 14:02:11 +00:00
Dmitrii Kovalkov	73e37ae388	Suppress "request was dropped" errors in test_timeline_archive (#11190 ) ## Problem Test `test_timeline_archive` is flaky because it makes requests that are intended to fail. It sometimes leads to warning in pageserver's logs. More details are in the issue. - Closes: https://github.com/neondatabase/neon/issues/11177 ## Summary of changes - Suppress such errors.	2025-03-12 13:23:31 +00:00
Vlad Lazar	1c0ff3c04d	utils: explicit OTEL export config and OTEL enablement via common entry point (#11139 ) We want to export performance traces from the pageserver in OTEL format. End goal is to see them in Grafana. To this end, there are two changes here: 1. Update the `tracing-utils` crate to allow for explicitly specifying the export configuration. Pageserver configuration is loaded from a file on start-up. This allows us to use the same flow for export configs there. 2. Update the `utils::logging::init` common entry point to set up OTEL tracing infrastructure if requested. Note that an entirely different tracing subscriber is used. This is to avoid interference with the existing tracing set-up. For now, no service uses this functionality. PR to plug this into the pageserver is [here](https://github.com/neondatabase/neon/pull/11140). Related https://github.com/neondatabase/neon/issues/9873	2025-03-12 11:07:49 +00:00
John Spray	7bf6397334	pageserver: remove legacy `TimelineInfo::latest_gc_cutoff` field (1/2) (#11149 ) ## Problem This field was retained for backward compat only in https://github.com/neondatabase/neon/pull/10707. Once https://github.com/neondatabase/cloud/pull/25233 is released, nothing external will be reading this field. Internally, this was a mandatory field so storage controller is still trying to decode it, so we must do this removal in two steps: this PR makes the field optional, and after one release we can fully remove it. Related: https://github.com/neondatabase/cloud/issues/24250 ## Summary of changes - Rename field to `_unused` - Remove field from swagger - Make field optional	2025-03-12 10:23:41 +00:00
Konstantin Knizhnik	f60ffe3021	Rebase compare local debug mode (#11174 ) ## Problem DEBUG_COMPARE_LOCAL mode is broken See https://neondb.slack.com/archives/C03QLRH7PPD/p1732862608323269?thread_ts=1732711054.862919&cid=C03QLRH7PPD ## Summary of changes Fix compile errors and unlogged build issues. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-12 05:52:18 +00:00
Arpad Müller	da2431f11f	storcon: add --control-plane-url config option (#11173 ) Adds the `--control-plane-url` config option to the storcon, which we want to migrate to instead of using `notify-attach`. Part of #11163	2025-03-12 02:30:56 +00:00
JC Grünhage	e8396034ac	fix(ci): fail meta using jq halt_error if data is unexpectedly missing (#11151 ) ## Problem When the githb API is having problems, we might not get data back, and are happily setting vars as empty. This causes problems down the line. See https://github.com/neondatabase/neon/actions/runs/13718859397/job/38381946590?pr=11132#step:5:1 for example. ## Summary of changes Fail the `meta` job if we don't get expected data back from github.	2025-03-11 22:59:30 +00:00
Tristan Partin	decd265c99	Revert notify to 6.0.0 (#11162 ) The upgrade to 8.0.0 caused severe performance regressions in the start_postgres_ms metric, which measures the time it takes from execing Postgres to the time Postgres marks itself as ready in the postmaster.pid file. We use the notify crate to watch for changes in the pgdata directory and the postmaster.pid file. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-11 22:18:09 +00:00
Christian Schwarz	158db414bf	buffered writer: handle write errors by retrying all write IO errors indefinitely (#10993 ) # Problem If the Pageserver ingest path (InMemoryLayer=>EphemeralFile=>BufferedWriter) encounters ENOSPC or any other write IO error when flushing the mutable buffer of the BufferedWriter, the buffered writer is left in a state where subsequent _reads_ from the InMemoryLayer it will cause a `must not use after we returned an error` panic. The reason is that 1. the flush background task bails on flush failure, 2. causing the `FlushHandle::flush` function to fail at channel.recv() and 3. causing the `FlushHandle::flush` function to bail with the flush error, 4. leaving its caller `BufferedWriter::flush` with `BufferedWriter::mutable = None`, 5. once the InMemoryLayer's RwLock::write guard is dropped, subsequent reads can enter, 6. those reads find `mutable = None` and cause the panic. # Context It has always been the contract that writes against the BufferedWriter API must not be retried because the writer/stream-style/append-only interface makes no atomicity guarantees ("On error, did nothing or a piece of the buffer get appended?"). The idea was that the error would bubble up to upper layers that can throw away the buffered writer and create a new one. (See our [internal error handling policy document on how to handle e.g. `ENOSPC`](`c870a50bc0/src/storage/handling_io_and_logical_errors.md (L36-L43)`)). That _might_ be true for delta/image layer writers, I haven't checked. But it's certainly not true for the ingest path: there are no provisions to throw away an InMemoryLayer that encountered a write error an reingest the WAL already written to it. Adding such higher-level retries would involve either resetting last_record_lsn to a lower value and restarting walreceiver. The code isn't flexible enough to do that, and such complexity likely isn't worth it given that write errors are rare. # Solution The solution in this PR is to retry _any_ failing write operation _indefinitely_ inside the buffered writer flush task, except of course those that are fatal as per `maybe_fatal_err`. Retrying indefinitely ensures that `BufferedWriter::mutable` is never left `None` in the case of IO errors, thereby solving the problem described above. It's a clear improvement over the status quo. However, while we're retrying, we build up backpressure because the `flush` is only double-buffered, not infinitely buffered. Backpressure here is generally good to avoid resource exhaustion, but blocks reads and hence stalls GetPage requests because InMemoryLayer reads and writes are mutually exclusive. That's orthogonal to the problem that is solved here, though. ## Caveats Note that there are some remaining conditions in the flush background task where it can bail with an error. I have annotated one of them with a TODO comment. Hence the `FlushHandle::flush` is still fallible and hence the overall scenario of leaving `mutable = None` on the bail path is still possible. We can clean that up in a later commit. Note also that retrying indefinitely is great for temporary errors like ENOSPC but likely undesirable in case the `std::io::Error` we get is really due to higher-level logic bugs. For example, we could fail to flush because the timeline or tenant directory got deleted and VirtualFile's reopen fails with ENOENT. Note finally that cancellation is not respected while we're retrying. This means we will block timeline/tenant/pageserver shutdown. The reason is that the existing cancellation story for the buffered writer background task was to recv from flush op channel until the sending side (FlushHandle) is explicitly shut down or dropped. Failing to handle cancellation carries the operational risk that even if a single timeline gets stuck because of a logic bug such as the one laid out above, we must still restart the whole pageserver process. # Alternatives Considered As pointed out in the `Context` section, throwing away a InMemoryLayer that encountered an error and reingesting the WAL is a lot of complexity that IMO isn't justified for such an edge case. Also, it's wasteful. I think it's a local optimum. A more general and simpler solution for ENOSPC is to `abort()` the process and run eviction on startup before bringing up the rest of pageserver. I argued for it in the past, the pro arguments are still valid and complete: https://neondb.slack.com/archives/C033RQ5SPDH/p1716896265296329 The trouble at the time was implementing eviction on startup. However, maybe things are simpler now that we are fully storcon-managed and all tenants have secondaries. For example, if pageserver `abort()`s on ENOSPC and then simply don't respond to storcon heartbeats while we're running eviction on startup, storcon will fail tenants over to the secondary anyway, giving us all the time we need to clean up. The downside is that if there's a systemic space management bug, above proposal will just propagate the problem to other nodes. But I imagine that because of the delays involved with filling up disks, the system might reach a half-stable state, providing operators more time to react. # Demo Intermediary commit `a03f335121480afc0171b0f34606bdf929e962c5` is demoed in this (internal) screen recording: https://drive.google.com/file/d/1nBC6lFV2himQ8vRXDXrY30yfWmI2JL5J/view?usp=drive_link # Perf Testing Ran `bench_ingest` on tmpfs, no measurable difference. Spans are uniquely owned by the flush task, and the span stack isn't too deep, so, enter and exit should be cheap. Plus, each flush takes ~150us with direct IO enabled, so, not _that_ high frequency event anyways. # Refs - fixes https://github.com/neondatabase/neon/issues/10856	2025-03-11 20:40:23 +00:00
Christian Schwarz	083a30b1e2	storage broker: disable deploy by default (#11172 ) context - https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222 - companion infra.git PR: https://github.com/neondatabase/infra/pull/3249	2025-03-11 19:45:06 +00:00
Alex Chi Z.	7d221214bb	feat(pageserver): support no-yield for gc-compaction (#11184 ) ## Problem This should also resolve the test flakiness of `test_gc_feedback`. close https://github.com/neondatabase/neon/issues/11144 ## Summary of changes If `NoYield` is set, do not yield in gc-compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-11 19:13:52 +00:00
Dmitrii Kovalkov	8983677f29	Ignore cargo deny advisory RUSTSEC-2025-0014 for humantime (#11180 ) ## Problem `humantime` is not maintained and `cargo deny check` fails - Will be addressed in https://github.com/neondatabase/neon/issues/11179 ## Summary of changes Ignore RUSTSEC-2025-0014 advisory for now	2025-03-11 19:09:32 +00:00
Ivan Efremov	011f7c21a3	fix(proxy): Add testodrome query id HTTP header (#11167 ) Handle "X-Neon-Query-ID" header to glue data with testodrome queries. Relates to the #22486	2025-03-11 17:17:30 +00:00
Alex Chi Z.	7588983168	fix(scrubber): log even if no refs are found (#11160 ) ## Problem Investigate https://github.com/neondatabase/neon/issues/11159 ## Summary of changes This doesn't fix the issue, but at least we can narrow down the cause next time it happens by logging ancestor referenced layer cnt even if it's 0. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-11 14:33:35 +00:00
Arseny Sher	359c64c779	walproposer: pre generations refactoring (#11060 ) ## Problem https://github.com/neondatabase/neon/issues/10851 ## Summary of changes Do some refactoring before making walproposer generations aware. - Rename SS_VOTING to SS_WAIT_VOTING, SS_IDLE to SS_WAIT_ELECTED - Continue to get rid of epochs: rename GetEpoch to GetLastLogTerm, donorEpoch to donorLastLogTerm - Instead of counting n_votes, n_connected, introduce explicit WalProposerState (collecting terms / voting / elected). Refactor out TermsCollected and VotesCollected; they will determine state transition differently depending whether generations are enabled or not. There is no new logic in this PR and thus no new tests.	2025-03-11 14:01:00 +00:00
Erik Grinaker	f466c01995	pageserver: add `max_logical_size_per_shard` for `get_top_tenants` (#11157 ) ## Problem In #11122, we want to split shards once the logical size of the largest timeline exceeds a split threshold. However, `get_top_tenants` currently only returns `max_logical_size`, which tracks the max _total_ logical size of a timeline across all shards. This is problematic, because the storage controller needs to fetch a list of N tenants that are eligible for splits, but the API doesn't currently have a way to express this. For example, with a split threshold of 1 GB, a tenant with `max_logical_size` of 4 GB is eligible to split if it has 1 or 2 shards, but not if it already has 4 shards. We need to express this in per-shard terms, otherwise the `get_top_tenants` endpoint may end up only returning tenants that can't be split, blocking splits entirely. Touches https://github.com/neondatabase/neon/pull/11122. Touches https://github.com/neondatabase/cloud/issues/22532. ## Summary of changes Add `TenantShardItem::max_logical_size_per_shard` containing `max_logical_size / shard_count`, and `TenantSorting::MaxLogicalSizePerShard` to order and filter by it.	2025-03-11 11:43:55 +00:00
Conrad Ludgate	d1b60fa0b6	fix(proxy): delete prepared statements when discarding (#11165 ) Fixes https://github.com/neondatabase/serverless/issues/144 When tables have enums, we need to perform type queries for that data. We cache these query statements for performance reasons. In Neon RLS, we run "discard all" for security reasons, which discards all the statements. When we need to type check again, the statements are no longer valid. This fixes it to discard the statements as well. I've also added some new logs and error types to monitor this. Currently we don't see the prepared statement errors in our logs.	2025-03-11 10:48:50 +00:00
Christian Schwarz	7c462b3417	impr: propagate VirtualFile metrics via RequestContext (#7202 ) # Refs - fixes https://github.com/neondatabase/neon/issues/6107 # Problem `VirtualFile` currently parses the path it is opened with to identify the `tenant,shard,timeline` labels to be used for the `STORAGE_IO_SIZE` metric. Further, for each read or write call to VirtualFile, it uses `with_label_values` to retrieve the correct metrics object, which under the hood is a global hashmap guarded by a parking_lot mutex. We perform tens of thousands of reads and writes per second on every pageserver instance; thus, doing the mutex lock + hashmap lookup is wasteful. # Changes Apply the technique we use for all other timeline-scoped metrics to avoid the repeat `with_label_values`: add it to `TimelineMetrics`. Wrap `TimelineMetrics` into an `Arc`. Propagate the `Arc<TimelineMetrics>` down do `VirtualFile`, and use `Timeline::metrics::storage_io_size`. To avoid contention on the `Arc<TimelineMetrics>`'s refcount atomics between different connection handlers for the same timeline, we wrap it into another Arc. To avoid frequent allocations, we store that Arc<Arc<TimelineMetrics>> inside the per-connection timeline cache. Preliminary refactorings to enable this change: - https://github.com/neondatabase/neon/pull/11001 - https://github.com/neondatabase/neon/pull/11030 # Performance I ran the benchmarks in `test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py` on an `i3en.3xlarge` because that's what we currently run them on. None of the benchmarks shows a meaningful difference in latency or throughput or CPU utilization. I would have expected some improvement in the many-tenants-one-client-each workload because they all hit that hashmap constantly, and clone the same `UintCounter` / `Arc` inside of it. But apparently the overhead is miniscule compared to the remaining work we do per getpage. Yet, since the changes are already made, the added complexity is manageable, and the perf overhead of `with_label_values` demonstrable in micro-benchmarks, let's have this change anyway. Also, propagating TimelineMetrics through RequestContext might come in handy down the line. The micro-benchmark that demonstrates perf impact of `with_label_values`, along with other pitfalls and mitigation techniques around the `metrics`/`prometheus` crate: - https://github.com/neondatabase/neon/pull/11019 # Alternative Designs An earlier iteration of this PR stored an `Arc<Arc<Timeline>>` inside `RequestContext`. The problem is that this risks reference cycles if the RequestContext gets stored in an object that is owned directly or indirectly by `Timeline`. Ideally, we wouldn't be using this mess of Arc's at all and propagate Rust references instead. But tokio requires tasks to be `'static`, and so, we wouldn't be able to propagate references across task boundaries, which is incompatible with any sort of fan-out code we already have (e.g. concurrent IO) or future code (parallel compaction). So, opt for Arc for now.	2025-03-11 07:23:06 +00:00
Christian Schwarz	420f7b07b4	add benchmark demonstrating `metrics`/`prometheus` crate multicore scalability pitfalls & workarounds (#11019 ) We use the `metrics` / `prometheus` crate in the Pageserver code base. This PR demonstrates - typical performance pitfalls with that crate - our current set of techniques to avoid most of these pitfalls. refs - https://github.com/neondatabase/neon/issues/10948 - https://github.com/neondatabase/neon/pull/7202 - I applied the `label_values__cache_label_values_lookup` technique there. - It didn't yield measurable results in high-level benchmarks though.	2025-03-11 07:22:56 +00:00
Arpad Müller	4d3c477689	storcon: timetime table, creation and deletion (#11058 ) This PR extends the storcon with basic safekeeper management of timelines, mainly timeline creation and deletion. We want to make the storcon manage safekeepers in the future. Timeline creation is controlled by the `--timelines-onto-safekeepers` flag. 1. it adds the `timelines` and `safekeeper_timeline_pending_ops` tables to the storcon db 2. extend code for the timeline creation and deletion 4. it adds per-safekeeper reconciler tasks TODO: * maybe not immediately schedule reconciliations for deletions but have a prior manual step * tenant deletions * add exclude API definitions (probably separate PR) * how to choose safekeeper to do exclude on vs deletion? this can be a bit hairy because the safekeeper might go offline in the meantime. * error/failure case handling * tests (cc test_explicit_timeline_creation from #11002) * single safekeeper mode: we often only have one SK (in tests for example) * `notify-safekeepers` hook: https://github.com/neondatabase/neon/issues/11163 TODOs implemented: * cancellations of enqueued reconciliations on a per-timeline basis, helpful if there is an ongoing deletion * implement pending ops overwrite behavior * load pending operations from db RFC section for important reading: [link](https://github.com/neondatabase/neon/blob/main/docs/rfcs/035-safekeeper-dynamic-membership-change.md#storage_controller-implementation) Implements the bulk of #9011 Successor of #10440. --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2025-03-11 02:31:22 +00:00
Alex Chi Z.	1b5258ef6a	Merge pull request #11161 from neondatabase/arpad/release-db-sk-loading release branch: re-apply "storcon db: load safekeepers from DB again"	2025-03-10 21:50:58 -04:00
Arpad Müller	9fbb33e9d9	storcon db: load safekeepers from DB again (#11087 ) Earlier PR #11041 soft-disabled the loading code for safekeepers from the storcon db. This PR makes us load the safekeepers from the database again, now that we have [JWTs available on staging](https://github.com/neondatabase/neon/pull/11087) and soon on prod. This reverts commit `23fb8053c5`. Part of https://github.com/neondatabase/cloud/issues/24727	2025-03-11 01:12:21 +01:00
Alex Chi Z.	3451bdd3d2	fix(test): force L0 compaction before gc-compaction (#11143 ) ## Problem Fix test flakyness of `test_gc_feedback` Closes: https://github.com/neondatabase/neon/issues/11153 ## Summary of changes Looking at the log, gc-compaction is interrupted by L0 compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-10 20:03:49 +00:00
Konstantin Knizhnik	fb1957936c	Fix caclulation of LFC used_pages (#11095 ) ## Problem Async prefetch in LFC PR cause incorrect calculation of LFC `used_pages`when page is overwritten ## Summary of changes Decrement `used_pages` is page is overwritten. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2025-03-10 18:28:55 +00:00
Matthias van de Meent	bc052fd0fc	Add configuration options to disable prevlink checks (#11138 ) This allows for improved decoding of otherwise broken WAL. ## Problem Currently, if (or when) a WAL record has a wrong prevptr, that breaks decoding. With this, we don't have to break on that if we decide it's OK to proceed after that. ## Summary of changes Use a Neon GUC to allow the system to enable the NEON-specific skip_lsn_checks option in XLogReader.	2025-03-10 17:02:30 +00:00
Vlad Lazar	8c553297cb	safekeeper: use max end lsn as start of next batch (#11152 ) ## Problem Partial reads are still problematic. They are stored in the buffer of the wal decoder and result in gaps being reported too eagerly on the pageserver side. ## Summary of changes Previously, we always used the start LSN of the chunk of WAL that was just read. This patch switches to using the end LSN of the last record that was decoded in the previous iteration.	2025-03-10 16:33:28 +00:00
Dmitrii Kovalkov	63b22d3fb1	pageserver: https for management API (#11025 ) ## Problem Storage controller uses unencrypted HTTP requests for pageserver management API. Closes: https://github.com/neondatabase/cloud/issues/24283 ## Summary of changes - Implement `http_utils::server::Server` with TLS support. - Replace `hyper0::server::Server` with `http_utils::server::Server` in pageserver. - Add HTTPS handler for pageserver management API. - Generate local SSL certificates in neon local.	2025-03-10 15:07:59 +00:00
JC Grünhage	f17931870f	fix(ci): use <!subteam^ID> syntax for pinging groups on slack (#11135 ) ## Problem Pinging groups on slack didn't work, because I didn't use the correct syntax. ## Summary of changes Use the correct syntax for pinging groups.	2025-03-10 13:27:23 +00:00
Arpad Müller	33c3c34c95	Appease cargo deny errors (#11142 ) * pprof can also use `prost` as a backend, switch to it as `protobuf` has no update available but a security issue. * `paste` is a build time dependency, so add the unmaintained warning as an exception.	2025-03-10 13:24:14 +00:00
Ivan Efremov	5d38fd6c43	fix(proxy): Use testodrome query id for latency measurement (#11150 ) Add a new neon option "neon_query_id" to glue data with testodrome queries. Log latency in microseconds always. Relates to the #22486	2025-03-10 12:55:16 +00:00
Dmitrii Kovalkov	66881b4394	storcon: update scheduler stats when changing node's preferred az (#11147 ) ## Problem `home_shard_count` is not updated on the preferred AZ change. Closes: https://github.com/neondatabase/neon/issues/10493 ## Summary of changes - Update scheduler stats (node ref counts) on preferred AZ change.	2025-03-10 11:33:00 +00:00
Konstantin Knizhnik	c87d307e8c	Print state of connection buffer when no response is receioved from PS for a long time (#11145 ) ## Problem See https://neondb.slack.com/archives/C08DE6Q9C3B Sometimes compute is not able to receive responses from PS for a long time (minutes). I do not think that the problem is at compute side, but in order to exclude this possibility I wan to see more information about connection state at compute side, particularly amount of data cached in connection buffer. ## Summary of changes Right now we are dumping state of socket buffer. This PR adds printing state of connection buffer. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-09 18:36:36 +00:00
Tristan Partin	1b8c4286c4	Fetch remote extension in ALTER EXTENSION UPDATE statements (#11102 ) Previously, remote extensions were not fetched unless they were used in some other manner. For instance, loading a BM25 index in pg_search fetches the pg_search extension. However, if on a fresh compute with pg_search 0.15.5 installed, the user ran `ALTER EXTENSION pg_search UPDATE TO '0.15.6'` without first using the pg_search extension, we would not fetch the extension and fail to find an update path. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-09 17:29:44 +00:00
Alex Chi Z.	93983ac5fc	Merge pull request #11128 from neondatabase/rc/release/2025-03-07 Storage release 2025-03-07	2025-03-07 14:02:55 -05:00
Tristan Partin	3fe5650039	Fix dropping role with table privileges granted by non-neon_superuser (#10964 ) We were previously only revoking privileges granted by neon_superuser. However, we need to do it for all grantors. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-07 19:00:11 +00:00
Alex Chi Z	72dd540c87	Merge branch 'release' of https://github.com/neondatabase/neon into rc/release/2025-03-07	2025-03-07 13:04:10 -05:00
Alex Chi Z.	612d0aea4f	feat(pageserver): add force patch index_part API (#11119 ) ## Problem As part of the disaster recovery tool. Partly for https://github.com/neondatabase/neon/issues/9114. ## Summary of changes * Add a new pageserver API to force patch the fields in index_part and modify the timeline internal structures. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-07 12:54:44 -05:00
Alex Chi Z.	cd438406fb	feat(pageserver): add force patch index_part API (#11119 ) ## Problem As part of the disaster recovery tool. Partly for https://github.com/neondatabase/neon/issues/9114. ## Summary of changes * Add a new pageserver API to force patch the fields in index_part and modify the timeline internal structures. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-07 17:42:52 +00:00
Dmitrii Kovalkov	e876794ce5	storcon: use https safekeeper api (#11065 ) ## Problem Storage controller uses http for requests to safekeeper management API. Closes: https://github.com/neondatabase/cloud/issues/24835 ## Summary of changes - Add `use_https_safekeeper_api` option to storcon to use https api - Use https for requests to safekeeper management API if this option is enabled - Add `ssl_ca_file` option to storcon for ability to specify custom root CA certificate	2025-03-07 17:22:47 +00:00
John Spray	87e6117dfd	storage controller: API-driven graceful migrations (#10913 ) ## Problem The current migration API does a live migration, but if the destination doesn't already have a secondary, that live migration is unlikely to be able to warm up a tenant properly within its timeout (full warmup of a big tenant can take tens of minutes). Background optimisation code knows how to do this gracefully by creating a secondary first, but we don't currently give a human a way to trigger that. Closes: https://github.com/neondatabase/neon/issues/10540 ## Summary of changes - Add `prefererred_node` parameter to TenantShard, which is respected by optimize_attachment - Modify migration API to have optional prewarm=true mode, in which we set preferred_node and call optimize_attachment, rather than directly modifying intentstate - Require override_scheduler=true flag if migrating somewhere that is a less-than-optimal scheduling location (e.g. wrong AZ) - Add `origin_node_id` to migration API so that callers can ensure they're moving from where they think they're moving from - Add tests for the above The storcon_cli wrapper for this has a 'watch' mode that waits for eventual cutover. This doesn't show the warmth of the secondary evolve because we don't currently have an API for that in the controller, as the passthrough API only targets attached locations, not secondaries. It would be straightforward to add later as a dedicated endpoint for getting secondary status, then extend the storcon_cli to consume that and print a nice progress indicator.	2025-03-07 17:02:38 +00:00
Vlad Lazar	aa3a75a0a7	pageserver: enable previous heatmaps by default (#11132 ) We add the off by default configs in https://github.com/neondatabase/neon/pull/11088 because the unarchival heatmap was causing oversized secondary locations. That was fixed in https://github.com/neondatabase/neon/pull/11098, so let's turn them on by default.	2025-03-07 11:14:38 -05:00
Arpad Müller	6389c9184c	update ring to 0.17.13 (#11131 ) Update ring from 0.17.6 to 0.17.13. Addresses the advisory: https://rustsec.org/advisories/RUSTSEC-2025-0009	2025-03-07 11:14:26 -05:00
Vlad Lazar	084fc4a757	pageserver: enable previous heatmaps by default (#11132 ) We add the off by default configs in https://github.com/neondatabase/neon/pull/11088 because the unarchival heatmap was causing oversized secondary locations. That was fixed in https://github.com/neondatabase/neon/pull/11098, so let's turn them on by default.	2025-03-07 16:05:31 +00:00
Vlad Lazar	e177927476	safekeeper: don't skip empty records for shard zero (#11137 ) ## Problem Shard zero needs to track the start LSN of the latest record in adition to the LSN of the next record to ingest. The former is included in basebackup persisted by the compute in WAL. Previously, empty records were skipped for all shards. This caused the prev LSN tracking on the PS to fall behind and led to logical replication issues. ## Summary of changes Shard zero now receives emtpy interpreted records for LSN tracking purposes. A test is included too.	2025-03-07 11:04:30 -05:00
Vlad Lazar	937876cbe2	safekeeper: don't skip empty records for shard zero (#11137 ) ## Problem Shard zero needs to track the start LSN of the latest record in adition to the LSN of the next record to ingest. The former is included in basebackup persisted by the compute in WAL. Previously, empty records were skipped for all shards. This caused the prev LSN tracking on the PS to fall behind and led to logical replication issues. ## Summary of changes Shard zero now receives emtpy interpreted records for LSN tracking purposes. A test is included too.	2025-03-07 15:52:01 +00:00
Alexander Lakhin	a4ce20db5c	Support workflow_dispatch event in _meta.yml (#11133 ) ## Problem Allow for using _meta.yml with workflow_dispatch event. ## Summary of changes Handle this event in the run-kind step; fix and update the description of the run-kind output.	2025-03-07 15:00:06 +00:00
Erik Grinaker	eedd179f0c	storcon: initial autosplit tweaks (#11134 ) ## Problem This patch makes some initial tweaks as preparation for https://github.com/neondatabase/cloud/issues/22532, where we will be introducing additional autosplit logic. The plan is outlined in https://github.com/neondatabase/cloud/issues/22532#issuecomment-2706215907. ## Summary of changes Minor code cleanups and behavioral changes: * Decide that we'll split based on `max_logical_size` (later possibly `total_logical_size`). * Fix a bug that would split the smallest candidate rather than the largest. * Pick the largest candidate by `max_logical_size` rather than `resident_size`, for consistency (debatable). * Split out `get_top_tenant_shards()` to fetch split candidates. * Fetch candidates concurrently from all nodes. * Make `TenantShard.get_scheduling_policy()` return a copy instead of a reference.	2025-03-07 14:38:01 +00:00
Arpad Müller	f1b18874c3	storcon: require safekeeper jwt's in strict mode (#11116 ) We have introduced the ability to specify safekeeper JWTs for the storage controller. It now does heartbeats. We now want to also require the presence of those JWTs. Let's merge this PR shortly after the release cutoff. Part of / follow-up of https://github.com/neondatabase/cloud/issues/24727	2025-03-07 13:29:48 +00:00
Fedor Dikarev	db77896e92	remove CODEOWNER assignement for the test_runner/ (#11130 ) ## Problem That adds an extra unnecessary load to the PerfCorr team ## Summary of changes Remove `CODEOWNERS` assignment for the `test_runner/` folder	2025-03-07 12:38:27 +00:00
Alexey Kondratov	f5aa8c3eac	feat(compute_ctl): Add a basic HTTP API benchmark (#11123 ) ## Problem We just had a regression reported at https://neondb.slack.com/archives/C08EXUJF554/p1741102467515599, which clearly came with one of the releases. It's not a huge problem yet, but it's annoying that we cannot quickly attribute it to a specific commit. ## Summary of changes Add a very simple `compute_ctl` HTTP API benchmark that does 10k requests to `/status` and `metrics.json` and reports p50 and p99. --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech>	2025-03-07 12:35:42 +00:00
Arpad Müller	cea67fc062	update ring to 0.17.13 (#11131 ) Update ring from 0.17.6 to 0.17.13. Addresses the advisory: https://rustsec.org/advisories/RUSTSEC-2025-0009	2025-03-07 12:17:04 +00:00
github-actions[bot]	e9bbafebbd	Storage release 2025-03-07	2025-03-07 06:02:10 +00:00
Alex Chi Z.	e825974a2d	feat(pageserver): yield gc-compaction to L0 compaction (#11120 ) ## Problem Part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes gc-compaction could take a long time in some cases, for example, if the job split heuristics is wrong and we selected a too large region for compaction that can't be finished within a reasonable amount of time. We will give up such tasks and yield to L0 compaction. Each gc-compaction sub-compaction job is atomic and cannot be split further so we have to give up (instead of storing a state and continue later as in image creation). --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-06 20:30:11 +00:00
Fedor Dikarev	50d883d516	Add performance-correctness to the CODEOWNERS (#11124 ) ## Problem After splitting teams it became a bit more complicated for the PerfCorr team to work on tests changes. ## Summary of changes 1. Add PerfCorr team co-owners for `.github/` folder 2. Add PerCorr team as owner for `test_runner/` folder	2025-03-06 19:59:17 +00:00
Alexey Kondratov	a485022300	fix(compute_ctl): Properly escape identifiers inside PL/pgSQL blocks (#11045 ) ## Problem In `f37eeb56`, I properly escaped the identifier, but I haven't noticed that the resulting string is used in the `format('...')`, so it needs additional escaping. Yet, after looking at it closer and with Heikki's and Tristan's help, it appeared to be that it's a full can of worms and we have problems all over the code in places where we use PL/pgSQL blocks. ## Summary of changes Add a new `pg_quote_dollar()` helper to deal with it, as dollar-quoting of strings seems to be the only robust way to escape strings in dynamic PL/pgSQL blocks. We mimic the Postgres' `pg_get_functiondef` logic here [1]. While on it, I added more tests and caught a couple of more bugs with string escaping: 1. `get_existing_dbs_async()` was wrapping `owner` in additional double-quotes if it contained special characters 2. `construct_superuser_query()` was flawed in even more ways than the rest of the code. It wasn't realistic to fix it quickly, but after thinking about it more, I realized that we could drop most of it altogether. IIUC, it was added as some sort of migration, probably back when we haven't had migrations yet. So all the complicated code was needed to properly update existing roles and DBs. In the current Neon, this code only runs before we create the very first DB and role. When we create roles and DBs, all `neon_superuser` grants are added in the different places. So the worst thing that could happen is that there is an ancient branch somewhere, so when users poke it, they will realize that not all Neon features work as expected. Yet, the fix is simple and self-serve -- just create a new role via UI or API, and it will get a proper `neon_superuser` grant. [1]: `8b49392b27/src/backend/utils/adt/ruleutils.c (L3153)` Closes neondatabase/cloud#25048	2025-03-06 19:54:29 +00:00
Anastasia Lubennikova	3dee29eb00	Spawn rsyslog from neonvm (#11111 ) then configure it from compute_ctl. to make it more robust in case of restarts and rsyslogd crashes.	2025-03-06 19:14:19 +00:00
Peter Bendel	3bb318a295	run periodic page bench more frequently to simplify bi-secting regressions (#11121 ) ## Problem When periodic pagebench runs only once a day a lot of commits can be in between a good run and a regression. ## Summary of changes Run the workflow every 3 hours	2025-03-06 17:47:54 +00:00
Alex Chi Z.	11334a2cdb	feat(pageserver): more statistics for gc-compaction (#11103 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Add timers for each phase of the gc-compaction. * Add a final ratio computation to directly show the garbage collection ratio in the logs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-06 16:44:00 +00:00
Alexey Kondratov	4b77807de9	fix(compute/sql_exporter): Ignore invalid DBs when collecting size (#11097 ) ## Problem Original Slack discussion: https://neondb.slack.com/archives/C04DGM6SMTM/p1739915430147169 TL;DR in Postgres, it's totally normal to have 'invalid' DBs (state after the interrupted `DROP DATABASE`). Yet, some of our metrics collected with `sql_exporter` try to get the size of such invalid DBs. Typical log lines: ``` time=2025-03-05T16:30:32.368Z level=ERROR source=promhttp.go:52 msg="Error gathering metrics" error="[from Gatherer #1] [collector=neon_collector,query=pg_stats_userdb] pq: [NEON_SMGR] [reqid 0] could not read db size of db 173228 from page server at lsn 0/44A0E8C0" time=2025-03-05T16:30:32.369Z level=ERROR source=promhttp.go:52 msg="Error gathering metrics" error="[from Gatherer #1] [collector=neon_collector,query=db_total_size] pq: [NEON_SMGR] [reqid 0] could not read db size of db 173228 from page server at lsn 0/44A0E8C0" ``` ## Summary of changes Ignore invalid DBs in these two metrics -- `pg_stats_userdb` and `db_total_size`	2025-03-06 15:32:17 +00:00
Vlad Lazar	5ceb8c994d	pageserver: mark unarchival heatmap layers as cold (#11098 ) ## Problem On unarchival, we update the previous heatmap with all visible layers. When the primary generates a new heatmap it includes all those layers, so the secondary will download them. Since they're not actually resident on the primary (we didn't call the warm up API), they'll never be evicted, so they remain in the heatmap. We want these layers in the heatmap, since we might wish to warm-up an unarchived timeline after a shard migration. However, we don't want them to be downloaded on the secondary until we've warmed up the primary. ## Summary of Changes Include these layers in the heatmap and mark them as cold. All heatmap operations act on non-cold layers apart from the attached location warming up API, which will download the cold layers. Once the cold layers are downloaded on the primary, they'll be included in the next heatmap as hot and the secondary starts fetching them too.	2025-03-06 11:25:02 +00:00
Vlad Lazar	43cea0df91	pageserver: allow for unit test stress test (#11112 ) ## Problem I like using `cargo stress` to hammer on a test, but it doesn't work out of the box because it does parallel runs by default and tests always use the same repo dir. ## Summary of changes Add an uuid to the test repo dir when generating it.	2025-03-06 11:23:25 +00:00
Erik Grinaker	ab7efe9e47	pageserver: add amortized read amp metrics (#11093 ) ## Problem In a batch, `pageserver_layers_per_read_global` counts all layer visits towards every read in the batch, since this directly affects the observed latency of the read. However, this doesn't give a good picture of the amortized read amplification due to batching. ## Summary of changes Add two more global read amp metrics: * `pageserver_layers_per_read_batch_global`: number of layers visited per batch. * `pageserver_layers_per_read_amortized_global`: number of layers divided by reads in a batch.	2025-03-06 10:23:48 +00:00
Folke Behrens	16b8a3f598	Update Jinja2 to 3.1.6 (#11109 ) https://github.com/neondatabase/neon/security/dependabot/89	2025-03-06 09:55:41 +00:00
Folke Behrens	f343537e4d	proxy: Small adjustments to json logging (#11107 ) * Remove callsite identifier registration on span creation. Forgot to remove from last PR. Was part of alternative idea. * Move "spans" object to right after "fields", so event and span fields are listed together.	2025-03-06 09:18:28 +00:00
Alex Chi Z.	78b322f616	rfc: add 041-rel-sparse-keyspace (#10412 ) Based on the PoC patch I've done in #10316, I'd like to put an RFC in advance to ensure everyone is on the same page, and start incrementally port the code to the main branch. https://github.com/neondatabase/neon/issues/9516 [Rendered](https://github.com/neondatabase/neon/blob/skyzh/rfc-041-rel-sparse-keyspace/docs/rfcs/041-rel-sparse-keyspace.md) --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-03-05 21:43:16 +00:00
Alex Chi Z.	2de3629b88	test(pageserver): use reldirv2 by default in regress tests (#11081 ) ## Problem For pg_regress test, we do both v1 and v2; for all the rest, we default to v2. part of https://github.com/neondatabase/neon/issues/9516 ## Summary of changes Use reldir v2 across test cases by default. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-05 21:02:44 +00:00
Em Sharnoff	1fe23fe8d2	compute/lfc: Add chunk size to neon_lfc_stats (#11100 ) This PR adds a new key to neon.neon_lfc_stats — 'file_cache_chunk_size_pages'. It just returns the value of BLOCKS_PER_CHUNK from the LFC implementation. The new value should (eventually) allow changing the chunk size without breaking any places that rely on LFC stats values measured in number of chunks. See neondatabase/cloud#25170 for more.	2025-03-05 20:35:08 +00:00
Peter Bendel	604eb5e8d4	fix grafana dashboard link for pooler endoints (#11099 ) ## Problem Our benchmarking workflows contain links to grafana dashboards to troubleshoot problems. This works fine for non-pooled endpoints. For pooled endpoints we need to remove the `-pooler` suffix from the endpoint's hostname to get a valid endpoint ID. Example link that doesn't work in this run https://github.com/neondatabase/neon/actions/runs/13678933253/job/38246028316#step:8:311 ## Summary of changes Check if connection string is a -pooler connection string and if so remove this suffix from the endpoint ID. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-03-05 20:01:17 +00:00
Tristan Partin	d599d2df80	Update postgres_exporter to 0.17.1 (#11094 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-05 18:32:45 +00:00
Alexey Kondratov	8263107f6c	feat(compute): Add filename label to remote ext requests metric (#11091 ) ## Problem We realized that we may use this metric for more 'live' info about extension installations vs. what we have with installed extensions metric, which is only updated at start, atm. ## Summary of changes Add `filename` label to `compute_ctl_remote_ext_requests_total`. Note that it contains the raw archive name with `.tar.zst` at the end, so the consumer may need to strip this suffix. Closes https://github.com/neondatabase/cloud/issues/24694	2025-03-05 18:17:57 +00:00
Anastasia Lubennikova	d94fc75cfc	Setup compute_ctl pgaudit and rsyslog (#10615 ) Setup pgaudit and pgauditlogtofile extensions in compute_ctl when the ComputeAuditLogLevel is set to 'hipaa'. See cloud PR https://github.com/neondatabase/cloud/pull/24568 Add rsyslog setup for compute_ctl. Spin up a rsyslog server in the compute VM, and configure it to send logs to the endpoint specified in AUDIT_LOGGING_ENDPOINT env.	2025-03-05 18:01:00 +00:00
Alex Chi Z.	9cdc8c0e6c	feat(pageserver): revisit error types for gc-compaction (#11082 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 We used anyhow::Error everywhere and it's time to fix. ## Summary of changes * Make sure that cancel errors are correctly propagated as CompactionError::ShuttingDown. * Skip all the trigger computation work if gc_cutoff is not generated yet. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-05 15:57:38 +00:00
Arpad Müller	2d45522fa6	storcon db: load safekeepers from DB again (#11087 ) Earlier PR #11041 soft-disabled the loading code for safekeepers from the storcon db. This PR makes us load the safekeepers from the database again, now that we have [JWTs available on staging](https://github.com/neondatabase/neon/pull/11087) and soon on prod. This reverts commit `23fb8053c5`. Part of https://github.com/neondatabase/cloud/issues/24727	2025-03-05 15:45:43 +00:00
Vlad Lazar	7430fb9836	Merge pull request #11090 from neondatabase/vlad/release-gate-previous-heatmap Storage release 2025-03-05	2025-03-05 14:37:24 +00:00
JC Grünhage	94e6897ead	fix(ci): make deploy job depend on pushing images to dev registries (#11089 ) ## Problem If an image fails to push to dev registries, we shouldn't trigger the deploy job, because that depends on images existing in dev registries. To ensure this is the case, the deploy job needs to depend on pushing to dev registries. ## Summary of changes Make `deploy` depend on `push-neon-image-dev` and `push-compute-image-dev`.	2025-03-05 14:28:43 +00:00
Erik Grinaker	332aae1484	test_runner/regress: speed up `test_check_visibility_map` (#11086 ) ## Problem `test_check_visibility_map` is the slowest test in CI, and can cause timeouts under particularly slow configurations (`debug` and `without-lfc`). ## Summary of changes * Reduce the `pgbench` scale factor from 10 to 8. * Omit a redundant vacuum during `pgbench` init. * Remove a final `vacuum freeze` + `pg_check_visible` pass, which has questionable value (we've already done a vacuum freeze previously, and we don't flush the compute cache before checking anyway).	2025-03-05 13:50:35 +00:00
Vlad Lazar	c45d169527	pageserver: gate previous heatmap behind config flag (#11088 ) ## Problem On unarchival, we update the previous heatmap with all visible layers. When the primary generates a new heatmap it includes all those layers, so the secondary will download them. Since they're not actually resident on the primary (we didn't call the warm up API), they'll never be evicted, so they remain in the heatmap. This leads to oversized secondary locations like we saw in pre-prod. ## Summary of changes Gate the loading of the previous heatmaps and the heatmap generation on unarchival behind configuration flags. They are disabled by default, but enabled in tests.	2025-03-05 13:29:46 +01:00
Vlad Lazar	8c12ccf729	pageserver: gate previous heatmap behind config flag (#11088 ) ## Problem On unarchival, we update the previous heatmap with all visible layers. When the primary generates a new heatmap it includes all those layers, so the secondary will download them. Since they're not actually resident on the primary (we didn't call the warm up API), they'll never be evicted, so they remain in the heatmap. This leads to oversized secondary locations like we saw in pre-prod. ## Summary of changes Gate the loading of the previous heatmaps and the heatmap generation on unarchival behind configuration flags. They are disabled by default, but enabled in tests.	2025-03-05 12:20:18 +00:00
Vlad Lazar	abae7637d6	pageserver: do big reads to fetch slru segment (#11029 ) ## Problem Each page of the slru segment is fetched individually when it's loaded on demand. ## Summary of Changes Use `Timeline::get_vectored` to fetch 16 at a time.	2025-03-05 11:55:55 +00:00
Anastasia Lubennikova	38a883118a	Skip dropping tablesync replication slots on the publisher from branch (#11073 ) fixes https://github.com/neondatabase/cloud/issues/24292 Do not drop tablesync replication slots on the publisher, when we're in the process of dropping subscriptions inherited by a neon branch. Because these slots are still needed by the parent branch subscriptions. For regular slots we handle this by setting the slot_name to NONE before calling DROP SUBSCRIPTION, but tablesync slots are not exposed to SQL. rely on GUC disable_logical_replication_subscribers=true to know that we're in the Neon-specific process of dropping subscriptions.	2025-03-05 11:29:46 +00:00
Erik Grinaker	40aa4d7151	utils: log Sentry initialization (#11077 ) ## Problem We don't have any logging for Sentry initialization. This makes it hard to verify that it has been configured correctly. ## Summary of changes Log some basic info when Sentry has been initialized, but omit the public key (which allows submitting events). Also log when `SENTRY_DSN` isn't specified at all, and when it fails to initialize (which is supposed to panic, but we may as well).	2025-03-05 11:23:07 +00:00
Folke Behrens	8e51bfc597	proxy: JSON logging field refactor (#11078 ) ## Problem Grafana Loki's JSON handling is somewhat limited and the log message should be structured in a way that it's easy to sift through logs and filter. ## Summary of changes * Drop span_id. It's too short lived to be of value and only bloats the logs. * Use the span's name as the object key, but append a unique numeric value to prevent name collisions. * Extract interesting span fields into a separate object at the root. New format: ```json { "timestamp": "2025-03-04T18:54:44.134435Z", "level": "INFO", "message": "connected to compute node at 127.0.0.1 (127.0.0.1:5432) latency=client: 22.002292ms, cplane: 0ns, compute: 5.338875ms, retry: 0ns", "fields": { "cold_start_info": "unknown" }, "process_id": 56675, "thread_id": 9122892, "task_id": "24", "target": "proxy::compute", "src": "proxy/src/compute.rs:288", "trace_id": "5eb89b840ec63fee5fc56cebd633e197", "spans": { "connect_request#1": { "ep": "endpoint", "role": "proxy", "session_id": "b8a41818-12bd-4c3f-8ef0-9a942cc99514", "protocol": "tcp", "conn_info": "127.0.0.1" }, "connect_to_compute#6": {}, "connect_once#8": { "compute_id": "compute", "pid": "853" } }, "extract": { "session_id": "b8a41818-12bd-4c3f-8ef0-9a942cc99514" } } ```	2025-03-05 10:27:46 +00:00
Peter Bendel	906d7468cc	exclude separate perf tests from bench step (#11084 ) ## Problem Our benchmarking workflow has a job step `bench`which runs all tests in test_runner/performance/* except those that we want to run separately. We recently added two test cases to that testcase directory that we want to run separately but forgot to ignore them during the bench step. This is now causing [failures](https://github.com/neondatabase/neon/actions/runs/13667689340/job/38212087331#step:7:392). ## Summary of changes Ignore the separately run tests in the bench step.	2025-03-05 10:14:51 +00:00
Konstantin Knizhnik	438f7bb726	Check response status in prefetch_lookup (#11080 ) ## Problem New async prefetch introduces `prefetch+lookup[` function which is called before LFC lookup to check if prefetch request is already completed. This function is not containing now check that response is actually `T_NeonGetPageResponse` (and not error). ## Summary of changes Add checks for response tag. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-05 10:03:09 +00:00
Peter Bendel	f62ddb11ed	Distinguish manually submitted runs for periodic pagebench in grafana dashboard (#11079 ) ## Problem Periodic pagebench workflow runs periodically from latest main commit and also allows to dispatch it manually for a given commit hash to bi-sect regressions. However in the dashboards we can not distinguish manual runs from periodic runs which makes it harder to follow the trend. ## Summary of changes Send an additional flag commit type to the benchmark runner instance to distinguish the run type. Note: this needs a follow-up PR on the receiving side.	2025-03-04 18:11:43 +00:00
Tristan Partin	7b7e4a9fd3	Authorize compute_ctl requests from the control plane (#10530 ) The compute should only act if requests come from the control plane. Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-03-04 18:08:00 +00:00
Erik Grinaker	4bbdb758ec	compute_tools: appease unused lint on macOS (#11074 ) ## Problem On macOS, the `unused` lint complains about two variables not used in `!linux` builds. These were introduced in #11007. ## Summary of changes Appease the linter by explicitly using the variables in `!linux` branches.	2025-03-04 16:39:32 +00:00
Alex Chi Z.	20af9cef17	fix(test): use the same value for reldir v1+v2 (#11070 ) ## Problem part of https://github.com/neondatabase/neon/issues/11067 My observation is that with the current value of settings, x86-v1 usually takes 30s, arm-v1 1m30s, x86-v2 1m, arm-v2 3m. But sometimes the system could run too slow and cause test to timeout on arm with reldir v2. While I investigate what's going on and further improve the performance, I'd like to set both of them to use the same test input, so that it doesn't timeout and we don't abuse this test case as a performance test. ## Summary of changes Use the same settings for both test cases. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-04 14:55:50 +00:00
Erik Grinaker	a2902e774a	http-utils: generate heap profiles with jemalloc_pprof (#11075 ) ## Problem The code to generate symbolized pprof heap profiles and flamegraph SVGs has been upstreamed to the `jemalloc_pprof` crate: * https://github.com/polarsignals/rust-jemalloc-pprof/pull/22 * https://github.com/polarsignals/rust-jemalloc-pprof/pull/23 ## Summary of changes Use `jemalloc_pprof` to generate symbolized pprof heap profiles and flamegraph SVGs. This reintroduces a bunch of internal jemalloc stack frames that we'd previously strip, e.g. each stack now always ends with `prof_backtrace_impl` (where jemalloc takes a stack trace for heap profiling), but that seems ok.	2025-03-04 12:13:41 +00:00
Vlad Lazar	435bf452e6	tests: remove obsolete err log whitelisting (#11069 ) The pageserver read path now supports overlapped in-memory and image layers via https://github.com/neondatabase/neon/pull/11000. These allowed errors are now obsolete.	2025-03-04 08:18:19 +00:00
Erik Grinaker	65addfc524	storcon: add per-tenant rate limiting for API requests (#10924 ) ## Problem Incoming requests often take the service lock, and sometimes even do database transactions. That creates a risk that a rogue client can starve the controller of the ability to do its primary job of reconciling tenants to an available state. ## Summary of changes * Use the `governor` crate to rate limit tenant requests at 10 requests per second. This is ~10-100x lower than the worst "attack" we've seen from a client bug. Admin APIs are not rate limited. * Add a `storage_controller_http_request_rate_limited` histogram for rate limited requests. * Log a warning every 10 seconds for rate limited tenants. The rate limiter is parametrized on TenantId, because the kinds of client bug we're protecting against generally happen within tenant scope, and the rates should be somewhat stable: we expect the global rate of requests to increase as we do more work, but we do not expect the rate of requests to one tenant to increase. --------- Co-authored-by: John Spray <john@neon.tech>	2025-03-03 22:04:59 +00:00
Alex Chi Z.	6d0976dad5	feat(pageserver): persist reldir v2 migration status (#10980 ) ## Problem part of https://github.com/neondatabase/neon/issues/9516 ## Summary of changes Similar to the aux v2 migration, we persist the relv2 migration status into index_part, so that even the config item is set to false, we will still read from the v2 storage to avoid loss of data. Note that only the two variants `None` and `Some(RelSizeMigration::Migrating)` are used for now. We don't have full migration implemented so it will never be set to `RelSizeMigration::Migrated`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-03 21:05:43 +00:00
Alex Chi Z.	dbf9a80261	fix(pageserver): avoid flooding gc-compaction logs (#11024 ) ## Problem The "did not trigger" gets logged at 10k/minute in staging. ## Summary of changes Change it to debug level. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-03 20:23:20 +00:00
Vlad Lazar	6ca49b4d0c	safekeeper: fix a gap tracking edge case (#11054 ) The interpreted reader tracks a record aligned current position in the WAL stream. Partial reads move the stream internally, but not from the pov of the interpreted WAL reader. Hence, where new shards subscribe with a start position that matches the reader's current position, but we've also done some partial reads. This confuses the gap tracking. To make it more robust, update the current batch start to the min between the new start position and its current value. Since no record has been decoded yet (position matches), we can't have lost it	2025-03-03 19:16:03 +00:00
Vlad Lazar	5197e43396	pageserver: add recurse flag to layer download spec (#11068 ) I missed updating the open api spec in the original PR. We need this so that the cplane auto-generated client sees the flag.	2025-03-03 19:04:01 +00:00
Alexander Lakhin	9a4e2eab61	Fix artifact name for build with sanitizers (#11066 ) ## Problem When a build is made with sanitizers, this is not reflected in the artifact name, which can lead to overriding normal builds with sanitized ones. ## Summary of changes Take this property of a build into account when constructing the artifact name.	2025-03-03 18:00:53 +00:00
Vlad Lazar	8298bc903c	pageserver: handle in-memory layer overlaps with persistent layers (#11000 ) ## Problem Image layers may be nested inside in-memory layers as diagnosed [here](https://github.com/neondatabase/neon/issues/10720#issuecomment-2649419252). The read path doesn't support this and may skip over the image layer, resulting in a failure to reconstruct the page. ## Summary of changes We already support nesting of image layers inside delta layers. The logic lives in `LayerMap::select_layer`. The main goal of this PR is to propagate the candidate in-memory layer down to that point and update the selection logic. Important changes are: 1. Support partial reads for the in-memory layer. Previously, we could only specify the start LSN of the read. We need to control the end LSN too. 2. `LayerMap::ranged_search` considers in-memory layers too. Previously, the search for in-memory layers was done explicitly in `Timeline::get_reconstruct_data_timeline`. Note that `LayerMap::ranged_search` now returns a weak readable layer which the `LayerManager` can upgrade. This dance is such that we can unit test the layer selection logic. 3. Update `LayerMap::select_layer` to consider the candidate in-memory layer too Loosely related drive bys: 1. Remove the "keys not found" tracking in the ranged search. This wasn't used anywhere and it just complicates things. 2. Remove the difficulty map stuff from the layer map. Again, not used anywhere. Closes https://github.com/neondatabase/neon/issues/9185 Closes https://github.com/neondatabase/neon/issues/10720	2025-03-03 17:52:59 +00:00
John Spray	b953daa21f	safekeeper: allow remote deletion to proceed after dropped requests (#11042 ) ## Problem If a caller times out on safekeeper timeline deletion on a large timeline, and waits a while before retrying, the deletion will not progress while the retry is waiting. The net effect is very very slow deletion as it only proceeds in 30 second bursts across 5 minute idle periods. Related: https://github.com/neondatabase/neon/issues/10265 ## Summary of changes - Run remote deletion in a background task - Carry a watch::Receiver on the Timeline for other callers to join the wait - Restart deletion if the API is called again and the previous attempt failed	2025-03-03 16:03:51 +00:00
Peter Bendel	a07599949f	First version of a new benchmark to test larger OLTP workload (#11053 ) ## Problem We want to support larger tenants (regarding logical database size, number of transactions per second etc.) and should increase our test coverage of OLTP transactions at larger scale. ## Summary of changes Start a new benchmark that over time will add more OLTP tests at larger scale. This PR covers the first version and will be extended in further PRs. Also fix some infrastructure: - default for new connections and large tenants is to use connection pooler pgbouncer, however our fixture always added `statement_timeout=120` which is not compatible with pooler [see](https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter) - action to create branch timed out after 10 seconds and 10 retries but for large tenants it can take longer so use increasing back-off for retries ## Test run https://github.com/neondatabase/neon/actions/runs/13593446706	2025-03-03 15:25:48 +00:00
Vlad Lazar	38277497fd	pageserver: log shutdown at info level for basebackup (#11046 ) ## Problem Timeline shutdown during basebackup logs at error level because the the canecellation error is smushed into BasebackupError::Server. ## Summary of changes Introduce BasebackupError::Shutdown and use it. `log_query_error` will now see `QueryError::Shutdown` and log at info level.	2025-03-03 13:46:50 +00:00
Arseny Sher	ef2b50994c	walproposer: basic infra to enable generations (#11002 ) ## Problem Preparation for https://github.com/neondatabase/neon/issues/10851 ## Summary of changes Add walproposer `safekeepers_generations` field which can be set by prefixing `neon.safekeepers` GUC with `g#n:`. Non zero value (n) forces walproposer to use generations. In particular, this also disables implicit timeline creation as timeline will be created by storcon. Add test checking this. Also add missing infra: `--safekeepers-generation` flag to neon_local endpoint start + fix `--start-timeout` flag: it existed but value wasn't used.	2025-03-03 13:20:20 +00:00
Konstantin Knizhnik	8669bfe493	Do not store zero pages in inmem SMGR for walredo (#11043 ) ## Problem See https://neondb.slack.com/archives/C033RQ5SPDH/p1740157873114339 smgrextend for FSM fork is called during page reconstruction by walredo process causing overflow of inmem SMGR (64 pages). ## Summary of changes Do not store zero pages in inmem SMGR because `inmem_read` returns zero page if it is not able to locate specified block. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-03-03 12:50:07 +00:00
Misha Sakhnov	625c526bdd	ci: create multiarch vm images (#11017 ) ## Problem We build compute-nodes as multi-arch images, but not the vm-compute-nodes. The PR adds multiarch vm images the same way as in autoscaling repo. ## Summary of changes Add architecture to the matrix for vm compute build steps Add merge job --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-03-03 11:47:09 +00:00
a-masterov	df0767176a	Change the tags names according to the curent state (#11059 ) ## Problem We have not synced `force-test-extensions-upgrade.yml` with the last changes. The variable `TEST_EXTENSIONS_UPGRADE` was ignored in the script and actually set to `NEW_COMPUTE_TAG` while it should be set to `OLD_COMPUTE_TAG` as we are about to run compatibility tests. ## Summary of changes The tag names were synced, the logic was fixed.	2025-03-03 09:40:49 +00:00
Heikki Linnakangas	38ddfab643	compute_ctl: Perform more startup actions in parallel (#11008 ) To speed up compute startup. Resizing swap in particular takes about 100 ms on my laptop. By performing it in parallel with downloading the basebackup, that latency is effectively hidden. I would imagine that downloading remote extensions can also take a non-trivial amount of time, although I didn't try to measure that. In any case that's now also performed in parallel with downloading the basebackup.	2025-03-03 00:29:37 +00:00
Heikki Linnakangas	066324d6ec	compute_ctl: Rearrange startup code (#11007 ) Move most of the code to compute.rs, so that all the major startup steps are visible in one place. You can now get a pretty good picture of what happens in the latency-critical path at compute startup by reading ComputeNode::start_compute(). This also clarifies the error handling in start_compute. Previously, the start_postgres function sometimes returned an Err, and sometimes Ok but with the compute status already set to Failed. Now the start_compute function always returns Err on failure, and it's the caller's responsibility to change the compute status to Failed. Separately from that, it returns a handle to the Postgres process via a `&mut` reference if it had already started Postgres (i.e. on success, or if the failure happens after launching the Postgres process). --------- Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2025-02-28 22:48:05 +00:00
Suhas Thalanki	ee0c8ca8fd	Add -fsigned-char for cross platform signed chars (#10852 ) ## Problem In multi-character keys, the GIN index creates a CRC Hash of the first 3 bytes of the key. The hash can have the first bit to be set or unset, needing to have a consistent representation of `char` across architectures for consistent results. GIN stores these keys by their hashes which determines the order in which the keys are obtained from the GIN index. By default, chars are signed in x86 and unsigned in arm, leading to inconsistent behavior across different platform architectures. Adding the `-fsigned-char` flag to the GCC compiler forces chars to be treated as signed across platforms, ensuring the ordering in which the keys are obtained consistent. ## Summary of changes Added `-fsigned-char` to the `CFLAGS` to force GCC to use signed chars across platforms. Added a test to check this across platforms. Fixes: https://github.com/neondatabase/cloud/issues/23199	2025-02-28 21:07:21 +00:00
Vlad Lazar	a1e67cfe86	Merge pull request #11051 from neondatabase/vlad/release-8005-and-cherry-picks Storage release 2025-02-28	2025-02-28 19:40:33 +00:00
Ivan Efremov	56033189c1	feat(proxy): Log latency after connect to compute (#11048 ) ## Problem To measure latency accurate we should associate the testodrome role within a latency data ## Summary of changes Add latency logging to associate different roles within a latency. Relates to the #22486	2025-02-28 17:58:42 +00:00
Erik Grinaker	0263c92c47	pageserver: fix race that can wedge background tasks (#11047 ) ## Problem `wait_for_active_tenant()`, used when starting background tasks, has a race condition that can cause it to wait forever (until cancelled). It first checks the current tenant state, and then subscribes for state updates, but if the state changes between these then it won't be notified about it. We've seen this wedge compaction tasks, which can cause unbounded layer file buildup and read amplification. ## Summary of changes Use `watch::Receiver::wait_for()` to check both the current and new tenant states.	2025-02-28 18:11:10 +01:00
Vlad Lazar	5fc599d653	storcon: soft disable SK heartbeats (#11041 ) ## Problem JWT tokens aren't in place, so all SK heartbeats fail. This is equivalent to a wait before applying the PS heartbeats and makes things more flaky. ## Summary of Changes Add a flag that skips loading SKs from the db on start-up and at runtime.	2025-02-28 18:11:10 +01:00
Erik Grinaker	d857f63e3b	pageserver: fix race that can wedge background tasks (#11047 ) ## Problem `wait_for_active_tenant()`, used when starting background tasks, has a race condition that can cause it to wait forever (until cancelled). It first checks the current tenant state, and then subscribes for state updates, but if the state changes between these then it won't be notified about it. We've seen this wedge compaction tasks, which can cause unbounded layer file buildup and read amplification. ## Summary of changes Use `watch::Receiver::wait_for()` to check both the current and new tenant states.	2025-02-28 17:00:22 +00:00
Alex Chi Z.	f79ee0bb88	fix(storcon): loop in chaos injection (#11004 ) ## Problem Somehow the previous patch loses the loop in the chaos injector function so everything will only run once. https://github.com/neondatabase/neon/pull/10934 ## Summary of changes Add back the loop. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-28 15:49:15 +00:00
Vlad Lazar	23fb8053c5	storcon: soft disable SK heartbeats (#11041 ) ## Problem JWT tokens aren't in place, so all SK heartbeats fail. This is equivalent to a wait before applying the PS heartbeats and makes things more flaky. ## Summary of Changes Add a flag that skips loading SKs from the db on start-up and at runtime.	2025-02-28 15:49:09 +00:00
Conrad Ludgate	d9ced89ec0	feat(proxy): require TLS to compute if prompted by cplane (#10717 ) https://github.com/neondatabase/cloud/issues/23008 For TLS between proxy and compute, we are using an internally provisioned CA to sign the compute certificates. This change ensures that proxy will load them from a supplied env var pointing to the correct file - this file and env var will be configured later, using a kubernetes secret. Control plane responds with a `server_name` field if and only if the compute uses TLS. This server name is the name we use to validate the certificate. Control plane still sends us the IP to connect to as well (to support overlay IP). To support this change, I'd had to split `host` and `host_addr` into separate fields. Using `host_addr` and bypassing `lookup_addr` if possible (which is what happens in production). `host` then is only used for the TLS connection. There's no blocker to merging this. The code paths will not be triggered until the new control plane is deployed and the `enableTLS` compute flag is enabled on a project.	2025-02-28 14:20:25 +00:00
Erik Grinaker	c7ff3c4c9b	safekeeper: downgrade interpreted reader errors (#11034 ) ## Problem This `critical!` could fire on IO errors, which is just noisy. Resolves #11027. ## Summary of changes Downgrade to error, except for decode errors. These could be either data corruption or a bug, but seem worth investigating either way.	2025-02-28 14:06:56 +00:00
Christian Schwarz	7c53fd0d56	refactor(page_service / timeline::handle): the GateGuard need not be a special case (#11030 ) # Changes While working on - https://github.com/neondatabase/neon/pull/7202 I found myself needing to cache another expensive Arc::clone inside inside the timeline::handle::Cache by wrapping it in another Arc. Before this PR, it seemed like the only expensive thing we were caching was the connection handler tasks' clone of `Arc<Timeline>`. But in fact the GateGuard was another such thing, but it was special-cased in the implementation. So, this refactoring PR de-special-cases the GateGuard. # Performance With this PR we are doing strictly _less_ operations per `Cache::get`. The reason is that we wrap the entire `Types::Timeline` into one Arc. Before this PR, it was a separate Arc around the Arc<Timeline> and one around the Arc<GateGuard>. With this PR, we avoid an allocation per cached item, namely, the separate Arc around the GateGuard. This PR does not change the amount of shared mutable state. So, all in all, it should be a net positive, albeit probably not noticable with our small non-NUMA instances and generally high CPU usage per request. # Reviewing To understand the refactoring logistics, look at the changes to the unit test types first. Then read the improved module doc comment. Then the remaining changes. In the future, we could rename things to be even more generic. For example, `Types::TenantMgr` could really be a `Types::Resolver`. And `Types::Timeline` should, to avoid constant confusion in the doc comment, be called `Types::Cached` or `Types::Resolved`. Because the `handle` module, after this PR, really doesn't care that we're using it for storing Arc's and GateGuards. Then again, specicifity is sometimes more useful than being generic. And writing the module doc comment in a totally generic way would probably also be more confusing than helpful.	2025-02-28 12:31:52 +00:00
a-masterov	7607686f25	Make test extensions upgrade work with absent images (#11036 ) ## Problem CI does not pass for the compute release due to the absence of some images ## Summary of changes Now we use the images from the old non-compute releases for non-compute images	2025-02-28 11:16:22 +00:00
Vlad Lazar	0d6d58bd3e	pageserver: make heatmap layer download API more cplane friendly (#10957 ) ## Problem We intend for cplane to use the heatmap layer download API to warm up timelines after unarchival. It's tricky for them to recurse in the ancestors, and the current implementation doesn't work well when unarchiving a chain of branches and warming them up. ## Summary of changes * Add a `recurse` flag to the API. When the flag is set, the operation recurses into the parent timeline after the current one is done. * Be resilient to warming up a chain of unarchived branches. Let's say we unarchived `B` and `C` from the `A -> B -> C` branch hierarchy. `B` got unarchived first. We generated the unarchival heatmaps and stash them in `A` and `B`. When `C` unarchived, it dropped it's unarchival heatmap since `A` and `B` already had one. If `C` needed layers from `A` and `B`, it was out of luck. Now, when choosing whether to keep an unarchival heatmap we look at its end LSN. If it's more inclusive than what we currently have, keep it.	2025-02-28 10:36:53 +00:00
JC Grünhage	66d2592d04	Merge pull request #11032 from neondatabase/rc/release/2025-02-28 Storage release 2025-02-28	2025-02-28 11:12:07 +01:00
Jan Christian Grünhage	517ae7a60e	Merge remote-tracking branch 'origin/release' into rc/release/2025-02-28	2025-02-28 10:10:19 +01:00
John Spray	55633ebe3a	storcon: enable API passthrough to nonzero shards (#11026 ) ## Problem Storage controller will proxy GETs to pageserver-like tenant/timeline paths through to the pageserver. Usually GET passthroughs make sense to go to shard 0, e.g. if you want to list timelines. But sometimes you really want to know about a particular shard, e.g. reading its cache state or similar. ## Summary of changes - Accept shard IDs as well as tenant IDs in the passthrough route - Refactor node lookup to take a shard ID and make the tenant ID case a layer on top of that. This is one more lock take-drop during these requests, but it's not particularly expensive and these requests shouldn't be terribly frequent This is not immediately used by anything, but will be there any time we want to e.g. do a pass-through query to check the warmth of a tenant cache on a particular shard or somesuch.	2025-02-28 08:42:08 +00:00
github-actions[bot]	b2a769cc86	Storage release 2025-02-28	2025-02-28 06:02:09 +00:00
Heikki Linnakangas	a4b2009800	compute_ctl: Refactor, moving spec_apply functions to spec_apply.rs (#11006 ) Seems nice to have the function and all its subroutines in the same source file.	2025-02-27 20:13:06 +00:00
Alex Chi Z.	ab1f22b7d1	fix(pageserver): correctly access layer map in gc-compaction (#11021 ) ## Problem layer_map access was unwrapped. It might return an error during shutdown. ## Summary of changes Propagate the layer_map access error back to the compaction loop. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-27 16:26:55 +00:00
JC Grünhage	7ed236e17e	fix(ci): push prod container images again (#11020 ) ## Problem https://github.com/neondatabase/neon/pull/10841 made building compute and neon images optional on releases that don't need them. The `push-<component>-image-prod` jobs had transitive dependencies that were skipped due to that, causing the images not to be pushed to production registries. ## Summary of changes Add `!failure() && !cancelled() &&` to the beginning of the conditions for these jobs to ensure they run even if some of their transitive dependencies are skipped.	2025-02-27 16:16:14 +00:00
Konstantin Knizhnik	e58f264a05	Increase inmem SMGR size for walredo process to 100 pagees (#10937 ) ## Problem We see `Inmem storage overflow` in page server logs: https://neondb.slack.com/archives/C033RQ5SPDH/p1740157873114339 walked process is using inseam SMGR with storage size limited by 64 pages with warning watermark 32 (based ion the assumption that XLR_MAX_BLOCK_ID is 32, so WAL record can not access more than 32 pages). Actually it is not true. We can update up to 3 forks for each block (including update of FSM and VM forks). ## Summary of changes This PR increases inseam SMGR size for walled process to 100 pages and print stack trace in case of overflow. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-27 14:31:05 +00:00
Matthias van de Meent	a283edaccf	PS/Prefetch: Use a timeout for reading data from TCP (#10834 ) This reduces pressure on OS TCP buffers, reducing flush times in other systems like PageServer. ## Problem ## Summary of changes	2025-02-27 14:00:18 +00:00
a-masterov	ad37199745	Separate the upgrade tests in timelines (#10974 ) ## Problem We created extensions in a single database. The tests could interfere, i.e., discover some service tables left by other extensions and produce unexpected results. ## Summary of changes The tests are now run in a separate timeline, so only one extension owns the database, which prevents interference.	2025-02-27 13:45:18 +00:00
Erik Grinaker	93b59e65a2	pageserver: remove stale comment (#11016 ) No longer true now that we eagerly notify the compaction loop.	2025-02-27 12:56:28 +00:00
Christian Schwarz	e35f7758d8	impr(controller_upcall_client): clean up copy-pasta code & add context to retries (#10991 ) Before this PR, re-attach and validate would log the same warning ``` calling control plane generation validation API failed ``` on retry errors. This can be confusing. This PR makes the message generically valid for any upcall and adds additional tracing spans to capture context. Along the way, clean up some copy-pasta variable naming. refs - https://github.com/neondatabase/neon/issues/10381#issuecomment-2684755827 --------- Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>	2025-02-27 10:59:43 +00:00
Peter Bendel	3a3d62dc4f	Bodobolero/test cum stats persistence (#10995 ) ## Problem So far cumulative statistics have not been persisted when Neon scales to zero (suspends endpoint). With PR https://github.com/neondatabase/neon/pull/6560 the cumulative statistics should now survive endpoint restarts and correctly trigger the auto- vacuum and auto analyze maintenance So far we did not have a testcase that validates that improvement in our dev cloud environment with a real project. ## Summary of changes Introduce testcase `test_cumulative_statistics_persistence`in the benchmarking workflow running daily to verify: - Verifies that the cumulative statistics are correctly persisted across restarts. - Cumulative statistics are important to persist across restarts because they are used - when auto-vacuum an auto-analyze trigger conditions are met. - The test performs the following steps: - Seed a new project using pgbench - insert tuples that by itself are not enough to trigger auto-vacuum - suspend the endpoint - resume the endpoint - insert additional tuples that by itself are not enough to trigger auto-vacuum but in combination with the previous tuples are - verify that autovacuum is triggered by the combination of tuples inserted before and after endpoint suspension ## Test run https://github.com/neondatabase/neon/actions/runs/13546879714/job/37860609089#step:6:282	2025-02-27 10:45:13 +00:00
Arpad Müller	a22be5af72	Migrate the last crates to edition 2024 (#10998 ) Migrates the remaining crates to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. Like the previous migration PRs, this is comprised of three commits: * the first does the edition update and makes `cargo check`/`cargo clippy` pass. we had to update bindgen to make its output [satisfy the requirements of edition 2024](https://doc.rust-lang.org/edition-guide/rust-2024/unsafe-extern.html) * the second commit does a `cargo fmt` for the new style edition. * the third commit reorders imports as a one-off change. As before, it is entirely optional. Part of #10918	2025-02-27 09:40:40 +00:00
Christian Schwarz	f09843ef17	refactor(pageserver): propagate RequestContext to layer downloads (#11001 ) For some reason the layer download API never fully got `RequestContext`-infected. This PR fixes that as a precursor to - https://github.com/neondatabase/neon/issues/6107	2025-02-27 09:26:25 +00:00
JC Grünhage	c92a36740b	fix(ci): support PR-on-top-of-PR usecase again (#11013 ) ## Problem https://github.com/neondatabase/neon/pull/10841 broke CI on PRs that aren't based on main or a release branch but want to merge into another PR. ## Summary of changes Replace `run-kind=pr-main` with `run-kind=pr`, so that all PRs that aren't release PRs are treated equally.	2025-02-27 09:05:15 +00:00
Arseny Sher	8b86cd1154	safekeeper: follow membership configuration rules (#10781 ) ## Problem safekeepers must ignore walproposer messages with non matching membership conf. ## Summary of changes Make safekeepers reject vote request, proposer elected and append request messages with non matching generation. Switch to the configuration in the greeting message if it is higher. In passing, fix one comment and WAL truncation. Last part of https://github.com/neondatabase/neon/issues/9965	2025-02-27 06:13:30 +00:00
Heikki Linnakangas	c50b38ab72	compute_ctl: Fix comment on start_postgres (#11005 ) The comment was woefully outdated and outright wrong. It applied a long time ago (before commit `e5cc2f92c4` to be precise), but nowadays the function just launches postgres and waits until it starts accepting connections. The other things the comment talked about are done in other functions.	2025-02-26 23:38:45 +00:00
Fedor Dikarev	4f4a3910d0	fix error (Line: 74, Col: 26): Unexpected value 'false' (#10999 ) ## Problem Check neon with extra platform builds is failing on main with: ``` The template is not valid. .github/workflows/neon_extra_builds.yml (Line: 74, Col: 26): Unexpected value 'false' ``` https://github.com/neondatabase/neon/actions/runs/13549634905 ## Summary of changes Use `fromJson()` to have `false` as boolean value. thanks to @skyzh for pointing on the issue	2025-02-26 19:54:46 +00:00
Alex Chi Z.	11aab9f0de	fix(pageserver): further stablize gc-compaction tests (#10975 ) ## Problem Yet another source of flakyness for https://github.com/neondatabase/neon/issues/10517 ## Summary of changes The test scenario we want to create is that we have an image layer in index_part and then overwrite it, so we have to ensure it gets persisted in index_part by doing a force checkpoint. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-26 19:50:10 +00:00
Heikki Linnakangas	5cfdb1244f	compute_ctl: Add OTEL tracing to incoming HTTP requests and startup (#10971 ) We lost this with the switch to axum for the HTTP server. Add it back. In addition to just resurrecting the functionality we had before, pass the tracing context of the /configure HTTP request to the start_postgres operation that runs in the main thread. This way, the 'start_postgres' and all its sub-spans like getting the basebackup become children of the HTTP request span. This allows end-to-end tracing of a compute start, all the way from the proxy to the SQL queries executed by compute_ctl as part of compute startup.	2025-02-26 19:27:16 +00:00
Arseny Sher	643a48210f	safekeeper: exclude API (#10757 ) ## Problem https://github.com/neondatabase/neon/pull/10241 added configuration switch endpoint, but it didn't delete timeline if node was excluded. ## Summary of changes Add separate /exclude API endpoint which similarly accepts membership configuration where sk is supposed by be excluded. Implementation deletes the timeline locally. Some more small related tweaks: - make mconf switch API PUT instead of POST as it is idempotent; - return 409 if switch was refused instead of 200 with requested & current; - remove unused was_active flag from delete response; - remove meaningless _force suffix from delete functions names; - reuse timeline.rs delete_dir function in timelines_global_map instead of its own copy. part of https://github.com/neondatabase/neon/issues/9965	2025-02-26 19:26:33 +00:00
Arseny Sher	c1a040447d	walproposer: send valid timeline_start_lsn in v2 (#10994 ) ## Problem https://github.com/neondatabase/neon/pull/10647 dropped timeline_start_lsn from protocol messages as it can be taken from term history. In v2 0 was sent in the placeholder. However, until safekeepers are deployed with that PR they still use the value, setting timeline_start_lsn to 0, which confuses WAL reading; problem appears only when compute includes 10647 but safekeepers don't. ref https://neondb.slack.com/archives/C04DGM6SMTM/p1740577649644269?thread_ts=1740572363.541619&cid=C04DGM6SMTM ## Summary of changes Send real value instead of 0 in v2.	2025-02-26 17:38:44 +00:00
Alex Chi Z.	30f3be9840	fix(test): reduce number of relations in test_tx_abort_with_many_relations (#10997 ) ## Problem I see a lot of timeout errors, which indicates that this test is too slow. It seems that create relations are fast, but the subsequent truncating step is slow. ## Summary of changes Reduce number of relations for now, and investigate later. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-26 17:19:14 +00:00
JC Grünhage	8dfa8f0b94	feat(ci): don't build storage on compute-releases and vice versa (#10841 ) ## Problem Release CI is slow, because we're doing unnecessary work, for example building compute images on storage releases and vice versa. ## Summary of changes - Extract tag generation into reusable workflow and extend it with fetching of previous component releases - Don't build neon images on compute releases and don't build compute images on proxy and storage releases - Reuse images from previous releases for tests on branches where we don't build those images ## Open questions - We differentiate between `TAG` and `COMPUTE_TAG` in a few places, but we don't differentiate between storage and proxy releases. Since they use the same image, this will continue to work, but I'm not sure this is what we want.	2025-02-26 17:17:26 +00:00
Alex Chi Z.	a138a6de9b	fix(pageserver): correctly handle collect_keyspace errors (#10976 ) ## Problem ref https://github.com/neondatabase/neon/issues/10927 ## Summary of changes * Implement `is_critical` and `is_cancel` over `CompactionError`. * Revisit all places that uses `CollectKeyspaceError` to ensure they are handled correctly. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-26 17:09:50 +00:00
Arpad Müller	14347630a4	ancestor detach: delete hardlinked layers on error (#10977 ) Delete layers that we have hardlinked so far when there is an error in `remote_copy`. This prevents a retry of the ancestor detach from stumbling over already present layer files: the hardlink would fail with an error. If there is a crash, we already clean up during the timeline attach: we loop over all layer files and purge all layers that are not referenced by the `index_part.json`. Make sure to hold the timeline gate to prevent races with detach&attach&read from the layer file. These cleanups aren't completely enough however, as there is code after `prepare` as well. To handle errors there, we add a special case for `AlreadyExists` errors during the hardlink, where we check if the layer is an orphan, and if yes, we delete it from local disk. That is ideally not the case we hit, as it is less clear in that scenario where the layer came from, but it provides good defense in depth. Related #10729 Fixes #10970	2025-02-26 16:11:15 +00:00
Erik Grinaker	86b9703f06	pageserver: set `SO_KEEPALIVE` on the page service socket (#10992 ) ## Problem If the client connection goes dead without an explicit close (e.g. due to network infrastructure dropping the connection) then we currently won't detect it for a long time, which may e.g. block GetPage flushes and keep the task running. Touches https://github.com/neondatabase/cloud/issues/23515. ## Summary of changes Enable `SO_KEEPALIVE` on the page service socket, to enable periodic TCP keepalive probes. These are configured via Linux sysctls, which will be deployed separately. By default, the first probe is sent after 2 hours, so this doesn't have a practical effect until we change the sysctls.	2025-02-26 14:36:05 +00:00
Arseny Sher	01581f3af5	safekeeper: drop json_ctrl (#10722 ) ## Problem json_ctrl.rs is an obsolete attempt to have tests with fine control of feeding messages into safekeeper superseded by desim framework. ## Summary of changes Drop it.	2025-02-26 13:32:37 +00:00
Arpad Müller	f94286f0c9	Upgrade compute_tools and compute_api to edition 2024 (#10983 ) Updates `compute_tools` and `compute_api` crates to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. The PR has three commits: * the first commit updates the named crates to edition 2024 and appeases `cargo clippy` by changing code. * the second commit performs a `cargo fmt` that does some minor changes (not many) * the third commit performs a cargo fmt with nightly options to reorder imports as a one-time thing. it's completely optional, but I offer it here for the compute team to review it. I'd like to hear opinions about the third commit, if it's wanted and felt worth the diff or not. I think most attention should be put onto the first commit. Part of #10918	2025-02-26 13:12:26 +00:00
Fedor Dikarev	c2a768086d	add credentials for pulling containers for the jobs (#10987 ) Ref: https://github.com/neondatabase/cloud/issues/24939 ## Problem I found that we are missing authorization for some container jobs, that will make them use anonymous pulls. It's not an issue for now, with high enough limits, but that could be an issue when new limits introduced in DockerHub (10 pulls / hour) ## Summary of changes - add credentials for the jobs that run in containers	2025-02-26 12:50:06 +00:00
Vlad Lazar	622a9def6f	tests: use generated record lsn instead of hardcoded one (#10990 ) ... and start the initial reader with the correct lsn Closes https://github.com/neondatabase/neon/issues/10978	2025-02-26 12:47:13 +00:00
Arpad Müller	26bda17551	storcon: use the SchedulingPolicy enum in SafekeeperPersistence (#10897 ) We don't want to serialize to/from string all the time, so use `SchedulingPolicy` in `SafekeeperPersistence` via the use of a wrapper. Stacked atop #10891	2025-02-26 12:12:50 +00:00
Folke Behrens	0d36f52a6c	proxy: Record and export user-agent header (#10955 ) neondatabase/cloud#24464	2025-02-26 11:39:34 +00:00
Heikki Linnakangas	40ad42d556	Silence "sudo: unable to resolve host" messages at compute startup (#10985 )	2025-02-26 10:10:05 +00:00
Heikki Linnakangas	e452f2a5a3	Remove some redundant log lines at postgres startup (#10958 )	2025-02-26 10:06:42 +00:00
Heikki Linnakangas	43b109af69	compute_ctl: Add more detailed tracing spans to startup subroutines (#10979 ) In local dev environment, these steps take around 100 ms, and they are in the critical path of a compute startup on a compute pool hit. I don't know if it's like that in production, but as first step, add tracing spans to the functions so that they can be measured more easily.	2025-02-26 09:51:07 +00:00
Arthur Petukhovsky	3684162d9f	Bump vm-builder v0.37.1 -> v0.42.2 (#10981 ) Bump version to pick up changes introduced in https://github.com/neondatabase/autoscaling/pull/1286 It's better to have a compute release for this change first, because: - vm-runner changes kernel loglevel from 7 to 6 - vm-builder has a change to bring it back to 7 after startup Previous update: https://github.com/neondatabase/neon/pull/10015	2025-02-26 09:19:19 +00:00
Arpad Müller	920040e402	Update storage components to edition 2024 (#10919 ) Updates storage components to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. The PR has two commits: * the first commit updates storage crates to edition 2024 and appeases `cargo clippy` by changing code. i have accidentially ran the formatter on some files that had other edits. * the second commit performs a `cargo fmt` I would recommend a closer review of the first commit and a less close review of the second one (as it just runs `cargo fmt`). part of https://github.com/neondatabase/neon/issues/10918	2025-02-25 23:51:37 +00:00
Konstantin Knizhnik	dc975d554a	Incremenet getpage histogram in prefetch_lookup (#10965 ) ## Problem PR https://github.com/neondatabase/neon/pull/10442 added prefetch_lookup function. It changed handling of getpage requests in compute. Before: 1. Lookup in LFC (return if found) 2. Register prefetch buffer 3. Wait prefetch result (increment getpage_hist) Now: 1. Lookup prefetch ring (return if prefetch request is already completed) 2. Lookup in LFC (return if found) 3. Register prefetch buffer 4. Wait prefetch result (increment getpage_hist) So if prefetch result is already available, then get page histogram is not incremented. It case failure of some test_throughtput benchmarks: https://neondb.slack.com/archives/C033RQ5SPDH/p1740425527249499 ## Summary of changes Increment getpage histogram in `prefetch_lookup` Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-25 19:51:38 +00:00
Suhas Thalanki	d05606252d	fix: only showing LSN for static computes in `neon endpoint list` (#10931 ) ## Problem `neon endpoint list` shows a different LSN than what the state of the replica is. This is mainly down to what we define as LSN in this output. If we define it as the LSN that a compute was started with, it only makes sense to show it for static computes. ## Summary of changes Removed the output of `last_record_lsn` for primary/hot standby computes. Closes: https://github.com/neondatabase/neon/issues/5825 --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2025-02-25 19:26:14 +00:00
Alex Chi Z.	c69ebb4486	fix(ci): extend timeout to 75min (#10963 ) 60min is not enough for debug builds Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-25 17:37:23 +00:00
a-masterov	1fb2faab5b	Rename the patch files for the semver test (#10966 ) ## Problem The patch for `semver` extensions relies on `PG_VERSION` environment variable. The files were named without the letter `v` so script cannot find them. ## Summary of changes The patch files were renamed.	2025-02-25 16:00:43 +00:00
Alex Chi Z.	015092d259	feat(pageserver): add automatic trigger for gc-compaction (#10798 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes Add the auto trigger for gc-compaction. It computes two values: L1 size and L2 size. When L1 size >= initial trigger threshold, we will trigger an initial gc-compaction. When l1_size / l2_size >= gc_compaction_ratio_percent, we will trigger the "tiered" gc-compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-25 14:50:39 +00:00
Alex Chi Z.	b7fcf2c7a7	test(pageserver): add reldir v2 into tests (#10750 ) ## Problem We have `test_perf_many_relations` but it only runs on remote clusters, and we cannot directly modify tenant config. Therefore, I patched one of the current tests to benchmark relv2 performance. close https://github.com/neondatabase/neon/issues/9986 ## Summary of changes * Add `v1/v2` selector to `test_tx_abort_with_many_relations`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-25 14:50:22 +00:00
Erik Grinaker	8deeddd4f0	pageserver: ignore `CollectKeySpaceError::Cancelled` during compaction (#10968 ) This pops up a few times during deployment. Not sure why it fires without `self.cancel` being cancelled, but could be e.g. ancestor timelines or sth.	2025-02-25 14:49:41 +00:00
a-masterov	f78ac44748	Use the Dockerfile COPY instead of docker cp (#10943 ) ## Problem We use `docker cp` to copy the files required for the extension tests now. It causes problems if we run older images with the newer source tree. ## Summary of changes Copying the files was moved to the compute Dockerfile.	2025-02-25 12:44:06 +00:00
Folke Behrens	f4fefd9f2f	pre-commit: Switch to cargo fmt to handle per-crate editions (#10969 ) cargo knows what edition each crate uses.	2025-02-25 12:29:27 +00:00
Konstantin Knizhnik	8f82c661d4	Move neon_pgstat_file_size_limit to the extension (#10959 ) ## Problem PG14 uses separate backend for stats collector having no access to shaerd memory. As far as AUX mechanism requires access to shared memory, persisting pgstat.stat file is not supported at pg14. And so there is no definition of `neon_pgstat_file_size_limit` variable. It makes it impossible to provide same config for all Postgres version. ## Summary of changes Move neon_pgstat_file_size_limit to Neon extension. Postgres submodules PR: https://github.com/neondatabase/postgres/pull/587 https://github.com/neondatabase/postgres/pull/588 https://github.com/neondatabase/postgres/pull/589 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Tristan Partin <tristan@neon.tech>	2025-02-25 12:23:04 +00:00
Arseny Sher	758f597280	compute <-> sk protocol v3 (#10647 ) ## Problem As part of https://github.com/neondatabase/neon/issues/8614 we need to pass membership configurations between compute and safekeepers. ## Summary of changes Add version 3 of the protocol carrying membership configurations. Greeting message in both sides gets full conf, and other messages generation number only. Use protocol bump to include other accumulated changes: - stop packing whole structs on the wire as is; - make the tag u8 instead of u64; - send all ints in network order; - drop proposer_uuid, we can pass it in START_WAL_PUSH and it wasn't much useful anyway. Per message changes, apart from mconf: - ProposerGreeting: tenant / timeline id is sent now as hex cstring. Remove proto version, it is passed outside in START_WAL_PUSH. Remove postgres timeline, it is unused. Reorder fields a bit. - AcceptorGreeting: reorder fields - VoteResponse: timeline_start_lsn is removed. It can be taken from first member of term history, and later we won't need it at all when all timelines will be explicitly created. Vote itself is u8 instead of u64. - ProposerElected: timeline_start_lsn is removed for the same reasons. - AppendRequest: epoch_start_lsn removed, it is known from term history in ProposerElected. Both compute and sk are able to talk v2 and v3 to make rollbacks (in case we need them) easier; neon.safekeeper_proto_version GUC sets the client version. v2 code can be dropped later. So far empty conf is passed everywhere, future PRs will handle them. To test, add param to some tests choosing proto version; we want to test both 2 and 3 until we fully migrate. ref https://github.com/neondatabase/neon/issues/10326 --------- Co-authored-by: Arthur Petukhovsky <petuhovskiy@yandex.ru>	2025-02-25 11:56:05 +00:00
Vlad Lazar	0d9a45a475	safekeeper: invalidate start of interpreted batch on reader resets (#10951 ) ## Problem The interpreted WAL reader tracks the start of the current logical batch. This needs to be invalidated when the reader is reset. This bug caused a couple of WAL gap alerts in staging. ## Summary of changes * Refactor to make it possible to write a reproducer * Add repro unit test * Fix by resetting the start with the reader Related https://github.com/neondatabase/cloud/issues/23935	2025-02-25 10:21:35 +00:00
Vlad Lazar	5d17640944	storcon: send heartbeats concurrently (#10954 ) ## Problem While looking at logs I noticed that heartbeats are sent sequentially. The loop polling the UnorderedSet is at the wrong level of identation. Instead of doing it after we have the full set, we did after each entry. ## Summary of Changes Poll the UnorderedSet properly.	2025-02-25 09:33:08 +00:00
Erik Grinaker	12f0e525c6	Merge pull request #10961 from neondatabase/erik/release-7930-slow-getpage pageserver: tweak slow GetPage logging (#10956)	2025-02-24 23:05:10 +01:00
Erik Grinaker	6621be6b7b	pageserver: tweak slow GetPage logging (#10956 ) ## Problem We recently added slow GetPage request logging. However, this unintentionally included the flush time when logging (which we already have separate logging for). It also logs at WARN level, which is a bit aggressive since we see this fire quite frequently. Follows https://github.com/neondatabase/neon/pull/10906. ## Summary of changes * Only log the request execution time, not the flush time. * Extract a `pagestream_dispatch_batched_message()` helper. * Rename `warn_slow()` to `log_slow()` and downgrade to INFO.	2025-02-24 22:01:14 +00:00
Erik Grinaker	6c6e5bfc2b	pageserver: tweak slow GetPage logging	2025-02-24 21:27:38 +01:00
Heikki Linnakangas	565a9e62a1	compute: Disconnect if no response to a pageserver request is received (#10882 ) We've seen some cases in production where a compute doesn't get a response to a pageserver request for several minutes, or even more. We haven't found the root cause for that yet, but whatever the reason is, it seems overly optimistic to think that if the pageserver hasn't responded for 2 minutes, we'd get a response if we just wait patiently a little longer. More likely, the pageserver is dead or there's some kind of a network glitch so that the TCP connection is dead, or at least stuck for a long time. Either way, it's better to disconnect and reconnect. I set the default timeout to 2 minutes, which should be enough for any GetPage request under normal circumstances, even if the pageserver has to download several layer files from remote storage. Make the disconnect timeout configurable. Also make the "log interval", after which we print a message to the log configurable, so that if you change the disconnect timeout, you can set the log timeout correspondingly. The default log interval is still 10 s. The new GUCs are called "neon.pageserver_response_log_timeout" and "neon.pageserver_response_disconnect_timeout". Includes a basic test for the log and disconnect timeouts. Implements issue #10857	2025-02-24 20:16:37 +00:00
Peter Bendel	8fd0f89b94	rename libduckdb.so in pg_duckdb context to avoid conflict with pg_mooncake (#10915 ) ## Problem Introducing pg_duckdb caused a conflict with pg_mooncake. Both use libduckdb.so in different versions. ## Summary of changes - Rename the libduckdb.so to libduckdb_pg_duckdb.so in the context of pg_duckdb so that it doesn't conflict with libduckdb.so referenced by pg_mooncake. - use a version map to rename the duckdb symbols to a version specific name - DUCKDB_1.1.3 for pg_mooncake - DUCKDB_1.2.0 for pg_duckdb For the concept of version maps see - https://www.man7.org/conf/lca2006/shared_libraries/slide19a.html - https://peeterjoot.com/2019/09/20/an-example-of-linux-glibc-symbol-versioning/ - https://akkadia.org/drepper/dsohowto.pdf	2025-02-24 17:50:49 +00:00
JC Grünhage	1f0dea9a1a	feat(ci): push container images to ghcr.io as well (#10945 ) ## Problem There's new rate-limits coming on docker hub. To reduce our reliance on docker hub and the problems the limits are going to cause for us, we want to prepare for this by also pushing our container images to ghcr.io ## Summary of changes Push our images to ghcr.io as well and not just docker hub.	2025-02-24 17:45:23 +00:00
Heikki Linnakangas	40acb0c06d	Fix usage of WaitEventSetWait() with timeout (#10947 ) WaitEventSetWait() returns the number of "events" that happened, and only that many events in the WaitEvent array are updated. When the timeout is reached, it returns 0 and does not modify the WaitEvent array at all. We were reading 'event.events' without checking the return value, which would be uninitialized when the timeout was hit. No test included, as this is harmless at the moment. But this caused the test I'm including in PR #10882 to fail. That PR changes the logic to loop back to retry the PQgetCopyData() call if WL_SOCKET_READABLE was set. Currently, making an extra call to PQconsumeInput() is harmless, but with that change in logic, it turns into a busy-wait.	2025-02-24 17:15:07 +00:00
Arpad Müller	df362de0dd	Reject basebackup requests for archived timelines (#10828 ) For archived timelines, we would like to prevent all non-pageserver issued getpage requests, as users are not supposed to send these. Instead, they should unarchive a timeline before issuing any external read traffic. As this is non-trivial to do, at least prevent launches of new computes, by errorring on basebackup requests for archived timelines. In #10688, we started issuing a warning instead of an error, because an error would mean a stuck project. Now after we can confirm the the warning is not present in the logs for about a week, we can issue errors. Follow-up of #10688 Related: #9548	2025-02-24 16:38:13 +00:00
Alex Chi Z.	5fad4a4cee	feat(storcon): chaos injection of force exit (#10934 ) ## Problem close https://github.com/neondatabase/cloud/issues/24485 ## Summary of changes This patch adds a new chaos injection mode for the storcon. The chaos injector reads the crontab and exits immediately at the configured time. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-24 15:30:21 +00:00
Arpad Müller	fdde58120c	Upgrade proxy crates to edition 2024 (#10942 ) This upgrades the `proxy/` crate as well as the forked libraries in `libs/proxy/` to edition 2024. Also reformats the imports of those forked libraries via: ``` cargo +nightly fmt -p proxy -p postgres-protocol2 -p postgres-types2 -p tokio-postgres2 -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true ``` It can be read commit-by-commit: the first commit has no formatting changes, only changes to accomodate the new edition. Part of #10918	2025-02-24 15:26:28 +00:00
Vlad Lazar	459446fcb8	pagesever: include visible layers in heatmaps after unarchival (#10880 ) ## Problem https://github.com/neondatabase/neon/pull/10788 introduced an API for warming up attached locations by downloading all layers in the heatmap. We intend to use it for warming up timelines after unarchival too, but it doesn't work. Any heatmap generated after the unarchival will not include our timeline, so we've lost all those layers. ## Summary of changes Generate a cheeky heatmap on unarchival. It includes all the visible layers. Use that as the `PreviousHeatmap` which inputs into actual heatmap generation. Closes: https://github.com/neondatabase/neon/issues/10541	2025-02-24 15:21:17 +00:00
Alexander Bayandin	17724a19e6	CI(allure-reports): update dependencies and cleanup code (#10794 ) ## Problem There are a bunch of minor improvements that are too small and insignificant as is, so collecting them in one PR. ## Summary of changes - Add runner arch to artifact name to make it easier to distinguish files on S3 ([ref](https://neondb.slack.com/archives/C059ZC138NR/p1739365938371149)) - Use `github.event.pull_request.number` instead of parsing `$GITHUB_EVENT_PATH` file - Update Allure CLI and `allure-pytest`	2025-02-24 15:07:14 +00:00
John Spray	2a5d7e5a78	tests: improve compat test coverage of controller-pageserver interaction (#10848 ) ## Problem We failed to detect https://github.com/neondatabase/neon/pull/10845 before merging, because the tests we run with a matrix of component versions didn't include the ones that did live migrations. ## Summary of changes - Do a live migration during the storage controller smoke test, since this is a pretty core piece of functionality - Apply a compat version matrix to the graceful cluster restart test, since this is the functionality that we most urgently need to work across versions to make deploys work. I expect the first CI run of this to fail, because https://github.com/neondatabase/neon/pull/10845 isn't merged yet.	2025-02-24 12:22:22 +00:00
Conrad Ludgate	fb77f28326	feat(proxy): add direction and private link id to billing export (#10925 ) ref: https://github.com/neondatabase/cloud/issues/23385 Adds a direction flag as well as private-link ID to the traffic reporting pipeline. We do not yet actually count ingress, but we include the flag anyway. I have additionally moved vpce_id string parsing earlier, since we expect it to be utf8 (ascii).	2025-02-24 11:49:11 +00:00
Heikki Linnakangas	a6f315c9c9	Remove unnecessary dependencies to synchronous 'postgres' crate (#10938 ) The synchronous 'postgres' crate is just a wrapper around the async 'tokio_postgres' crate. Some places were unnecessarily using the re-exported NoTls and Error from the synchronous 'postgres' crate, even though they were otherwise using the 'tokio_postgres' crate. Tidy up by using the tokio_postgres types directly.	2025-02-24 09:40:25 +00:00
Alexey Kondratov	df264380b9	fix(compute_ctl): Skip invalid DBs in PerDatabasePhase (#10910 ) ## Problem After refactoring the configuration code to phases, it became a bit fuzzy who filters out DBs that are not present in Postgres, are invalid, or have `datallowconn = false`. The first 2 are important for the DB dropping case, as we could be in operation retry, so DB could be already absent in Postgres or invalid (interrupted `DROP DATABASE`). Recent case: https://neondb.slack.com/archives/C03H1K0PGKH/p1740053359712419 ## Summary of changes Add a common code that filters out inaccessible DBs inside `ApplySpecPhase::RunInEachDatabase`.	2025-02-21 21:50:50 +00:00
Arpad Müller	4bbe75de8c	Update vm_monitor to edition 2024 (#10916 ) Updates `vm_monitor` to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, it's only changes due to the rustfmt edition. part of https://github.com/neondatabase/neon/issues/10918	2025-02-21 20:29:05 +00:00
Tristan Partin	c0c3ed94a9	Fix flaky test_compute_installed_extensions_metric test (#10933 ) There was a race condition with compute_ctl and the metric being collected related to whether the neon extension had been updated or not. compute_ctl will run `ALTER EXTENSION neon UPDATE` on compute start in the postgres database. Fixes: https://github.com/neondatabase/neon/issues/10932 Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-21 18:29:48 +00:00
Erik Grinaker	9c0fefee25	Merge pull request #10921 from neondatabase/rc/release/2025-02-21 Storage release 2025-02-21	2025-02-21 18:55:58 +01:00
Konstantin Knizhnik	b1d8771d5f	Store prefetch results in LFC cache once as soon as they are received (#10442 ) ## Problem Prefetch is performed locally, so different backers can request the same pages form PS. Such duplicated request increase load of page server and network traffic. Making prefetch global seems to be very difficult and undesirable, because different queries can access chunks on different speed. Storing prefetch chunks in LFC will not completely eliminate duplicates, but can minimise such requests. The problem with storing prefetch result in LFC is that in this case page is not protected by share buffer lock. So we will have to perform extra synchronisation at LFC side. See: https://neondb.slack.com/archives/C0875PUD0LC/p1736772890602029?thread_ts=1736762541.116949&cid=C0875PUD0LC @MMeent implementation of prewarm: See https://github.com/neondatabase/neon/pull/10312/ ## Summary of changes Use conditional variables to sycnhronize access to LFC entry. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-21 16:56:16 +00:00
Vlad Lazar	97e2e27f68	storcon: use `Duration` for duration's in the storage controller tenant config (#10928 ) ## Problem The storage controller treats durations in the tenant config as strings. These are loaded from the db. The pageserver maps these durations to a seconds only format and we always get a mismatch compared to what's in the db. ## Summary of changes Treat durations as durations inside the storage controller and not as strings. Nothing changes in the cross service API's themselves or the way things are stored in the db. I also added some logging which I would have made the investigation a 10min job: 1. Reason for why the reconciliation was spawned 2. Location config diff between the observed and wanted states	2025-02-21 17:14:43 +01:00
Vlad Lazar	3e82addd64	storcon: use `Duration` for duration's in the storage controller tenant config (#10928 ) ## Problem The storage controller treats durations in the tenant config as strings. These are loaded from the db. The pageserver maps these durations to a seconds only format and we always get a mismatch compared to what's in the db. ## Summary of changes Treat durations as durations inside the storage controller and not as strings. Nothing changes in the cross service API's themselves or the way things are stored in the db. I also added some logging which I would have made the investigation a 10min job: 1. Reason for why the reconciliation was spawned 2. Location config diff between the observed and wanted states	2025-02-21 15:45:00 +00:00
John Spray	5e3c234edc	storcon: do more concurrent optimisations (#10929 ) ## Problem Now that we rely on the optimisation logic to handle fixing things up after some tenants are in the wrong AZ (e.g. after node failure), it's no longer appropriate to treat optimisations as an ultra-low-priority task. We used to reflect that low priority with a very low limit on concurrent execution, such that we would only migrate 2 things every 20 seconds. ## Summary of changes - Increase MAX_OPTIMIZATIONS_EXEC_PER_PASS from 2 to 16 - Increase MAX_OPTIMIZATIONS_PLAN_PER_PASS from 8 to 64. Since we recently gave user-initiated actions their own semaphore, this should not risk starving out API requests.	2025-02-21 14:58:49 +00:00
Erik Grinaker	cea2b222d0	Merge branch 'release' into rc/release/2025-02-21	2025-02-21 12:38:14 +01:00
Arpad Müller	ff3819efc7	storcon: infrastructure for safekeeper specific JWT tokens (#10905 ) Safekeepers only respond to requests with the per-token scope, or the `safekeeperdata` JWT scope. Therefore, add infrastructure in the storage controller for safekeeper JWTs. Also, rename the ambiguous `jwt_token` to `pageserver_jwt_token`. Part of #9011 Related: https://github.com/neondatabase/cloud/issues/24727	2025-02-21 11:02:02 +00:00
Arpad Müller	f927ae6e15	Return a json response in scheduling_policy handler (#10904 ) Return an empty json response in the `scheduling_policy` handler. This prevents errors of the form: ``` Error: receive body: error decoding response body: EOF while parsing a value at line 1 column 0 ``` when setting the scheduling policy via the `storcon_cli`. part of #9011.	2025-02-21 11:01:57 +00:00
Heikki Linnakangas	61d385caea	Split plv8 build into two parts (#10920 ) Plv8 consists of two parts: 1. the V8 engine, which is built from vendored sources, and 2. the PostgreSQL extension. Split those into two separate steps in the Dockerfile. The first step doesn't need any PostgreSQL sources or any other files from the neon repository, just the build tools and the upstream plv8 sources. Use the build-deps image as the base for that step, so that the layer can be cached and doesn't need to be rebuilt every time. This is worthwhile because the V8 build takes a very long time.	2025-02-21 09:03:54 +00:00
github-actions[bot]	5b2afd953c	Storage release 2025-02-21	2025-02-21 06:02:10 +00:00
Alex Chi Z.	c214c32d3f	fix(pageserver): avoid creating empty job for gc-compaction (#10917 ) ## Problem This should be one last fix for https://github.com/neondatabase/neon/issues/10517. ## Summary of changes If a keyspace is empty, we might produce a gc-compaction job which covers no layer files. We should avoid generating such jobs so that the gc-compaction image layer can cover the full key range. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-21 01:56:30 +00:00
Erik Grinaker	9b42d1ce1a	pageserver: periodically log slow ongoing getpage requests (#10906 ) ## Problem We don't have good observability for "stuck" getpage requests. Resolves https://github.com/neondatabase/cloud/issues/23808. ## Summary of changes Log a periodic warning (every 30 seconds) if GetPage request execution is slow to complete, to aid in debugging stuck GetPage requests. This does not cover response flushing (we have separate logging for that), nor reading the request from the socket and batching it (expected to be insignificant and not straightforward to handle with the current protocol). This costs 95 nanoseconds on the happy path when awaiting a `tokio::task::yield_now()`: ``` warn_slow/enabled=false time: [45.716 ns 46.116 ns 46.687 ns] warn_slow/enabled=true time: [141.53 ns 141.83 ns 142.18 ns] ```	2025-02-20 21:38:42 +00:00
Konstantin Knizhnik	0b9b391ea0	Fix caclulation of prefetch ring position to fit in-flight request in resized ring buffer (#10899 ) ## Problem Refer https://github.com/neondatabase/neon/issues/10885 Wait position in ring buffer to restrict number of in-flight requests is not correctly calculated. ## Summary of changes Update condition and remove redundant assertion Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-20 20:21:44 +00:00
Heikki Linnakangas	3f376e44ba	Temporarily disable pg_duckdb (#10909 ) It clashed with pg_mooncake This is the same as the hotfix #10908 , but for the main branch, to keep the release and main branches in sync. In particular, we don't want to accidentally revert this temporary fix, if we cut a new release from main.	2025-02-20 19:48:06 +00:00
Arpad Müller	5b81a774fc	Update rust to 1.85.0 (#10914 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Announcement blog post](https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html). Prior update was in #10618.	2025-02-20 19:16:22 +00:00
Konstantin Knizhnik	bd335fa751	Fix prototype of CheckPointReplicationState (#10907 ) ## Problem Occasionally removed (void) from definition of `CheckPointReplicationState` function ## Summary of changes Restore function prototype. https://github.com/neondatabase/postgres/pull/585 https://github.com/neondatabase/postgres/pull/586 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-20 18:29:14 +00:00
Vlad Lazar	34996416d6	pageserver: guard against WAL gaps in the interpreted protocol (#10858 ) ## Problem The interpreted SK <-> PS protocol does not guard against gaps (neither does the Vanilla one, but that's beside the point). ## Summary of changes Extend the protocol to include the start LSN of the PG WAL section from which the records were interpreted. Validation is enabled via a config flag on the pageserver and works as follows: Case 1: `raw_wal_start_lsn` is smaller than the requested LSN There can't be gaps here, but we check that the shard received records which it hasn't seen before. Case 2: `raw_wal_start_lsn` is equal to the requested LSN This is the happy case. No gap and nothing to check Case 3: `raw_wal_start_lsn` is greater than the requested LSN This is a gap. To make Case 3 work I had to bend the protocol a bit. We read record chunks of WAL which aren't record aligned and feed them to the decoder. The picture below shows a shard which subscribes at a position somewhere within Record 2. We already have a wal reader which is below that position so we wait to catch up. We read some wal in Read 1 (all of Record 1 and some of Record 2). The new shard doesn't need Record 1 (it has already processed it according to the starting position), but we read past it's starting position. When we do Read 2, we decode Record 2 and ship it off to the shard, but the starting position of Read 2 is greater than the starting position the shard requested. This looks like a gap. ![image](https://github.com/user-attachments/assets/8aed292e-5d62-46a3-9b01-fbf9dc25efe0) To make it work, we extend the protocol to send an empty `InterpretedWalRecords` to shards if the WAL the records originated from ends the requested start position. On the pageserver, that just updates the tracking LSNs in memory (no-op really). This gives us a workaround for the fake gap. As a drive by, make `InterpretedWalRecords::next_record_lsn` mandatory in the application level definition. It's always included. Related: https://github.com/neondatabase/cloud/issues/23935	2025-02-20 17:49:05 +00:00
Tristan Partin	d571553d8a	Remove hacks in compute_ctl related to compute ID (#10751 )	2025-02-20 17:31:52 +00:00
Tristan Partin	f7474d3f41	Remove forward compatibility hacks related to compute HTTP servers (#10797 ) These hacks were added to appease the forward compatibility tests and can be removed. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-20 17:31:42 +00:00
Dmitrii Kovalkov	e808e9432a	storcon: use https for pageservers (#10759 ) ## Problem Storage controller uses unsecure http for pageserver API. Closes: https://github.com/neondatabase/cloud/issues/23734 Closes: https://github.com/neondatabase/cloud/issues/24091 ## Summary of changes - Add an optional `listen_https_port` field to storage controller's Node state and its API (RegisterNode/ListNodes/etc). - Allow updating `listen_https_port` on node registration to gradually add https port for all nodes. - Add `use_https_pageserver_api` CLI option to storage controller to enable https. - Pageserver doesn't support https for now and always reports `https_port=None`. This will be addressed in follow-up PR.	2025-02-20 17:16:04 +00:00
Anastasia Lubennikova	7c7180a79d	Fix deadlock in drop_subscriptions_before_start (#10806 ) ALTER SUBSCRIPTION requires AccessExclusive lock which conflicts with iteration over pg_subscription when multiple databases are present and operations are applied concurrently. Fix by explicitly locking pg_subscription in the beginning of the transaction in each database. ## Problem https://github.com/neondatabase/cloud/issues/24292	2025-02-20 17:14:16 +00:00
Erik Grinaker	07bee60037	pageserver: make compaction walredo errors critical (#10884 ) Mark walredo errors as critical too. Also pull the pattern matching out into the outer `match`. Follows #10872.	2025-02-20 12:08:54 +00:00
Erik Grinaker	f7edcf12e3	pageserver: downgrade ephemeral layer roll wait message (#10883 ) We already log a message for this during the L0 flush, so the additional message is mostly noise.	2025-02-20 12:08:30 +00:00
a-masterov	1d9346f8b7	Add pg_repack test (#10638 ) ## Problem We don't test `pg_repack` ## Summary of changes The test for `pg_repack` is added	2025-02-20 10:05:01 +00:00
Konstantin Knizhnik	a6d8640d6f	Persist pg_stat information in pageserver (#6560 ) ## Problem Statistic is saved in local file and so lost on compute restart. Persist in in page server using the same AUX file mechanism used for replication slots See more about motivation in https://neondb.slack.com/archives/C04DGM6SMTM/p1703077676522789 ## Summary of changes Persist postal file using AUX mechanism Postgres PRs: https://github.com/neondatabase/postgres/pull/547 https://github.com/neondatabase/postgres/pull/446 https://github.com/neondatabase/postgres/pull/445 Related to #6684 and #6228 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-20 06:38:55 +00:00
Arpad Müller	bb7e244a42	storcon: fix heartbeats timing out causing a panic (#10902 ) Fix an issue caused by PR https://github.com/neondatabase/neon/pull/10891: we introduced the concept of timeouts for heartbeats, where we would hang up on the other side of the oneshot channel if a timeout happened (future gets cancelled, receiver is dropped). This hang up would make the heartbeat task panic when it did obtain the response, as we unwrap the result of the result sending operation. The panic would lead to the heartbeat task panicing itself, which is then according to logs the last sign of life we of that process invocation. I'm not sure what brings down the process, in theory tokio [should continue](https://docs.rs/tokio/latest/tokio/runtime/enum.UnhandledPanic.html#variant.Ignore), but idk. Alternative to #10901.	2025-02-19 23:04:05 +00:00
Arpad Müller	787b98f8f2	storcon: log all safekeepers marked as offline (#10898 ) Doing this to help debugging offline safekeepers. Part of https://github.com/neondatabase/neon/issues/9011	2025-02-19 20:45:22 +00:00
Vlad Lazar	f148d71d9b	test: disable background heatmap uploads and downloads in cold migration test (#10895 ) ## Problem Background heatmap uploads and downloads were blocking the ones done manually by the test. ## Summary of changes Disable Background heatmap uploads and downloads for the cold migration test. The test does them explicitly.	2025-02-19 19:30:17 +00:00
JC Grünhage	aad817d806	refactor(ci): use reusable push-to-container-registry workflow for pinning the build-tools image (#10890 ) ## Problem Pinning build tools still replicated the ACR/ECR/Docker Hub login and pushing, even though we have a reusable workflow for this. Was mentioned as a TODO in https://github.com/neondatabase/neon/pull/10613. ## Summary of changes Reuse `_push-to-container-registry.yml` for pinning the build-tools images.	2025-02-19 17:26:09 +00:00
Erik Grinaker	0b3db74c44	libs: remove unnecessary regex in `pprof::symbolize` (#10893 ) `pprof::symbolize()` used a regex to strip the Rust monomorphization suffix from generic methods. However, the `backtrace` crate can do this itself if formatted with the `:#` flag. Also tighten up the code a bit.	2025-02-19 17:11:12 +00:00
Arpad Müller	9ba2a87e69	storcon: sk heartbeat fixes (#10891 ) This PR does the following things: * The initial heartbeat round blocks the storage controller from becoming online again. If all safekeepers are unresponsive, this can cause storage controller startup to be very slow. The original intent of #10583 was that heartbeats don't affect normal functionality of the storage controller. So add a short timeout to prevent it from impeding storcon functionality. * Fix the URL of the utilization endpoint. * Don't send heartbeats to safekeepers which are decomissioned. Part of https://github.com/neondatabase/neon/issues/9011 context: https://neondb.slack.com/archives/C033RQ5SPDH/p1739966807592589	2025-02-19 16:57:11 +00:00
Alex Chi Z.	1f9511dbd9	feat(pageserver): yield image creation to L0 compactions across timelines (#10877 ) ## Problem A simpler version of https://github.com/neondatabase/neon/pull/10812 ## Summary of changes Image layer creation will be preempted by L0 accumulated on other timelines. We stop image layer generation if there's a pending L0 compaction request. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-19 15:10:12 +00:00
Erik Grinaker	aab5482fd5	storcon: add CPU/heap profiling endpoints (#10894 ) Adds CPU/heap profiling for storcon. Also fixes allowlists to match on the path only, since profiling endpoints take query parameters. Requires #10892 for heap profiling.	2025-02-19 14:43:29 +00:00
Erik Grinaker	3720cf1c5a	storcon: use jemalloc (#10892 ) ## Problem We'd like to enable CPU/heap profiling for storcon. This requires jemalloc. ## Summary of changes Use jemalloc as the global allocator, and enable heap sampling for profiling.	2025-02-19 14:20:51 +00:00
Erik Grinaker	0453eaf65c	pageserver: reduce default `compaction_upper_limit` to 20 (#10889 ) ## Problem We've seen the previous default of 50 cause OOMs. Compacting many L0 layers at once now has limited benefit, since the cost is mostly linear anyway. This is already being reduced to 20 in production settings. ## Summary of changes Reduce `DEFAULT_COMPACTION_UPPER_LIMIT` to 20. Once released, let's remove the config overrides.	2025-02-19 14:12:05 +00:00
Heikki Linnakangas	2d96134a4e	Remove unused dependencies (#10887 ) Per cargo machete.	2025-02-19 14:09:01 +00:00
JC Grünhage	e52e93797f	refactor(ci): use variables for AWS account IDs (#10886 ) ## Problem Our AWS account IDs are copy-pasted all over the place. A wrong paste might only be caught late if we hardcode them, but will get flagged instantly by actionlint if we access them from github actions variables. Resolves https://github.com/neondatabase/neon/issues/10787, follow-up for https://github.com/neondatabase/neon/pull/10613. ## Summary of changes Access AWS account IDs using Github Actions variables.	2025-02-19 12:34:41 +00:00
Erik Grinaker	aa115a774c	storcon: eagerly attempt autosplits (#10849 ) ## Problem Autosplits are crucial for bulk ingest performance. However, autosplits were only attempted when there was no other pending work. This could cause e.g. mass AZ affinity violations following Pageserver restarts to starve out autosplits for hours. Resolves #10762. ## Summary of changes Always attempt autosplits in the background reconciliation loop, regardless of other pending work.	2025-02-19 09:01:02 +00:00
Peter Bendel	2f0d6571a9	add a variant to ingest benchmark with shard-splitting disabled (#10876 ) ## Problem we measure ingest performance for a few variants (stripe-sizes, pre-sharded, shard-splitted). However some phenomena (e.g. related to L0 compaction) in PS can be better observed and optimized with un-sharded tenants. ## Summary of changes - Allow to create projects with a policy that disables sharding (`{"scheduling": "Essential"}`) - add a variant to ingest_benchmark that uses that policy for the new project ## Test run https://github.com/neondatabase/neon/actions/runs/13396325970	2025-02-19 08:43:53 +00:00
a-masterov	7199919f04	Fix the problems discovered in the upgrade test (#10826 ) ## Problem The nightly test discovered problems in the extensions upgrade test. 1. `PLv8` has different versions on PGv17 and PGv16 and a different test set, which was not implemented correctly [sample](https://github.com/neondatabase/neon/actions/runs/13382330475/job/37372930271) 2. The same for `semver` [sample](https://github.com/neondatabase/neon/actions/runs/13382330475/job/37372930017) 3. `pgtap` interfered with the other tests, e.g. tables, created by other extensions caused the tests to fail. ## Summary of changes The discovered problems were fixed. 1. The tests list for `PLv8` is now generated using the original Makefile 2. The patches for `semver` are now split for PGv16 and PGv17. 3. `pgtap` is being tested in a separate database now. --------- Co-authored-by: Mikhail Kot <mikhail@neon.tech>	2025-02-19 06:40:09 +00:00
Alex Chi Z.	74e789b155	Merge pull request #10878 from neondatabase/rc/release/2025-02-18	2025-02-18 23:04:21 -05:00
Alex Chi Z.	235439e639	fix(pageserver): make repartition error critical (#10872 ) ## Problem Read errors during repartition should be a critical error. ## Summary of changes <del>We only have one call site</del> We have two call sites of `repartition` where one of them is during the initial image upload optimization and another is during image layer creation, so I added a `critical!` here instead of inside `collect_keyspace`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-18 15:29:19 -05:00
Alex Chi Z.	a4e3989c8d	fix(pageserver): make repartition error critical (#10872 ) ## Problem Read errors during repartition should be a critical error. ## Summary of changes <del>We only have one call site</del> We have two call sites of `repartition` where one of them is during the initial image upload optimization and another is during image layer creation, so I added a `critical!` here instead of inside `collect_keyspace`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-18 20:19:23 +00:00
Peter Bendel	9d074db18d	Use link to cross-service-endpoint dashboard in allure reports and benchmarking workflow logs (#10874 ) ## Problem We have links to deprecated dashboards in our logs Example https://github.com/neondatabase/neon/actions/runs/13382454571/job/37401983608#step:8:348 ## Summary of changes Use link to cross service endpoint instead. Example: https://github.com/neondatabase/neon/actions/runs/13395407925/job/37413056148#step:7:345	2025-02-18 19:54:21 +00:00
Alex Chi Z.	538ea03f73	feat(pageserver): allow read path debug in getpagelsn API (#10748 ) ## Problem The usual workflow for me to debug read path errors in staging is: download the tenant to my laptop, import, and then run some read tests. With this patch, we can do this directly over staging pageservers. ## Summary of changes * Add a new `touchpagelsn` API that does a page read but does not return page info back. * Allow read from latest record LSN from get/touchpagelsn * Add read_debug config in the context. * The read path will read the context config to decide whether to enable read path tracing or not. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-18 18:54:53 +00:00
Alex Chi Z.	a78438f15c	Revert "feat(pageserver): repartition on L0-L1 boundary (#10548 )" (#10870 ) This reverts commit `443c8d0b4b`. ## Problem We observe a massive amount of compaction errors. ## Summary of changes If the tenant did not write any L1 layers (i.e., they accumulate L0 layers where number of them is below L0 threshold), image creation will always fail. Therefore, it's not correct to simply use the disk_consistent_lsn or L0/L1 boundary for the image creation.	2025-02-18 13:39:01 -05:00
Erik Grinaker	cb8060545d	pageserver: don't log noop image compaction (#10873 ) ## Problem We log image compaction stats even when no image compaction happened. This is logged every 10 seconds for every timeline. ## Summary of changes Only log when we actually performed any image compaction.	2025-02-18 17:49:01 +00:00
JC Grünhage	9151d3a318	feat(ci): notify storage oncall if deploy job fails on release branch (#10865 ) ## Problem If the deploy job on the release branch doesn't succeed, the preprod deployment will not have happened. It was requested that this triggers a notification in https://github.com/neondatabase/neon/issues/10662. ## Summary of changes If we're on the release branch and the deploy job doesn't end up in "success", notify storage oncall on slack.	2025-02-18 17:20:03 +00:00
Anastasia Lubennikova	381115b68e	Add pgaudit and pgauditlogtofile extensions (#10763 ) to compute image. This commit doesn't enable anything yet. It is a preparatory work for enabling audit logging in computes.	2025-02-18 16:32:32 +00:00
Vlad Lazar	1a69a8cba7	storage: add APIs for warming up location after cold migrations (#10788 ) ## Problem We lack an API for warming up attached locations based on the heatmap contents. This is problematic in two places: 1. If we manually migrate and cut over while the secondary is still cold 2. When we re-attach a previously offloaded tenant ## Summary of changes https://github.com/neondatabase/neon/pull/10597 made heatmap generation additive across migrations, so we won't clobber it a after a cold migration. This allows us to implement: 1. An endpoint for downloading all missing heatmap layers on the pageserver: `/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`. Only one such operation per timeline is allowed at any given time. The granularity is tenant shard. 2. An endpoint to the storage controller to trigger the downloads on the pageserver: `/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`. This works both at tenant and tenant shard level. If an unsharded tenant id is provided, the operation is started on all shards, otherwise only the specified shard. 3. A storcon cli command. Again, tenant and tenant-shard level granularities are supported. Cplane will call into storcon and trigger the downloads for all shards. When we want to rescue a migration, we will use storcon cli targeting the specific tenant shard. Related: https://github.com/neondatabase/neon/issues/10541	2025-02-18 16:09:06 +00:00
Alex Chi Z.	ed98f6d57e	feat(pageserver): log lease request (#10832 ) ## Problem To investigate https://github.com/neondatabase/cloud/issues/23650 ## Summary of changes We log lease requests to see why there are clients accessing things below gc_cutoff. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-18 16:06:39 +00:00
Alex Chi Z.	f9a063e2e9	test(pageserver): fix test_pageserver_gc_compaction_idempotent (#10833 ) ## Problem ref https://github.com/neondatabase/neon/issues/10517 ## Summary of changes For some reasons the job split algorithm decides to have different image coverage range for two compactions before/after restart. So we remove the subcompaction key range and let it generate an image covering the full range, which should make the test more stable. Also slightly tuned the logging span. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-18 16:06:20 +00:00
Heikki Linnakangas	f36ec5c84b	chore(compute): Postgres 17.4, 16.8, 15.12 and 14.17 (#10868 ) Update all minor versions. No conflicts. Postgres repository PRs: - https://github.com/neondatabase/postgres/pull/584 - https://github.com/neondatabase/postgres/pull/583 - https://github.com/neondatabase/postgres/pull/582 - https://github.com/neondatabase/postgres/pull/581	2025-02-18 15:56:43 +00:00
Alexander Bayandin	274cb13293	test_runner: fix mismatch versions tests on linux (#10869 ) ## Problem Tests with mixed-version binaries always use the latest binaries on CI ([an example](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10848/13378137061/index.html#suites/8fc5d1648d2225380766afde7c428d81/1ccefc4cfd4ef176/)): The versions of new `storage_broker` and old `pageserver` are the same: `b45254a5605f6fdafdf475cdd3e920fe00898543`. This affects only Linux, on macOS the version mixed correctly. ## Summary of changes - Use hardlinks instead of symlinks to create a directory with mixed-version binaries	2025-02-18 15:52:00 +00:00
Alex Chi Z.	290f007b8e	Revert "feat(pageserver): repartition on L0-L1 boundary (#10548 )" (#10870 ) This reverts commit `443c8d0b4b`. ## Problem We observe a massive amount of compaction errors. ## Summary of changes If the tenant did not write any L1 layers (i.e., they accumulate L0 layers where number of them is below L0 threshold), image creation will always fail. Therefore, it's not correct to simply use the disk_consistent_lsn or L0/L1 boundary for the image creation.	2025-02-18 15:43:33 +00:00
Alexander Lakhin	29e4ca351e	Pass asan/ubsan options to pg_dump/pg_restore started by fast_import (#10866 )	2025-02-18 15:41:20 +00:00
Arpad Müller	caece02da7	move pull_timeline to safekeeper_api and add SafekeeperGeneration (#10863 ) Preparations for a successor of #10440: * move `pull_timeline` to `safekeeper_api` and add it to `SafekeeperClient`. we want to do `pull_timeline` on any creations that we couldn't do initially. * Add a `SafekeeperGeneration` type instead of relying on a type alias. we want to maintain a safekeeper specific generation number now in the storcon database. A separate type is important to make it impossible to mix it up with the tenant's pageserver specific generation number. We absolutely want to avoid that for correctness reasons. If someone mixes up a safekeeper and pageserver id (both use the `NodeId` type), that's bad but there is no wrong generations flying around. part of #9011	2025-02-18 14:02:22 +00:00
Arseny Sher	d36baae758	Add gc_blocking and restore latest_gc_cutoff in openapi spec (#10867 ) ## Problem gc_blocking is missing in the tenant info, but cplane wants to use it. Also, https://github.com/neondatabase/neon/pull/10707/ removed latest_gc_cutoff from the spec, renaming it to applied_gc_cutoff. Temporarily get it back until cplane migrates. ## Summary of changes Add them. ref https://neondb.slack.com/archives/C03438W3FLZ/p1739877734963979	2025-02-18 13:57:12 +00:00
Alexander Lakhin	f81259967d	Add test to make sure sanitizers really work when expected (#10838 )	2025-02-18 13:23:18 +00:00
Conrad Ludgate	719ec378cd	fix(local_proxy): discard all in tx (#10864 ) ## Problem `discard all` cannot run in a transaction (even if implicit) ## Summary of changes Split up the query into two, we don't need transaction support.	2025-02-18 08:54:20 +00:00
Alexander Bayandin	27241f039c	test_runner: fix `neon_local` usage for version mismatch tests (#10859 ) ## Problem Tests with mixed versions of binaries always pick up new versions if services are started using `neon_local`. ## Summary of changes - Set `neon_local_binpath` along with `neon_binpath` and `pg_distrib_dir` for tests with mixed versions	2025-02-17 20:29:14 +00:00
Heikki Linnakangas	811506aaa2	fast_import: Use rust s3 client for uploading (#10777 ) This replaces the use of the awscli utility. awscli binary is massive, it added about 200 MB to the docker image size, while the s3 client was already a dependency so using that is essentially free, as far as binary size is concerned. I implemented a simple upload function that tries to keep 10 uploads going in parallel. I believe that's the default behavior of the "aws s3 sync" command too.	2025-02-17 20:07:31 +00:00
Heikki Linnakangas	2884917bd4	compute: Allow postgres user to power off the VM also on <= v16 (#10860 ) I did this for debian bookworm variant in PR #10710, but forgot to update the "bullseye" dockerfile that is used to build older PostgreSQL versions.	2025-02-17 19:42:57 +00:00
Tristan Partin	b34598516f	Warn when PR may require regenerating cloud PG settings (#10229 ) These generated Postgres settings JSON files can get out of sync causing the control plane to reject updated to an endpoint or project's Postgres settings. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-17 19:02:16 +00:00
Erik Grinaker	84bbe87d60	pageserver: tweak `pageserver_layers_per_read` histogram resolution (#10847 ) ## Problem The current `pageserver_layers_per_read` histogram buckets don't represent the current reality very well. For the percentiles we care about (e.g. p50 and p99), we often see fairly high read amp, especially during ingestion, and anything below 4 can be considered very good. ## Summary of changes Change the per-timeline read amp histogram buckets to `[4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0]`.	2025-02-17 17:24:17 +00:00
Vlad Lazar	b10890b81c	tests: compare digests in test_peer_recovery (#10853 ) ## Problem Test fails when comparing the first WAL segment because the system id in the segment header is different. The system id is not consistently set correctly since segments are usually inited on the safekeeper sync step with sysid 0. ## Summary of Chnages Compare timeline digests instead. This skips the header. Closes https://github.com/neondatabase/neon/issues/10596	2025-02-17 16:32:24 +00:00
Conrad Ludgate	3204efc860	chore(proxy): use specially named prepared statements for type-checking (#10843 ) I was looking into https://github.com/neondatabase/serverless/issues/144, I recall previous cases where proxy would trigger these prepared statements which would conflict with other statements prepared by our client downstream. Because of that, and also to aid in debugging, I've made sure all prepared statements that proxy needs to make have specific names that likely won't conflict and makes it clear in a error log if it's our statements that are causing issues	2025-02-17 16:19:57 +00:00
Arseny Sher	3d370679a1	Merge pull request #10855 from neondatabase/rel-02-14-39d42d846ae38 Cherry-pick #10845 into release	2025-02-17 18:46:22 +03:00
John Spray	c37e3020ab	pageserver_api: fix decoding old-version TimelineInfo (#10845 ) ## Problem In #10707 some new fields were introduced in TimelineInfo. I forgot that we do not only use TimelineInfo for encoding, but also decoding when the storage controller calls into a pageserver, so this broke some calls from controller to pageserver while in a mixed-version state. ## Summary of changes - Make new fields have default behavior so that they are optional	2025-02-17 18:43:14 +03:00
Tristan Partin	da79cc5eee	Add neon.extension_server_{connect,request}_timeout (#10801 ) Instead of hardcoding the request timeout, let's make it configurable as a PGC_SUSET GUC. Additionally, add a connect timeout GUC. Although the extension server runs on the compute, it is always best to keep operations from hanging. Better to present a timeout error to the user than a stuck backend. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-17 15:40:43 +00:00
John Spray	39d42d846a	pageserver_api: fix decoding old-version TimelineInfo (#10845 ) ## Problem In #10707 some new fields were introduced in TimelineInfo. I forgot that we do not only use TimelineInfo for encoding, but also decoding when the storage controller calls into a pageserver, so this broke some calls from controller to pageserver while in a mixed-version state. ## Summary of changes - Make new fields have default behavior so that they are optional	2025-02-17 15:04:47 +00:00
Arpad Müller	0330b61729	Azure SDK: use neon branch again (#10844 ) Originally I wanted to switch back to the `neon` branch before merging #10825, but I forgot to do it. Do it in a separate PR now. No actual change of the source code, only changes the branch name (so that maybe in a few weeks we can delete the temporary branch `arpad/neon-rebase`).	2025-02-17 14:59:01 +00:00
Erik Grinaker	8a2d95b4b5	pageserver: appease unused lint on macOS (#10846 ) ## Problem `SmgrOpFlushInProgress::measure()` takes a `socket_fd` argument which is only used on Linux. This causes linter warnings on macOS. Touches #10823. ## Summary of changes Add a noop use of `socket_fd` on non-Linux branch.	2025-02-17 14:41:22 +00:00
Konstantin Knizhnik	8c6d133d31	Fix out-of-boundaries access in addSHLL function (#10840 ) ## Problem See https://github.com/neondatabase/neon/issues/10839 rho(x,b) functions returns values in range [1,b+1] and addSHLL tries to store it in array of size b+1. ## Summary of changes Subtract 1 fro value returned by rho --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-17 12:54:17 +00:00
Arpad Müller	81f08d304a	Rebase Azure SDK and apply newest patch (#10825 ) The [upstream PR](https://github.com/Azure/azure-sdk-for-rust/pull/1997) has been merged with some changes to use threads with async, so apply them to the neon specific fork to be nice to the executor (before, we had the state as of filing of that PR). Also, rebase onto the latest version of upstream's `legacy` branch. current SDK commits: [link](https://github.com/neondatabase/azure-sdk-for-rust/commits/neon-2025-02-14) now: [link](https://github.com/neondatabase/azure-sdk-for-rust/commits/arpad/neon-refresh) Prior update was in #10790	2025-02-17 10:44:44 +00:00
Peter Bendel	d566d604cf	feat(compute) add pg_duckdb extension v0.3.1 (#10829 ) We want to host pg_duckdb (starting with v0.3.1) on Neon. This PR replaces https://github.com/neondatabase/neon/pull/10350 which was for older pg_duckdb v0.2.0 Use cases - faster OLAP queries - access to datelake files (e.g. parquet) on S3 buckets from Neon PostgreSQL Because neon does not provide superuser role to neon customers we need to grant some additional permissions to neon_superuser: Note: some grants that we require are already granted to `PUBLIC` in new release of pg_duckdb [here](`3789e4c509/sql/pg_duckdb--0.2.0--0.3.0.sql (L1054)`) ```sql GRANT ALL ON FUNCTION duckdb.install_extension(TEXT) TO neon_superuser; GRANT ALL ON TABLE duckdb.extensions TO neon_superuser; GRANT ALL ON SEQUENCE duckdb.extensions_table_seq TO neon_superuser; ```	2025-02-17 10:43:16 +00:00
Alexander Lakhin	f739773edd	Fix format of milliseconds in pytest output (#10836 ) ## Problem The timestamp prefix of pytest log lines contains milliseconds without leading zeros, so values of milliseconds less than 100 printed incorrectly. For example: ``` 2025-02-15 12:02:51.997 INFO [_internal.py:97] 127.0.0.1 - - ... 2025-02-15 12:02:52.4 INFO [_internal.py:97] 127.0.0.1 - - ... 2025-02-15 12:02:52.9 INFO [_internal.py:97] 127.0.0.1 - - ... 2025-02-15 12:02:52.23 INFO [_internal.py:97] 127.0.0.1 - - ... ``` ## Summary of changes Fix log_format for pytest so that milliseconds are printed with leading zeros.	2025-02-16 04:59:52 +00:00
Heikki Linnakangas	2dae0612dd	fast_import: Fix shared_buffers setting (#10837 ) In commit `9537829ccd` I made shared_buffers be derived from the system's available RAM. However, I failed to remove the old hard-coded shared_buffers=10GB settings, shared_buffers was set twice. Oopsie.	2025-02-16 00:01:19 +00:00
Alexander Bayandin	2ec8dff6f7	CI(build-and-test-locally): set `session-timeout` for pytest (#10831 ) ## Problem Sometimes, a regression test run gets stuck (taking more than 60 minutes) and is killed by GitHub's `timeout-minutes` without leaving any traces in the test results database. I find no correlation between this and either the build type, the architecture, or the Postgres version. See: https://neonprod.grafana.net/goto/nM7ih7cHR?orgId=1 ## Summary of changes - Bump `pytest-timeout` to the version that supports `--session-timeout` - Set `--session-timeout` to (timeout-minutes - 10 minutes) * 60 seconds in Attempt to stop tests gracefully to generate test reports until they are forcibly stopped by the stricter `timeout-minutes` limit.	2025-02-15 10:34:11 +00:00
Alex Chi Z.	ae091c6913	feat(pageserver): store reldir in sparse keyspace (#10593 ) ## Problem Part of https://github.com/neondatabase/neon/issues/9516 ## Summary of changes This patch adds the support for storing reldir in the sparse keyspace. All logic are guarded with the `rel_size_v2_enabled` flag, so if it's set to false, the code path is exactly the same as what's currently in prod. Note that we did not persist the `rel_size_v2_enabled` flag and the logic around it will be implemented in the next patch. (i.e., what if we enabled it, restart the pageserver, and then it gets set to false? we should still read from v2 using the rel_size_v2_migration_status in the index_part). The persistence logic I'll implement in the next patch will disallow switching from v2->v1 via config item. I also refactored the metrics so that it can work with the new reldir store. However, this metric is not correctly computed for reldirs (see the comments) before. With the refactor, the value will be computed only when we have an initial value for the reldir size. The refactor keeps the incorrectness of the computation when there are more than 1 database. For the tests, we currently run all the tests with v2, and I'll set it to false and add some v2-specific tests before merging, probably also v1->v2 migration tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-14 20:31:54 +00:00
Christian Schwarz	a32e8871ac	compute/pageserver: correlation of logs through backend PID (via `application_name`) (#10810 ) This PR makes compute set the `application_name` field to the PG backend process PID which is also included in each compute log line. This allows correlation of Pageserver connection logs with compute logs in a way that was guesswork before this PR. In future, we can switch for a more unique identifier for a page_service session. Refs - discussion in https://neondb.slack.com/archives/C08DE6Q9C3B/p1739465208296169?thread_ts=1739462628.361019&cid=C08DE6Q9C3B - fixes https://github.com/neondatabase/neon/issues/10808	2025-02-14 20:11:42 +00:00
Christian Schwarz	9177312ba6	basebackup: use `Timeline::get` for `get_rel` instead of `get_rel_page_at_lsn` (#10476 ) I noticed the opportunity to simplify here while working on https://github.com/neondatabase/neon/pull/9353 . The only difference is the zero-fill behavior: if one reads past rel size, `get_rel_page_at_lsn` returns a zeroed page whereas `Timeline::get` returns an error. However, the `endblk` is at most rel size large, because `nblocks` is eq `get_rel_size`, see a few lines above this change. We're using the same LSN (`self.lsn`) for everything, so there is no chance of non-determinism. Refs: - Slack discussion debating correctness: https://neondb.slack.com/archives/C033RQ5SPDH/p1737457010607119	2025-02-14 17:57:18 +00:00
Arseny Sher	a036708da1	Merge pull request #10820 from neondatabase/rc/release/2025-02-14 Storage release 2025-02-14	2025-02-14 19:36:36 +03:00
Christian Schwarz	b992a1a62a	page_service: include socket send & recv queue length in slow flush log mesage (#10823 ) # Summary In - https://github.com/neondatabase/neon/pull/10813 we added slow flush logging but it didn't log the TCP send & recv queue length. This PR adds that data to the log message. I believe the implementation to be safe & correct right now, but it's brittle and thus this PR should be reverted or improved upon once the investigation is over. Refs: - stacked atop https://github.com/neondatabase/neon/pull/10813 - context: https://neondb.slack.com/archives/C08DE6Q9C3B/p1739464533762049?thread_ts=1739462628.361019&cid=C08DE6Q9C3B - improves https://github.com/neondatabase/neon/issues/10668 - part of https://github.com/neondatabase/cloud/issues/23515 # How It Works The trouble is two-fold: 1. getting to the raw socket file descriptor through the many Rust types that wrap it and 2. integrating with the `measure()` function Rust wraps it in types to model file descriptor lifetimes and ownership, and usually one can get access using `as_raw_fd()`. However, we `split()` the stream and the resulting [`tokio::io::WriteHalf`](https://docs.rs/tokio/latest/tokio/io/struct.WriteHalf.html) . Check the PR commit history for my attempts to do it. My solution is to get the socket fd before we wrap it in our protocol types, and to store that fd in the new `PostgresBackend::socket_fd` field. I believe it's safe because the lifetime of `PostgresBackend::socket_fd` value == the lifetime of the `TcpStream` that wrap and store in `PostgresBackend::framed`. Specifically, the only place that close()s the socket is the `impl Drop for TcpStream`. I think the protocol stack calls `TcpStream::shutdown()`, but, that doesn't `close()` the file descriptor underneath. Regarding integration with the `measure()` function, the trouble is that `flush_fut` is currently a generic `Future` type. So, we just pass in the `socket_fd` as a separate argument. A clean implementation would convert the `pgb_writer.flush()` to a named future that provides an accessor for the socket fd while not being polled. I tried (see PR history), but failed to break through the `WriteHalf`. # Testing Tested locally by running ``` ./target/debug/pagebench get-page-latest-lsn --num-clients=1000 --queue-depth=1000 ``` in one terminal, waiting a bit, then ``` pkill -STOP pagebench ``` then wait for slow logs to show up in `pageserver.log`. Pick one of the slow log message's port pairs, e.g., `127.0.0.1:39500`, and then checking sockstat output ``` ss -ntp \| grep '127.0.0.1:39500' ``` to ensure that send & recv queue size match those in the log message.	2025-02-14 16:20:07 +00:00
Gleb Novikov	3d7a32f619	fast import: allow restore to provided connection string (#10407 ) Within https://github.com/neondatabase/cloud/issues/22089 we decided that would be nice to start with import that runs dump-restore into a running compute (more on this [here](https://www.notion.so/neondatabase/2024-Jan-13-Migration-Assistant-Next-Steps-Proposal-Revised-17af189e004780228bdbcad13eeda93f?pvs=4#17af189e004780de816ccd9c13afd953)) We could do it by writing another tool or by extending existing `fast_import.rs`, we chose the latter. In this PR, I have added optional `restore_connection_string` as a cli arg and as a part of the json spec. If specified, the script will not run postgres and will just perform restore into provided connection string. TODO: - [x] fast_import.rs: - [x] cli arg in the fast_import.rs - [x] encoded connstring in json spec - [x] simplify `fn main` a little, take out too verbose stuff to some functions - [ ] ~~allow streaming from dump stdout to restore stdin~~ will do in a separate PR - [ ] ~~address https://github.com/neondatabase/neon/pull/10251#pullrequestreview-2551877845~~ will do in a separate PR - [x] tests: - [x] restore with cli arg in the fast_import.rs - [x] restore with encoded connstring in json spec in s3 - [ ] ~~test with custom dbname~~ will do in a separate PR - [ ] ~~test with s3 + pageserver + fast import binary~~ https://github.com/neondatabase/neon/pull/10487 - [ ] ~~https://github.com/neondatabase/neon/pull/10271#discussion_r1923715493~~ will do in a separate PR neondatabase/cloud#22775 --------- Co-authored-by: Eduard Dykman <bird.duskpoet@gmail.com>	2025-02-14 16:10:06 +00:00
Christian Schwarz	fac5db3c8d	page_service: emit periodic log message while response flush is slow (#10813 ) The logic might seem a bit intricate / over-optimized, but I recently spent time benchmarking this code path in the context of a nightly pagebench regression (https://github.com/neondatabase/cloud/issues/21759) and I want to avoid regressing it any further. Ideally would also log the socket send & recv queue length like we do on the compute side in - https://github.com/neondatabase/neon/pull/10673 But that is proving difficult due to the Rust abstractions that wrap the socket fd. Work in progress on that is happening in - https://github.com/neondatabase/neon/pull/10823 Regarding production impact, I am worried at a theoretical level that the additional logging may cause a downward spiral in the case where a pageserver is slow to flush because there is not enough CPU. The logging would consume more CPU and thereby slow down flushes even more. However, I don't think this matters practically speaking. # Refs - context: https://neondb.slack.com/archives/C08DE6Q9C3B/p1739464533762049?thread_ts=1739462628.361019&cid=C08DE6Q9C3B - fixes https://github.com/neondatabase/neon/issues/10668 - part of https://github.com/neondatabase/cloud/issues/23515 # Testing Tested locally by running ``` ./target/debug/pagebench get-page-latest-lsn --num-clients=1000 --queue-depth=1000 ``` in one terminal, waiting a bit, then ``` pkill -STOP pagebench ``` then wait for slow logs to show up in `pageserver.log`. To see that the completion log message is logged, run ``` pkill -CONT pagebench ```	2025-02-14 14:37:03 +00:00
John Spray	1e7ad80ee7	storage controller: prioritize reconciles for user-facing operations (#10822 ) ## Problem Some situations may produce a large number of pending reconciles. If we experience an issue where reconciles are processed more slowly than expected, that can prevent us responding promptly to user requests like tenant/timeline CRUD. This is a cleaner implementation of the hotfix in https://github.com/neondatabase/neon/pull/10815 ## Summary of changes - Introduce a second semaphore for high priority tasks, with configurable units (default 256). The intent is that in practical situations these user-facing requests should never have to wait. - Use the high priority semaphore for: tenant/timeline CRUD, and shard splitting operations. Use normal priority for everything else.	2025-02-14 17:34:51 +03:00
John Spray	581be23100	storcon: fix eliding parameters from proxied URL labels (#10817 ) ## Problem We had code for stripping IDs out of proxied paths to reduce cardinality of metrics, but it was only stripping out tenant IDs, and leaving in timeline IDs and query parameters (e.g. LSN in lsn->timestamp lookups). ## Summary of changes - Use a more general regex approach. There is still some risk that a future pageserver API might include a parameter in `/the/path/`, but we control that API and it is not often extended. We will also alert on metrics cardinality in staging so that if we made that mistake we would notice.	2025-02-14 17:34:25 +03:00
John Spray	a82a6631fd	storage controller: prioritize reconciles for user-facing operations (#10822 ) ## Problem Some situations may produce a large number of pending reconciles. If we experience an issue where reconciles are processed more slowly than expected, that can prevent us responding promptly to user requests like tenant/timeline CRUD. This is a cleaner implementation of the hotfix in https://github.com/neondatabase/neon/pull/10815 ## Summary of changes - Introduce a second semaphore for high priority tasks, with configurable units (default 256). The intent is that in practical situations these user-facing requests should never have to wait. - Use the high priority semaphore for: tenant/timeline CRUD, and shard splitting operations. Use normal priority for everything else.	2025-02-14 13:25:43 +00:00
Folke Behrens	da7496e1ee	proxy: Post-refactor + future clippy lint cleanup (#10824 ) * Clean up deps and code after logging and binary refactor * Also include future clippy lint cleanup	2025-02-14 12:34:09 +00:00
a-masterov	646e011c4d	Tests the test-upgrade scripts themselves (#10664 ) ## Problem We run the compatibility tests only if we are upgrading the extension. An accidental code change may break the test itself, so we have to check this code as well. ## Summary of changes The test is scheduled once a day to save time and resources. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-02-14 11:41:57 +00:00
Arpad Müller	878c1c7110	offload_timeline: check if the timeline is archived on HasChildren error (#10776 ) PR #10305 makes sure that there is no actual race, i.e. we will never attempt to offload a timeline that has just been unarchived, or similar. However, if a timeline has been unarchived and has children that are unarchived too, we will get an error log line. Such races can occur as in compaction we check if the timeline can be offloaded way before we attempt to offload it: the result might change in the meantime. This patch checks if the delete guard can't be obtained because the timeline has unarchived children, and if yes, it does another check for whether the timeline has become unarchived or not. If it is unarchived, it just prints an info log msg and integrates itself into the error suppression logic of the compaction calling into it. If you squint at it really closely, there is still a possible race in which we print an error log, but this one is unlikely because the timeline and its children need to be archived right after the check for whether the timeline has any unarchived children, and right before the check whether the timeline is archived. Archival involves a network operation while nothing between these two checks does that, so it's very unlikely to happen in real life. https://github.com/neondatabase/cloud/issues/23979#issuecomment-2651265729	2025-02-14 10:21:50 +00:00
John Spray	996f0a3753	storcon: fix eliding parameters from proxied URL labels (#10817 ) ## Problem We had code for stripping IDs out of proxied paths to reduce cardinality of metrics, but it was only stripping out tenant IDs, and leaving in timeline IDs and query parameters (e.g. LSN in lsn->timestamp lookups). ## Summary of changes - Use a more general regex approach. There is still some risk that a future pageserver API might include a parameter in `/the/path/`, but we control that API and it is not often extended. We will also alert on metrics cardinality in staging so that if we made that mistake we would notice.	2025-02-14 09:57:19 +00:00
Konstantin Knizhnik	8bdb1828c8	Perform seqscan to fill LFC chunks with data so that on-disk file size included size of table (#10775 ) ## Problem See https://github.com/neondatabase/neon/issues/10755 Random access pattern of pgbench leaves sparse chunks, which makes the on-disk size of file.cache unpredictable. ## Summary of changes Perform seqscan to fill LFC chunks with data so that on-disk file size included size of table. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-14 08:19:56 +00:00
github-actions[bot]	8ca7ea859d	Storage release 2025-02-14	2025-02-14 06:02:05 +00:00
Alexander Bayandin	3e8bf2159d	CI(build-and-test): run `benchmarks` after `deploy` job (#10791 ) ## Problem `benchmarks` is a long-running and non-blocking job. If, on Staging, a deploy-blocking job fails, restarting it requires cancelling any running `benchmarks` jobs, which is a waste of CI resources and requires a couple of extra clicks for a human to do. Ref: https://neondb.slack.com/archives/C059ZC138NR/p1739292995400899 ## Summary of changes - Run `benchmarks` after `deploy` job - Handle `benchmarks` run in PRs with `run-benchmarks` label but without `deploy` job.	2025-02-13 22:03:47 +00:00
Arpad Müller	5008324460	Fix utilization URL and ensure heartbeats work (#10811 ) There was a typo in the name of the utilization endpoint URL, fix it. Also, ensure that the heartbeat mechanism actually works. Related: #10583, #10429 Part of #9011	2025-02-13 20:55:53 +00:00
Christian Schwarz	487f3202fe	pageserver read path: abort on fatal IO errors from disk / filesystem (#10786 ) Before this PR, an IO error returned from the kernel, e.g., due to a bad disk, would get bubbled up, all the way to a user-visible query failing. This is against the IO error handling policy where we have established and is hence being rectified in this PR. [[(internal Policy document link)]](`bef44149f7/src/storage/handling_io_and_logical_errors.md (L33-L35)`) The practice on the write path seems to be that we call `maybe_fatal_err()` or `fatal_err()` fairly high up the stack. That is, regardless of whether std::fs, tokio::fs, or VirtualFile is used to perform the IO. For the read path, I choose a centralized approach in this PR by checking for errors as close to the kernel interface as possible. I believe this is better for long-term consistency. To mitigate the problem of missing context if we abort so far down in the stack, the `on_fatal_io_error` now captures and logs a backtrace. I grepped the pageserver code base for `fs::read` to convince myself that all non-VirtualFile reads already handle IO errors according to policy. Refs - fixes https://github.com/neondatabase/neon/issues/10454	2025-02-13 20:53:39 +00:00
Alex Chi Z.	6a741fd1c2	fix(pageserver): ensure all basebackup client errors are caught (#10793 ) ## Problem We didn't catch all client errors causing alerts. ## Summary of changes Client errors should be wrapped with ClientError so that it doesn't fire alerts. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-13 19:38:02 +00:00
a-masterov	7ac7755dad	Add tests for pgtap (#10589 ) ## Problem We do not test `pgtap` which is shipped with Neon ## Summary of changes Test and binaries for `pgtap` are added.	2025-02-13 19:04:08 +00:00
Arseny Sher	98e18e9a54	Add s3 storage to test_s3_wal_replay (#10809 ) ## Problem The test is flaky: WAL in remote storage appears to be corrupted. One of hypotheses so far is that corruption is the result of local fs implementation being non atomic, and safekeepers may concurrently PUT the same segment. That's dubious though because by looking at local_fs impl I'd expect then early EOF on segment read rather then observed zeros in test failures, but other directions seem even less probable. ## Summary of changes Let's add s3 backend as well and see if it is also flaky. Also add some more logging around segments uploads. ref https://github.com/neondatabase/neon/issues/10761	2025-02-13 18:05:15 +00:00
Tristan Partin	0cf9157adc	Handle new compute_ctl_config parameter in compute spec requests (#10746 ) There is now a compute_ctl_config field in the response that currently only contains a JSON Web Key set. compute_ctl currently doesn't do anything with the keys, but will in the future. The reasoning for the new field is due to the nature of empty computes. When an empty compute is created, it does not have a tenant. A compute spec is the primary means of communicating the details of an attached tenant. In the empty compute state, there is no spec. Instead we wait for the control plane to pass us one via /configure. If we were to include the jwks field in the compute spec, we would have a partial compute spec, which doesn't logically make sense. Instead, we can have two means of passing settings to the compute: - spec: tenant specific config details - compute_ctl_config: compute specific settings For instance, the JSON Web Key set passed to the compute is independent of any tenant. It is a setting of the compute whether it is attached or not. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-13 18:04:36 +00:00
Tristan Partin	b6f972ed83	Increase the extension server request timeout to 1 minute (#10800 ) pg_search is 46ish MB. All other remote extensions are around hundeds of KB. 3 seconds is not long enough to download the tarball if the S3 gateway cache doesn't already contain a copy. According to our setup, the cache is limited to 10 GB in size and anything that has not been accessed for an hour is purged. This is really bad for scaling to 0, even more so if you're the only project actively using the extension in a production Kubernetes cluster. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-13 17:33:27 +00:00
John Spray	a4d0a34591	tests: flush in test_isolation (#10658 ) ## Problem This test occasionally fails while the test teardown tries to do a graceful shutdown, because the test has quickly written lots of data into the pageserver. Closes: #10654 ## Summary of changes - Call `post_checks` at the end of `test_isolation`, as we already do for test_pg_regress -- this improves our detection of issues, and as a nice side effect flushes the pageserver. - Ignore pg_notify files when validating state at end of test, these are not expected to be the same	2025-02-13 16:23:51 +00:00
John Spray	ae463f366b	tests: broaden allow-list for #10720 workaround (#10807 ) ## Problem In #10752 I used an overly-strict regex that only ignored error on a particular key. ## Summary of changes - Drop key from regex so it matches all such errors	2025-02-13 16:15:04 +00:00
Alexey Kondratov	8c2f85b209	chore(compute): Postgres 17.3, 16.7, 15.11 and 14.16 (#10771 ) ## Summary of changes Bump all minor versions. The only non-trivial conflict was between - `0350b876b0` - and `bd09a752f4` It seems that just adding this extra argument is enough. I also got conflict with `c1c9df3159` but for some reason only in PG 15. Yet, that was a trivial one around ```c if (XLogCtl) LWLockRelease(ControlFileLock); /* durable_rename already emitted log message */ return false; ``` in `xlog.c` ## Postgres PRs - https://github.com/neondatabase/postgres/pull/580 - https://github.com/neondatabase/postgres/pull/579 - https://github.com/neondatabase/postgres/pull/577 - https://github.com/neondatabase/postgres/pull/578	2025-02-13 13:28:05 +00:00
JC Grünhage	e37ba8642d	Integrate cargo-chef into Dockerfile (#10782 ) ## Problem The build of the neon container image is not caching any part of the rust build, making it fairly slow. ## Summary of changes Cache dependency building using cargo-chef.	2025-02-13 13:08:46 +00:00
Vlad Lazar	8fea43a5ba	pageserver: make heatmap generation additive (#10597 ) ## Problem Previously, when cutting over to cold secondary locations, we would clobber the previous, good, heatmap with a cold one. This is because heatmap generation used to include only resident layers. Once this merges, we can add an endpoint which triggers full heatmap hydration on attached locations to heal cold migrations. ## Summary of changes With this patch, heatmap generation becomes additive. If we have a heatmap from when this location was secondary, the new uploaded heatmap will be the result of a reconciliation between the old one and the on disk resident layers. More concretely, when we have the previous heatmap: 1. Filter the previous heatmap and keep layers that are (a) present in the current layer map, (b) visible, (c) not resident. Call this set of layers `visible_non_resident`. 2. From the layer map, select all layers that are resident and visible. Call this set of layers `resident`. 3. The new heatmap is the result of merging the two disjoint sets. Related https://github.com/neondatabase/neon/issues/10541	2025-02-13 12:48:47 +00:00
Arpad Müller	536bdb3209	storcon: track safekeepers in memory, send heartbeats to them (#10583 ) In #9011, we want to schedule timelines to safekeepers. In order to do such scheduling, we need information about how utilized a safekeeper is and if it's available or not. Therefore, send constant heartbeats to the safekeepers and try to figure out if they are online or not. Includes some code from #10440.	2025-02-13 11:06:30 +00:00
John Spray	b8095f84a0	pageserver: make true GC cutoff visible in admin API, rebrand `latest_gc_cutoff` as `applied_gc_cutoff` (#10707 ) ## Problem We expose `latest_gc_cutoff` in our API, and callers understandably were using that to validate LSNs for branch creation. However, this is _not_ the true GC cutoff from a user's point of view: it's just the point at which we last actually did GC. The actual cutoff used when validating branch creations and page_service reads is the min() of latest_gc_cutoff and the planned GC lsn in GcInfo. Closes: https://github.com/neondatabase/neon/issues/10639 ## Summary of changes - Expose the more useful min() of GC cutoffs as `gc_cutoff_lsn` in the API, so that the most obviously named field is really the one people should use. - Retain the ability to read the LSN at which GC was actually done, in an `applied_gc_cutoff_lsn` field. - Internally rename `latest_gc_cutoff_lsn` to `applied_gc_cutoff_lsn` ("latest" was a confusing name, as the value in GcInfo is more up to date in terms of what a user experiences) - Temporarily preserve the old `latest_gc_cutoff_lsn` field for compat with control plane until we update it to use the new field. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-02-13 10:33:47 +00:00
Ivan Efremov	356cca23a5	fix(proxy): Change HSet to HDel for cancellation key metric (#10789 )	2025-02-13 10:22:13 +00:00
JC Grünhage	7b966a2b71	CI(trigger-e2e-tests): fix checking for successful image pushes (#10803 ) ## Problem https://github.com/neondatabase/neon/pull/10613 changed how images are pushed, and therefore also how we have to wait for images to be pushed in `trigger-e2e-tests`. The `trigger-e2e-tests` workflow is triggered in three different ways: - When a pull request is pushed to that is already ready to review, here we call the workflow from `build_and_test` - When a pull request is marked ready for review, then the workflow is triggered directly - When a push to `main` or `release(-.*)?` triggers `build_and_test` and that indirectly calls `trigger-e2e-tests`. The second of these paths had a bug, which was not tested in the PR, because this path being different wasn't clear to me. ## Summary of changes Fix the jq statement that caused the bug.	2025-02-13 10:13:26 +00:00
JC Grünhage	e38694742c	fix(ci): don't try pushing to prod container registries from main (#10795 ) ## Problem https://github.com/neondatabase/neon/pull/10613 changed how images are pushed, and there was a small mismatch between the github workflow and the script generating what to push where. This resulted in the workflow trying to push images to prod registries from the main branch, even though we don't do that and therefore didn't generate a mapping for those registries in the script that decides what to push where. This misconception happened because promote-images-dev pushed to dev registries, and promote-images-prod pushed to prod registries, but promote-images-prod also updated the latest tag in the dev registries if and only if we are on the main branch. This last bit is why the push-<component>-image-prod jobs were trying to run on the main branch. ## Summary of changes Don't try pushing to prod registries from the main branch.	2025-02-12 20:26:05 +00:00
Arpad Müller	922f3ee17d	Compress git history of Azure SDK (#10790 ) Switch the Azure SDK git fork to one with a compressed git history. This helps with download speed of the git repository. closes #10732	2025-02-12 19:48:11 +00:00
Arpad Müller	61d2474632	Also check by the planned gc cutoff for lease creation (#10764 ) We don't want to allow new leases below the planned gc cutoff either. Other APIs like branch creation or getpage requests already enforce this.	2025-02-12 19:29:17 +00:00
JC Grünhage	b77dd66bc4	refactor(ci): overhaul container image pushing (#10613 ) ## Problem Retagging container images and pushing container images taken from one registry to another is very tangled up with artifact building and not separated by component. This makes not building compute for storage releases and vice versa pretty tricky. To enable that, I want to clean up retagging and pushing of container images and then continue on making the pipelines for releases leaner by not building unnecessary things. ## Summary of changes - Add a reusable workflow that can push to ACR, ECR and Docker Hub, while being very flexible in terms of source and target images. This allows for retagging and pushing images between container registries. - Stop pushing images to registries aside of docker hub in the jobs that build the images - Split image pushing into 4 different jobs (not mentioning special cases): - neon-dev - neon-prod - compute-dev - compute-prod ## TODO - Consider also using this for `pin-build-tools-image`, as it's basically another instance of the same thing. ## Known limitations - The ECR part of this workflow supports authenticating to multiple AWS accounts and therefore multiple ECR endpoints, but the ACR part only supports one Azure Account. If someone with more knowledge on Azure can tell me whether an equivalent to https://github.com/aws-actions/amazon-ecr-login?tab=readme-ov-file#login-to-ecr-on-multiple-aws-accounts is easily possible, that'd be great. - The `image_map` input is a bit complex. It expects something along the lines of ``` { "docker.io/neondatabase/compute-node-v14:13196061314": [ "docker.io/neondatabase/compute-node-v14:13196061314", "369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-v14:13196061314", "neoneastus2.azurecr.io/neondatabase/compute-node-v14:13196061314" ], "docker.io/neondatabase/compute-node-v15:13196061314": [ "docker.io/neondatabase/compute-node-v15:13196061314", "369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-v15:13196061314", "neoneastus2.azurecr.io/neondatabase/compute-node-v15:13196061314" ] } ``` to map from source to target image. We have a small python step to generate this map for the 4 main image pushing jobs. The concrete example is taken from https://github.com/neondatabase/neon/actions/runs/13196061314/job/36838584098?pr=10613#step:3:6 and shortened to two images.	2025-02-12 17:54:51 +00:00
Alexey Kondratov	49775d28e4	fix(compute): Respect skip_pg_catalog_updates in reconfigure() (#10696 ) ## Problem We respect `skip_pg_catalog_updates` at the initial start, but ignore at the follow-up `/configure`. Yet, it's used for storage->cplane->compute notify requests after migrations, shard split, etc. So every time we get them, applying the new config takes much longer than it should because we go through Postgres catalog checks. Cplane sets this flag, when it does serves notify attach call `9068c7d743` Related to `inc-403`, for example ## Summary of changes Look at `skip_pg_catalog_updates` in `compute.reconfigure()`	2025-02-12 17:54:21 +00:00
Alexander Bayandin	f45f9209b9	CI(trigger-e2e-tests): check permissions before running jobs (#10785 ) ## Problem PRs created by external contributors, in some cases might list failed jobs - `Trigger E2E Tests / cancel-previous-e2e-tests` - `Trigger E2E Tests / tag` They don't block the merge, and tests in fact pass (their counterparts in internal PR), but because jobs are triggered from an external PR (and not from the corresponding internal one) they still present as red marks. For example https://github.com/neondatabase/neon/pull/10778 ## Summary of changes - Check permissions before triggering e2e tests	2025-02-12 17:00:23 +00:00
Cheng Chen	20fe4b8ec3	chore(compute): pg_mooncake v0.1.2 (#10778 ) ## Problem Upgrade pg_mooncake to v0.1.2 ## Summary of changes https://github.com/Mooncake-Labs/pg_mooncake/blob/main/CHANGELOG.md#012-2025-02-11	2025-02-12 16:29:19 +00:00
Erik Grinaker	f62047ae97	pageserver: add separate semaphore for L0 compaction (#10780 ) ## Problem L0 compaction frequently gets starved out by other background tasks and image/GC compaction. L0 compaction must be responsive to keep read amplification under control. Touches #10694. Resolves #10689. ## Summary of changes Use a separate semaphore for the L0-only compaction pass. * Add a `CONCURRENT_L0_COMPACTION_TASKS` semaphore and `BackgroundLoopKind::L0Compaction`. * Add a setting `compaction_l0_semaphore` (default off via `compaction_l0_first`). * Use the L0 semaphore when doing an `OnlyL0Compaction` pass. * Use the background semaphore when doing a regular compaction pass (which includes an initial L0 pass). * While waiting for the background semaphore, yield for L0 compaction if triggered. * Add `CompactFlags::NoYield` to disable L0 yielding, and set it for the HTTP API route. * Remove the old `use_compaction_semaphore` setting and compaction-scoped semaphore. * Remove the warning when waiting for a semaphore; it's noisy and we have metrics.	2025-02-12 16:12:21 +00:00
Fedor Dikarev	ec354884ea	Feat/pin docker images to sha (#10730 ) ## Problem With current approach for the base images in `Dockerfiles`, it's hard to track when image is updated, and as they are base, than update will invalidate all the layers, as base image changed. That also becomes more complicated, as we have a number of runners, and they may have different images with the tag `bookworm-slim`, so that will lead to invalidate caches, when image build on one runner will be used on another runners. To fix that problem, we could pin our base images to the specific sha, and that not only align images across runners, and also will allow us to have reproducible build and don't depend on any spontaneous changes in upstream. Fix: https://github.com/neondatabase/cloud/issues/24084 ## Summary of changes Beside of the main goal, that PR also included some small changes around Dockerfiles: 1. Main change: use `SHA` for `bookworm-slim` and `bullseye-slim` debian images 2. For the layers requiring `curl` we could add `curl` and `unzip` to the `build-deps` image, and use it as a base image for all the steps, removing extra dependency on `alpine/curl` 3. added `retry-on-host-error=on` for the `wgetrc` as it happened to me: fail to resolve hostname	2025-02-12 14:03:10 +00:00
John Spray	9989d8bfae	tests: make Workload more determinstic (#10741 ) ## Problem Previously, Workload was reconfiguring the compute before each run of writes, which was meant to be a no-op when nothing changed, but was actually writing extra data due to an issue being fixed in https://github.com/neondatabase/neon/pull/10696. The row counts in tests were too low in some cases, these tests were only working because of those extra writes that shouldn't have been happening, and moreover were relying on checkpoints happening. ## Summary of changes - Only reconfigure compute if the attached pageserver actually changed. If pageserver is set to None, that means controller is managing everything, so never reconfigure compute. - Update tests that wrote too few rows. --------- Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2025-02-12 12:35:29 +00:00
Heikki Linnakangas	9537829ccd	fast_import: Make CPU & memory size configurable (#10709 ) The old values assumed that you have at least about 18 GB of RAM available (shared_buffers=10GB and maintenance_work_mem=8GB). That's a lot when testing locally. Make it configurable, and make the default assumption much smaller: 256 MB. This is nice for local testing, but it's also in preparation for starting to use VMs to run these jobs. When launched in a VM, the control plane can set these env variables according to the max size of the VM. Also change the formula for how RAM is distributed: use 10% of RAM for shared_buffers, and 70% for maintenance_work_mem. That leaves a good amount for misc. other stuff and the OS. A very large shared_buffers setting won't typically help with bulk loading. It won't help with the network and I/O of processing all the tables, unless maybe if the whole database fits in shared buffers, but even then it's not much faster than using local disk. Bulk loading is all sequential I/O. It also won't help much with index creation, which is also sequential I/O. A large maintenance_work_mem can be quite useful, however, so that's where we put most of the RAM.	2025-02-12 11:43:23 +00:00
Mikhail Kot	2c4c6e6330	fix(neon): Add tests clarifying postgres sigabrt on pageserver unavailability (#10666 ) Resolves: https://github.com/neondatabase/neon/issues/5734 When we query pageserver and it's unavailable after some retries, postgres sigabrt's. This is intended behavior so I've added tests checking it	2025-02-12 10:52:26 +00:00
Erik Grinaker	71c30e52fa	pageserver: properly yield for L0 compaction (#10769 ) ## Problem When image compaction yields for L0 compaction, it may not immediately schedule L0 compaction, because it just goes on to compact the next pending timeline. Touches #10694. Requires #10744. ## Summary of changes Extend `CompactionOutcome` with `YieldForL0` and `Skipped` variants, and immediately schedule an L0 compaction pass in the `YieldForL0` case.	2025-02-11 23:43:58 +00:00
Erik Grinaker	6c83ac3fd2	pageserver: do all L0 compaction before image compaction (#10744 ) ## Problem Image compaction can starve out L0 compaction if a tenant has several timelines with L0 debt. Touches #10694. Requires #10740. ## Summary of changes * Add an initial L0 compaction pass, in order of L0 count. * Add a tenant option `compaction_l0_first` to control the L0 pass (disabled by default). * Add `CompactFlags::OnlyL0Compaction` to run an L0-only compaction pass. * Clean up the compaction iteration logic. A later PR will use separate semaphores for the L0 and image compaction passes to avoid cross-tenant L0 starvation. That PR will also make image compaction yield if _any_ of the tenant's timelines have pending L0 compaction to further avoid starvation.	2025-02-11 22:08:46 +00:00
Heikki Linnakangas	635b67508b	Split utils::http to separate crate (#10753 ) Avoids compiling the crate and its dependencies into binaries that don't need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the release-line-debug-size-lto profile.	2025-02-11 22:06:53 +00:00
dependabot[bot]	9491154eae	build(deps): bump cryptography from 43.0.1 to 44.0.1 in the pip group (#10773 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-02-11 21:23:17 +00:00
Christian Schwarz	efea8223bb	Merge pull request #10774 from neondatabase/releases/2025-02-11-smgr-op-latency-metrics-hotfix	2025-02-11 21:16:44 +01:00
Heikki Linnakangas	b5e09fdaf3	Re-order Dockerfile steps for putting together final compute image (#10736 ) Run "apt install" first, and only then COPY the files from the intermediary build layers to the final image. This way, if you modify any of the sources that trigger e.g. rebuilding compute_ctl, the "apt install" step can still be cached.	2025-02-11 20:10:06 +00:00
John Spray	cd51ed2f86	tests: parametrize test_graceful_cluster_restart on AZ count (#10427 ) ## Problem In https://github.com/neondatabase/neon/pull/10411 fill logic changes such that it benefits us to test it with & without AZs set up. I didn't extend the test inline in that PR because there were overlapping test changes in flight to add `num_az` parameter. ## Summary of changes - Parameterise test on AZ count (1 or 2) - When AZ count is 2, use a different balance check that just asserts the _tenants_ are balanced (since AZ affinity is chosen on a per-tenant basis)	2025-02-11 20:09:41 +00:00
Folke Behrens	f62bc28086	proxy: Move binaries into the lib (#10758 ) * This way all clippy lints defined in the lib also cover the binary code. * It's much easier to detect unused code. * Fix all discovered lints.	2025-02-11 19:46:23 +00:00
Christian Schwarz	d3d3bfc6d0	fix(page_service / batching): smgr op latency metric of dropped responses include flush time (#10756 ) # Problem Say we have a batch of 10 responses to send out. Then, even with - #10728 we've still only called observe_execution_end_flush_start for the first 3 responses. The remaining 7 response timers are still ticking. When compute now closes the connection, the waiting flush fails with an error and we `drop()` the remaining 7 responses' smgr op timers. The `impl Drop for SmgrOpTimer` will observe an execution time that includes the flush time. In practice, this is supsected to produce the `+Inf` observations in the smgr op latency histogram we've seen since the introduction of pipelining, even after shipping #10728. refs: - fixup of https://github.com/neondatabase/neon/pull/10042 - fixup of https://github.com/neondatabase/neon/pull/10728 - fixes https://github.com/neondatabase/neon/issues/10754	2025-02-11 20:25:13 +01:00
Christian Schwarz	b3a911ff8c	fix(page_service / batching): smgr op latency metrics includes the flush time of preceding requests (#10728 ) Before this PR, if a batch contains N responses, the smgr op latency reported for response (N-i) would include the time we spent flushing the preceding requests. refs: - fixup of https://github.com/neondatabase/neon/pull/10042 - fixes https://github.com/neondatabase/neon/issues/10674	2025-02-11 20:25:08 +01:00
Tristan Partin	da9c101939	Implement a second HTTP server within compute_ctl (#10574 ) The compute_ctl HTTP server has the following purposes: - Allow management via the control plane - Provide an endpoint for scaping metrics - Provide APIs for compute internal clients - Neon Postgres extension for installing remote extensions - local_proxy for installing extensions and adding grants The first two purposes require the HTTP server to be available outside the compute. The Neon threat model is a bad actor within our internal network. We need to reduce the surface area of attack. By exposing unnecessary unauthenticated HTTP endpoints to the internal network, we increase the surface area of attack. For endpoints described in the third bullet point, we can just run an extra HTTP server, which is only bound to the loopback interface since all consumers of those endpoints are within the compute.	2025-02-11 18:02:22 +00:00
Arpad Müller	f7b2293317	Hardlink resident layers during detach ancestor (#10729 ) After a detach ancestor operation, we don't want to on-demand download layers that are already resident. This has shown to impede performance, sometimes quite a lot (50 seconds: https://github.com/neondatabase/neon/issues/8828#issuecomment-2643735644) Fixes #8828.	2025-02-11 16:58:34 +00:00
Arpad Müller	be447ba4f8	Change timeline_offloading setting default to true (#10760 ) This changes the default value of the `timeline_offloading` pageserver and tenant configs to true, now that offloading has been rolled out without problems. There is also a small fix in the tenant config merge function, where we applied the `lazy_slru_download` value instead of `timeline_offloading`. Related issue: https://github.com/neondatabase/cloud/issues/21353	2025-02-11 16:36:54 +00:00
Christian Schwarz	9247331c67	fix(page_service / batching): smgr op latency metric of dropped responses include flush time (#10756 ) # Problem Say we have a batch of 10 responses to send out. Then, even with - #10728 we've still only called observe_execution_end_flush_start for the first 3 responses. The remaining 7 response timers are still ticking. When compute now closes the connection, the waiting flush fails with an error and we `drop()` the remaining 7 responses' smgr op timers. The `impl Drop for SmgrOpTimer` will observe an execution time that includes the flush time. In practice, this is supsected to produce the `+Inf` observations in the smgr op latency histogram we've seen since the introduction of pipelining, even after shipping #10728. refs: - fixup of https://github.com/neondatabase/neon/pull/10042 - fixup of https://github.com/neondatabase/neon/pull/10728 - fixes https://github.com/neondatabase/neon/issues/10754	2025-02-11 14:05:59 +00:00
John Spray	fcedd10226	tests: temporarily permit a log error (#10752 ) ## Problem These tests can encounter a bug in the pageserver read path (#9185) which occurs under the very specific circumstances that the tests create, but is very unlikely to happen in the field. We will fix the bug, but in the meantime let's un-flake the tests. Related: https://github.com/neondatabase/neon/issues/10720 ## Summary of changes - Permit "could not find data for key" errors in tests affected by #9185	2025-02-11 12:37:09 +00:00
Arpad Müller	a4ea1e53ae	Apply Azure SDK patch to periodically load workload identity file (#10415 ) The SDK bug https://github.com/Azure/azure-sdk-for-rust/issues/1739 was originally worked around via #10378, but now upstream has provided a fix in [this](https://github.com/Azure/azure-sdk-for-rust/pull/1997) PR, which we've been asked to test. So this is what this PR is doing: revert #10378 (to make sure we fail if the bug isn't fixed by the SDK PR), and apply the SDK PR to our fork. Currently pointing to my local branch to check CI. I'd like to merge the [SDK fork PR](https://github.com/neondatabase/azure-sdk-for-rust/pull/2) before merging this to main.	2025-02-11 09:40:22 +00:00
Heikki Linnakangas	c26131c2b3	Link pgbouncer dynamically (#10749 ) I don't see the point of static linking, postgres itself and many of the extensions are already built dynamically. One reason for the change is that I'm working on bigger changes to start using systemd in the compute, and as part of that I wanted to add the --with-systemd configure option to pgbouncer, and there doesn't seem to be a static version of libsystemd (at least not on Debian).	2025-02-11 07:48:54 +00:00
Andrew Rudenko	4ab18444ec	compute_ctl: database_schema should keep process::Child as part of returned value (#10273 ) ## Problem /database_schema endpoint returns incomplete output from `pg_dump` ## Summary of changes The Tokio process was not used properly. The returned stream does not include `process::Child`, and the process is scheduled to be killed immediately after the `get_database_schema` call when `cmd` goes out of scope. The solution in this PR is to return a special Stream implementation that retains `process::Child`.	2025-02-11 07:02:13 +00:00
Heikki Linnakangas	98883e4b30	compute_ctl: Use a single tokio runtime (#10743 ) compute_ctl is mostly written in synchronous fashion, intended to run in a single thread. However various parts had become async, and they launched their own tokio runtimes to run the async code. For example, VM monitor ran in its own multi-threaded runtime, and apply_spec_sql() launched another multi-threaded runtime to run the per-database SQL commands in parallel. In addition to that, a few places used a current-thread runtime to run async code in the main thread, or launched a current-thread runtime in a different thread to run background tasks. Unify the runtimes so that there is only one tokio runtime. It's created very early at process startup, and the main thread "enters" the runtime, so that it's always available for tokio::spawn() and runtime.block_on() calls. All code that needs to run async code uses the same runtime. The main thread still mostly runs in a synchronous fashion. When it needs to run async code, it uses rt.block_on(). Spawn fewer additional threads, prefer to spawn tokio tasks instead. Convert some code that ran synchronously in background threads into async. I didn't go all the way, though, some background threads are still spawned.	2025-02-11 00:39:44 +00:00
Tristan Partin	3d143ad799	Unbrick the forward compatibility test failures (#10747 ) Since the merge of https://github.com/neondatabase/neon/pull/10523, forward compatibility tests have been broken everywhere. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-10 22:22:10 +00:00
Alex Chi Z.	b0c7ee0175	feat(pageserver): better gc_compaction_split heuristics (#10727 ) ## Problem close https://github.com/neondatabase/neon/issues/10213 `range_search` only returns the top-most layers that may satisfy the search, so it doesn't include all layers that might be accessed (the user needs to recursively call this function). We need to retrieve the full layer map and find overlaps in order to have a correct heuristics of the job split. ## Summary of changes Retrieve all layers and find overlaps instead of doing `range_search`. The patch also reduces the time holding the layer map read guard. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-10 19:33:34 +00:00
Erik Grinaker	8c4e94107d	pageserver: notify compaction loop at threshold (#10740 ) ## Problem The compaction loop currently runs periodically, which can cause it to wait for up to 20 seconds before starting L0 compaction by default. Also, when we later separate the semaphores for L0 compaction and image compaction, we want to give up waiting for the image compaction semaphore if L0 compaction is needed on any timeline. Touches #10694. ## Summary of changes Notify the compaction loop when an L0 flush (on any timeline) exceeds `compaction_threshold`. Also do some opportunistic cleanups in the area.	2025-02-10 17:48:09 +00:00
Heikki Linnakangas	c368b0fe14	Use a cache mount to speed up rebuilding compute node image (#10737 ) Building the compute rust binaries from scratch is pretty slow, it takes between 4-15 minutes on my laptop, depending on which compiler flags and other tricks I use. A cache mount allows caching the dependencies and incremental builds, which speeds up rebuilding significantly when you only makes a small change in a source file.	2025-02-10 16:58:29 +00:00
Heikki Linnakangas	aba61a3712	Download awscli in separate layer in Dockerfile, to allow caching (#10733 ) The awscli was downloaded at the last stages of the overall compute image build, which meant that if you modified any part of the build, it would trigger a re-download of the awscli. That's a bit annoying when developing locally and rebuilding the compute image repeatedly. Move it to a separate layer, to cache separately and to avoid the spurious rebuilds.	2025-02-10 16:48:28 +00:00
Tristan Partin	946da3f7e2	Require --compute-id when running compute_ctl (#10523 ) The compute_id will be used when verifying claims sent by the control plane. Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-10 16:46:20 +00:00
Ivan Efremov	73633e27ed	fix(proxy): Log errors from the local proxy in auth-broker (#10659 ) Handle errors from local proxy by parsing HTTP response in auth broker code Closes [#19476](https://github.com/neondatabase/cloud/issues/19476)	2025-02-10 16:06:13 +00:00
Konstantin Knizhnik	0cf0119751	Add --save_records option to pg_waldump (#10626 ) ## Problem Make it possible to dump WAL records in format recognised by walredo process. Intended usage: ``` pg_waldump -R 1663/5/16396 -B 771727 000000010000000100000034 --save-records=/tmp/walredo.records postgres --wal-redo < /tmp/walredo.records > /tmp/page.img ``` ## Summary of changes Related Postgres PRs: https://github.com/neondatabase/postgres/pull/575 https://github.com/neondatabase/postgres/pull/572 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-10 15:48:03 +00:00
Alex Chi Z.	b37f52fdf1	feat(pageserver): dump read path on missing key error (#10528 ) ## Problem helps investigate https://github.com/neondatabase/neon/issues/10482 ## Summary of changes In debug mode and testing mode, we will record all files visited by a read operation, and print it out when it errors. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-10 14:25:56 +00:00
Alex Chi Z.	443c8d0b4b	feat(pageserver): repartition on L0-L1 boundary (#10548 ) ## Problem Reduce the read amplification when doing `repartition`. ## Summary of changes Compute the L0-L1 boundary LSN and do repartition here. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-10 14:25:48 +00:00
Alexander Bayandin	2f36bdb218	CI(build-neon): fix duplicated builds (#10731 ) ## Problem Parameterising `build-neon` job with `test-cfg` makes it to build exactly the same thing several times. See - `874accd6ed/.github/workflows/_build-and-test-locally.yml (L51-L52)` - https://github.com/neondatabase/neon/actions/runs/13215068271/job/36893373038 ## Summary of changes - Extract `sanitizers` to a separate input from `test-cfg` and set it separately - Don't parametrise `build-neon` with `test-cfg`	2025-02-10 12:29:39 +00:00
Ivan Efremov	e7118213ab	impr(proxy): Set TTL for Redis cancellation map keys (#10671 ) Use expire() op to set TTL for Redis cancellation key	2025-02-10 10:51:53 +00:00
a-masterov	d204d51faf	Fix the upgrade test for pg_jwt by adding the database name (#10738 ) ## Problem The upgrade test for pg_jwt does not work correctly. ## Summary of changes The script for the upgrade test is modified to use the database `contrib_regression`.	2025-02-10 09:56:46 +00:00
Erik Grinaker	ac55e2dbe5	pageserver: improve tenant housekeeping task (#10725 ) # Problem walredo shutdown is done in the compaction task. Let's move it to tenant housekeeping. # Summary of changes * Rename "ingest housekeeping" to "tenant housekeeping". * Move walredo shutdown into tenant housekeeping. * Add a constant `WALREDO_IDLE_TIMEOUT` set to 3 minutes (previously 10x compaction threshold).	2025-02-08 12:42:55 +00:00
Erik Grinaker	874accd6ed	pageserver: misc task cleanups (#10723 ) This patch does a bunch of superficial cleanups of `tenant::tasks` to avoid noise in subsequent PRs. There are no functional changes. PS: enable "hide whitespace" when reviewing, due to the unindentation of large async blocks.	2025-02-08 11:02:13 +00:00
Christian Schwarz	6cd3b501ec	fix(page_service / batching): smgr op latency metrics includes the flush time of preceding requests (#10728 ) Before this PR, if a batch contains N responses, the smgr op latency reported for response (N-i) would include the time we spent flushing the preceding requests. refs: - fixup of https://github.com/neondatabase/neon/pull/10042 - fixes https://github.com/neondatabase/neon/issues/10674	2025-02-08 09:28:09 +00:00
Christian Schwarz	bf20d78292	fix(page_service): page reconstruct error log does not include `shard_id` label (#10680 ) # Problem Before this PR, the `shard_id` field was missing when page_service logs a reconstruct error. This was caused by batching-related refactorings. Example from staging: ``` 2025-01-30T07:10:04.346022Z ERROR page_service_conn_main{peer_addr=...}:process_query{tenant_id=... timeline_id=...}:handle_pagerequests:request:handle_get_page_at_lsn_request_batched{req_lsn=FFFFFFFF/FFFFFFFF}: error reading relation or page version: Read error: whole vectored get request failed because one or more of the requested keys were missing: could not find data for key ... ``` # Changes Delay creation of the handler-specific span until after shard routing This also avoids the need for the record() call in the pagestream hot path. # Testing Manual testing with a failpoint that is part of this PR's history but will be squashed away. # Refs - fixes https://github.com/neondatabase/neon/issues/10599	2025-02-07 19:45:39 +00:00
John Spray	a54853abd5	Merge pull request #10712 from neondatabase/rc/release/2025-02-07 Storage release 2025-02-07	2025-02-07 18:21:13 +00:00
Arpad Müller	69007f7ac8	Revert recent AWS SDK update (#10724 ) We've been seeing some regressions in staging since the AWS SDK updates: https://github.com/neondatabase/neon/issues/10695 . We aren't sure the regression was caused by the SDK update, but the issues do involve S3, so it's not unlikely. By reverting the SDK update we find out whether it was really the SDK update, or something else. Reverts the two PRs: * https://github.com/neondatabase/neon/pull/10588 * https://github.com/neondatabase/neon/pull/10699 https://neondb.slack.com/archives/C08C2G15M6U/p1738576986047179	2025-02-07 18:18:45 +00:00
Arpad Müller	2656c713a4	Revert recent AWS SDK update (#10724 ) We've been seeing some regressions in staging since the AWS SDK updates: https://github.com/neondatabase/neon/issues/10695 . We aren't sure the regression was caused by the SDK update, but the issues do involve S3, so it's not unlikely. By reverting the SDK update we find out whether it was really the SDK update, or something else. Reverts the two PRs: * https://github.com/neondatabase/neon/pull/10588 * https://github.com/neondatabase/neon/pull/10699 https://neondb.slack.com/archives/C08C2G15M6U/p1738576986047179	2025-02-07 17:37:53 +00:00
John Spray	5e95860e70	tests: wait for manifest persistence in test_timeline_archival_chaos (#10719 ) ## Problem This test would sometimes fail its assertion that a timeline does not revert to active once archived. That's because it was using the in-memory offload state, not the persistent state, so this was sometimes lost across a pageserver restart. Closes: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - When reading offload status, read from pageserver API _and_ remote storage before considering the timeline offloaded	2025-02-07 16:27:39 +00:00
Heikki Linnakangas	0abff59e97	compute: Allow postgres user to power off the VM (#10710 ) I plan to use this when launching a fast_import job in a VM. There's currently no good way for an executable running in a NeonVM to exit gracefully and have the VM shut down. The inittab we use always respawns the payload command. The idea is that the control plane can use "fast_import ... && poweroff" as the command, so that when fast_import completes successfully, the VM is terminated, and the k8s Pod and VirtualMachine object are marked as completed successfully. I'm working on bigger changes to how we launch VMs, and will try to come up with a nicer system for that, but in the meanwhile, this quick hack allows us to proceed with using VMs for one-off jobs like fast_import.	2025-02-07 16:03:01 +00:00
John Spray	9609f7547e	tests: address warnings in timeline shutdown (#10702 ) ## Problem There are a couple of log warnings tripping up `test_timeline_archival_chaos` - `[stopping left-over name="timeline_delete" tenant_shard_id=2d526292b67dac0e6425266d7079c253 timeline_id=Some(44ba36bfdee5023672c93778985facd9) kind=TimelineDeletionWorker\n')](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10672/13161357302/index.html#/testresult/716b997bb1d8a021)` - `ignoring attempt to restart exited flush_loop 503d8f401d8887cfaae873040a6cc193/d5eed0673ba37d8992f7ec411363a7e3\n')` Related: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - Downgrade the 'ignoring attempt to restart' to info -- there's nothing in the design that forbids this happening, i.e. someone calling maybe_spawn_flush_loop concurrently with shutdown() - Prevent timeline deletion tasks outliving tenants by carrying a gateguard. This logically makes sense because the deletion process does call into Tenant to update manifests.	2025-02-07 15:29:34 +00:00
Erik Grinaker	d6e87a3a9c	pageserver: add separate, disabled compaction semaphore (#10716 ) ## Problem L0 compaction can get starved by other background tasks. It needs to be responsive to avoid read amp blowing up during heavy write workloads. Touches #10694. ## Summary of changes Add a separate semaphore for compaction, configurable via `use_compaction_semaphore` (disabled by default). This is primarily for testing in staging; it needs further work (in particular to split image/L0 compaction jobs) before it can be enabled.	2025-02-07 15:11:31 +00:00
Arpad Müller	f5243992fa	safekeeper: make timeline deletions a bit more verbose (#10721 ) Make timeline deletion print the sub-steps, so that we can narrow down some stuck timeline deletion issues we are observing. https://neondb.slack.com/archives/C08C2G15M6U/p1738930694716009	2025-02-07 15:06:26 +00:00
John Spray	95220ba43e	tests: fix flaky endpoint in test_ingest_logical_message (#10700 ) ## Problem Endpoint kept running while timeline was deleted, causing forbidden warnings on the pageserver when the tenant is not found. ## Summary of changes - Explicitly stop the endpoint before the end of the test, so that it isn't trying to talk to the pageserver in the background while things are torn down	2025-02-07 14:51:36 +00:00
John Spray	08f92bb916	pageserver: clean up DeletionQueue push_layers_sync (#10701 ) ## Problem This is tech debt. While we introduced generations for tenants, some legacy situations without generations needed to delete things inline (async operation) instead of enqueing them (sync operation). ## Summary of changes - Remove the async code, replace calls with the sync variant, and assert that the generation is always set	2025-02-07 13:03:01 +00:00
Fedor Dikarev	8f651f9582	switch from localtest.me to local.neon.build (#10714 ) ## Problem Ref: https://github.com/neondatabase/neon/issues/10632 We use dns named `.localtest.me` in our test, and that domain is well-known and widely used for that, with all the records there resolve to the localhost, both IPv4 and IPv6: `127.0.0.1` and `::1` In some cases on our runners these addresses resolves only to `IPv6`, and so components fail to connect when runner doesn't have `IPv6` address. We suspect issue in systemd-resolved here (https://github.com/systemd/systemd/issues/17745) To workaround that and improve test stability, we introduced our own domain `.local.neon.build` with IPv4 address `127.0.0.1` only See full details and troubleshoot log in referred issue. p.s. If you're FritzBox user, don't forget to add that domain `local.neon.build` to the `DNS Rebind Protection` section under `Home Network -> Network -> Network Settings`, otherwise FritzBox will block addresses, resolving to the local addresses. For other devices/vendors, please check corresponding documentation, if resolving `local.neon.build` will produce empty answer for you. ## Summary of changes Replace all the occurrences of `localtest.me` with `local.neon.build`	2025-02-07 12:25:16 +00:00
Arseny Sher	b5a239c4ae	Add reconciliation details to sk membership change rfc (#10514 ) ## Problem RFC pointed out the need of reconciliation, but wasn't detailed how it can be done. ## Summary of changes Add these details.	2025-02-07 11:20:49 +00:00
Alexander Lakhin	de05258419	Adjust diesel schema check for build with sanitizers (#10711 ) We need to disable the detection of memory leaks when running ``neon_local init` for build with sanitizers to avoid an error thrown by AddressSanitizer.	2025-02-07 08:56:39 +00:00
github-actions[bot]	d255fa4b7e	Storage release 2025-02-07	2025-02-07 06:02:18 +00:00
Peter Bendel	e73d681a0e	Patch pgcopydb and fix another segfault (#10706 ) ## Problem Found another pgcopydb segfault in error handling ```bash 2025-02-06 15:30:40.112 51299 ERROR pgsql.c:2330 [TARGET -738302813] FATAL: terminating connection due to administrator command 2025-02-06 15:30:40.112 51298 ERROR pgsql.c:2330 [TARGET -1407749748] FATAL: terminating connection due to administrator command 2025-02-06 15:30:40.112 51297 ERROR pgsql.c:2330 [TARGET -2073308066] FATAL: terminating connection due to administrator command 2025-02-06 15:30:40.112 51300 ERROR pgsql.c:2330 [TARGET 1220908650] FATAL: terminating connection due to administrator command 2025-02-06 15:30:40.432 51300 ERROR pgsql.c:2536 [Postgres] FATAL: terminating connection due to administrator command 2025-02-06 15:30:40.513 51290 ERROR copydb.c:773 Sub-process 51300 exited with code 0 and signal Segmentation fault 2025-02-06 15:30:40.578 51299 ERROR pgsql.c:2536 [Postgres] FATAL: terminating connection due to administrator command 2025-02-06 15:30:40.613 51290 ERROR copydb.c:773 Sub-process 51299 exited with code 0 and signal Segmentation fault 2025-02-06 15:30:41.253 51298 ERROR pgsql.c:2536 [Postgres] FATAL: terminating connection due to administrator command 2025-02-06 15:30:41.314 51290 ERROR copydb.c:773 Sub-process 51298 exited with code 0 and signal Segmentation fault 2025-02-06 15:30:43.133 51297 ERROR pgsql.c:2536 [Postgres] FATAL: terminating connection due to administrator command 2025-02-06 15:30:43.215 51290 ERROR copydb.c:773 Sub-process 51297 exited with code 0 and signal Segmentation fault 2025-02-06 15:30:43.215 51290 ERROR indexes.c:123 Some INDEX worker process(es) have exited with error, see above for details 2025-02-06 15:30:43.215 51290 ERROR indexes.c:59 Failed to create indexes, see above for details 2025-02-06 15:30:43.232 51271 ERROR copydb.c:768 Sub-process 51290 exited with code 12 ``` ```bashadmin@ip-172-31-38-164:~/pgcopydb$ gdb /usr/local/pgsql/bin/pgcopydb core GNU gdb (Debian 13.1-3) 13.1 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "aarch64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/pgsql/bin/pgcopydb... [New LWP 51297] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1". Core was generated by `pgcopydb: create index ocr.ocr_pipeline_step_results_version_pkey '. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630 630 newLinePtr = '\0'; (gdb) bt #0 0x0000aaaac3a4b030 in splitLines (lbuf=lbuf@entry=0xffffd8b86930, buffer=<optimized out>) at string_utils.c:630 #1 0x0000aaaac3a3a678 in pgsql_execute_log_error (pgsql=pgsql@entry=0xffffd8b87040, result=result@entry=0x0, sql=sql@entry=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", debugParameters=debugParameters@entry=0xaaaaec5f92f0, context=context@entry=0x0) at pgsql.c:2322 #2 0x0000aaaac3a3bbec in pgsql_execute_with_params (pgsql=pgsql@entry=0xffffd8b87040, sql=0xffff81fe9be0 "CREATE UNIQUE INDEX IF NOT EXISTS ocr_pipeline_step_results_version_pkey ON ocr.ocr_pipeline_step_results_version USING btree (id, transaction_id);", paramCount=paramCount@entry=0, paramTypes=paramTypes@entry=0x0, paramValues=paramValues@entry=0x0, context=context@entry=0x0, parseFun=parseFun@entry=0x0) at pgsql.c:1649 #3 0x0000aaaac3a3c468 in pgsql_execute (pgsql=pgsql@entry=0xffffd8b87040, sql=<optimized out>) at pgsql.c:1522 #4 0x0000aaaac3a245f4 in copydb_create_index (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, index=index@entry=0xffff81f71800, ifNotExists=<optimized out>) at indexes.c:846 #5 0x0000aaaac3a24ca8 in copydb_create_index_by_oid (specs=specs@entry=0xffffd8b8ec98, dst=dst@entry=0xffffd8b87040, indexOid=<optimized out>) at indexes.c:410 #6 0x0000aaaac3a25040 in copydb_index_worker (specs=specs@entry=0xffffd8b8ec98) at indexes.c:297 #7 0x0000aaaac3a25238 in copydb_start_index_workers (specs=specs@entry=0xffffd8b8ec98) at indexes.c:209 #8 0x0000aaaac3a252f4 in copydb_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:112 #9 0x0000aaaac3a253f4 in copydb_start_index_supervisor (specs=0xffffd8b8ec98) at indexes.c:57 #10 copydb_start_index_supervisor (specs=specs@entry=0xffffd8b8ec98) at indexes.c:34 #11 0x0000aaaac3a51ff4 in copydb_process_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:146 #12 0x0000aaaac3a520dc in copydb_copy_all_table_data (specs=specs@entry=0xffffd8b8ec98) at table-data.c:69 #13 0x0000aaaac3a0ccd8 in cloneDB (copySpecs=copySpecs@entry=0xffffd8b8ec98) at cli_clone_follow.c:602 #14 0x0000aaaac3a0d2cc in start_clone_process (pid=0xffffd8b743d8, copySpecs=0xffffd8b8ec98) at cli_clone_follow.c:502 #15 start_clone_process (copySpecs=copySpecs@entry=0xffffd8b8ec98, pid=pid@entry=0xffffd8b89788) at cli_clone_follow.c:482 #16 0x0000aaaac3a0d52c in cli_clone (argc=<optimized out>, argv=<optimized out>) at cli_clone_follow.c:164 #17 0x0000aaaac3a53850 in commandline_run (command=command@entry=0xffffd8b9eb88, argc=0, argc@entry=22, argv=0xffffd8b9edf8, argv@entry=0xffffd8b9ed48) at /home/admin/pgcopydb/src/bin/pgcopydb/../lib/subcommands.c/commandline.c:71 #18 0x0000aaaac3a01464 in main (argc=22, argv=0xffffd8b9ed48) at main.c:140 (gdb) ``` The problem is most likely that the following call returned a message in a read-only memory segment where we cannot replace \n with \0 in string_utils.c splitLines() function ```C char message = PQerrorMessage(pgsql->connection); ``` ## Summary of changes modified the patch to also address this problem	2025-02-06 20:21:18 +00:00
Anastasia Lubennikova	44b905d14b	Fix remote extension lookup (#10708 ) when library name doesn't match extension name. The bug was introduced by recent commit `ebc55e6a`	2025-02-06 19:21:38 +00:00
Arseny Sher	186199f406	Update aws sdk (#10699 ) ## Problem We have unclear issue with stuck s3 client, probably after partial aws sdk update without updating sdk-s3. https://github.com/neondatabase/neon/pull/10588 Let's try to update s3 as well. ## Summary of changes Result of running cargo update -p aws-types -p aws-sigv4 -p aws-credential-types -p aws-smithy-types -p aws-smithy-async -p aws-sdk-kms -p aws-sdk-iam -p aws-sdk-s3 -p aws-config ref https://github.com/neondatabase/neon/issues/10695	2025-02-06 17:28:27 +00:00
OBBO67	82cbab7512	Switch reqlsns[0].request_lsn to arrow operator in neon_read_at_lsnv() (#10620 ) (#10687 ) ## Problem Currently the following line below uses array subscript notation which is confusing since `reqlsns` is not an array but just a pointer to a struct. ``` XLogWaitForReplayOf(reqlsns[0].request_lsn); ``` ## Summary of changes Switch from array subscript notation to arrow operator to improve readability of code. Close #10620.	2025-02-06 17:26:26 +00:00
Erik Grinaker	2943590694	pageserver: use histogram for background job semaphore waits (#10697 ) ## Problem We don't have visibility into how long an individual background job is waiting for a semaphore permit. ## Summary of changes * Make `pageserver_background_loop_semaphore_wait_seconds` a histogram rather than a sum. * Add a paced warning when a task takes more than 10 minutes to get a permit (for now). * Drive-by cleanup of some `EnumMap` usage.	2025-02-06 17:17:47 +00:00
John Spray	df06c41085	tests: don't detach from controller in test_issue_5878 (#10675 ) ## Problem This test called NeonPageserver.tenant_detach, which as well as detaching locally on the pageserver, also updates the storage controller to put the tenant into Detached mode. When the test runs slowly in debug mode, it sometimes takes long enough that the background_reconcile loop wakes up and drops the tenant from memory in response, such that the pageserver can't validate its deletions and the test does not behave as expected. Closes: https://github.com/neondatabase/neon/issues/10513 ## Summary of changes - Call the pageserver HTTP client directly rather than going via NeonPageserver.tenant_detach	2025-02-06 15:18:50 +00:00
Alexander Bayandin	ddd7c36343	CI(approved-for-ci-run): Use internal CI_ACCESS_TOKEN for cloning repo (#10693 ) ## Problem The default `GITHUB_TOKEN` is used to push changes created with `approved-for-ci-run`, which doesn't work: ``` Run git push --force origin "${BRANCH}" remote: Permission to neondatabase/neon.git denied to github-actions[bot]. fatal: unable to access 'https://github.com/neondatabase/neon/': The requested URL returned error: 403 ``` Ref: https://github.com/neondatabase/neon/actions/runs/13166108303/job/36746518291?pr=10687 ## Summary of changes - Use `CI_ACCESS_TOKEN` to clone an external repo - Remove unneeded `actions/checkout`	2025-02-06 14:40:22 +00:00
Peter Bendel	839f41f5bb	fix pgcopydb seg fault and -c idle_in_transaction_session_timeout=0 (#10692 ) ## Problem During ingest_benchmark which uses `pgcopydb` ([see](https://github.com/dimitri/pgcopydb))we sometimes had outages. - when PostgreSQL COPY step failed we got a segfault (reported [here](https://github.com/dimitri/pgcopydb/issues/899)) - the root cause was Neon idle_in_transaction_session_timeout is set to 5 minutes which is suboptimal for long-running tasks like project import (reported [here](https://github.com/dimitri/pgcopydb/issues/900)) ## Summary of changes Patch pgcopydb to avoid segfault. override idle_in_transaction_session_timeout and set it to "unlimited"	2025-02-06 14:39:45 +00:00
Alex Chi Z.	f22d41eaec	feat(pageserver): num of background job metrics (#10690 ) ## Problem We need a metrics to know what's going on in pageserver's background jobs. ## Summary of changes * Waiting tasks: task still waiting for the semaphore. * Running tasks: tasks doing their actual jobs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-02-06 14:39:37 +00:00
Alexander Lakhin	977781e423	Enable sanitizers for postgres v17 (#10401 ) Add a build with sanitizers (asan, ubsan) to the CI pipeline and run tests on it. See https://github.com/neondatabase/neon/issues/6053 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-02-06 12:53:43 +00:00
Arpad Müller	67b71538d0	Limit returned lsn for timestamp by the planned gc cutoff (#10678 ) Often the output of the timestamp->lsn API is used as input for branch creation, and branch creation takes the planned lsn into account, i.e. rejects lsn's as branch lsns that are before the planned lsn. This patch doesn't fix all race conditions, it's still racy. But at least it is a step into the right direction. For #10639	2025-02-06 11:17:08 +00:00
Erik Grinaker	f4cfa725b8	pageserver: add a few critical errors (#10657 ) ## Problem Following #10641, let's add a few critical errors. Resolves #10094. ## Summary of changes Adds the following critical errors: * WAL sender read/decode failure. * WAL record ingestion failure. * WAL redo failure. * Missing key during compaction. We don't add an error for missing keys during GetPage requests, since we've seen a handful of these in production recently, and the cause is still unclear (most likely a benign race).	2025-02-06 10:30:27 +00:00
Arpad Müller	05326cc247	Skip gc cutoff lsn check at timeline creation if lease exists (#10685 ) Right now, branch creation doesn't care if a lsn lease exists or not, it just fails if the passed lsn is older than either the last or the planned gc cutoff. However, if an lsn lease exists for a given lsn, we can actually create a branch at that point: nothing has been gc'd away. This prevents race conditions that #10678 still leaves around. Related: #10639 https://github.com/neondatabase/cloud/issues/23667	2025-02-06 10:10:11 +00:00
Arpad Müller	b66fbd6176	Warn on basebackups for archived timelines (#10688 ) We don't want any external requests for an archived timeline. This includes basebackup requests, i.e. when a compute is being started up. Therefore, we'd like to forbid such basebackup requests: any attempt to get a basebackup on an archived timeline (or any getpage request really) is a cplane bug. Make this a warning for now so that, if there is potentially a bug, we can detect cases in the wild before they cause stuck operations, but the intention is to return an error eventually. Related: #9548	2025-02-06 10:09:20 +00:00
Vlad Lazar	95588dab98	safekeeper: fix wal fan-out shard subscription data race (#10677 ) ## Problem [This select arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L414) runs when we want to attach a new reader to the current cursor. It checks the current position of the cursor and resets it if required. The current position of the cursor is updated in the [other select arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L336-L345). That runs when we get some WAL to send. Now, what happens if we want to attach two shards consecutively to the cursor? Let's say [this select arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L397) runs twice in a row. Let's assume cursor is currently at LSN X. First shard wants to attach at position V and the other one at W. Assume X > W > V. First shard resets the stream to position V. Second shard comes in, sees stale cursor position X and resets it to W. This means that the first shard doesn't get wal in the [V, W) range. ## Summary of changes Ultimately, this boils down to the current position not being kept in sync with the reset of the WAL stream. This patch fixes the race by updating it when resetting the WAL stream and adds a unit test repro. Closes https://github.com/neondatabase/cloud/issues/23750	2025-02-06 09:24:28 +00:00
Christian Schwarz	1686d9e733	perf(page_service): dont `.instrument(span.clone())` the response flush (#10686 ) On my AX102 Hetzner box, removing this line removes about 20us from the `latency_mean` result in `test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant`. If the same 20us can be removed in the nightly benchmark run, this will be a ~10% improvement because there, mean latencies are about ~220us. This span was added during batching refactors, we didn't have it before, and I don't think it's terribly useful. refs - https://github.com/neondatabase/cloud/issues/21759	2025-02-06 08:33:37 +00:00
Erik Grinaker	abcd00181c	pageserver: set a concurrency limit for LocalFS (#10676 ) ## Problem The local filesystem backend for remote storage doesn't set a concurrency limit. While it can't/won't enforce a concurrency limit itself, this also bounds the upload queue concurrency. Some tests create thousands of uploads, which slows down the quadratic scheduling of the upload queue, and there is no point spawning that many Tokio tasks. Resolves #10409. ## Summary of changes Set a concurrency limit of 100 for the LocalFS backend. Before: `test_layer_map[release-pg17].test_query: 68.338 s` After: `test_layer_map[release-pg17].test_query: 5.209 s`	2025-02-06 07:24:36 +00:00
Konstantin Knizhnik	01f0be03b5	Fix bugs in lfc_cache_containsv (#10682 ) ## Problem Incorrect manipulations with iteration index in `lfc_cache_containsv` ## Summary of changes ``` - int this_chunk = Min(nblocks, BLOCKS_PER_CHUNK - chunk_offs); + int this_chunk = Min(nblocks - i, BLOCKS_PER_CHUNK - chunk_offs); int this_chunk = ``` - if (i + 1 >= nblocks) + if (i >= nblocks) ``` Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-06 07:00:00 +00:00
Konstantin Knizhnik	81cd30e4d6	Use #ifdef instead of #if USE_ASSERT_CHECKING (#10683 ) ## Problem USE_ASSERT _CHECKING is defined as empty entity. but it is checked using #if ## Summary of changes Replace `#if USE_ASSERT _CHECKING` with `#ifdef USE_ASSERT _CHECKING` as done in other places in Postgres Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-06 05:47:56 +00:00
Konstantin Knizhnik	7fc6953da4	Is neon superuser (#10625 ) ## Problem is_neon_superuser() fiunction is public in pg14/pg15 but statically defined in publicationcmd.c in pg16/pg17 ## Summary of changes Make this function public for all Postgres version. It is intended to be used not only in publicationcmd.c See https://github.com/neondatabase/postgres/pull/573 https://github.com/neondatabase/postgres/pull/576 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-06 05:42:14 +00:00
Christian Schwarz	77f9e74d86	pgxn: include socket send & recv queue size in slow response logs (#10673 ) # Problem When we see an apparent slow request, one possible cause is that the client is failing to consume responses, but we don't have a clear way to see that. # Solution - Log the socket queue depths on slow/stuck connections, so that we have an indication of whether the compute is keeping up with processing the connection's responses. refs - slack https://neondb.slack.com/archives/C036U0GRMRB/p1738652644396329 - refs https://github.com/neondatabase/cloud/issues/23515 - refs https://github.com/neondatabase/cloud/issues/23486	2025-02-06 01:14:29 +00:00
Alex Chi Z.	0ceeec9be3	fix(pageserver): schedule compaction immediately if pending (#10684 ) ## Problem The code is intended to reschedule compaction immediately if there are pending tasks. We set the duration to 0 before if there are pending tasks, but this will go through the `if period == Duration::ZERO {` branch and sleep for another 10 seconds. ## Summary of changes Set duration to 1 so that it doesn't sleep for too long. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 22:11:50 +00:00
Alex Chi Z.	733a57247b	fix(pageserver): disallow gc-compaction produce l0 layer (#10679 ) ## Problem Any compaction should never produce l0 layers. This never happened in my experiments, but would be good to guard it early. ## Summary of changes Disallow gc-compaction to produce l0 layers. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 20:44:28 +00:00
Heikki Linnakangas	6699a30a49	Make it easy to build only a subset of extensions into compute image (#10655 ) The full build of all extensions takes a long time. When working locally on parts that don't need extensions, you can iterate more quickly by skipping the unnecessary extensions. This adds a build argument to the dockerfile to specify extensions to build. There are three options: - EXTENSIONS=all (default) - EXTENSIONS=minimal: Build only a few extensions that are listed in shared_preload_libraries in the default neon config. - EXTENSIONS=none: Build no extensions (except for the mandatory 'neon' extension).	2025-02-05 18:07:51 +00:00
Alex Chi Z.	133b89a83d	feat(pageserver): continue from last incomplete image layer creation (#10660 ) ## Problem close https://github.com/neondatabase/neon/issues/10651 ## Summary of changes * Image layer creation starts from the next partition of the last processed partition if the previous attempt was not complete. * Add tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 17:35:39 +00:00
Arseny Sher	fba22a7123	Record more timings in test_layer_map (#10670 ) ## Problem It it is not very clear how much time take different operations. ## Summary of changes Record more timings. ref https://github.com/neondatabase/neon/issues/10409	2025-02-05 17:00:26 +00:00
John Spray	14e05276a3	storcon: fix a case where optimise could get stuck on unschedulable node (#10648 ) ## Problem When a shard has two secondary locations, but one of them is on a node with MaySchedule::No, the optimiser would get stuck, because it couldn't decide which secondary to remove. This is generally okay if a node is offline, but if a node is in Pause mode for a long period of time, it's a problem. Closes: https://github.com/neondatabase/neon/issues/10646 ## Summary of changes - Instead of insisting on finding a node in the wrong AZ to remove, find an available node in the _right_ AZ, and remove all the others. This ensures that if there is one live suitable node, then other offline/paused nodes cannot hold things up.	2025-02-05 16:05:12 +00:00
Tristan Partin	ebc55e6ae8	Fix logic for checking if a compute can install a remote extension (#10656 ) Given a remote extensions manifest of the following: ```json { "public_extensions": [], "custom_extensions": null, "library_index": { "pg_search": "pg_search" }, "extension_data": { "pg_search": { "control_data": { "pg_search.control": "comment = 'pg_search: Full text search for PostgreSQL using BM25'\ndefault_version = '0.14.1'\nmodule_pathname = '$libdir/pg_search'\nrelocatable = false\nsuperuser = true\nschema = paradedb\ntrusted = true\n" }, "archive_path": "13117844657/v14/extensions/pg_search.tar.zst" } } } ``` We were allowing a compute to install a remote extension that wasn't listed in either public_extensions or custom_extensions. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-05 14:58:33 +00:00
Erik Grinaker	f07119cca7	pageserver: add `pageserver_wal_ingest_values_committed` metric (#10653 ) ## Problem We don't have visibility into the ratio of image vs. delta pages ingested in Pageservers. This might be useful to determine whether we should compress WAL records before storing them, which in turn might make compaction more efficient. ## Summary of changes Add `pageserver_wal_ingest_values_committed` metric with dimensions `class=metadata\|data` and `kind=image\|delta`.	2025-02-05 14:33:04 +00:00
Vlad Lazar	47975d06d9	storcon: silence cplane 404s on tenant creation (#10665 ) ## Problem We get WARN log noise on tenant creations. Cplane creates tenants via /location_config. That returns the attached locations in the response and spawns a reconciliation which will also attempt to notify cplane. If the notification is attempted before cplane persists the shards to its database, storcon gets back a 404. The situation is harmless, but annoying. ## Summary of Changes * Add a tenant creation hint to the reconciler config * If the hint is true and we get back a 404 on the notification from cplane, ignore the error, but still queue the reconcile up for a retry. Closes https://github.com/neondatabase/cloud/issues/20732	2025-02-05 12:41:09 +00:00
Fedor Dikarev	472007dd7c	ci: unify Dockerfiles, set bash as SHELL for debian layers, make cpan step as separate RUN (#10645 ) ## Problem Ref: https://github.com/neondatabase/cloud/issues/23461 and follow-up after: https://github.com/neondatabase/neon/pull/10553 we used `echo` to set-up `.wgetrc` and `.curlrc`, and there we used `\n` to make these multiline configs with one echo command. The problem is that Debian `/bin/sh`'s built-in echo command behaves differently from the `/bin/echo` executable and from the `echo` built-in in `bash`. Namely, it does not support the`-e` option, and while it does treat `\n` as a newline, passing `-e` here will add that `-e` to the output. At the same time, when we use different base images, for example `alpine/curl`, their `/bin/sh` supports and requires `-e` for treating escape sequences like `\n`. But having different `echo` and remembering difference in their behaviour isn't best experience for the developer and makes bad experience maintaining Dockerfiles. Work-arounds: - Explicitly use `/bin/bash` (like in this PR) - Use `/bin/echo` instead of the shell's built-in echo function - Use printf "foo\n" instead of echo -e "foo\n" ## Summary of changes 1. To fix that, we process with the option setting `/bin/bash` as a SHELL for the debian-baysed layers 2. With no changes for `alpine/curl` based layers. 3. And one more change here: in `extensions` layer split to the 2 steps: installing dependencies from `CPAN` and installing `lcov` from github, so upgrading `lcov` could reuse previous layer with installed cpan modules.	2025-02-04 18:58:02 +00:00
Vlad Lazar	f9009d6b80	pageserver: write heatmap to disk after uploading it (#10650 ) ## Problem We wish to make heatmap generation additive in https://github.com/neondatabase/neon/pull/10597. However, if the pageserver restarts and has a heatmap on disk from when it was a secondary long ago, we can end up keeping extra layers on the secondary's disk. ## Summary of changes Persist the heatmap after a successful upload.	2025-02-04 17:52:54 +00:00
Alex Chi Z.	cab60b6d9f	fix(pagesever): stablize gc-compaction tests (#10621 ) ## Problem Hopefully this can resolve https://github.com/neondatabase/neon/issues/10517. The reason why the test is flaky is that after restart the compute node might write some data so that the pageserver flush some layers, and in the end, causing L0 compaction to run, and we cannot get the test scenario as we want. ## Summary of changes Ensure all L0 layers are compacted before starting the test. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-04 16:11:31 +00:00
Erik Grinaker	06090bbccd	pageserver: log critical error on `ClearVmBits` for unknown pages (#10634 ) ## Problem In #9895, we fixed some issues where `ClearVmBits` were broadcast to all shards, even those not owning the VM relation. As part of that, we found some ancient code from #1417, which discarded spurious incorrect `ClearVmBits` records for pages outside of the VM relation. We added observability in #9911 to see how often this actually happens in the wild. After two months, we have not seen this happen once in production or staging. However, out of caution, we don't want a hard error and break WAL ingestion. Resolves #10067. ## Summary of changes Log a critical error when ingesting `ClearVmBits` for unknown VM relations or pages.	2025-02-04 14:55:11 +00:00
Folke Behrens	dcf335a251	proxy: Switch proxy to JSON logging (#9857 ) ## Problem We want to switch proxy and ideally all Rust services to structured JSON logging to support better filtering and cross-referencing with tracing. ## Summary of changes * Introduce a custom tracing-subscriber to write the JSON. In a first attempt a customized tracing::fmt::FmtSubscriber was used, but it's very inefficient and can still generate invalid JSON. It's also doesn't allow us to add important fields to the root object. * Make this opt in: the `LOGFMT` env var can be set to `"json"` to enable to new logger at startup.	2025-02-04 14:50:53 +00:00
Arpad Müller	b6e9daea9a	storcon: only allow errrors of the server cert verification (#10644 ) This PR does a bunch of things: * only allow errors of the server cert verification, not of the TLS handshake. The TLS handshake doesn't cause any errors for us so we can just always require it to be valid. This simplifies the code a little. * As the solution is more permanent than originally anticipated, I think it makes sense to move the `AcceptAll` verifier outside. * log the connstr information. this helps with figuring out which domain names are configured in the connstr, etc. I think it is generally useful to print it. make extra sure that the password is not leaked. Follow-up of #10640	2025-02-04 14:01:57 +00:00
a-masterov	d5c3a4e2b9	Add support for pgjwt test (#10611 ) ## Problem We don't currently test pgjwt, while it is based on pg_prove and can be easily added ## Summary of changes The test for pgjwt was added.	2025-02-04 13:49:44 +00:00
Heikki Linnakangas	8107140f7f	Refactor compute dockerfile (#10371 ) Refactor how extensions are built in compute Dockerfile 1. Rename some of the extension layers, so that names correspond more precisely to the upstream repository name and the source directory name. For example, instead of "pg-jsonschema-pg-build", spell it "pg_jsonschema-build". Some of the layer names had the extra "pg-" part, and some didn't; harmonize on not having it. And use an underscore if the upstream project name uses an underscore. 2. Each extension now consists of two dockerfile targets: [extension]-src and [extension]-build. By convention, the -src target downloads the sources and applies any neon-specific patches if necessary. The source tarball is downloaded and extracted under /ext-src. For example, the 'pgvector' extension creates the following files and directory: /ext-src/pgvector.tar.gz # original tarball /ext-src/pgvector.patch # neon-specific patch, copied from patches/ dir /ext-src/pgvector-src/ # extracted tarball, with patch applied This separation avoids re-downloading the sources every time the extension is recompiled. The 'extension-tests' target also uses the [extension]-src layers, by copying the /ext-src/ dirs from all the extensions together into one image. This refactoring came about when I was experimenting with different ways of splitting up the Dockerfile so that each extension would be in a separate file. That's not part of this PR yet, but this is a good step in modularizing the extensions.	2025-02-04 10:35:43 +00:00
Alex Chi Z.	e219d48bfe	refactor(pageserver): clearify compaction return value (#10643 ) ## Problem ## Summary of changes Make the return value of the set of compaction functions less confusing. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-03 21:56:55 +00:00
Alex Chi Z.	c1be84197e	feat(pageserver): preempt image layer generation if L0 piles up (#10572 ) ## Problem Image layer generation could block L0 compactions for a long time. ## Summary of changes * Refactored the return value of `create_image_layers_for_` functions to make it self-explainable. Preempt image layer generation in `Try` mode if L0 piles up. Note that we might potentially run into a state that only the beginning part of the keyspace gets image coverage. In that case, we either need to implement something to prioritize some keyspaces with image coverage, or tune the image_creation_threshold to ensure that the frequency of image creation could keep up with L0 compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-02-03 20:55:47 +00:00
dependabot[bot]	d80cbb2443	build(deps): bump openssl from 0.10.66 to 0.10.70 in /test_runner/pg_clients/rust/tokio-postgres in the cargo group across 1 directory (#10642 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-03 19:42:40 +00:00
Erik Grinaker	06b45fd0fd	utils/logging: add `critical!` macro and metric (#10641 ) ## Problem We don't currently have good alerts for critical errors, e.g. data loss/corruption. Touches #10094. ## Summary of changes Add a `critical!` macro and corresponding `libmetrics_tracing_event_count{level="critical"}` metric. This will: * Emit an `ERROR` log message with prefix `"CRITICAL:"` and a backtrace. * Increment `libmetrics_tracing_event_count{level="critical"}`, and indirectly `level="error"`. * Trigger a pageable alert (via the metric above). * In debug builds, panic the process. I'll add uses of the macro separately.	2025-02-03 19:23:12 +00:00
John Spray	715e20343a	storage controller: improve scheduling of tenants created in PlacementPolicy::Secondary (#10590 ) ## Problem I noticed when onboarding lots of tenants that the AZ scheduling violation stat was climbing, before falling later as optimisations happened. This was happening because we first add the tenant with PlacementPolicy::Secondary, and then later go to PlacementPolicy::Attached, and the scheduler's behavior led to a bad AZ choice: 1. Create a secondary location in the non-preferred AZ 2. Upgrade to Attached where we promote that non-preferred-AZ location to attached and then create another secondary 3. Optimiser later realises we're in the wrong AZ and moves us ## Summary of changes - Extend some logging to give more information about AZs - When scheduling secondary location in PlacementPolicy::Secondary, select it as if we were attached: in this mode, our business goal is to have a warm pageserver location that we can make available as attached quickly if needed, therefore we want it to be in the preferred AZ. - Make optimize_secondary logic the same, so that it will consider a secondary location in the preferred AZ to be optimal when in PlacementPolicy::Secondary - When transitioning to from PlacementPolicy::Attached(N) to PlacementPolicy::Secondary, instead of arbitrarily picking a location to keep, prefer to keep the location in the preferred AZ	2025-02-03 19:01:16 +00:00
Arpad Müller	c774f0a147	storcon db: allow accepting any TLS certificate (#10640 ) We encountered some TLS validation errors for the storcon since applying #10614. Add an option to downgrade them to logged errors instead to allow us to debug with more peace. cc issue https://github.com/neondatabase/cloud/issues/23583	2025-02-03 18:21:01 +00:00
Arpad Müller	40d6b3a34e	Merge pull request #10602 from neondatabase/rc/release/2025-01-31 Storage release 2025-01-31	2025-02-03 16:43:04 +01:00
Folke Behrens	628a9616c4	fix(proxy): Don't use --is-private-access-proxy to disable IP check (#10633 ) ## Problem * The behavior of this flag changed. Plus, it's not necessary to disable the IP check as long as there are no IPs listed in the local postgres. ## Summary of changes * Drop the flag from the command in the README.md section. * Change the postgres URL passed to proxy to not use the endpoint hostname. * Also swap postgres creation and proxy startup, so the DB is running when proxy comes up.	2025-02-03 14:12:41 +00:00
Alexander Bayandin	43682624b5	CI(pg-clients): fix logical replication tests (#10623 ) ## Problem Tests for logical replication (on Staging) have been failing for some time because logical replication is not enabled for them. This issue occurred after switching to an org API key with a different default setting, where logical replication was not enabled by default. ## Summary of changes - Add `enable_logical_replication` input to `actions/neon-project-create` - Enable logical replication in `test-logical-replication` job	2025-02-03 13:41:41 +00:00
Em Sharnoff	e617a3a075	vm-monitor: Improve error display (#10542 ) Logging errors with the debug format specifier causes multi-line errors, which are sometimes a pain to deal with. Instead, we should use anyhow's alternate display format, which shows the same information on a single line. Also adjusted a couple of error messages that were stale. Fixes neondatabase/cloud#14710.	2025-02-03 13:34:11 +00:00
Fedor Dikarev	23ca8b061b	Use actions/checkout for checkout (#10630 ) ## Problem 1. First of all it's more correct 2. Current usage allows ` Time-of-Check-Time-of-Use (TOCTOU) 'Pwn Request' vulnerabilities`. Please check security slack channel or reach me for more details. I will update PR description after merge. ## Summary of changes 1. Use `actions/checkout` with `ref: ${{ github.event.pull_request.head.sha }}` Discovered by and Co-author: @varunsh-coder	2025-02-03 12:55:48 +00:00
Anastasia Lubennikova	b1bc33eb4d	Fix logical_replication_sync test fixture (#10531 ) Fixes flaky test_lr_with_slow_safekeeper test #10242 Fix query to `pg_catalog.pg_stat_subscription` catalog to handle table synchronization and parallel LR correctly.	2025-02-03 12:44:47 +00:00
OBBO67	b1e451091a	pageserver: clean up references to timeline delete marker, uninit marker (#5718 ) (#10627 ) ## Problem Since [#5580](https://github.com/neondatabase/neon/pull/5580) the delete and uninit file markers are no longer needed. ## Summary of changes Remove the remaining code for the delete and uninit markers. Additionally removes the `ends_with_suffix` function as it is no longer required. Closes [#5718](https://github.com/neondatabase/neon/issues/5718).	2025-02-03 11:54:07 +00:00
Arpad Müller	87ad50c925	storcon: use diesel-async again, now with tls support (#10614 ) Successor of #10280 after it was reverted in #10592. Re-introduce the usage of diesel-async again, but now also add TLS support so that we connect to the storcon database using TLS. By default, diesel-async doesn't support TLS, so add some code to make us explicitly request TLS. cc https://github.com/neondatabase/cloud/issues/23583	2025-02-03 11:53:51 +00:00
Alexander Bayandin	89b9f74077	CI(pre-merge-checks): do not run `conclusion` job for PRs (#10619 ) ## Problem While working on https://github.com/neondatabase/neon/pull/10617 I (unintentionally) merged the PR before the main CI pipeline has finished. I suspect this happens because we have received all the required job results from the pre-merge-checks workflow, which runs on PRs that include changes to relevant files. ## Summary of changes - Skip the `conclusion` job in `pre-merge-checks` workflows for PRs	2025-02-03 09:40:12 +00:00
John Spray	f071800979	tests: stabilize shard locations earlier in test_scrubber_tenant_snapshot (#10606 ) ## Problem This test would sometimes emit unexpected logs from the storage controller's requests to do migrations, which overlap with the test's restarts of pageservers, where those migrations are happening some time after a shard split as the controller moves load around. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10602/13067323736/index.html#testresult/f66f1329557a1fc5/retries ## Summary of changes - Do a reconcile_until_idle after shard split, so that the rest of the test doesn't run concurrently with migrations	2025-02-03 09:02:21 +00:00
Peter Bendel	4dfe60e2ad	revert https://github.com/neondatabase/neon/pull/10616 (#10631 ) ## Problem https://github.com/neondatabase/neon/pull/10616 was only intended temparily during the weekend, want to reset to prior state ## Summary of changes revert https://github.com/neondatabase/neon/pull/10616 but keep fixes in https://github.com/neondatabase/neon/pull/10622	2025-02-03 09:00:23 +00:00
Arpad Müller	8ae6f656a6	Don't require partial backup semaphore capacity for deletions (#10628 ) In the safekeeper, we block deletions on the timeline's gate closing, and any `WalResidentTimeline` keeps the gate open (because it owns a gate lock object). Thus, unless the `main_task` function of a partial backup doesn't return, we can't delete the associated timeline. In order to make these tasks exit early, we call the cancellation token of the timeline upon its shutdown. However, the partial backup task wasn't looking for the cancellation while waiting to acquire a partial backup permit. On a staging safekeeper we have been in a situation in the past where the semaphore was already empty for a duration of many hours, rendering all attempted deletions unable to proceed until a restart where the semaphore was reset: https://neondb.slack.com/archives/C03H1K0PGKH/p1738416586442029	2025-02-03 04:11:06 +00:00
Peter Bendel	b9e1a67246	fix generate matrix for olap for saturdays (#10622 ) ## Problem when introducing pg17 for job step `Generate matrix for OLAP benchmarks` I introduced a syntax error that only hits on Saturdays. ## Summary of changes Remove trailing comma ## successful test run https://github.com/neondatabase/neon/actions/runs/13086363907	2025-02-01 11:09:45 +00:00
Folke Behrens	6318828c63	Update rust to 1.84.1 (#10618 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.84.1/). Prior update was in https://github.com/neondatabase/neon/pull/10328. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-01-31 20:52:17 +00:00
Stefan Radig	6dd48ba148	feat(proxy): Implement access control with VPC endpoint checks and block for public internet / VPC (#10143 ) - Wired up filtering on VPC endpoints - Wired up block access from public internet / VPC depending on per project flag - Added cache invalidation for VPC endpoints (partially based on PR from Raphael) - Removed BackendIpAllowlist trait --------- Co-authored-by: Ivan Efremov <ivan@neon.tech>	2025-01-31 20:32:57 +00:00
Conrad Ludgate	ad1a41157a	feat(proxy): optimizing the chances of large write in copy_bidirectional (#10608 ) We forked copy_bidirectional to solve some issues like fast-shutdown (disallowing half-open connections) and to introduce better error tracking (which side of the conn closed down). A change recently made its way upstream offering performance improvements: https://github.com/tokio-rs/tokio/pull/6532. These seem applicable to our fork, thus it makes sense to apply them here as well.	2025-01-31 19:14:27 +00:00
Tristan Partin	fcd195c2b6	Migrate compute_ctl arg parsing to clap derive (#10497 ) The primary benefit is that all the ad hoc get_matches() calls are no longer necessary. Now all it takes to get at the CLI arguments is referencing a struct member. It's also great the we can replace the ad hoc CLI struct we had with this more formal solution. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-31 19:04:26 +00:00
Peter Bendel	bc7822d90c	temporarily disable some steps and run more often to expose more pgbench --initialize in benchmarking workflow (#10616 ) ## Problem we want to disable some steps in benchmarking workflow that do not initialize new projects and instead run the test more frequently Test run https://github.com/neondatabase/neon/actions/runs/13077737888	2025-01-31 18:41:17 +00:00
Alexander Bayandin	48c87dc458	CI(pre-merge-checks): fix condition (#10617 ) ## Problem Merge Queue fails if changes include Rust code. ## Summary of changes - Fix condition for `build-build-tools-image` - Add a couple of no-op `false \|\|` to make predicates look symmetric	2025-01-31 18:07:26 +00:00
John Spray	aedeb1c7c2	pageserver: revise logging of cancelled request results (#10604 ) ## Problem When a client dropped before a request completed, and a handler returned an ApiError, we would log that at error severity. That was excessive in the case of a request erroring on a shutdown, and could cause test flakes. example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/13067651123/index.html#suites/ad9c266207b45eafe19909d1020dd987/6021ce86a0d72ae7/ ``` Cancelled request finished with an error: ShuttingDown ``` ## Summary of changes - Log a different info-level on ShuttingDown and ResourceUnavailable API errors from cancelled requests	2025-01-31 17:43:54 +00:00
John Spray	a93e9f22fc	pageserver: remove faulty debug assertion in compaction (#10610 ) ## Problem This assertion is incorrect: it is legal to see another shard's data at this point, after a shard split. Closes: https://github.com/neondatabase/neon/issues/10609 ## Summary of changes - Remove faulty assertion	2025-01-31 17:43:31 +00:00
JC Grünhage	10cf5e7a38	Move cargo-deny into a separate workflow on a schedule (#10289 ) ## Problem There are two (related) problems with the previous handling of `cargo-deny`: - When a new advisory is added to rustsec that affects a dependency, unrelated pull requests will fail. - New advisories rely on pushes or PRs to be surfaced. Problems that already exist on main will only be found if we try to merge new things into main. ## Summary of changes We split out `cargo-deny` into a separate workflow that runs on all PRs that touch `Cargo.lock`, and on a schedule on `main`, `release`, `release-compute` and `release-proxy` to find new advisories.	2025-01-31 13:42:59 +00:00
Arpad Müller	dce617fe07	Update to rebased rust-postgres (#10584 ) Update to a rebased version of our rust-postgres patches, rebased on [this](`98f5a11bc0`) commit this time. With #10280 reapplied, this means that the rust-postgres crates will be deduplicated, as the new crate versions are finally compatible with the requirements of diesel-async. Earlier update: #10561 rust-postgres PR: https://github.com/neondatabase/rust-postgres/pull/39	2025-01-31 12:40:20 +00:00
Alexander Bayandin	503bc72d31	CI: add `diesel print-schema` check (#10527 ) ## Problem We want to check that `diesel print-schema` doesn't generate any changes (`storage_controller/src/schema.rs`) in comparison with the list of migration. ## Summary of changes - Add `diesel_cli` to `build-tools` image - Add `Check diesel schema` step to `build-neon` job, at this stage we have all required binaries, so don't need to compile anything additionally - Check runs only on x86 release builds to be sure we do it at least once per CI run.	2025-01-31 11:48:46 +00:00
Fedor Dikarev	89cff08354	unify pg-build-nonroot-with-cargo base layer and config retries in curl (#10575 ) Ref: https://github.com/neondatabase/cloud/issues/23461 ## Problem Just made changes around and see these 2 base layers could be optimised. and after review comment from @myrrc setting up timeouts and retries in `alpine/curl` image ## Summary of changes	2025-01-31 11:46:33 +00:00
Erik Grinaker	afbcebe7f7	test_runner: force-compact in `test_sharding_autosplit` (#10605 ) ## Problem This test may not fully detect data corruption during splits, since we don't force-compact the entire keyspace. ## Summary of changes Force-compact all data in `test_sharding_autosplit`.	2025-01-31 11:31:58 +00:00
Arpad Müller	7d5c70c717	Update AWS SDK crates (#10588 ) We want to keep the AWS SDK up to date as that way we benefit from new developments and improvements. Prior update was in #10056	2025-01-31 11:23:12 +00:00
John Spray	f09cfd11cb	pageserver: exclude archived timelines from freeze+flush on shutdown (#10594 ) ## Problem If offloading races with normal shutdown, we get a "failed to freeze and flush: cannot flush frozen layers when flush_loop is not running, state is Exited". This is harmless but points to it being quite strange to try and freeze and flush such a timeline. flushing on shutdown for an archived timeline isn't useful. Related: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - During Timeline::shutdown, ignore ShutdownMode::FreezeAndFlush if the timeline is archived	2025-01-31 10:54:14 +00:00
Arseny Sher	765ba43438	Allow pageserver unreachable errors in test_scrubber_tenant_snapshot (#10585 ) ## Problem test_scrubber_tenant_snapshot restarts pageservers, but log validation fails tests on any non white listed storcon warnings, making the test flaky. ## Summary of changes Allow warns like 2025-01-29T12:37:42.622179Z WARN reconciler{seq=1 tenant_id=2011077aea9b4e8a60e8e8a19407634c shard_id=0004}: Call to node 2 (localhost:15352) management API failed, will retry (attempt 1): receive body: error sending request for url (http://localhost:15352/v1/tenant/2011077aea9b4e8a60e8e8a19407634c-0004/location_config): client error (Connect) ref https://github.com/neondatabase/neon/issues/10462	2025-01-31 10:33:24 +00:00
Folke Behrens	6041a93591	Update tokio base crates (#10556 ) Update `tokio` base crates and their deps. Pin `tokio` to at least 1.41 which stabilized task ID APIs. To dedup `mio` dep the `notify` crate is updated. It's used in `compute_tools`. `9f81828429/compute_tools/src/pg_helpers.rs (L258-L367)`	2025-01-31 09:54:31 +00:00
Conrad Ludgate	738bf83583	chore: replace dashmap with clashmap (#10582 ) ## Problem Because dashmap 6 switched to hashbrown RawTable API, it required us to use unsafe code in the upgrade: https://github.com/neondatabase/neon/pull/8107 ## Summary of changes Switch to clashmap, a fork maintained by me which removes much of the unsafe and ultimately switches to HashTable instead of RawTable to remove much of the unsafe requirement on us.	2025-01-31 09:53:43 +00:00
Anna Stepanyan	423e239617	[infra/notes] impr: add issue types to issue templates (#10018 ) refs #0000 --------- Co-authored-by: Fedor Dikarev <fedor@neon.tech>	2025-01-31 06:29:06 +00:00
github-actions[bot]	a018878e27	Storage release 2025-01-31	2025-01-31 06:02:08 +00:00
Heikki Linnakangas	df87a55609	tests: Speed up test_pgdata_import_smoke on Postgres v17 (#10567 ) The test runs this query: select count(*), sum(data::bigint)::bigint from t to validate the test results between each part of the test. It performs a simple sequential scan and aggregation, but was taking an order of magnitude longer on v17 than on previous Postgres versions, which sometimes caused the test to time out. There were two reasons for that: 1. On v17, the planner estimates the table to have only only one row. In reality it has 305790 rows, and older versions estimated it at 611580, which is not too bad given that the table has not been analyzed so the planner bases that estimate just on the number of pages and the widths of the datatypes. The new estimate of 1 row is much worse, and it leads the planner to disregard parallel plans, whereas on older versions you got a Parallel Seq Scan. I tracked this down to upstream commit 29cf61ade3, "Consider fillfactor when estimating relation size". With that commit, table_block_relation_estimate_size() function calculates that each page accommodates less than 1 row when the fillfactor is taken into account, which rounds down to 0. In reality, the executor will always place at least one row on a page regardless of fillfactor, but the new estimation formula doesn't take that into account. I reported this to pgsql-hackers (https://www.postgresql.org/message-id/2bf9d973-7789-4937-a7ca-0af9fb49c71e%40iki.fi), we don't need to do anything more about it in neon. It's OK to not use parallel scans here; once issue 2. below is addressed, the queries are fast enough without parallelism.. 2. On v17, prefetching was not happening for the sequential scan. That's because starting with v17, buffers are reserved in the shared buffer cache before prefetching is initiated, and we use a tiny shared_buffers=1MB setting in the tests. The prefetching is effectively disabled with such a small shared_buffers setting, to protect the system from completely starving out of buffers. To address that, simply bump up shared_buffers in the test. This patch addresses the second issue, which is enough to fix the problem.	2025-01-30 22:55:17 +00:00
John Spray	5e0c40709f	storcon: refine chaos selection logic (#10600 ) ## Problem In https://github.com/neondatabase/neon/pull/10438 it was pointed out that it would be good to avoid picking tenants in ID order, and also to avoid situations where we might double-select the same tenant. There was an initial swing at this in https://github.com/neondatabase/neon/pull/10443, where Chi suggested a simpler approach which is done in this PR ## Summary of changes - Split total set of tenants into in and out of home AZ - Consume out of home AZ first, and if necessary shuffle + consume from out of home AZ	2025-01-30 22:45:43 +00:00
John Spray	e1273acdb1	pageserver: handle shutdown cleanly in layer download API (#10598 ) ## Problem This API is used in tests and occasionally for support. It cast all errors to 500. That can cause a failure on the log checks: https://neon-github-public-dev.s3.amazonaws.com/reports/main/13056992876/index.html#suites/ad9c266207b45eafe19909d1020dd987/683a7031d877f3db/ ## Summary of changes - Avoid using generic anyhow::Error for layer downloads - Map shutdown cases to 503 in http route	2025-01-30 22:43:36 +00:00
John Spray	d18f6198e1	storcon: fix AZ-driven tenant selection in chaos (#10443 ) ## Problem In https://github.com/neondatabase/neon/pull/10438 I had got the function for picking tenants backwards, and it was preferring to move things _away_ from their preferred AZ. ## Summary of changes - Fix condition in `is_attached_outside_preferred_az`	2025-01-30 22:17:07 +00:00
John Spray	6da7c556c2	pageserver: fix race cleaning up timeline files when shut down during bootstrap (#10532 ) ## Problem Timeline bootstrap starts a flush loop, but doesn't reliably shut down the timeline (incl. waiting for flush loop to exit) before destroying UninitializedTimeline, and that destructor tries to clean up local storage. If local storage is still being written to, then this is unsound. Currently the symptom is that we see a "Directory not empty" error log, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/main/12966756686/index.html#testresult/5523f7d15f46f7f7/retries ## Summary of changes - Move fallible IO part of bootstrap into a function (notably, this is fallible in the case of the tenant being shut down while creation is happening) - When that function returns an error, call shutdown() on the timeline	2025-01-30 20:33:22 +00:00
a-masterov	bf6d5e93ba	Run tests of the contrib extensions (#10392 ) ## Problem We don't test the extensions, shipped with contrib ## Summary of changes The tests are now running	2025-01-30 19:32:35 +00:00
Arpad Müller	4d2c2e9460	Revert "storcon: switch to diesel-async and tokio-postgres (#10280 )" (#10592 ) There was a regression of #10280, tracked in [#23583](https://github.com/neondatabase/cloud/issues/23583). I have ideas how to fix the issue, but we are too close to the release cutoff, so revert #10280 for now. We can revert the revert later :).	2025-01-30 19:23:25 +00:00
John Spray	bae0de643e	tests: relax constraints on test_timeline_archival_chaos (#10595 ) ## Problem The test asserts that it completes at least 10 full timeline lifecycles, but the noisy CI environment sometimes doesn't meet that goal. Related: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - Sleep for longer between pageserver restarts, so that the timeline workers have more chance to make progress - Sleep for shorter between retries from timeline worker, so that they have better chance to get in while a pageserver is up between restarts - Relax the success condition to complete at least 5 iterations instead of 10	2025-01-30 19:22:59 +00:00
Cheng Chen	8293b252b2	chore(compute): pg_mooncake v0.1.1 (#10578 ) ## Problem Upgrade pg_mooncake to v0.1.1 ## Summary of changes https://github.com/Mooncake-Labs/pg_mooncake/blob/main/CHANGELOG.md#011-2025-01-29	2025-01-30 18:33:25 +00:00
Peter Bendel	6c8fc909d6	Benchmarking PostgreSQL17: for OLAP need specific connstr secrets (#10587 ) ## Problem for OLAP benchmarks we need specific connstr secrets with different database names for each job step This is a follow-up for https://github.com/neondatabase/neon/pull/10536 In previous PR we used a common GitHub secret for a shared re-use project that has 4 databases: neondb, tpch, clickbench and userexamples. [Failure example](https://neon-github-public-dev.s3.amazonaws.com/reports/main/13044872855/index.html#suites/54d0af6f403f1d8611e8894c2e07d023/fc029330265e9f6e/): ```log # /tmp/neon/pg_install/v17/bin/psql user=neondb_owner dbname=neondb host=ep-broad-brook-w2luwzzv.us-east-2.aws.neon.build sslmode=require options='-cstatement_timeout=0 ' -c -- $ID$ -- TPC-H/TPC-R Pricing Summary Report Query (Q1) -- Functional Query Definition -- Approved February 1998 ... ERROR: relation "lineitem" does not exist ``` ## Summary of changes We need dedicated GitHub secrets and dedicated connection strings for each of the use cases. ## Test run https://github.com/neondatabase/neon/actions/runs/13053968231	2025-01-30 16:41:46 +00:00
Heikki Linnakangas	efe42db264	tests: test_pgdata_import_smoke requires the 'testing' cargo feature (#10569 ) It took me ages to figure out why it was failing on my laptop. What I saw was that when the test makes the 'import_pgdata' in the pageserver, the pageserver actually performs a regular 'bootstrap' timeline creation by running initdb, with no importing. It boiled down to the json request that the test uses: ``` { "new_timeline_id": str(timeline_id), "import_pgdata": { "idempotency_key": str(idempotency), "location": {"LocalFs": {"path": str(importbucket.absolute())}}, }, }, ``` and how serde deserializes into rust structs. The 'LocalFs' enum variant in `models.rs` is gated on the 'testing' cargo feature. On a non-testing build, that got deserialized into the default Bootstrap enum variant, as a valid TimelineCreateRequestModeImportPgdata variant could not be formed. PS. IMHO we should get rid of the testing feature, compile in all the functionality, and have a runtime flag to disable anything dangeorous. With that, you would've gotten a nice "feature only enabled in testing mode" error in this case, or the test would've simply worked. But that's another story.	2025-01-30 16:11:26 +00:00
Alex Chi Z.	cf6dee946e	fix(pageserver): gc-compaction race with read (#10543 ) ## Problem close https://github.com/neondatabase/neon/issues/10482 ## Summary of changes Add an extra lock on the read path to protect against races. The read path has an implication that only certain kind of compactions can be performed. Garbage keys must first have an image layer covering the range, and then being gc-ed -- they cannot be done in one operation. An alternative to fix this is to move the layers read guard to be acquired at the beginning of `get_vectored_reconstruct_data_timeline`, but that was intentionally optimized out and I don't want to regress. The race is not limited to image layers. Gc-compaction will consolidate deltas automatically and produce a flat delta layer (i.e., when we have retain_lsns below the gc-horizon). The same race would also cause behaviors like getting an un-replayable key history as in https://github.com/neondatabase/neon/issues/10049. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-30 15:25:29 +00:00
Alexey Kondratov	be51b10da7	chore(compute): Print some compute_ctl errors in debug mode (#10586 ) ## Problem In some cases, we were returning a very shallow error like `error sending request for url (XXX)`, which made it very hard to figure out the actual error. ## Summary of changes Use `{:?}` in a few places, and remove it from places where we were printing a string anyway.	2025-01-30 14:31:49 +00:00
Arpad Müller	93714c4c7b	secondary downloader: load metadata on loading of timeline (#10539 ) Related to #10308, we might have legitimate changes in file size or generation. Those changes should not cause warn log lines. In order to detect changes of the generation number while the file size stayed the same, load the metadata that we store on disk on loading of the timeline. Still do a comparison with the on-disk layer sizes to find any discrepancies that might occur due to race conditions (new metadata file gets written but layer file has not been updated yet, and PS shuts down). However, as it's possible to hit it in a race conditon, downgrade it to a warning. Also fix a mistake in #10529: we want to compare the old with the new metadata, not the old metadata with itself.	2025-01-30 12:03:36 +00:00
John Spray	ab627ad9fd	storcon_cli: fix spurious error setting preferred AZ (#10568 ) ## Problem The client code for `tenant-set-preferred-az` declared response type `()`, so printed a spurious error on each use: ``` Error: receive body: error decoding response body: invalid type: map, expected unit at line 1 column 0 ``` The requests were successful anyway. ## Summary of changes - Declare the proper return type, so that the command succeeds quietly.	2025-01-30 11:54:02 +00:00
Erik Grinaker	6a2afa0c02	pageserver: add per-timeline read amp histogram (#10566 ) ## Problem We don't have per-timeline observability for read amplification. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes Add a per-timeline `pageserver_layers_per_read` histogram. NB: per-timeline histograms are expensive, but probably worth it in this case.	2025-01-30 11:24:49 +00:00
Alexander Bayandin	8804d58943	Nightly Benchmarks: use pgbench from artifacts (#10370 ) We don't use statically linked OpenSSL anymore (#10302), it's ok to switch to Neon's pgbench for pgvector benchmarks	2025-01-30 11:18:07 +00:00
Erik Grinaker	d3db96c211	pageserver: add `pageserver_deltas_per_read_global` metric (#10570 ) ## Problem We suspect that Postgres checkpoints will limit the number of page deltas necessary to reconstruct a page, but don't know for certain. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes Add `pageserver_deltas_per_read_global` metric. This pairs with `pageserver_layers_per_read_global` from #10573.	2025-01-30 10:55:07 +00:00
Erik Grinaker	b24727134c	pageserver: improve read amp metric (#10573 ) ## Problem The current global `pageserver_layers_visited_per_vectored_read_global` metric does not appear to accurately measure read amplification. It divides the layer count by the number of reads in a batch, but this means that e.g. 10 reads with 100 L0 layers will only measure a read amp of 10 per read, while the actual read amp was 100. While the cost of layer visits are amortized across the batch, and some layers may not intersect with a given key, each visited layer contributes directly to the observed latency for every read in the batch, which is what we care about. Touches https://github.com/neondatabase/cloud/issues/23283. Extracted from #10566. ## Summary of changes * Count the number of layers visited towards each read in the batch, instead of the average across the batch. * Rename `pageserver_layers_visited_per_vectored_read_global` to `pageserver_layers_per_read_global`. * Reduce the read amp log warning threshold down from 512 to 100.	2025-01-30 09:27:40 +00:00
Alexander Lakhin	a7a706cff7	Fix submodule reference after #10473 (#10577 )	2025-01-30 09:09:43 +00:00
Alex Chi Z.	77ea9b16fe	fix(pageserver): use the larger one of upper limit and threshold (#10571 ) ## Problem Follow up of https://github.com/neondatabase/neon/pull/10550 in case the upper limit is set larger than threshold. It does not make sense for someone to enforce the behavior like "if there are >= 50 L0s, only compact 10 of them". ## Summary of changes Use the maximum of compaction threshold and upper limit when selecting L0 files to compact. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-30 00:05:40 +00:00
Alex Chi Z.	9dff6cc2a4	fix(pageserver): skip repartition if we need L0 compaction (#10547 ) ## Problem Repartition is slow, but it's only used in image layer creation. We can skip it if we have a lot of L0 layers to ingest. ## Summary of changes If L0 compaction is not complete, do not repartition and do not create image layers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-29 21:32:50 +00:00
Erik Grinaker	ff298afb97	pageserver: add `level` for timeline layer metrics (#10563 ) ## Problem We don't have good observability for per-timeline compaction debt, specifically the number of delta layers in the frozen, L0, and L1 levels. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes * Add a `level` label for `pageserver_layer_{count,size}` with values `l0`, `l1`, and `frozen`. * Track metrics for frozen layers. There is already a `kind={delta,image}` label. `kind=image` is only possible for `level=l1`. We don't include the currently open ephemeral layer, only frozen layers. There is always exactly 1 ephemeral layer, with a dynamic size which is already tracked in `pageserver_timeline_ephemeral_bytes`.	2025-01-29 21:10:56 +00:00
Fedor Dikarev	de1c35fab3	add retries for apt, wget and curl (#10553 ) Ref: https://github.com/neondatabase/cloud/issues/23461 ## Problem > recent CI failure due to apt-get: ``` 4.266 E: Failed to fetch http://deb.debian.org/debian/pool/main/g/gcc-10/libgfortran5_10.2.1-6_arm64.deb Error reading from server - read (104: Connection reset by peer) [IP: 146.75.122.132 80] ``` https://github.com/neondatabase/neon/actions/runs/11144974698/job/30973537767?pr=9186 thinking about if there should be a mirror-selector at the beginning of the dockerfile so that it uses a debian mirror closer to the build server? ## Summary of changes We could consider adding local mirror or proxy and keep it close to our self-hosted runners. For now lets just add retries for `apt`, `wget` and `curl` thanks to @skyzh for reporting that in October 2024, I just finally found time to take a look here :)	2025-01-29 21:02:54 +00:00
Peter Bendel	62819aca36	Add PostgreSQL version 17 benchmarks (#10536 ) ## Problem benchmarking.yml so far is only running benchmarks with PostgreSQL version 16. However neon recently changed the default for new customers to PostgreSQL version 17. See related [epic](https://github.com/neondatabase/cloud/issues/23295) ## Summary of changes We do not want to run every job step with both pg 16 and 17 because this would need excessive resources (runners, computes) and extend the benchmarking run wall clock time too much. So we select an opinionated subset of testcases that we also report in weekly reporting and add a postgres v17 job step. For re-use projects associated Neon projects have been created and connection strings have been added to neon database organization secrets. A follow up is to add the reporting for these new runs to some grafana dashboards.	2025-01-29 20:21:42 +00:00
Tristan Partin	707a926057	Remove unused compute_ctl HTTP routes (#10544 ) These are not used anywhere within the platform, so let's remove dead code. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 19:22:01 +00:00
Alex Chi Z.	5bcefb4ee1	fix(pageserver): compaction perftest wrt upper limit (#10564 ) ## Problem The config is added in https://github.com/neondatabase/neon/pull/10550 causing behavior change for l0 compaction. close https://github.com/neondatabase/neon/issues/10562 ## Summary of changes Fix the test case to consider the effect of upper_limit. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-29 18:43:39 +00:00
Alexey Kondratov	34322b2424	chore(compute): Simplify new compute_ctl metrics and fix flaky test (#10560 ) ## Problem 1. `d04d924` added separate metrics for total requests and failures separately, but it doesn't make much sense. We could just have a unified counter with `http_status`. 2. `test_compute_migrations_retry` had a race, i.e., it was waiting for the last successful migration, not an actual failure. This was revealed after adding an assert on failure metric in `d04d924`. ## Summary of changes 1. Switch to unified counters for `compute_ctl` requests. 2. Add a waiting loop into `test_compute_migrations_retry` to eliminate the race. Part of neondatabase/cloud#17590	2025-01-29 18:09:25 +00:00
Vlad Lazar	fdfbc7b358	pageserver: hold GC while reading from a timeline (#10559 ) ## Problem If we are GC-ing because a new image layer was added while traversing the timeline, then it will remove layers that are required for fulfilling the current get request (read-path cannot "look back" and notice the new image layer). ## Summary of Changes Prevent GC from progressing on the current timeline while it is being visited for a read. Epic: https://github.com/neondatabase/neon/issues/9376	2025-01-29 17:08:25 +00:00
Conrad Ludgate	190c19c034	chore: update rust-postgres on rebase (#10561 ) I tried a full update of our tokio-postgres fork before. We hit some breaking change. This PR only pulls in ~50% of the changes from upstream: https://github.com/neondatabase/rust-postgres/pull/38.	2025-01-29 17:02:07 +00:00
Mikhail Kot	34e560fe37	download exporters from releases rather than using docker images (#10551 ) Use releases for postgres-exporter, pgbouncer-exporter, and sql-exporter	2025-01-29 15:52:00 +00:00
Tristan Partin	7922458b98	Use num_cpus from the workspace in pageserver (#10545 ) Luckily they were the same version, so we didn't spend time compiling two versions, which could have been the case in the future. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 15:45:36 +00:00
a-masterov	34d9e2d8e3	Add a test for GrapgQL (#10156 ) ## Problem We currently don't run the tests shipped with `pg_graphql`. ## Summary of changes The tests for `pg_graphql` are added.	2025-01-29 15:01:56 +00:00
Conrad Ludgate	2f82c21c63	chore: update rust-postgres fork (#10557 ) I updated the fork to fix some lints. Cargo keeps getting confused by it so let's just update the lockfile here	2025-01-29 12:55:24 +00:00
Ivan Efremov	222cc181e9	impr(proxy): Move the CancelMap to Redis hashes (#10364 ) ## Problem The approach of having CancelMap as an in-memory structure increases code complexity, as well as putting additional load for Redis streams. ## Summary of changes - Implement a set of KV ops for Redis client; - Remove cancel notifications code; - Send KV ops over the bounded channel to the handling background task for removing and adding the cancel keys. Closes #9660	2025-01-29 11:19:10 +00:00
alexanderlaw	4d2328ebe3	Fix C code to satisfy sanitizers (#10473 )	2025-01-29 10:05:43 +00:00
a-masterov	9f81828429	Test extension upgrade compatibility (#10244 ) ## Problem We have to test the extensions, shipped with Neon for compatibility before the upgrade. ## Summary of changes Added the test for compatibility with the upgraded extensions.	2025-01-29 09:19:11 +00:00
Arseny Sher	9ab13d6e2c	Log statements in test_layer_map (#10554 ) ## Problem test_layer_map doesn't log statements and it is not clear how long they take. ## Summary of changes Do log them. ref https://github.com/neondatabase/neon/issues/10409	2025-01-29 09:16:00 +00:00
Alex Chi Z.	983e18e63e	feat(pageserver): add compaction_upper_limit config (#10550 ) ## Problem Follow-up of the incident, we should not use the same bound on lower/upper limit of compaction files. This patch adds an upper bound limit, which is set to 50 for now. ## Summary of changes Add `compaction_upper_limit`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-01-28 23:18:32 +00:00
Alex Chi Z.	b735df6ff0	fix(pageserver): make image layer generation atomic (#10516 ) ## Problem close https://github.com/neondatabase/neon/issues/8362 ## Summary of changes Use `BatchLayerWriter` to ensure we clean up image layers after failed compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-28 21:29:51 +00:00
Fedor Dikarev	68cf0ba439	run benchmark tests on small-metal runners (#10549 ) ## Problem Ref: https://github.com/neondatabase/cloud/issues/23314 We suspect some inconsistency in Benchmark tests runs could be due to different type of runners they are landed in. To have that aligned in both terms: failure rates and benchmark results, lets run them for now on `small-metal` servers and see the progress for the tests stability. ## Summary of changes	2025-01-28 21:26:38 +00:00
Alexey Kondratov	d04d924649	feat(compute): Add some basic compute_ctl metrics (#10504 ) ## Problem There are several parts of `compute_ctl` with a very low visibility of errors: 1. DB migrations that run async in the background after compute start. 2. Requests made to control plane (currently only `GetSpec`). 3. Requests made to the remote extensions server. ## Summary of changes Add new counters to quickly evaluate the amount of errors among the fleet. Part of neondatabase/cloud#17590	2025-01-28 19:24:07 +00:00
JC Grünhage	f5fdaa6dc6	feat(ci): generate basic release notes with links (#10511 ) ## Problem https://github.com/neondatabase/neon/pull/10448 removed release notes, because if their generation failed, the whole release was failing. People liked them though, and wanted some basic release notes as a fall-back instead of completely removing them. ## Summary of changes Include basic release notes that link to the release PR and to a diff to the previous release.	2025-01-28 19:13:39 +00:00
Vlad Lazar	c54cd9e76a	storcon: signal LSN wait to pageserver during live migration (#10452 ) ## Problem We've seen the ingest connection manager get stuck shortly after a migration. ## Summary of changes A speculative mitigation is to use the same mechanism as get page requests for kicking LSN ingest. The connection manager monitors LSN waits and queries the broker if no updates are received for the timeline. Closes https://github.com/neondatabase/neon/issues/10351	2025-01-28 17:33:07 +00:00
Erik Grinaker	1010b8add4	pageserver: add `l0_flush_wait_upload` setting (#10534 ) ## Problem We need a setting to disable the flush upload wait, to test L0 flush backpressure in staging. ## Summary of changes Add `l0_flush_wait_upload` setting.	2025-01-28 17:21:05 +00:00
Folke Behrens	ae4b2af299	fix(proxy): Use correct identifier for usage metrics upload (#10538 ) ## Problem The request data and usage metrics S3 requests use the same identifier shown in logs, causing confusion about what type of upload failed. ## Summary of changes Use the correct identifier for usage metrics uploads. neondatabase/cloud#23084	2025-01-28 17:08:17 +00:00
Tristan Partin	15fecb8474	Update axum to 0.8.1 (#10332 ) Only a few things that needed updating: - async_trait was removed - Message::Text takes a Utf8Bytes object instead of a String Signed-off-by: Tristan Partin <tristan@neon.tech> Co-authored-by: Conrad Ludgate <connor@neon.tech>	2025-01-28 15:32:59 +00:00
Erik Grinaker	47677ba578	pageserver: disable L0 backpressure by default (#10535 ) ## Problem We'll need further improvements to compaction before enabling L0 flush backpressure by default. See: https://neondb.slack.com/archives/C033RQ5SPDH/p1738066068960519?thread_ts=1737818888.474179&cid=C033RQ5SPDH. Touches #5415. ## Summary of changes Disable `l0_flush_delay_threshold` by default.	2025-01-28 14:51:30 +00:00
Arpad Müller	83b6bfa229	Re-download layer if its local and on-disk metadata diverge (#10529 ) In #10308, we noticed many warnings about the local layer having different sizes on-disk compared to the metadata. However, the layer downloader would never redownload layer files if the sizes or generation numbers change. This is obviously a bug, which we aim to fix with this PR. This change also moves the code deciding what to do about a layer to a dedicated function: before we handled the "routing" via control flow, but now it's become too complicated and it is nicer to have the different verdicts for a layer spelled out in a list/match.	2025-01-28 13:39:53 +00:00
Erik Grinaker	ed942b05f7	Revert "pageserver: revert flush backpressure" (#10402 )" (#10533 ) This reverts commit `9e55d79803`. We'll still need this until we can tune L0 flush backpressure and compaction. I'll add a setting to disable this separately.	2025-01-28 13:33:58 +00:00
Vlad Lazar	62a717a2ca	pageserver: use PS node id for SK appname (#10522 ) ## Problem This one is fairly embarrassing. Safekeeper node id was used in the pageserver application name when connecting to safekeepers. ## Summary of changes Use the right node id. Closes https://github.com/neondatabase/neon/issues/10461	2025-01-28 13:11:51 +00:00
Peter Bendel	c8fbbb9b65	Test ingest_benchmark with different stripe size and also PostgreSQL version 17 (#10510 ) We want to verify if pageserver stripe size has an impact on ingest performance. We want to verify if ingest performance has improved or regressed with postgres version 17. ## Summary of changes - Allow to create new project with different postgres versions - allow to pre-shard new project with different stripe sizes instead of relying on storage manager to shard_split the project once a threshold is exceeded Replaces https://github.com/neondatabase/neon/pull/10509 Test run https://github.com/neondatabase/neon/actions/runs/12986410381	2025-01-27 21:06:05 +00:00
John Spray	d73f4a6470	pageserver: retry wrapper on manifest upload (#10524 ) ## Problem On remote storage errors (e.g. I/O timeout) uploading tenant manifest, all of compaction could fail. This is a problem IRL because we shouldn't abort compaction on a single IO error, and in tests because it generates spurious failures. Related: https://github.com/orgs/neondatabase/projects/51/views/2?sliceBy%5Bvalue%5D=jcsp&pane=issue&itemId=93692919&issue=neondatabase%7Cneon%7C10389 ## Summary of changes - Use `backoff::retry` when uploading tenant manifest	2025-01-27 21:02:25 +00:00
Heikki Linnakangas	5477d7db93	fast_import: fixes for Postgres v17 (#10414 ) Now that the tests are run on v17, they're also run in debug mode, which is slow. Increase statement_timeout in the test to work around that.	2025-01-27 19:47:49 +00:00
Arpad Müller	eb9832d846	Remove PQ_LIB_DIR env var (#10526 ) We now don't need libpq any more for the build of the storage controller, as we use `diesel-async` since #10280. Therefore, we remove the env var that gave cargo/rustc the location for libpq. Follow-up of #10280	2025-01-27 19:38:18 +00:00
Christian Schwarz	3d36dfe533	fix: noisy `broker subscription failed` error during storage broker deploys (#10521 ) During broker deploys, pageservers log this noisy WARN en masse. I can trivially reproduce the WARN message in neon_local by SIGKILLing broker during e.g. `pgbench -i`. I don't understand why tonic is not detecting the error as `Code::Unavailable`. Until we find time to understand that / fix upstream, this PR adds the error message to the existing list of known error messages that get demoted to INFO level. Refs: - refs https://github.com/neondatabase/neon/issues/9562	2025-01-27 19:19:55 +00:00
John Spray	ebf44210ba	remote_storage: less sensitive timeout logging in ABS listings (#10518 ) ## Problem We were logging a warning after a single request timeout, while listing objects. Closes: https://github.com/neondatabase/neon/issues/10166 ## Summary of changes - These timeouts are a pretty normal part of life, so back it off to only log a warning after two in a row.	2025-01-27 17:44:18 +00:00
John Spray	aabf455dfb	README: clarify that neon_local is a dev/test tool (#10512 ) ## Problem From time to time, folks discover our `control_plane/` folder and make the (reasonable) mistake of thinking it's a tool for running full-sized Neon systems, whereas in reality it is a tool for dev/test. ## Summary of changes - Change control_plane's readme title to "Local Development Control Plane (`neon_local`)` - Change "Running local installation" to "Running a local development environment" in the main readme	2025-01-27 17:24:42 +00:00
John Spray	aec92bfc34	pageserver: decrease utilization MAX_SHARDS (#10489 ) ## Problem The intent of this parameter is to have pageservers consider themselves "full" if they've got lots of shards, even if they have plenty of capacity. It works, but because we typically successfully oversubscribe capacity up to 200%, the MAX_SHARDS limit is effectively doubled, so this 20,000 value ends up meaning 40,000, whereas the original intent was to limit nodes to ~10000 shards. ## Summary of changes - Change MAX_SHARDS to 5000, so that a node with 5000 will get a 100% utilization, which is equivalent in practice to being considered "half full" by the storage controller in capacity terms. This is all a bit subtle and indiret. Originally the limit was baked into the pageserver with the idea that the pageserver knows better what its own resources tolerate than the storage controller does, but in practice it would be probably be easier to understand all this if we just did it controller-side. So there's scope to refactor here in future.	2025-01-27 17:03:32 +00:00
Arpad Müller	b0b4b7dd8f	storcon: switch to diesel-async and tokio-postgres (#10280 ) Switches the storcon away from using diesel's synchronous APIs in favour of `diesel-async`. Advantages: * less C dependencies, especially no openssl, which might be behind the bug: https://github.com/neondatabase/cloud/issues/21010 * Better to only have async than mix of async plus `spawn_blocking` We had to turn off usage of the connection pool for migrations, as diesel migrations don't support async APIs. Thus we still use `spawn_blocking` in that one place. But this is explicitly done in one of the `diesel-async` examples.	2025-01-27 14:25:11 +00:00
Mikhail Kot	4dd4096f11	Pgbouncer exporter in compute image (#10503 ) https://github.com/neondatabase/cloud/issues/19081 Include pgbouncer_exporter in compute image and run it at port 9127	2025-01-27 14:09:21 +00:00
Erik Grinaker	be718ed121	pageserver: disable L0 flush stalls, tune delay threshold (#10507 ) ## Problem In ingest benchmarks, we see L0 compaction delays of over 10 minutes due to image compaction. We can't stall L0 flushes for that long. ## Summary of changes Disable L0 flush stalls, and bump the default L0 flush delay threshold from 20 to 30 L0 layers.	2025-01-25 16:51:54 +00:00
Christian Schwarz	e5b3eb1e64	Merge pull request #10500 from neondatabase/rc/release/2025-01-24 Storage release 2025-01-24	2025-01-25 00:54:56 +01:00
Konstantin Knizhnik	9f1408fdf3	Do not assign max(lsn) to maxLastWrittenLsn in SetLastWrittenLSNForblokv (#10474 ) ## Problem See https://github.com/neondatabase/neon/issues/10281 `SetLastWrittenLSNForBlockv` is assigning max(lsn) to `maxLastWrittenLsn` while its should contain only max LSN not present in LwLSN cache. It case unnecessary waits in PS. ## Summary of changes Restore status-quo for pg17. Related Postgres PR: https://github.com/neondatabase/postgres/pull/563 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-24 14:57:32 +00:00
Conrad Ludgate	7000aaaf75	chore: fix h2 stubgen (#10491 ) ## Problem ## Summary of changes --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-01-24 14:55:48 +00:00
Erik Grinaker	ef2a2555b1	pageserver: tighten compaction failure detection (#10502 ) ## Problem If compaction fails, we disable L0 flush stalls to avoid persistent stalls. However, the logic would unset the failure marker on offload failures or shutdown. This can lead to sudden L0 flush stalls if we try and fail to offload a timeline with compaction failures, or if there is some kind of shutdown race. Touches #10405. ## Summary of changes Don't touch the compaction failure marker on offload failures or shutdown.	2025-01-24 13:55:05 +00:00
Konstantin Knizhnik	d8ab6ddb0f	Check if relation has storage in calculate_relation_size (#10477 ) ## Problem Parent of partitioned table has no storage, it relfilelocator is zero. It cab be incorrectly hashed and produce wrong results. See https://github.com/neondatabase/postgres/pull/518 ## Summary of changes This problem is already addressed in pg17. Add the same check for all other PG versions. Postgres PRs: https://github.com/neondatabase/postgres/pull/566 https://github.com/neondatabase/postgres/pull/565 https://github.com/neondatabase/postgres/pull/564 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-24 12:43:52 +00:00
JC Grünhage	dcc437da1d	Make promote-images-prod depend on promote-images-dev (#10494 ) ## Problem After talking about it again with @bayandin again this should replace the changes from https://github.com/neondatabase/neon/pull/10475. While the previous changes worked, they are less visually clear in what happens, and we might end up in a situation where we update `latest`, but don't actually have the tagged image pushed that contains the same changes. The latter would result in potentially hard to debug situations. ## Summary of changes Revert `c283aaaf8d` and make promote-images-prod depend on promote-images-dev instead.	2025-01-24 11:03:39 +00:00
a-masterov	c286fea018	Print logs in extensions test in another step to improve readability (#10483 ) ## Problem The containers' log output is mixed with the tests' output, so you must scroll up to find the error. ## Summary of changes Printing of containers' logs moved to a separate step.	2025-01-24 10:44:48 +00:00
Vlad Lazar	de8276488d	tests: enable wal reader fanout in tests (#10301 ) Note: this has to merge after the release is cut on `2025-01-17` for compat tests to start passing. ## Problem SK wal reader fan-out is not enabled in tests by default. ## Summary of changes Enable it.	2025-01-24 10:34:57 +00:00
Erik Grinaker	ddb9ae1214	pageserver: add compaction backpressure for layer flushes (#10405 ) ## Problem There is no direct backpressure for compaction and L0 read amplification. This allows a large buildup of compaction debt and read amplification. Resolves #5415. Requires #10402. ## Summary of changes Delay layer flushes based on the number of level 0 delta layers: * `l0_flush_delay_threshold`: delay flushes such that they take 2x as long (default `2 * compaction_threshold`). * `l0_flush_stall_threshold`: stall flushes until level 0 delta layers drop below threshold (default `4 * compaction_threshold`). If either threshold is reached, ephemeral layer rolls also synchronously wait for layer flushes to propagate this backpressure up into WAL ingestion. This will bound the number of frozen layers to 1 once backpressure kicks in, since all other frozen layers must flush before the rolled layer. ## Analysis This will significantly change the compute backpressure characteristics. Recall the three compute backpressure knobs: * `max_replication_write_lag`: 500 MB (based on Pageserver `last_received_lsn`). * `max_replication_flush_lag`: 10 GB (based on Pageserver `disk_consistent_lsn`). * `max_replication_apply_lag`: disabled (based on Pageserver `remote_consistent_lsn`). Previously, the Pageserver would keep ingesting WAL and build up ephemeral layers and L0 layers until the compute hit `max_replication_flush_lag` at 10 GB and began backpressuring. Now, once we delay/stall WAL ingestion, the compute will begin backpressuring after `max_replication_write_lag`, i.e. 500 MB. This is probably a good thing (we're not building up a ton of compaction debt), but we should consider tuning these settings. `max_replication_flush_lag` probably doesn't serve a purpose anymore, and we should consider removing it. Furthermore, the removal of the upload barrier in #10402 will mean that we no longer backpressure flushes based on S3 uploads, since `max_replication_apply_lag` is disabled. We should consider enabling this as well. ### When and what do we compact? Default compaction settings: * `compaction_threshold`: 10 L0 delta layers. * `compaction_period`: 20 seconds (between each compaction loop check). * `checkpoint_distance`: 256 MB (size of L0 delta layers). * `l0_flush_delay_threshold`: 20 L0 delta layers. * `l0_flush_stall_threshold`: 40 L0 delta layers. Compaction characteristics: * Minimum compaction volume: 10 layers * 256 MB = 2.5 GB. * Additional compaction volume (assuming 128 MB/s WAL): 128 MB/s * 20 seconds = 2.5 GB (10 L0 layers). * Required compaction bandwidth: 5.0 GB / 20 seconds = 256 MB/s. ### When do we hit `max_replication_write_lag`? Depending on how fast compaction and flushes happens, the compute will backpressure somewhere between `l0_flush_delay_threshold` or `l0_flush_stall_threshold` + `max_replication_write_lag`. * Minimum compute backpressure lag: 20 layers * 256 MB + 500 MB = 5.6 GB * Maximum compute backpressure lag: 40 layers * 256 MB + 500 MB = 10.0 GB This seems like a reasonable range to me.	2025-01-24 09:47:28 +00:00
Erik Grinaker	9e55d79803	Reapply "pageserver: revert flush backpressure" (#10270 ) (#10402 ) This reapplies #10135. Just removing this flush backpressure without further mitigations caused read amp increases during bulk ingestion (predictably), so it was reverted. We will replace it by compaction-based backpressure. ## Problem In #8550, we made the flush loop wait for uploads after every layer. This was to avoid unbounded buildup of uploads, and to reduce compaction debt. However, the approach has several problems: * It prevents upload parallelism. * It prevents flush and upload pipelining. * It slows down ingestion even when there is no need to backpressure. * It does not directly backpressure based on compaction debt and read amplification. We will instead implement compaction-based backpressure in a PR immediately following this removal (#5415). Touches #5415. Touches #10095. ## Summary of changes Remove waiting on the upload queue in the flush loop.	2025-01-24 08:35:35 +00:00
github-actions[bot]	f35e1356a1	Storage release 2025-01-24	2025-01-24 06:02:13 +00:00
Alex Chi Z.	8d47a60de2	fix(pageserver): handle dup layers during gc-compaction (#10430 ) ## Problem If gc-compaction decides to rewrite an image layer, it will now cause index_part to lose reference to that layer. In details, * Assume there's only one image layer of key 0000...AAAA at LSN 0x100 and generation 0xA in the system. * gc-compaction kicks in at gc-horizon 0x100, and then produce 0000...AAAA at LSN 0x100 and generation 0xB. * It submits a compaction result update into the index part that unlinks 0000-AAAA-100-A and adds 0000-AAAA-100-B On the remote storage / local disk side, this is fine -- it unlinks things correctly and uploads the new file. However, the `index_part.json` itself doesn't record generations. The buggy procedure is as follows: 1. upload the new file 2. update the index part to remove the old file and add the new file 3. remove the new file Therefore, the correct update result process for gc-compaction should be as follows: * When modifying the layer map, delete the old one and upload the new one. * When updating the index, uploading the new one in the index without deleting the old one. ## Summary of changes * Modify `finish_gc_compaction` to correctly order insertions and deletions. * Update the way gc-compaction uploads the layer files. * Add new tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-23 21:54:44 +00:00
Alexey Kondratov	6166482589	feat(compute): Automatically create release PRs (#10495 ) We've finally transitioned to using a separate `release-compute` branch. Now, we can finally automatically create release PRs on Fri and release them during the following week. Part of neondatabase/cloud#11698	2025-01-23 20:47:20 +00:00
Arpad Müller	ca6d72ba2a	Increase reconciler timeout after shard split (#10490 ) Sometimes, especially when the host running the tests is overloaded, we can run into reconcile timeouts in `test_timeline_ancestor_detach_idempotent_success`, making the test flaky. By increasing the timeouts from 30 seconds to 120 seconds, we can address the flakiness. Fixes #10464	2025-01-23 16:43:04 +00:00
a-masterov	b6c0f66619	CI(autocomment): add the lfc state (#10121 ) ## Problem Currently, the report does not contain the LFC state of the failed tests. ## Summary of changes Added the LFC state to the link to the allure report. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-01-23 14:52:07 +00:00
Mikhail Kot	3702ec889f	Enable postgres_fdw (#10426 ) Update compute image to include postgres_fdw #3720	2025-01-23 13:22:31 +00:00
Anastasia Lubennikova	8e8df1b453	Disable logical replication subscribers (#10249 ) Drop logical replication subscribers before compute starts on a non-main branch. Add new compute_ctl spec flag: drop_subscriptions_before_start If it is set, drop all the subscriptions from the compute node before it starts. To avoid race on compute start, use new GUC neon.disable_logical_replication_subscribers to temporarily disable logical replication workers until we drop the subscriptions. Ensure that we drop subscriptions exactly once when endpoint starts on a new branch. It is essential, because otherwise, we may drop not only inherited, but newly created subscriptions. We cannot rely only on spec.drop_subscriptions_before_start flag, because if for some reason compute restarts inside VM, it will start again with the same spec and flag value. To handle this, we save the fact of the operation in the database in the neon.drop_subscriptions_done table. If the table does not exist, we assume that the operation was never performed, so we must do it. If table exists, we check if the operation was performed on the current timeline. fixes: https://github.com/neondatabase/neon/issues/8790	2025-01-23 11:02:15 +00:00
Alex Chi Z.	92d95b08cf	fix(pageserver): extend split job key range to the end (#10484 ) ## Problem Not really a bug fix, but hopefully can reproduce https://github.com/neondatabase/neon/issues/10482 more. If the layer map does not contain layers that end at exactly the end range of the compaction job, the current split algorithm will produce the last job that ends at the maximum layer key. This patch extends it all the way to the compaction job end key. For example, the user requests a compaction of 0000...FFFF. However, we only have a layer 0000..3000 in the layer map, and the split job will have a range of 0000..3000 instead of 0000..FFFF. This is not a correctness issue but it would be better to fix it so that we can get consistent job splits. ## Summary of changes Compaction job split will always cover the full specified key range. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-23 00:15:46 +00:00
Arpad Müller	0af40b5494	Only churn rows once in test_scrubber_physical_gc_ancestors (#10481 ) ## Problem PR #10457 was supposed to fix the flakiness of `test_scrubber_physical_gc_ancestors`, but instead it made it even more flaky. However, the original error causes disappeared, now to be replaced by key not found errors. See this for a longer explanation: https://github.com/neondatabase/neon/issues/10391#issuecomment-2608018967 ## Solution This does one churn rows after all compactions, and before we do any timeline gc's. That way, we remain more accessible at older lsn's.	2025-01-22 19:45:12 +00:00
Arpad Müller	c60b91369a	Expose safekeeper APIs for creation and deletion (#10478 ) Add APIs for timeline creation and deletion to the safekeeper client crate. Going to be used later in #10440. Split off from #10440. Part of https://github.com/neondatabase/neon/issues/9011	2025-01-22 18:52:16 +00:00
a-masterov	f1473dd438	Fix the connection error for extension tests (#10480 ) ## Problem The trust connection to the compute required for `pg_anon` was removed. However, the PGPASSWORD environment variable was not added to `docker-compose.yml`. This caused connection errors, which were interpreted as success due to errors in the bash script. ## Summary of changes The environment variable was added, and the logic in the bash script was fixed.	2025-01-22 16:34:57 +00:00
JC Grünhage	c283aaaf8d	Tag images from docker-hub in promote-images-prod (#10475 ) ## Problem https://github.com/neondatabase/neon/actions/runs/12896686483/job/35961290336#step:5:107 showed that `promote-images-prod` was missing another dependency. ## Summary of changes Modify `promote-images-prod` to tag based on docker-hub images, so that `promote-images-prod` does not rely on `promote-images-dev`. The result should be the exact same, but allows the two jobs to run in parallel.	2025-01-22 16:09:41 +00:00
Vlad Lazar	414ed82c1f	pageserver: issue concurrent IO on the read path (#9353 ) ## Refs - Epic: https://github.com/neondatabase/neon/issues/9378 Co-authored-by: Vlad Lazar <vlad@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> ## Problem The read path does its IOs sequentially. This means that if N values need to be read to reconstruct a page, we will do N IOs and getpage latency is `O(NIoLatency)`. ## Solution With this PR we gain the ability to issue IO concurrently within one layer visit and* to move on to the next layer without waiting for IOs from the previous visit to complete. This is an evolved version of the work done at the Lisbon hackathon, cf https://github.com/neondatabase/neon/pull/9002. ## Design ### `will_init` now sourced from disk btree index keys On the algorithmic level, the only change is that the `get_values_reconstruct_data` now sources `will_init` from the disk btree index key (which is PS-page_cache'd), instead of from the `Value`, which is only available after the IO completes. ### Concurrent IOs, Submission & Completion To separate IO submission from waiting for its completion, while simultaneously feature-gating the change, we introduce the notion of an `IoConcurrency` struct through which IO futures are "spawned". An IO is an opaque future, and waiting for completions is handled through `tokio::sync::oneshot` channels. The oneshot Receiver's take the place of the `img` and `records` fields inside `VectoredValueReconstructState`. When we're done visiting all the layers and submitting all the IOs along the way we concurrently `collect_pending_ios` for each value, which means for each value there is a future that awaits all the oneshot receivers and then calls into walredo to reconstruct the page image. Walredo is now invoked concurrently for each value instead of sequentially. Walredo itself remains unchanged. The spawned IO futures are driven to completion by a sidecar tokio task that is separate from the task that performs all the layer visiting and spawning of IOs. That tasks receives the IO futures via an unbounded mpsc channel and drives them to completion inside a `FuturedUnordered`. (The behavior from before this PR is available through `IoConcurrency::Sequential`, which awaits the IO futures in place, without "spawning" or "submitting" them anywhere.) #### Alternatives Explored A few words on the rationale behind having a sidecar task and what alternatives were considered. One option is to queue up all IO futures in a FuturesUnordered that is polled the first time when we `collect_pending_ios`. Firstly, the IO futures are opaque, compiler-generated futures that need to be polled at least once to submit their IO. "At least once" because tokio-epoll-uring may not be able to submit the IO to the kernel on first poll right away. Second, there are deadlocks if we don't drive the IO futures to completion independently of the spawning task. The reason is that both the IO futures and the spawning task may hold some _and_ try to acquire _more_ shared limited resources. For example, both spawning task and IO future may try to acquire * a VirtualFile file descriptor cache slot async mutex (observed during impl) * a tokio-epoll-uring submission slot (observed during impl) * a PageCache slot (currently this is not the case but we may move more code into the IO futures in the future) Another option is to spawn a short-lived `tokio::task` for each IO future. We implemented and benchmarked it during development, but found little throughput improvement and moderate mean & tail latency degradation. Concerns about pressure on the tokio scheduler made us discard this variant. The sidecar task could be obsoleted if the IOs were not arbitrary code but a well-defined struct. However, 1. the opaque futures approach taken in this PR allows leaving the existing code unchanged, which 2. allows us to implement the `IoConcurrency::Sequential` mode for feature-gating the change. Once the new mode sidecar task implementation is rolled out everywhere, and `::Sequential` removed, we can think about a descriptive submission & completion interface. The problems around deadlocks pointed out earlier will need to be solved then. For example, we could eliminate VirtualFile file descriptor cache and tokio-epoll-uring slots. The latter has been drafted in https://github.com/neondatabase/tokio-epoll-uring/pull/63. See the lengthy doc comment on `spawn_io()` for more details. ### Error handling There are two error classes during reconstruct data retrieval: * traversal errors: index lookup, move to next layer, and the like * value read IO errors A traversal error fails the entire get_vectored request, as before this PR. A value read error only fails that value. In any case, we preserve the existing behavior that once `get_vectored` returns, all IOs are done. Panics and failing to poll `get_vectored` to completion will leave the IOs dangling, which is safe but shouldn't happen, and so, a rate-limited log statement will be emitted at warning level. There is a doc comment on `collect_pending_ios` giving more code-level details and rationale. ### Feature Gating The new behavior is opt-in via pageserver config. The `Sequential` mode is the default. The only significant change in `Sequential` mode compared to before this PR is the buffering of results in the `oneshot`s. ## Code-Level Changes Prep work: * Make `GateGuard` clonable. Core Feature: * Traversal code: track `will_init` in `BlobMeta` and source it from the Delta/Image/InMemory layer index, instead of determining `will_init` after we've read the value. This avoids having to read the value to determine whether traversal can stop. * Introduce `IoConcurrency` & its sidecar task. * `IoConcurrency` is the clonable handle. * It connects to the sidecar task via an `mpsc`. * Plumb through `IoConcurrency` from high level code to the individual layer implementations' `get_values_reconstruct_data`. We piggy-back on the `ValuesReconstructState` for this. * The sidecar task should be long-lived, so, `IoConcurrency` needs to be rooted up "high" in the call stack. * Roots as of this PR: * `page_service`: outside of pagestream loop * `create_image_layers`: when it is called * `basebackup`(only auxfiles + replorigin + SLRU segments) * Code with no roots that uses `IoConcurrency::sequential` * any `Timeline::get` call * `collect_keyspace` is a good example * follow-up: https://github.com/neondatabase/neon/issues/10460 * `TimelineAdaptor` code used by the compaction simulator, unused in practive * `ingest_xlog_dbase_create` * Transform Delta/Image/InMemoryLayer to * do their values IO in a distinct `async {}` block * extend the residence of the Delta/Image layer until the IO is done * buffer their results in a `oneshot` channel instead of straight in `ValuesReconstructState` * the `oneshot` channel is wrapped in `OnDiskValueIo` / `OnDiskValueIoWaiter` types that aid in expressiveness and are used to keep track of in-flight IOs so we can print warnings if we leave them dangling. * Change `ValuesReconstructState` to hold the receiving end of the `oneshot` channel aka `OnDiskValueIoWaiter`. * Change `get_vectored_impl` to `collect_pending_ios` and issue walredo concurrently, in a `FuturesUnordered`. Testing / Benchmarking: * Support queue-depth in pagebench for manual benchmarkinng. * Add test suite support for setting concurrency mode ps config field via a) an env var and b) via NeonEnvBuilder. * Hacky helper to have sidecar-based IoConcurrency in tests. This will be cleaned up later. More benchmarking will happen post-merge in nightly benchmarks, plus in staging/pre-prod. Some intermediate helpers for manual benchmarking have been preserved in https://github.com/neondatabase/neon/pull/10466 and will be landed in later PRs. (L0 layer stack generator!) Drive-By: * test suite actually didn't enable batching by default because `config.compatibility_neon_binpath` is always Truthy in our CI environment => https://neondb.slack.com/archives/C059ZC138NR/p1737490501941309 * initial logical size calculation wasn't always polled to completion, which was surfaced through the added WARN logs emitted when dropping a `ValuesReconstructState` that still has inflight IOs. * remove the timing histograms `pageserver_getpage_get_reconstruct_data_seconds` and `pageserver_getpage_reconstruct_seconds` because with planning, value read IO, and walredo happening concurrently, one can no longer attribute latency to any one of them; we'll revisit this when Vlad's work on tracing/sampling through RequestContext lands. * remove code related to `get_cached_lsn()`. The logic around this has been dead at runtime for a long time, ever since the removal of the materialized page cache in #8105. ## Testing Unit tests use the sidecar task by default and run both modes in CI. Python regression tests and benchmarks also use the sidecar task by default. We'll test more in staging and possibly preprod. # Future Work Please refer to the parent epic for the full plan. The next step will be to fold the plumbing of IoConcurrency into RequestContext so that the function signatures get cleaned up. Once `Sequential` isn't used anymore, we can take the next big leap which is replacing the opaque IOs with structs that have well-defined semantics. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-01-22 15:30:23 +00:00
Alexey Kondratov	881e351f69	feat(compute): Allow installing both 0.8.0 and 0.7.4 pgvector (#10345 ) ## Problem Both these versions are binary compatible, but the way pgvector structures the SQL files forbids installing 0.7.4 if you have a 0.8.0 distribution. Yet, some users may need a previous version for backward compatibility, e.g., restoring the dump. See this thread for discussion https://neondb.slack.com/archives/C04DGM6SMTM/p1735911490242919?thread_ts=1731343604.259169&cid=C04DGM6SMTM ## Summary of changes Put `vector--0.7.4.sql` file into compute image to allow installing this version as well. Tested on staging and it seems to be working as expected: ```sql select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| (null) \| vector data type and ivfflat and hnsw access methods create extension vector version '0.7.4'; select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| 0.7.4 \| vector data type and ivfflat and hnsw access methods alter extension vector update; select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| 0.8.0 \| vector data type and ivfflat and hnsw access methods drop extension vector; create extension vector; select * from pg_available_extensions where name = 'vector'; name \| default_version \| installed_version \| comment --------+-----------------+-------------------+------------------------------------------------------ vector \| 0.8.0 \| 0.8.0 \| vector data type and ivfflat and hnsw access methods ``` If we find out it's a good approach, we can adopt the same for other extensions with a stable ABI -- support both `current` and `current - 1` releases.	2025-01-22 12:38:23 +00:00
Christian Schwarz	b31ce14083	initial logical size calculation: always poll to completion (#10471 ) # Refs - extracted from https://github.com/neondatabase/neon/pull/9353 # Problem Before this PR, when task_mgr shutdown is signalled, e.g. during pageserver shutdown or Tenant shutdown, initial logical size calculation stops polling and drops the future that represents the calculation. This is against the current policy that we poll all futures to completion. This became apparent during development of concurrent IO which warns if we drop a `Timeline::get_vectored` future that still has in-flight IOs. We may revise the policy in the future, but, right now initial logical size calculation is the only part of the codebase that doesn't adhere to the policy, so let's fix it. ## Code Changes - make sensitive exclusively to `Timeline::cancel` - This should be sufficient for all cases of shutdowns; the sensitivity to task_mgr shutdown is unnecessary. - this broke the various cancel tests in `test_timeline_size.py`, e.g., `test_timeline_initial_logical_size_calculation_cancellation` - the tests would time out because the await point was not sensitive to cancellation - to fix this, refactor `pausable_failpoint` so that it accepts a cancellation token - side note: we _really_ should write our own failpoint library; maybe after we get heap-allocated RequestContext, we can plumb failpoints through there.	2025-01-22 12:28:26 +00:00
Christian Schwarz	b4d87b9dfe	fix(tests): actually enable pipelinig by default in the test suite (#10472 ) ## Problem PR #9993 was supposed to enable `page_service_pipelining` by default for all `NeonEnv`s, but this was ineffective in our CI environment. Thus, CI Python-based tests and benchmarks, unless explicitly configuring pipelining, were still using serial protocol handling. ## Analysis The root cause was that in our CI environment, `config.compatibility_neon_binpath` is always Truthy. It's not in local environments, which is why this slipped through in local testing. Lesson: always add a log line ot pageserver startup and spot-check tests to ensure the intended default is picked up. ## Summary of changes Fix it. Since enough time has passed, the compatiblity snapshot contains a recent enough software version so we don't need to worry about `compatibility_neon_binpath` anymore. ## Future Work The question how to add a new default except for compatibliity tests, which is what the broken code was supposed to do, is still unsolved. Slack discussion: https://neondb.slack.com/archives/C059ZC138NR/p1737490501941309	2025-01-22 10:10:43 +00:00
Conrad Ludgate	2b49d6ee05	feat: adjust the tonic features to remove axum dependency (#10348 ) To help facilitate an upgrade to axum 0.8 (https://github.com/neondatabase/neon/pull/10332#pullrequestreview-2541989619) this massages the tonic dependency features so that tonic does not depend on axum.	2025-01-22 09:15:52 +00:00
Erik Grinaker	14e1f89053	pageserver: eagerly notify flush waiters (#10469 ) ## Problem Currently, the layer flush loop will continue flushing layers as long as any are pending, and only notify waiters once there are no further layers to flush. This can cause waiters to wait longer than necessary, and potentially starve them if pending layers keep arriving faster than they can be flushed. The impact of this will increase when we add compaction backpressure and propagate it up into the WAL receiver. Extracted from #10405. ## Summary of changes Break out of the layer flush loop once we've flushed up to the requested LSN. If further flush requests have arrived in the meanwhile, flushing will resume immediately after.	2025-01-21 22:01:27 +00:00
Erik Grinaker	8a8c656c06	pageserver: add `LayerMap::watch_layer0_deltas()` (#10470 ) ## Problem For compaction backpressure, we need a mechanism to signal when compaction has reduced the L0 delta layer count below the backpressure threshold. Extracted from #10405. ## Summary of changes Add `LayerMap::watch_level0_deltas()` which returns a `tokio::sync:⌚:Receiver` signalling the current L0 delta layer count.	2025-01-21 21:18:09 +00:00
Erik Grinaker	a75e11cc00	pageserver: return duration from `StorageTimeMetricsTimer` (#10468 ) ## Problem It's sometimes useful to obtain the elapsed duration from a `StorageTimeMetricsTimer` for purposes beyond just recording it in metrics (e.g. to log it). Extracted from #10405. ## Summary of changes Add `StorageTimeMetricsTimer.elapsed()` and return the duration from `stop_and_record()`.	2025-01-21 20:56:34 +00:00
Alex Chi Z.	7d4bfcdc47	feat(pageserver): add config items for gc-compaction auto trigger (#10455 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 The automatic trigger is already implemented at https://github.com/neondatabase/neon/pull/10221 but I need to write some tests and finish my experiments in staging before I can merge it with confidence. Given that I have some other patches that will modify the config items, I'd like to get the config items merged first to reduce conflicts. ## Summary of changes * add `l2_lsn` to index_part.json -- below that LSN, data have been processed by gc-compaction * add a set of gc-compaction auto trigger control items into the config --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-21 19:29:38 +00:00
a-masterov	737888e5c9	Remove the tests for `pg_anon` (#10382 ) ## Problem We are removing the `pg_anon` v1 extension from Neon. So we don't need to test it anymore and can remove the code for simplicity. ## Summary of changes The code required for testing `pg_anon` is removed.	2025-01-21 19:17:14 +00:00
Gleb Novikov	19bf7b78a0	fast import: basic python test (#10271 ) We did not have any tests on fast_import binary yet. In this PR I have introduced: - `FastImport` class and tools for testing in python - basic test that runs fast import against vanilla postgres and checks that data is there Should be merged after https://github.com/neondatabase/neon/pull/10251	2025-01-21 16:50:44 +00:00
Arpad Müller	7e4a39ea53	Fix two flakiness sources in test_scrubber_physical_gc_ancestors (#10457 ) We currently have some flakiness in `test_scrubber_physical_gc_ancestors`, see #10391. The first flakiness kind is about the reconciler not actually becoming idle within the timeout of 30 seconds. We see continuous forward progress so this is likely not a hang. We also see this happen in parallel to a test failure, so is likely due to runners being overloaded. Therefore, we increase the timeout. The second flakiness kind is an assertion failure. This one is a little bit more tricky, but we saw in the successful run that there was some advance of the lsn between the compaction ran (which created layer files) and the gc run. Apparently gc rejects reductions to the single image layer setting if the cutoff lsn is the same as the lsn of the image layer: it will claim that that layer is newer than the space cutoff and therefore skip it, while thinking the old layer (that we want to delete) is the latest one (so it's not deleted). We address the second flakiness kind by inserting a tiny amount of WAL between the compaction and gc. This should hopefully fix things. Related issue: #10391 (not closing it with the merger of the PR as we'll need to validate that these changes had the intended effect). Thanks to Chi for going over this together with me in a call.	2025-01-21 15:40:04 +00:00
JC Grünhage	624a507544	Create Github releases with empty body for now (#10448 ) ## Problem When releasing `release-7574`, the Github Release creation failed with "body is too long" (see https://github.com/neondatabase/neon/actions/runs/12834025431/job/35792346745#step:5:77). There's lots of room for improvement of the release notes, but for now we'll disable them instead. ## Summary of changes - Disable automatic generation of release notes for Github releases - Enable creation of Github releases for proxy/compute	2025-01-21 12:45:21 +00:00
Arpad Müller	2ab9f69825	Simplify pageserver_physical_gc function (#10104 ) This simplifies the code in `pageserver_physical_gc` a little bit after the feedback in #10007 that the code is too complicated. Most importantly, we don't pass around `GcSummary` any more in a complicated fashion, and we save on async stream-combinator-inception in one place in favour of `try_stream!{}`. Follow-up of #10007	2025-01-20 21:57:15 +00:00
Alex Chi Z.	2de2b26c62	feat(pageserver): add reldir migration configs (#10439 ) ## Problem Part of #9516 per RFC at https://github.com/neondatabase/neon/pull/10412 ## Summary of changes Adding the necessary config items and index_part items for the large relation count work. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-20 20:44:12 +00:00
Matthias van de Meent	e781cf6dd8	Compute/LFC: Apply limits consistently (#10449 ) Otherwise we might hit ERRORs in otherwise safe situations (such as user queries), which isn't a great user experience. ## Problem https://github.com/neondatabase/neon/pull/10376 ## Summary of changes Instead of accepting internal errors as acceptable, we ensure we don't exceed our allocated usage.	2025-01-20 18:29:21 +00:00
Christian Schwarz	72130d7d6c	fix(page_service / handle): panic when parallel client disconnect & Timeline shutdown (#10445 ) ## Refs - fixes https://github.com/neondatabase/neon/issues/10444 ## Problem We're seeing a panic `handles are only shut down once in their lifetime` in our performance testbed. ## Hypothesis Annotated code in https://github.com/neondatabase/neon/issues/10444#issuecomment-2602286415. ``` T1: drop Cache, executes up to (1) => HandleInner is now in state ShutDown T2: Timeline::shutdown => PerTimelineState::shutdown executes shutdown() again => panics ``` Likely this snuck in the final touches of #10386 where I narrowed down the locking rules. ## Summary of changes Make duplicate shutdowns a no-op.	2025-01-20 17:51:30 +00:00
John Spray	2657b7ec75	rfcs: add sharded ingest RFC (#8754 ) ## Summary Whereas currently we send all WAL to all pageserver shards, and each shard filters out the data that it needs, in this RFC we add a mechanism to filter the WAL on the safekeeper, so that each shard receives only the data it needs. This will place some extra CPU load on the safekeepers, in exchange for reducing the network bandwidth for ingesting WAL back to scaling as O(1) with shard count, rather than O(N_shards). Touches #9329. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Vlad Lazar <vlalazar.vlad@gmail.com> Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-01-20 17:33:07 +00:00
Christian Schwarz	4dec0dddc6	Merge pull request #10447 from neondatabase/releases/2025-01-20-hotfix Release: storage hotfix 2025-01-20	2025-01-20 15:55:44 +01:00
Christian Schwarz	02fc58b878	impr(timeline handles): add more tests covering reference cyle (#10446 ) The other test focus on the external interface usage while the tests added in this PR add some testing around HandleInner's lifecycle, ensuring we don't leak it once either connection gets dropped or per-timeline-state is shut down explicitly.	2025-01-20 14:37:24 +00:00
Christian Schwarz	e0c504af38	fix(page_service / handle): panic when parallel client disconnect & Timeline shutdown Refs - fixes https://github.com/neondatabase/neon/issues/10444	2025-01-20 14:37:16 +01:00
Arpad Müller	b312a3c320	Move DeleteTimelineFlow::prepare to separate function and use enum (#10334 ) It was requested by review in #10305 to use an enum or something like it for distinguishing the different modes instead of two parameters, because two flags allow four combinations, and two of them don't really make sense/ aren't used. follow-up of #10305	2025-01-20 12:50:44 +00:00
John Spray	7d761a9d22	storage controller: make chaos less disruptive to AZ locality (#10438 ) ## Problem Since #9916 , the chaos code is actively fighting the optimizer: tenants tend to be attached in their preferred AZ, so most chaos migrations were moving them to a non-preferred AZ. ## Summary of changes - When picking migrations, prefer to migrate things _toward_ their preferred AZ when possible. Then pick shards to move the other way when necessary. The resulting behavior should be an alternating "back and forth" where the chaos code migrates thiings away from home, and then migrates them back on the next iteration. The side effect will be that the chaos code actively helps to push things into their home AZ. That's not contrary to its purpose though: we mainly just want it to continuously migrate things to exercise migration+notification code.	2025-01-20 09:47:23 +00:00
John Spray	8bdaee35f3	pageserver: safety checks on validity of uploaded indices (#10403 ) ## Problem Occasionally, we encounter bugs in test environments that can be detected at the point of uploading an index, but we proceed to upload it anyway and leave a tenant in a broken state that's awkward to handle. ## Summary of changes - Validate index when submitting it for upload, so that we can see the issue quickly e.g. in an API invoking compaction - Validate index before executing the upload, so that we have a hard enforcement that any code path that tries to upload an index will not overwrite a valid index with an invalid one.	2025-01-20 09:20:31 +00:00
Arpad Müller	b0f34099f9	Add safekeeper utilization endpoint (#10429 ) Add an endpoint to obtain the utilization of a safekeeper. Future changes to the storage controller can use this endpoint to find the most suitable safekeepers for newly created timelines, analogously to how it's done for pageservers already. Initially we just want to assign by timeline count, then we can iterate from there. Part of https://github.com/neondatabase/neon/issues/9011	2025-01-17 21:43:52 +00:00
Alex Chi Z.	3399eea2ed	Merge pull request #10436 from neondatabase/rc/release/2025-01-17 Storage release 2025-01-17	2025-01-17 12:36:17 -05:00
Alex Chi Z	6a29c809d5	Merge branch 'release' of https://github.com/neondatabase/neon into rc/release/2025-01-17	2025-01-17 10:44:25 -05:00
Vlad Lazar	6975228a76	pageserver: add initdb metrics (#10434 ) ## Problem Initdb observability is poor. ## Summary of changes Add some metrics so we can figure out which part, if any, is slow. Closes https://github.com/neondatabase/neon/issues/10423	2025-01-17 14:51:33 +00:00
JC Grünhage	053abff71f	Fix dependency on neon-image in promote-images-dev (#10437 ) ## Problem `871e8b325f` failed CI on main because a job ran to soon. This was caused by `ea84ec357f`. While `promote-images-dev` does not inherently need `neon-image`, a few jobs depending on `promote-images-dev` do need it, and previously had it when it was `promote-images`, which depended on `test-images`, which in turn depended on `neon-image`. ## Summary of changes To ensure jobs depending `docker.io/neondatabase/neon` images get them, `promote-images-dev` gets the dependency to `neon-image` back which it previously had transitively through `test-images`.	2025-01-17 14:21:30 +00:00
github-actions[bot]	a62c01df4c	Storage release 2025-01-17	2025-01-17 06:02:11 +00:00
Tristan Partin	871e8b325f	Use the request ID given by the control plane in compute_ctl (#10418 ) Instead of generating our own request ID, we can just use the one provided by the control plane. In the event, we get a request from a client which doesn't set X-Request-ID, then we just generate one which is useful for tracing purposes. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-16 20:46:53 +00:00
Christian Schwarz	c47c5f4ace	fix(page_service pipelining): tenant cannot shut down because gate kept open while flushing responses (#10386 ) # Refs - fixes https://github.com/neondatabase/neon/issues/10309 - fixup of batching design, first introduced in https://github.com/neondatabase/neon/pull/9851 - refinement of https://github.com/neondatabase/neon/pull/8339 # Problem `Tenant::shutdown` was occasionally taking many minutes (sometimes up to 20) in staging and prod if the `page_service_pipelining.mode="concurrent-futures"` is enabled. # Symptoms The issue happens during shard migration between pageservers. There is page_service unavailability and hence effectively downtime for customers in the following case: 1. The source (state `AttachedStale`) gets stuck in `Tenant::shutdown`, waiting for the gate to close. 2. Cplane/Storcon decides to transition the target `AttachedMulti` to `AttachedSingle`. 3. That transition comes with a bump of the generation number, causing the `PUT .../location_config` endpoint to do a full `Tenant::shutdown` / `Tenant::attach` cycle for the target location. 4. That `Tenant::shutdown` on the target gets stuck, waiting for the gate to close. 5. Eventually the gate closes (`close completed`), correlating with a `page_service` connection handler logging that it's exiting because of a network error (`Connection reset by peer` or `Broken pipe`). While in (4): - `Tenant::shutdown` is stuck waiting for all `Timeline::shutdown` calls to complete. So, really, this is a `Timeline::shutdown` bug. - retries from Cplane/Storcon to complete above state transitions, fail with errors related to the tenant mgr slot being in state `TenantSlot::InProgress`, the tenant state being `TenantState::Stopping`, and the timelines being in `TimelineState::Stopping`, and the `Timeline::cancel` being cancelled. - Existing (and/or new?) page_service connections log errors `error reading relation or page version: Not found: Timed out waiting 30s for tenant active state. Latest state: None` # Root-Cause After a lengthy investigation ([internal write-up](https://www.notion.so/neondatabase/2025-01-09-batching-deadlock-Slow-Log-Analysis-in-Staging-176f189e00478050bc21c1a072157ca4?pvs=4)) I arrived at the following root cause. The `spsc_fold` channel (`batch_tx`/`batch_rx`) that connects the Batcher and Executor stages of the pipelined mode was storing a `Handle` and thus `GateGuard` of the Timeline that was not shutting down. The design assumption with pipelining was that this would always be a short transient state. However, that was incorrect: the Executor was stuck on writing/flushing an earlier response into the connection to the client, i.e., socket write being slow because of TCP backpressure. The probable scenario of how we end up in that case: 1. Compute backend process sends a continuous stream of getpage prefetch requests into the connection, but never reads the responses (why this happens: see Appendix section). 2. Batch N is processed by Batcher and Executor, up to the point where Executor starts flushing the response. 3. Batch N+1 is procssed by Batcher and queued in the `spsc_fold`. 4. Executor is still waiting for batch N flush to finish. 5. Batcher eventually hits the `TimeoutReader` error (10min). From here on it waits on the `spsc_fold.send(Err(QueryError(TimeoutReader_error)))` which will never finish because the batch already inside the `spsc_fold` is not being read by the Executor, because the Executor is still stuck in the flush. (This state is not observable at our default `info` log level) 6. Eventually, Compute backend process is killed (`close()` on the socket) or Compute as a whole gets killed (probably no clean TCP shutdown happening in that case). 7. Eventually, Pageserver TCP stack learns about (6) through RST packets and the Executor's flush() call fails with an error. 8. The Executor exits, dropping `cancel_batcher` and its end of the spsc_fold. This wakes Batcher, causing the `spsc_fold.send` to fail. Batcher exits. The pipeline shuts down as intended. We return from `process_query` and log the `Connection reset by peer` or `Broken pipe` error. The following diagram visualizes the wait-for graph at (5) ```mermaid flowchart TD Batcher --spsc_fold.send(TimeoutReader_error)--> Executor Executor --flush batch N responses--> socket.write_end socket.write_end --wait for TCP window to move forward--> Compute ``` # Analysis By holding the GateGuard inside the `spsc_fold` open, the pipelining implementation violated the principle established in (https://github.com/neondatabase/neon/pull/8339). That is, that `Handle`s must only be held across an await point if that await point is sensitive to the `<Handle as Deref<Target=Timeline>>::cancel` token. In this case, we were holding the Handle inside the `spsc_fold` while awaiting the `pgb_writer.flush()` future. One may jump to the conclusion that we should simply peek into the spsc_fold to get that Timeline cancel token and be sensitive to it during flush, then. But that violates another principle of the design from https://github.com/neondatabase/neon/pull/8339. That is, that the page_service connection lifecycle and the Timeline lifecycles must be completely decoupled. Tt must be possible to shut down one shard without shutting down the page_service connection, because on that single connection we might be serving other shards attached to this pageserver. (The current compute client opens separate connections per shard, but, there are plans to change that.) # Solution This PR adds a `handle::WeakHandle` struct that does _not_ hold the timeline gate open. It must be `upgrade()`d to get a `handle::Handle`. That `handle::Handle` _does_ hold the timeline gate open. The batch queued inside the `spsc_fold` only holds a `WeakHandle`. We only upgrade it while calling into the various `handle_` methods, i.e., while interacting with the `Timeline` via `<Handle as Deref<Target=Timeline>>`. All that code has always been required to be (and is!) sensitive to `Timeline::cancel`, and therefore we're guaranteed to bail from it quickly when `Timeline::shutdown` starts. We will drop the `Handle` immediately, before we start `pgb_writer.flush()`ing the responses. Thereby letting go of our hold on the `GateGuard`, allowing the timeline shutdown to complete while the page_service handler remains intact. # Code Changes * Reproducer & Regression Test * Developed and proven to reproduce the issue in https://github.com/neondatabase/neon/pull/10399 * Add a `Test` message to the pagestream protocol (`cfg(feature = "testing")`). * Drive-by minimal improvement to the parsing code, we now have a `PagestreamFeMessageTag`. * Refactor `pageserver/client` to allow sending and receiving `page_service` requests independently. * Add a Rust helper binary to produce situation (4) from above * Rationale: (4) and (5) are the same bug class, we're holding a gate open while `flush()`ing. * Add a Python regression test that uses the helper binary to demonstrate the problem. * Fix * Introduce and use `WeakHandle` as explained earlier. * Replace the `shut_down` atomic with two enum states for `HandleInner`, wrapped in a `Mutex`. * To make `WeakHandle::upgrade()` and `Handle::downgrade()` cache-efficient: * Wrap the `Types::Timeline` in an `Arc` * Wrap the `GateGuard` in an `Arc` * The separate `Arc`s enable uncontended cloning of the timeline reference in `upgrade()` and `downgrade()`. If instead we were `Arc<Timeline>::clone`, different connection handlers would be hitting the same cache line on every upgrade()/downgrade(), causing contention. * Please read the udpated module-level comment in `mod handle` module-level comment for details. # Testing & Performance The reproducer test that failed before the changes now passes, and obviously other tests are passing as well. We'll do more testing in staging, where the issue happens every ~4h if chaos migrations are enabled in storcon. Existing perf testing will be sufficient, no perf degradation is expected. It's a few more alloctations due to the added Arc's, but, they're low frequency. # Appendix: Why Compute Sometimes Doesn't Read Responses Remember, the whole problem surfaced because flush() was slow because Compute was not reading responses. Why is that? In short, the way the compute works, it only advances the page_service protocol processing when it has an interest in data, i.e., when the pagestore smgr is called to return pages. Thus, if compute issues a bunch of requests as part of prefetch but then it turns out it can service the query without reading those pages, it may very well happen that these messages stay in the TCP until the next smgr read happens, either in that session, or possibly in another session. If there’s too many unread responses in the TCP, the pageserver kernel is going to backpressure into userspace, resulting in our stuck flush(). All of this stems from the way vanilla Postgres does prefetching and "async IO": it issues `fadvise()` to make the kernel do the IO in the background, buffering results in the kernel page cache. It then consumes the results through synchronous `read()` system calls, which hopefully will be fast because of the `fadvise()`. If it turns out that some / all of the prefetch results are not needed, Postgres will not be issuing those `read()` system calls. The kernel will eventually react to that by reusing page cache pages that hold completed prefetched data. Uncompleted prefetch requests may or may not be processed -- it's up to the kernel. In Neon, the smgr + Pageserver together take on the role of the kernel in above paragraphs. In the current implementation, all prefetches are sent as GetPage requests to Pageserver. The responses are only processed in the places where vanilla Postgres would do the synchronous `read()` system call. If we never get to that, the responses are queued inside the TCP connection, which, once buffers run full, will backpressure into Pageserver's sending code, i.e., the `pgb_writer.flush()` that was the root cause of the problems we're fixing in this PR.	2025-01-16 20:34:02 +00:00
Tristan Partin	b0838a68e5	Enable pgx_ulid on Postgres 17 (#10397 ) The extension now supports Postgres 17. The release also seems to be binary compatible with the previous version. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-16 19:49:04 +00:00
John Spray	8f2ebc0684	tests: stabilize test_storage_controller_node_deletion (#10420 ) ## Problem `test_storage_controller_node_deletion` sometimes failed because shards were moving around during timeline creation, and neon_local isn't tolerant of that. The movements were unexpected because the shards had only just been created. This was a regression from #9916 Closes: #10383 ## Summary of changes - Make this test use multiple AZs -- this makes the storage controller's scheduling reliably stable Why this works: in #9916 , I made a simplifying assumption that we would have multiple AZs to get nice stable scheduling -- it's much easier, because each tenant has a well defined primary+secondary location when they have an AZ preference and nodes have different AZs. Everything still works if you don't have multiple AZs, but you just have this quirk that sometimes the optimizer can disagree with initial scheduling, so once in a while a shard moves after being created -- annoying for tests, harmless IRL.	2025-01-16 19:00:16 +00:00
Vlad Lazar	3a285a046b	pageserver: include node id when subscribing to SK (#10432 ) ## Problem All pageserver have the same application name which makes it hard to distinguish them. ## Summary of changes Include the node id in the application name sent to the safekeeper. This should gives us more visibility in logs. There's a few metrics that will increase in cardinality by `pageserver_count`, but that's fine.	2025-01-16 18:51:56 +00:00
John Spray	da13154791	storcon: revise fill logic to prioritize AZ (#10411 ) ## Problem Node fills were limited to moving (total shards / node_count) shards. In systems that aren't perfectly balanced already, that leads us to skip migrating some of the shards that belong on this node, generating work for the optimizer later to gradually move them back. ## Summary of changes - Where a shard has a preferred AZ and is currently attached outside this AZ, then always promote it during fill, irrespective of target fill count	2025-01-16 17:33:46 +00:00
John Spray	2e13a3aa7a	storage controller: handle legacy TenantConf in consistency_check (#10422 ) ## Problem We were comparing serialized configs from the database with serialized configs from memory. If fields have been added/removed to TenantConfig, this generates spurious consistency errors. This is fine in test environments, but limits the usefulness of this debug API in the field. Closes: https://github.com/neondatabase/neon/issues/10369 ## Summary of changes - Do a decode/encode cycle on the config before comparing it, so that it will have exactly the expected fields.	2025-01-16 16:56:44 +00:00
Alex Chi Z.	cccc196848	refactor(pageserver): make partitioning an ArcSwap (#10377 ) ## Problem gc-compaction needs the partitioning data to decide the job split. This refactor allows concurrent access/computing the partitioning. ## Summary of changes Make `partitioning` an ArcSwap so that others can access the partitioning while we compute it. Fully eliminate the `repartition is called concurrently` warning when gc-compaction is going on. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-16 15:33:37 +00:00
Arpad Müller	e436dcad57	Rename "disabled" safekeeper scheduling policy to "pause" (#10410 ) Rename the safekeeper scheduling policy "disabled" to "pause". A rename was requested in https://github.com/neondatabase/neon/pull/10400#discussion_r1916259124, as the "disabled" policy is meant to be analogous to the "pause" policy for pageservers. Also simplify the `SkSchedulingPolicyArg::from_str` function, relying on the `from_str` implementation of `SkSchedulingPolicy`. Latter is used for the database format as well, so it is quite stable. If we ever want to change the UI, we'll need to duplicate the function again but this is cheap.	2025-01-16 14:30:49 +00:00
John Spray	21d7b6a258	tests: refactor test_tenant_delete_races_timeline_creation (#10425 ) ## Problem Threads spawned in `test_tenant_delete_races_timeline_creation` are not joined before the test ends, and can generate `PytestUnhandledThreadExceptionWarning` in other tests. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10419/12805365523/index.html#/testresult/53a72568acd04dbd ## Summary of changes - Wrap threads in ThreadPoolExecutor which will join them before the test ends - Remove a spurious deletion call -- the background thread doing deletion ought to succeed.	2025-01-16 14:11:33 +00:00
JC Grünhage	86dbc44db1	CI: Run check-codestyle-rust as part of pre-merge-checks (#10387 ) ## Problem When multiple changes are grouped in a merge group to be merged as part of the merge queue, the changes might individually pass `check-codestyle-rust` but not in their combined form. ## Summary of changes - Move `check-codestyle-rust` into a reusable workflow that is called from it's previous location in `build_and_test.yml`, and additionally call it from `pre_merge_checks.yml`. The additional call does not run on ARM, only x86, to ensure the merge queue continues being responsive. - Trigger `pre_merge_checks.yml` on PRs that change any of the workflows running in `pre_merge_checks.yml`, so that we get feedback on those early an not only after trying to merge those changes.	2025-01-16 09:20:24 +00:00
Tristan Partin	58f6af6c9a	Clean up compute_ctl extension server code (#10417 )	2025-01-16 08:35:36 +00:00
Matthias van de Meent	7be971081a	Make sure we request pages with a known-flushed LSN. (#10413 ) This should fix the largest source of flakyness of test_nbtree_pagesplit_cycleid. ## Problem https://github.com/neondatabase/neon/issues/10390 ## Summary of changes By using a guaranteed-flushed LSN, we ensure that PS won't have to wait forever. (If it does wait forever, we know the issue can't be with Compute's WAL)	2025-01-16 08:34:11 +00:00
Arseny Sher	6fe4c6798f	Add START_WAL_PUSH proto_version and allow_timeline_creation options. (#10406 ) ## Problem As part of https://github.com/neondatabase/neon/issues/8614 we need to pass options to START_WAL_PUSH. ## Summary of changes Add two options. `allow_timeline_creation`, default true, disables implicit timeline creation in the connection from compute. Eventually such creation will be forbidden completely, but as we migrate to configurations we need to support both: current mode and configurations enabled where creation by compute is disabled. `proto_version` specifies compute <-> sk protocol version. We have it currently in the first greeting package also, but I plan to change tag size from u64 to u8, which would make it hard to use. Command is more appropriate place for it anyway.	2025-01-16 08:01:19 +00:00
Matthias van de Meent	2eda484ef6	prefetch: Read more frequently from TCP buffer (#10394 ) This reduces pressure on the OS TCP read buffer by increasing the moments we read data out of the receive buffer, and increasing the number of bytes we can pull from that buffer when we do reads. ## Problem A backend may not always consume its prefetch data quick enough ## Summary of changes We add a new function `prefetch_pump_state` which pulls as many prefetch requests from the OS TCP receive buffer as possible, but without blocking. This thus reduces pressure on OS-level TCP buffers, thus increasing throughput by limiting throttling caused by full TCP buffers.	2025-01-16 02:43:47 +00:00
Mikhail Kot	c7429af8a0	Enable dblink (#10358 ) Update compute image to include dblink #3720	2025-01-15 22:29:18 +00:00
Alex Chi Z.	a753349cb0	feat(pageserver): validate data integrity during gc-compaction (#10131 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 part of investigation of https://github.com/neondatabase/neon/issues/10049 ## Summary of changes * If `cfg!(test) or cfg!(feature = testing)`, then we will always try generating an image to ensure the history is replayable, but not put the image layer into the final layer results, therefore discovering wrong key history before we hit a read error. * I suspect it's easier to trigger some races if gc-compaction is continuously run on a timeline, so I increased the frequency to twice per 10 churns. * Also, create branches in gc-compaction smoke tests to get more test coverage. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad@neon.tech>	2025-01-15 22:04:06 +00:00
Gleb Novikov	55a68b28a2	fast import: restore to neondb (not postgres) database (#10251 ) ## Problem `postgres` is system database at neon, so we need to do `pg_restore` into `neondb` instead https://github.com/neondatabase/cloud/issues/22100 ## Summary of changes Changed fast_import a little bit: 1. After succesfull connection creating `neondb` in postgres instance 2. Changed restore connstring to use new db 3. Added optional `source_connection_string`, which allows to skip `s3_prefix` and just connect directly. 4. Added `-i` that stops process until sigterm ## TODO - [x] test image in cplane e2e - [ ] Change import job image back to latest after this merged (partial revert of https://github.com/neondatabase/cloud/pull/22338)	2025-01-15 20:51:09 +00:00
John Spray	fb0e2acb2f	pageserver: add `page_trace` API for debugging (#10293 ) ## Problem When a pageserver is receiving high rates of requests, we don't have a good way to efficiently discover what the client's access pattern is. Closes: https://github.com/neondatabase/neon/issues/10275 ## Summary of changes - Add `/v1/tenant/x/timeline/y/page_trace?size_limit_bytes=...&time_limit_secs=...` API, which returns a binary buffer. - Add `pagectl page-trace` tool to decode and analyze the output. --------- Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-01-15 19:07:22 +00:00
Arpad Müller	efaec6cdf8	Add endpoint and storcon cli cmd to set sk scheduling policy (#10400 ) Implementing the last missing endpoint of #9981, this adds support to set the scheduling policy of an individual safekeeper, as specified in the RFC. However, unlike in the RFC we call the endpoint `scheduling_policy` not `status` Closes #9981. As for why not use the upsert endpoint for this: we want to have the safekeeper upsert endpoint be used for testing and for deploying new safekeepers, but not for changes of the scheduling policy. We don't want to change any of the other fields when marking a safekeeper as decommissioned for example, so we'd have to first fetch them only to then specify them again. Of course one can also design an endpoint where one can omit any field and it doesn't get modified, but it's still not great for observability to put everything into one big "change something about this safekeeper" endpoint.	2025-01-15 18:15:30 +00:00
Tristan Partin	3d41069dc4	Update pgrx in extension builds to 0.12.9 (#10372 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-15 16:26:58 +00:00
Vlad Lazar	dbebede7bf	safekeeper: fan out from single wal reader to multiple shards (#10190 ) ## Problem Safekeepers currently decode and interpret WAL for each shard separately. This is wasteful in terms of CPU memory usage - we've seen this in profiles. ## Summary of changes Fan-out interpreted WAL to multiple shards. The basic is that wal decoding and interpretation happens in a separate tokio task and senders attach to it. Senders only receive batches concerning their shard and only past the Lsn they've last seen. Fan-out is gated behind the `wal_reader_fanout` safekeeper flag (disabled by default for now). When fan-out is enabled, it might be desirable to control the absolute delta between the current position and a new shard's desired position (i.e. how far behind or ahead a shard may be). `max_delta_for_fanout` is a new optional safekeeper flag which dictates whether to create a new WAL reader or attach to the existing one. By default, this behaviour is disabled. Let's consider enabling it if we spot the need for it in the field. ## Testing Tests passed [here](https://github.com/neondatabase/neon/pull/10301) with wal reader fanout enabled as of `34f6a71718`. Related: https://github.com/neondatabase/neon/issues/9337 Epic: https://github.com/neondatabase/neon/issues/9329	2025-01-15 15:33:54 +00:00
Tristan Partin	3e529f124f	Remove leading slashes when downloading remote files (#10396 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-15 15:29:52 +00:00
Arseny Sher	05a71c7d6a	safekeeper: add membership configuration switch endpoint (#10241 ) ## Problem https://github.com/neondatabase/neon/issues/9965 ## Summary of changes Add to safekeeper http endpoint to switch membership configuration. Also add it to python client for tests, and add simple test itself.	2025-01-15 14:16:04 +00:00
Alexander Bayandin	b9464865b6	benchmarks: report successful runs to slack as well (#10393 ) ## Problem Successful `benchmarks` runs doesn't have enough visibility Ref https://neondb.slack.com/archives/C069Z2199DL/p1736868055094539 ## Summary of changes - Report both successful and failed `benchmarks` to Slack - Update `slackapi/slack-github-action` action	2025-01-15 13:05:05 +00:00
Vlad Lazar	1577430408	safekeeper: decode and interpret for multiple shards in one go (#10201 ) ## Problem Currently, we call `InterpretedWalRecord::from_bytes_filtered` from each shard. To serve multiple shards at the same time, the API needs to allow for enquiring about multiple shards. ## Summary of changes This commit tweaks it a pretty brute force way. Naively, we could just generate the shard for a key, but pre and post split shards may be subscribed at the same time, so doing it efficiently is more complex.	2025-01-15 11:10:24 +00:00
Erik Grinaker	05d17a10ae	rfc: add CPU and heap profiling RFC (#10085 ) This document proposes a standard cross-team pattern for CPU and memory profiling across applications and languages, using the [pprof](https://github.com/google/pprof) profile format. It enables both ad hoc profiles via HTTP endpoints, and continuous profiling across the fleet via [Grafana Cloud Profiles](https://grafana.com/docs/grafana-cloud/monitor-applications/profiles/). Continuous profiling incurs an overhead of about 0.1% CPU usage and 3% slower heap allocations. [Rendered](https://github.com/neondatabase/neon/blob/erik/profiling-rfc/docs/rfcs/040-profiling.md) Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888.	2025-01-15 10:35:38 +00:00
Arseny Sher	2d0ea08524	Add safekeeper membership conf to control file. (#10196 ) ## Problem https://github.com/neondatabase/neon/issues/9965 ## Summary of changes Add safekeeper membership configuration struct itself and storing it in the control file. In passing also add creation timestamp to the control file (there were cases where I wanted it in the past). Remove obsolete unused PersistedPeerInfo struct from control file (still keep it control_file_upgrade.rs to have it in old upgrade code). Remove the binary representation of cfile in the roundtrip test. Updating it is annoying, and we still test the actual roundtrip. Also add configuration to timeline creation http request, currently used only in one python test. In passing, slightly change LSNs meaning in the request: normally start_lsn is passed (the same as ancestor_start_lsn in similar pageserver call), but we allow specifying higher commit_lsn for manual intervention if needed. Also when given LSN initialize term_history with it.	2025-01-15 09:45:58 +00:00
Arseny Sher	c98cbbeac1	Add migration details to safekeeper membership RFC. (#10272 ) ## Problem https://github.com/neondatabase/neon/pull/8455 wasn't specific enough on migration from current situation to enabling generations. ## Summary of changes Describe the missing parts, including control plane pushing generation to compute, which also defines whether generations are enabled -- non zero value does it.	2025-01-15 09:41:49 +00:00
John Spray	47c1640acc	storage controller: pagination for tenant listing API (#10365 ) ## Problem For large deployments, the `control/v1/tenant` listing API can time out transmitting a monolithic serialized response. ## Summary of changes - Add `limit` and `start_after` parameters to listing API - Update storcon_cli to use these parameters and limit requests to 1000 items at a time	2025-01-14 21:37:32 +00:00
Erik Grinaker	6debb49b87	pageserver: coalesce index uploads when possible (#10248 ) ## Problem With upload queue reordering in #10218, we can easily get into a situation where multiple index uploads are queued back to back, which can't be parallelized. This will happen e.g. when multiple layer flushes enqueue layer/index/layer/index/... and the layers skip the queue and are uploaded in parallel. These index uploads will incur serial S3 roundtrip latencies, and may block later operations. Touches #10096. ## Summary of changes When multiple back-to-back index uploads are ready to upload, only upload the most recent index and drop the rest.	2025-01-14 21:10:17 +00:00
Erik Grinaker	e58e29e639	pageserver: limit number of upload queue tasks (#10384 ) ## Problem The upload queue can currently schedule an arbitrary number of tasks. This can both spawn an unbounded number of Tokio tasks, and also significantly slow down upload queue scheduling as it's quadratic in number of operations. Touches #10096. ## Summary of changes Limit the number of inprogress tasks to the remote storage upload concurrency. While this concurrency limit is shared across all tenants, there's certainly no point in scheduling more than this -- we could even consider setting the limit lower, but don't for now to avoid artificially constraining tenants.	2025-01-14 18:01:14 +00:00
Heikki Linnakangas	d36112d20f	Simplify compute dockerfile by setting PATH just once (#10357 ) By setting PATH in the 'pg-build' layer, all the extension build layers will inherit. No need to pass PG_CONFIG to all the various make invocations either: once pg_config is in PATH, the Makefiles will pick it up from there.	2025-01-14 17:02:35 +00:00
Erik Grinaker	ffaa52ff5d	pageserver: reorder upload queue when possible (#10218 ) ## Problem The upload queue currently sees significant head-of-line blocking. For example, index uploads act as upload barriers, and for every layer flush we schedule a layer and index upload, which effectively serializes layer uploads. Resolves #10096. ## Summary of changes Allow upload queue operations to bypass the queue if they don't conflict with preceding operations, increasing parallelism. NB: the upload queue currently schedules an explicit barrier after every layer flush as well (see #8550). This must be removed to enable parallelism. This will require a better mechanism for compaction backpressure, see e.g. #8390 or #5415.	2025-01-14 16:31:59 +00:00
John Spray	aa7323a384	storage controller: quality of life improvements for AZ handling (#10379 ) ## Problem Since https://github.com/neondatabase/neon/pull/9916, the preferred AZ of a tenant is much more impactful, and we would like to make it more visible in tooling. ## Summary of changes - Include AZ in node describe API - Include AZ info in node & tenant outputs in CLI - Add metrics for per-node shard counts, labelled by AZ - Add a CLI for setting preferred AZ on a tenant - Extend AZ-setting API+CLI to handle None for clearing preferred AZ	2025-01-14 15:30:43 +00:00
Christian Schwarz	2466a2f977	page_service: throttle individual requests instead of the batched request (#10353 ) ## Problem Before this PR, the pagestream throttle was applied weighted on a per-batch basis. This had several problems: 1. The throttle occurence counters were only bumped by `1` instead of `batch_size`. 2. The throttle wait time aggregator metric only counted one wait time, irrespective of `batch_size`. That makes sense in some ways of looking at it but not in others. 3. If the last request in the batch runs into the throttle, the other requests in the batch are also throttled, i.e., over-throttling happens (theoretical, didn't measure it in practice). ## Solution It occured to me that we can simply push the throttling upwards into `pagestream_read_message`. This has the added benefit that in pipeline mode, the `executor` stage will, if it is idle, steal whatever requests already made it into the `spsc_fold` and execute them; before this change, that was not the case - the throttling happened in the `executor` stage instead of the `batcher` stage. ## Code Changes There are two changes in this PR: 1. Lifting up the throttling into the `pagestream_read_message` method. 2. Move the throttling metrics out of the `Throttle` type into `SmgrOpMetrics`. Unlike the other smgr metrics, throttling is per-tenant, hence the Arc. 3. Refactor the `SmgrOpTimer` implementation to account for the new observation states, and simplify its design. 4. Drive-by-fix flush time metrics. It was using the same `now` in the `observe_guard` every time. The `SmgrOpTimer` is now a state machine. Each observation point moves the state machine forward. If a timer object is dropped early some "pair"-like metrics still require an increment or observation. That's done in the Drop implementation, by driving the state machine to completion.	2025-01-14 15:28:01 +00:00
Alex Chi Z.	9bdb14c1c0	fix(pageserver): ensure initial image layers have correct key ranges (#10374 ) ## Problem Discovered during the relation dir refactor work. If we do not create images as in this patch, we would get two set of image layers: ``` 0000...METADATA_KEYS 0000...REL_KEYS ``` They overlap at the same LSN and would cause data loss for relation keys. This doesn't happen in prod because initial image layer generation is never called, but better to be fixed to avoid future issues with the reldir refactors. ## Summary of changes * Consolidate create_image_layers call into a single one. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-14 15:27:48 +00:00
Conrad Ludgate	df4abd8b14	fix: force-refresh azure identity token (#10378 ) ## Problem Because of https://github.com/Azure/azure-sdk-for-rust/issues/1739, our identity token file was not being refreshed. This caused our uploads to start failing when the storage token expired. ## Summary of changes Drop and recreate the remote storage config every time we upload in order to force reload the identity token file.	2025-01-14 12:53:32 +00:00
Konstantin Knizhnik	a039f8381f	Optimize vector get last written LSN (#10360 ) ## Problem See https://github.com/neondatabase/neon/issues/10281 pg17 performs extra lock/unlock operation when fetching LwLSN. ## Summary of changes Perform all lookups under one lock, moving initialization of not found keys to separate loop. Related Postgres PR: https://github.com/neondatabase/postgres/pull/553 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-14 05:54:30 +00:00
Tristan Partin	430b556b34	Update postgres-exporter and sql_exporter in computes (#10349 ) The postgres-exporter was much further out of date, but let's just bump both. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-14 00:44:39 +00:00
Konstantin Knizhnik	1783501eaa	Increase max connection for replica to prevent test flukyness (#10306 ) ## Problem See https://github.com/neondatabase/neon/issues/10167 Too small number of `max_connections` (2) can cause failures of test_physical_replication_config_mismatch_too_many_known_xids test ## Summary of changes Increase `max_connections` to 5 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-13 20:01:03 +00:00
John Spray	fd1368d31e	storcon: rework scheduler optimisation, prioritize AZ (#9916 ) ## Problem We want to do a more robust job of scheduling tenants into their home AZ: https://github.com/neondatabase/neon/issues/8264. Closes: https://github.com/neondatabase/neon/issues/8969 ## Summary of changes ### Scope This PR combines prioritizing AZ with a larger rework of how we do optimisation. The rationale is that just bumping AZ in the order of Score attributes is a very tiny change: the interesting part is lining up all the optimisation logic to respect this properly, which means rewriting it to use the same scores as the scheduler, rather than the fragile hand-crafted logic that we had before. Separating these changes out is possible, but would involve doing two rounds of test updates instead of one. ### Scheduling optimisation `TenantShard`'s `optimize_attachment` and `optimize_secondary` methods now both use the scheduler to pick a new "favourite" location. Then there is some refined logic for whether + how to migrate to it: - To decide if a new location is sufficiently "better", we generate scores using some projected ScheduleContexts that exclude the shard under consideration, so that we avoid migrating from a node with AffinityScore(2) to a node with AffinityScore(1), only to migrate back later. - Score types get a `for_optimization` method so that when we compare scores, we will only do an optimisation if the scores differ by their highest-ranking attributes, not just because one pageserver is lower in utilization. Eventually we _will_ want a mode that does this, but doing it here would make scheduling logic unstable and harder to test, and to do this correctly one needs to know the size of the tenant that one is migrating. - When we find a new attached location that we would like to move to, we will create a new secondary location there, even if we already had one on some other node. This handles the case where we have a home AZ A, and want to migrate the attachment between pageservers in that AZ while retaining a secondary location in some other AZ as well. - A unit test is added for https://github.com/neondatabase/neon/issues/8969, which is implicitly fixed by reworking optimisation to use the same scheduling scores as scheduling.	2025-01-13 19:33:00 +00:00
Alex Chi Z.	e9ed53b14f	feat(pageserver): support inherited sparse keyspace (#10313 ) ## Problem In preparation to https://github.com/neondatabase/neon/issues/9516. We need to store rel size and directory data in the sparse keyspace, but it does not support inheritance yet. ## Summary of changes Add a new type of keyspace "sparse but inherited" into the system. On the read path: we don't remove the key range when we descend into the ancestor. The search will stop when (1) the full key range is covered by image layers (which has already been implemented before), or (2) we reach the end of the ancestor chain. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-13 15:43:01 +00:00
Conrad Ludgate	a338aee132	feat(local_proxy): use ed25519 signatures with pg_session_jwt (#10290 ) Generally ed25519 seems to be much preferred for cryptographic strength to P256 nowadays, and it is NIST approved finally. We should use it where we can as it's also faster than p256. This PR makes the re-signed JWTs between local_proxy and pg_session_jwt use ed25519. This does introduce a new dependency on ed25519, but I do recall some Neon Authorise customers asking for support for ed25519, so I am justifying this dependency addition in the context that we can then introduce support for customer ed25519 keys sources: * https://csrc.nist.gov/pubs/fips/186-5/final subsection 7 (EdDSA) * https://datatracker.ietf.org/doc/html/rfc8037#section-3.1	2025-01-13 15:20:46 +00:00
Heikki Linnakangas	96243af651	Stop building unnecessary extension tarballs (#10355 ) We build "custom extensions" from a different repository nowadays.	2025-01-13 15:01:13 +00:00
John Spray	ef8bfacd6b	storage controller: API + CLI for migrating secondary locations (#10284 ) ## Problem Currently, if we want to move a secondary there isn't a neat way to do that: we just have migration API for the attached location, and it is only clean to use that if you've manually created a secondary via pageserver API in the place you're going to move it to. Secondary migration API enables: - Moving the secondary somewhere because we would like to later move the attached location there. - Move the secondary location because we just want to reclaim some disk space from its current location. ## Summary of changes - Add `/migrate_secondary` API - Add `tenant-shard-migrate-secondary` CLI - Add tests for above	2025-01-13 14:52:43 +00:00
Konstantin Knizhnik	ceacc29609	Start with minimal prefetch distance to minimize prefetch overhead for exact or limited index scans (#10359 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1736526089437179 In case of queries index scan with LIMIT clause, multiple backends can concurrently send larger number of duplicated prefetch requests which are not stored in LFC and so actually do useless job. Current implementation of index prefetch starts with maximal prefetch distance (10 by default now) when there are no key bounds, so in queries with LIMIT clause like `select * from T order by pk limit 1` compute can send a lot of useless prefetch requests to page server. ## Summary of changes Always start with minimal prefetch distance even if there are not key boundaries. Related Postgres PRs: https://github.com/neondatabase/postgres/pull/552 https://github.com/neondatabase/postgres/pull/551 https://github.com/neondatabase/postgres/pull/550 https://github.com/neondatabase/postgres/pull/549 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-13 14:26:11 +00:00
Erik Grinaker	b31ed0acd1	utils: add ?force=true hint for CPU profiler (#10368 ) This makes it less annoying to try to take a CPU profile when a continuous profile is already running.	2025-01-13 14:23:42 +00:00
Alexander Bayandin	b2d0e1a519	Link OpenSSL dynamically (#10302 ) ## Problem Statically linked OpenSSL is buggy in multithreaded environment: - https://github.com/neondatabase/cloud/issues/16155 - https://github.com/neondatabase/neon/issues/8275 ## Summary of changes - Link OpenSSL dynamically (revert OpenSSL part from https://github.com/neondatabase/neon/pull/8074) Before: ``` ldd /usr/local/v17/lib/libpq.so linux-vdso.so.1 (0x0000ffffb5ce4000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffb5c10000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffb5650000) /lib/ld-linux-aarch64.so.1 (0x0000ffffb5ca7000) ``` After: ``` ldd /usr/local/v17/lib/libpq.so linux-vdso.so.1 (0x0000ffffbf3e8000) libssl.so.3 => /lib/aarch64-linux-gnu/libssl.so.3 (0x0000ffffbf260000) libcrypto.so.3 => /lib/aarch64-linux-gnu/libcrypto.so.3 (0x0000ffffbec00000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffffbf1c0000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffffbea50000) /lib/ld-linux-aarch64.so.1 (0x0000ffffbf3ab000) ```	2025-01-13 14:13:02 +00:00
John Spray	d1bc36f536	storage controller: fix retries of compute hook notifications while a secondary node is offline (#10352 ) ## Problem We would sometimes fail to retry compute notifications: 1. Try and send, set compute_notify_failure if we can't 2. On next reconcile, reconcile() fails for some other reason (e.g. tried to talk to an offline node), and we fail the `result.is_ok() && must_notify` condition around the re-sending. Closes: https://github.com/neondatabase/cloud/issues/22612 ## Summary of changes - Clarify the meaning of the reconcile result: it should be Ok(()) if configuring attached location worked, even if secondary or detach locations cannot be reached. - Skip trying to talk to secondaries if they're offline - Even if reconcile fails and we can't send the compute notification (we can't send it because we're not sure if it's really attached), make sure we save the `compute_notify_failure` flag so that subsequent reconciler runs will try again - Add a regression test for the above	2025-01-13 13:31:57 +00:00
Erik Grinaker	0b9032065e	utils: allow 60-second CPU profiles (#10367 ) Taking continuous profiles every 20 seconds is likely too expensive (in dollar terms). Let's try 60-second profiles. We can now interrupt running profiles via `?force=true`, so this should be fine.	2025-01-13 13:14:23 +00:00
Heikki Linnakangas	09fe3b025c	Add a websockets tunnel and a test for the proxy's websockets support. (#3823 ) For testing the proxy's websockets support. I wrote this to test https://github.com/neondatabase/neon/issues/3822. Unfortunately, that bug can not be reproduced with this tunnel. The bug only appears when the client pipelines the first query with the authentication messages. The tunnel doesn't do that. --- Update (@conradludgate 2025-01-10): We have since added some websocket tests, but they manually implemented a very simplistic setup of the postgres protocol. Introducing the tunnel would make more complex testing simpler in the future. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2025-01-13 11:35:39 +00:00
John Spray	12053cf832	storage controller: improve consistency_check_api (#10363 ) ## Problem Limitations found while using this to investigate https://github.com/neondatabase/neon/issues/10234: - If we hit a node consistency issue, we drop out and don't check shards for consistency - The messages printed after a shard consistency issue are huge, and grafana appears to drop them. ## Summary of changes - Defer node consistency errors until the end of the function, so that we always proceed to check shards for consistency - Print out smaller log lines that just point out the diffs between expected and persistent state	2025-01-13 11:18:14 +00:00
Conrad Ludgate	de199d71e1	chore: Address lints introduced in rust 1.85.0 beta (#10340 ) With a new beta build of the rust compiler, it's good to check out the new lints. Either to find false positives, or find flaws in our code. Additionally, it helps reduce the effort required to update to 1.85 in 6 weeks.	2025-01-13 10:34:36 +00:00
Erik Grinaker	22a6460010	libs/utils: add `force` parameter for `/profile/cpu` (#10361 ) ## Problem It's only possible to take one CPU profile at a time. With Grafana continuous profiling, a (low-frequency) CPU profile will always be running, making it hard to take an ad hoc CPU profile at the same time. Resolves #10072. ## Summary of changes Add a `force` parameter for `/profile/cpu` which will end and return an already running CPU profile, starting a new one for the current caller.	2025-01-13 10:01:18 +00:00
Erik Grinaker	cd982a82ec	pageserver,safekeeper: increase heap profiling frequency to 2 MB (#10362 ) ## Problem Currently, the heap profiling frequency is every 1 MB allocated. Taking a profile stack trace takes about 1 µs, and allocating 1 MB takes about 15 µs, so the overhead is about 6.7% which is a bit high. This is a fixed cost regardless of whether heap profiles are actually accessed. ## Summary of changes Increase the heap profiling sample frequency from 1 MB to 2 MB, which reduces the overhead to about 3.3%. This seems acceptable, considering performance-sensitive code will avoid allocations as far as possible anyway.	2025-01-13 09:44:59 +00:00
Heikki Linnakangas	8327f68043	Minor cleanup of extension build commands (#10356 ) There used to be some pg version dependencies in these extensions, but now that there isn't, follow the simpler pattern used in other extensions. No change in the produced images.	2025-01-11 17:39:27 +00:00
Heikki Linnakangas	846e8fdce4	Remove obsolete hnsw extension (#8008 ) This has been deprecated and disabled for new installations for a long time. Let's remove it for good.	2025-01-11 14:20:50 +00:00
Heikki Linnakangas	70a3bf37a0	Stop building 'compute-tools' image (#10333 ) It's been unused from time immemorial. --------- Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2025-01-11 13:09:55 +00:00
Arpad Müller	23c0748cdd	Remove active column (#10335 ) We don't need or want the `active` column. Remove it. Vlad pointed out that this is safe. Thanks to the separation of the schemata in earlier PRs, this is easy. follow-up of #10205 Part of https://github.com/neondatabase/neon/issues/9981	2025-01-11 02:52:45 +00:00
Alex Chi Z.	b5d54ba52a	refactor(pageserver): move queue logic to compaction.rs (#10330 ) ## Problem close https://github.com/neondatabase/neon/issues/10031, part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes Move the compaction job generation to `compaction.rs`, thus making the code more readable and debuggable. We now also return running job through the get compaction job API, versus before we only return scheduled jobs. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-10 20:53:00 +00:00
Christian Schwarz	58332cb361	pageserver: remove unused metric `pageserver_layers_visited_per_read_global` (#10141 ) As of commit "pageserver: remove legacy read path" (#8601) we always use vectored get, which has a separate metric.	2025-01-10 20:35:50 +00:00
Vlad Lazar	4c093c6314	Merge pull request #10338 from neondatabase/rc/release/2025-01-10 Storage release 2025-01-10	2025-01-10 19:21:43 +00:00
Christian Schwarz	9b43204893	fix(page_service): Timeline::gate held open while throttling (#10314 ) When we moved throttling up from Timeline::get into page_service, we stopped being sensitive to `Timeline::cancel`, even though we're holding a Handle and thus a guard on the `Timeline::gate` open. This PR rectifies the situation. Refs - Found while investigating #10309 (hung detach because gate kept open), but not expected to be the root cause of that issue because the affected tenants are not being throttled according to their metrics.	2025-01-10 19:21:01 +00:00
Christian Schwarz	cdd34dfc12	impr(utils/spsc_fold): add another test case (#10319 ) Wondered about the case covered here while investigating #10309.	2025-01-10 19:19:48 +00:00
Vlad Lazar	3cd5034eac	storcon: don't assume host:port format in storcon client (#10347 ) ## Problem https://github.com/neondatabase/infra/pull/2725 updated the scrubber to use a non-host port endpoint for storcon. That breaks when unwrapping the port. ## Summary of changes Support both `host:port` and `host` formats for the storcon api.	2025-01-10 18:35:16 +00:00
Cheng Chen	425b777840	chore(compute): pg_mooncake v0.1.0 (#10337 ) ## Problem Upgrade pg_mooncake to v0.1.0 ## Summary of changes	2025-01-10 16:38:13 +00:00
John Spray	4398051385	tests: smaller datasets in LFC tests (#10346 ) ## Problem These two tests came up in #9537 as doing multi-gigabyte I/O, and from inspection of the tests it doesn't seem like they need that to fulfil their purpose. ## Summary of changes - In test_local_file_cache_unlink, run fewer background threads with a smaller number of rows. These background threads AFAICT exist to make sure some I/O is going on while we unlink the LFC directory, but 5 threads should be enough for "some". - In test_lfc_resize, tweak the test to validate that the cache size is larger than the final size before resizing it, so that we're sure we're writing enough data to really be doing something. Then decrease the pgbench scale.	2025-01-10 15:53:23 +00:00
Folke Behrens	71bca6f580	poetry: Update packaging for poetry v2 (#10344 ) ## Problem When poetry v2 (released Jan 5) is used it needs `packaging.metadata` module, but we downgrade `packaging` to 23.0. `packaging==23.1` introduced the metadata submodule. ## Summary of changes Update `packaging` to 24.2.	2025-01-10 14:32:26 +00:00
John Spray	105f66c4ce	tests: move test_parallel_copy into performance tree (#10343 ) ## Problem This test writes ~5GB of data. It is not suitable to run in parallel with all the other small tests in test_runner/regress. via #9537 ## Summary of changes - Move test_parallel_copy into the performance directory, so that it does not run in parallel with other tests	2025-01-10 13:57:26 +00:00
John Spray	0d4fce2d35	tests: refine how compat snapshot is generated (#10342 ) ## Problem I noticed in https://github.com/neondatabase/neon/pull/9537 that tests which work with compat snapshots were writing several hundred MB of data, which isn't really necessary. Also, the snapshots are large but don't have the proper variety of storage format features, e.g. they could just have L0 deltas. ## Summary of changes - Use smaller scale factor and runtime to generate less data - Configure a small layer size and use force image layer generation so that our output contains L1 deltas and image layers, and has a decent number of entries in the layer map	2025-01-10 13:57:23 +00:00
Erik Grinaker	2b8ea1e768	utils: add flamegraph for heap profiles (#10223 ) ## Problem Unlike CPU profiles, the `/profile/heap` endpoint can't automatically generate SVG flamegraphs. This requires the user to install and use `pprof` tooling, which is unnecessary and annoying. Resolves #10203. ## Summary of changes Add `format=svg` for the `/profile/heap` route, and generate an SVG flamegraph using the `inferno` crate, similarly to what `pprof-rs` already does for CPU profiles.	2025-01-10 12:14:29 +00:00
Christian Schwarz	db00eb41a1	fix(spsc_fold): potentially missing wakeup when send()ing in state `SenderWaitsForReceiverToConsume` (#10318 ) # Problem Before this PR, there were cases where send() in state SenderWaitsForReceiverToConsume would never be woken up by the receiver, because it never registered with `wake_sender`. Example Scenario 1: we stop polling a send() future A that was waiting for the receiver to consume. We drop A and create a new send() future B. B would return Poll::Pending and never regsister a waker. Example Scenario 2: a send() future A transitions from HasData to SenderWaitsForReceiverToConsume. This registers the context X with `wake_sender`. But before the Receiver consumes the data, we poll A from a different context Y. The state is still SenderWaitsForReceiverToConsume, but we wouldn't register the new context with `wake_sender`. When the Receiver comes around to consume and `wake_sender.notify()`s, it wakes the old context X instead of Y. # Fix Register the waker in the case where we're polled in state `SenderWaitsForReceiverToConsume`. # Relation to #10309 I found this bug while investigating #10309. There was never proof that this bug here is the root cause for #10309. In the meantime we found a more probably hypothesis for the root cause than what is being fixed here. Regardless, let's walk through my thought process about how it might have been relevant: There (in page_service), Scenario 1 does not apply because we poll the send() future to completion. Scenario 2 (`tokio::join!`) also does not apply with the current `tokio::join!()` impl, because it will just poll each future every time, each with the same context. Although if we ever used something like a FuturesUnordered anywhere, that will be using a different context, so, in that case, the bug might materialize. Regarding tokio & spurious poll in general: @conradludgate is not aware of any spurious wakeup cases in current tokio, but within a `tokio::join!()`, any wake meant for one future will poll all the futures, so that can appear as a spurious wake up to the N-1 futures of the `tokio::join!()`.	2025-01-10 11:06:03 +00:00
Conrad Ludgate	735c66dc65	fix(proxy): propagate the existing ComputeUserInfo to connect for cancellation (#10322 ) ## Problem We were incorrectly constructing the ComputeUserInfo, used for cancellation checks, based on the return parameters from postgres. This didn't contain the correct info. ## Summary of changes Propagate down the existing ComputeUserInfo.	2025-01-10 09:36:51 +00:00
Folke Behrens	77660f3d88	proxy: Fix parsing of UnknownTopic with payload (#10339 ) ## Problem When the proxy receives a `Notification` with an unknown topic it's supposed to use the `UnknownTopic` unit variant. Unfortunately, in adjacently tagged enums serde will not simply ignore the configured content if found and try to deserialize a map/object instead. ## Summary of changes * Use a custom deserialize function to ignore variant content. * Add a little unit test covering both cases.	2025-01-10 09:12:31 +00:00
Folke Behrens	b6205af4a5	Update tracing/otel crates (#10311 ) Update the tracing(-x) and opentelemetry(-x) crates. Some breaking changes require updating our code: * Initialization is done via builders now https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/CHANGELOG.md#0270 * Errors from OTel SDK are logged via tracing crate as well. https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry/CHANGELOG.md#0270	2025-01-10 08:48:03 +00:00
github-actions[bot]	32f58f8228	Storage release 2025-01-10	2025-01-10 06:02:00 +00:00
Arpad Müller	6149ac8834	Handle race between auto-offload and unarchival (#10305 ) ## Problem Auto-offloading as requested by the compaction task is racy with unarchival, in that the compaction task might attempt to offload an unarchived timeline. By that point it will already have set the timeline to the `Stopping` state however, which makes it unusable for any purpose. For example: 1. compaction task decides to offload timeline 2. timeline gets unarchived 3. `offload_timeline` gets called by compaction task * sets timeline's state to `Stopping` * realizes that the timeline can't be unarchived, errors out 6. endpoint can't be started as the timeline is `Stopping` and thus 'can't be found'. A future iteration of the compaction task can't "heal" this state either as the timeline will still not be archived, same goes for other automatic stuff. The only way to heal this is a tenant detach+attach, or alternatively a pageserver restart. Furthermore, the compaction task is especially amenable for such races as it first stores `can_offload` into a variable, figures out whether compaction is needed (which takes some time), and only then does it attempt an offload operation: the time difference between "check" and "use" is non-trivially small. To make it even worse, we start the compaction task right after attach of a tenant, and it is a common pattern by pageserver users to attach a tenant to then immediately unarchive a timeline, so that an endpoint can be started. ## Solutions not adopted The simplest solution is to move the `can_offload` check to right before attempting of the offload. But this is not a good solution, as no lock is held between that check and timeline shutdown. So races would still be possible, just become less likely. I explored using the timeline state for this, as in adding an additional enum variant. But `Timeline::set_state` is racy (#10297). ## Adopted solution We use the lock on the timeline's upload queue as an arbiter: either unarchival gets to it first and sours the state for auto-offloading, or auto-offloading shuts it down, which stops any parallel unarchival in its tracks. The key part is not releasing the upload queue's lock between the check whether the timeline is archived or not, and shutting it down (the actual implementation only sets `shutting_down` but it has the same effect on `initialized_mut()` as a full shutdown). The rest of the patch is stuff that follows from this. We also move the part where we set the state to `Stopping` to after that arbiter has decided the fate of the timeline. For deletions, we do keep it inside `DeleteTimelineFlow::prepare` however, so that it is called with all of the the timelines locks held that the function allocates (timelines lock most importantly). This is only a precautionary measure however, as I didn't want to analyze deletion related code for possible races. ## Future changes It might make sense to move `can_offload` to right before the offload attempt. Maybe some other properties might have changed as well. Although this will not be perfect either as no lock is held. I want to keep it out of this change to emphasize that this move wasn't the main reason we are race free now. Fixes #10220	2025-01-09 20:41:49 +00:00
Tristan Partin	49756a0d01	Implement compute_ctl management API in Axum (#10099 ) This is a refactor to create better abstractions related to our management server. It cleans up the code, and prepares everything for authorized communication to and from the control plane. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-09 20:08:26 +00:00
Arpad Müller	99b5a6705f	Update rust to 1.84.0 (#10328 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.84.0/). Prior update was in #9926.	2025-01-09 18:29:09 +00:00
Alexey Kondratov	f37eeb56ad	fix(compute_ctl): Resolve issues with dropping roles having dangling permissions (#10299 ) ## Problem In Postgres, one cannot drop a role if it has any dependent objects in the DB. In `compute_ctl`, we automatically reassign all dependent objects in every DB to the corresponding DB owner. Yet, it seems that it doesn't help with some implicit permissions. The issue is reproduced by installing a `postgis` extension because it creates some views and tables in the public schema. ## Summary of changes Added a repro test without using a `postgis`: i) create a role via `compute_ctl` (with `neon_superuser` grant); ii) create a test role, a table in schema public, and grant permissions via the role in `neon_superuser`. To fix the issue, I added a new `compute_ctl` code that removes such dangling permissions before dropping the role. It's done in the least invasive way, i.e., only touches the schema public, because i) that's the problem we had with PostGIS; ii) it creates a smaller chance of messing anything up and getting a stuck operation again, just for a different reason. Properly, any API-based catalog operations should fail gracefully and provide an actionable error and status code to the control plane, allowing the latter to unwind the operation and propagate an error message and hint to the user. In this sense, it's aligned with another feature request https://github.com/neondatabase/cloud/issues/21611 Resolve neondatabase/cloud#13582	2025-01-09 16:39:53 +00:00
Arpad Müller	bebc46e713	Add scheduling_policy column to safekeepers table (#10205 ) Add a `scheduling_policy` column to the safekeepers table of the storage controller. Part of #9981	2025-01-09 15:55:02 +00:00
John Spray	ad51622568	remote_storage: enable Azure connection pooling by default (#10324 ) ## Problem Initially we defaulted this to zero to reduce risk. We have now been using pooling in staging for some time without issues, so let's make it the default for anyone using this software without setting the config explicitly. Closes: https://github.com/neondatabase/cloud/issues/20971 ## Summary of changes - Set Azure blob storage connection pool size to 8 by default	2025-01-09 15:34:06 +00:00
John Spray	ac6cca17ac	storcon: don't log a heartbeat error during shutdown (#10325 ) ## Problem Occasionally we see an unexpected error like: ``` ERROR spawn_heartbeat_driver: Failed to update node state 1 after heartbeat round: Shutting down\n') Hint: use scripts/check_allowed_errors.sh to test any new allowed_error you add ``` https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10324/12690404952/index.html#/testresult/63406a0687bf6eca ## Summary of changes - Explicitly handle ApiError::ShuttingDown as a no-op when mutating node status	2025-01-09 15:33:44 +00:00
Alex Chi Z.	640ac4fc9e	fix(pageserver): report timestamp is in the past if the key is missing (#10210 ) ## Problem If for some reasons we already garbage-collected the data under an LSN but the caller uses a past LSN for the find_time_cutoff function, now we will report a missing key error and GC will never proceed. Note that missing key error can also happen if the key is really missing (i.e., during the past offload incidents) ## Summary of changes Make sure GC proceeds by bumping the LSN. When time_cutoff=None, we will not increase the time_cutoff (it will be set to latest_gc_cutoff). If we really need to bump the GC LSN for maintenance purpose, we need a separate API to do that. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-09 14:43:20 +00:00
Konstantin Knizhnik	20c40eb733	Add response tag to getpage request in V3 protocol version (#8686 ) ## Problem We have several serious data corruption incidents caused by mismatch of get-age requests: https://neondb.slack.com/archives/C07FJS4QF7V/p1723032720164359 We hope that the problem is fixed now. But it is better to prevent such kind of problems in future. Part of https://github.com/neondatabase/cloud/issues/16472 ## Summary of changes This PR introduce new V3 version of compute<->pageserver protocol, adding tag to getpage response. So now compute is able to check if it really gets response to the requested page. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-01-09 13:12:04 +00:00
Vlad Lazar	f4739d49e3	pageserver: tweak interpreted ingest record metrics (#10291 ) ## Problem The filtered record metric doesn't make sense for interpreted ingest. ## Summary of changes While of dubious utility in the first place, this patch replaces them with records received and records observed metrics for interpreted ingest: * received records cause the pageserver to do _something_: write a key, value pair to storage, update some metadata or flush pending modifications * observed records are a shard 0 concept and contain only key metadata used in tracking relation sizes (received records include observed records)	2025-01-09 12:31:02 +00:00
Arseny Sher	030ab1c0e8	TLA+ spec for safekeeper membership change (#9966 ) ## Problem We want to define the algorithm for safekeeper membership change. ## Summary of changes Add spec for it, several models and logs of checking them. ref https://github.com/neondatabase/neon/issues/8699	2025-01-09 12:26:17 +00:00
John Spray	5baa4e7f0a	docker: don't set LD_LIBRARY_PATH (#10321 ) ## Problem This was causing storage controller to still use neon-built libpq instead of vanilla libpq. Since https://github.com/neondatabase/neon/pull/10269 we have a vanilla postgres in the system path -- anything that wants a postgres library will use that. ## Summary of changes - Remove LD_LIBRARY_PATH assignment in Dockerfile	2025-01-09 11:47:55 +00:00
Tristan Partin	5b2751397d	Refactor MigrationRunner::run_migrations() to call a helper (#10232 ) This will make it easier to add per-db migrations, such as that for CVE-2024-4317. Link: https://www.postgresql.org/support/security/CVE-2024-4317/ Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-09 07:05:07 +00:00
Ivan Efremov	fcfff72454	impr(proxy): Decouple ip_allowlist from the CancelClosure (#10199 ) This PR removes the direct dependency of the IP allowlist from CancelClosure, allowing for more scalable and flexible IP restrictions and enabling the future use of Redis-based CancelMap storage. Changes: - Introduce a new BackendAuth async trait that retrieves the IP allowlist through existing authentication methods; - Improve cancellation error handling by instrument() async cancel_sesion() rather than dropping it. - Set and store IP allowlist for SCRAM Proxy to consistently perform IP allowance check Relates to #9660	2025-01-08 19:34:53 +00:00
Anastasia Lubennikova	0ad0db6ff8	compute: dropdb DROP SUBSCRIPTION fix (#10066 ) ## Problem Project gets stuck if database with subscriptions was deleted via API / UI. https://github.com/neondatabase/cloud/issues/18646 ## Summary of changes Before dropping the database, drop all the subscriptions in it. Do not drop slot on publisher, because we have no guarantee that the slot still exists or that the publisher is reachable. Add `DropSubscriptionsForDeletedDatabases` phase to run these operations in all databases, we're about to delete. Ignore the error if the database does not exist.	2025-01-08 18:55:04 +00:00
John Spray	68d8acfd05	storage controller: don't hold detached tenants in memory (#10264 ) ## Problem Typical deployments of neon have some tenants that stay in use continuously, and a background churning population of tenants that are created and then fall idle, and are configured to Detached state. Currently, this churn of short lived tenants results in an ever-increasing memory footprint. Closes: https://github.com/neondatabase/neon/issues/9712 ## Summary of changes - At startup, filter to only load shards that don't have Detached policy - In process_result, check if a tenant's shards are all Detached and observed=={}, and if so drop them from memory - In tenant_location_conf and other tenant mutators, load the tenants' shards on-demand if they are not present	2025-01-08 18:12:09 +00:00
Vlad Lazar	dc284247a5	storage_controller: fix node flap detach race (#10298 ) ## Problem The observed state removal may race with the inline updates of the observed state done from `Service::node_activate_reconcile`. This was intended to work as follows: 1. Detaches while the node is unavailable remove the entry from the observed state. 2. `Service::node_activate_reconcile` diffs the locations returned by the pageserver with the observed state and detaches in-line when required. ## Summary of changes This PR removes step (1) and lets background reconciliations deal with the mismatch between the intent and observed state. A follow up will attempt to remove `Service::node_activate_reconcile` altogether. Closes https://github.com/neondatabase/neon/issues/10253	2025-01-08 10:26:53 +00:00
Alex Chi Z.	5c76e2a983	fix(storage-scrubber): ignore errors if index_part is not consistent (#10304 ) ## Problem Consider the pageserver is doing the following sequence of operations: * upload X files * update index_part to add X and remove Y * delete Y files When storage scrubber obtains the initial timeline snapshot before "update index_part" (that is the old version that contains Y but not X), and then obtains the index_part file after it gets updated, it will report all Y files are missing. ## Summary of changes Do not report layer file missing if index_part listed and downloaded are not the same (i.e. different last_modified times) Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-07 23:24:17 +00:00
Erik Grinaker	237dae71a1	Revert "pageserver,safekeeper: disable heap profiling (#10268 )" (#10303 ) This reverts commit `b33299dc37`. Heap profiles weren't the culprit after all. Touches #10225.	2025-01-07 22:49:00 +00:00
Fedor Dikarev	43a5e575d6	ci: use reusable workflow for MacOs build (#9889 ) Closes: https://github.com/neondatabase/cloud/issues/17784 ## Problem Currently, we run the whole CI pipeline for any changes. It's slow and expensive. ## Suggestion Starting with MacOs builds: - check what files were changed - rebuild only needed parts - reuse results from previous builds when available - run builds in parallel when possible --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-01-07 20:00:56 +00:00
Folke Behrens	0a117fb1f1	proxy: Parse Notification twice only for unknown topic (#10296 ) ## Problem We currently parse Notification twice even in the happy path. ## Summary of changes Use `#[serde(other)]` to catch unknown topics and defer the second parsing.	2025-01-07 15:24:54 +00:00
JC Grünhage	4aa9786c6b	Fix promote-images-prod after splitting it out (#10292 ) ## Problem `promote-images` was split into `promote-images-dev` and `promote-images-prod` in https://github.com/neondatabase/neon/pull/10267. `dev` credentials were loaded in `promote-images-dev` and `prod` credentials were loaded in `promote-images-prod`, but `promote-images-prod` needs `dev` credentials as well to access the `dev` images to replicate them from `dev` to `prod`. ## Summary of changes Load `dev` credentials in `promote-images-prod` as well.	2025-01-07 13:45:18 +00:00
Matthias van de Meent	be38123e62	Fix accounting of dropped prefetched GetPage requests (#10276 ) Apparently, we failed to do this bookkeeping in quite a few places... ## Problem Fixes https://github.com/neondatabase/cloud/issues/22364 ## Summary of changes Add accounting of dropped requests. Note that this includes prefetches dropped due to things like "PS connection dropped unexpectedly" or "prefetch queue is already full", but not (yet?) "dropped due to backend shutdown".	2025-01-07 10:41:52 +00:00
JC Grünhage	ea84ec357f	Split promote-images into promote-images-dev and promote-images-prod (#10267 ) ## Problem `trigger-e2e-tests` waits half an hour before starting to run. Nearly half of that time can be saved by promoting images before tests on them are complete, so the e2e tests can run in parallel. On `main` and `release{,-proxy,-compute}`, `promote-images` updates `latest` and pushes things to prod ecr, so we want to run `promote-images` only after `test-images` is done, but on other branches, there is no harm in promoting images that aren't tested yet. ## Summary of changes To promote images into dev container registries sooner, `promote-images` is split into `promote-images-dev` and `promote-images-prod`. The former pushes to dev container registries, the latter to prod ones. The latter also waits for `test-images`, while the former doesn't. This allows to run `trigger-e2e-tests` sooner.	2025-01-07 10:36:05 +00:00
Matthias van de Meent	30863c0104	libpagestore: timeout = max(0, difference), not min(0, difference) (#10274 ) Using `min(0, ...)` causes us to fail to wait in most situations, so a lack of data would be a hot wait loop, which is bad. ## Problem We noticed high CPU usage in some situations	2025-01-07 09:07:38 +00:00
Alexander Bayandin	02f81b6469	Fix clippy warning on macOS (#10282 ) ## Problem On macOS: ``` error: unused variable: `disable_lfc_resizing` --> compute_tools/src/bin/compute_ctl.rs:431:9 \| 431 \| disable_lfc_resizing, \| ^^^^^^^^^^^^^^^^^^^^ help: try ignoring the field: `disable_lfc_resizing: _` \| = note: `-D unused-variables` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_variables)]` ``` ## Summary of changes - Initialise `disable_lfc_resizing` only on Linux (because it's used on Linux only in further bloc)	2025-01-06 20:28:33 +00:00
Alexander Bayandin	ad7f14d526	test_runner: update packages for Python 3.13 (#10285 ) ## Problem It's impossible to run regression tests with Python 3.13 as some dependencies don't support it (some of them are outdated, and `jsonnet` doesn't support it at all yet) ## Summary of changes - Update dependencies for Python 3.13 - Install `jsonnet` only on Python < 3.13 and skip relevant tests on Python 3.13 Closes #10237	2025-01-06 20:25:31 +00:00
Erik Grinaker	b342a02b1c	Dockerfile: build with `force-frame-pointers=yes` (#10286 ) See https://github.com/neondatabase/neon/pull/10226#issuecomment-2573725182.	2025-01-06 20:17:43 +00:00
Alex Chi Z.	4a6556e269	fix(pageserver): ensure GC computes time cutoff using the same start time (#10193 ) ## Problem close https://github.com/neondatabase/neon/issues/10192 ## Summary of changes * `find_gc_time_cutoff` takes `now` parameter so that all branches compute the cutoff based on the same start time, avoiding races. * gc-compaction uses a single `get_gc_compaction_watermark` function to get the safe LSN to compact. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-01-06 19:29:18 +00:00
Erik Grinaker	95f1920231	cargo: build with frame pointers (#10226 ) ## Problem Frame pointers are typically disabled by default (depending on CPU architecture), to improve performance. This frees up a CPU register, and avoids a couple of instructions per function call. However, it makes stack unwinding much more inefficient, since it has to use DWARF debug information instead, and gives worse results with e.g. `perf` and eBPF profiles. The `backtrace` implementation of `libunwind` is also suspected to cause seg faults. The performance benefit of frame pointer omission doesn't appear to matter that much on modern 64-bit CPU architectures (which have plenty of registers and optimized instruction execution), and benchmarks did not show measurable overhead. The Rust standard library and jemalloc already enable frame pointers by default. For more information, see https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html. Resolves #10224. Touches #10225. ## Summary of changes Enable frame pointers in all builds, and use frame pointers for pprof-rs stack sampling.	2025-01-06 17:27:08 +00:00
Conrad Ludgate	fda52a0005	feat(proxy): dont trigger error alerts for unknown topics (#10266 ) ## Problem Before the holidays, and just before our code freeze, a change to cplane was made that started publishing the topics from #10197. This triggered our alerts and put us in a sticky situation as it was not an error, and we didn't want to silence the alert for the entire holidays, and we didn't want to release proxy 2 days in a row if it was not essential. We fixed it eventually by rewriting the alert based on logs, but this is not a good solution. ## Summary of changes Introduces an intermediate parsing step to check the topic name first, to allow us to ignore parsing errors for any topics we do not know about.	2025-01-06 13:05:35 +00:00
Busra Kugler	406cca643b	Update neon_fixtures.py - remove logs (#10219 ) We need to remove this line to prevent aws keys exposing in the public s3 buckets	2025-01-06 10:44:23 +00:00
dependabot[bot]	b368e62cfc	build(deps): bump jinja2 from 3.1.4 to 3.1.5 in the pip group (#10236 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-01-04 15:40:50 +00:00
Erik Grinaker	96c36c0894	Merge pull request #10263 from neondatabase/rc/release/2025-01-03 Storage release 2025-01-03	2025-01-03 20:32:37 +01:00
John Spray	4b2f56862d	docker: include vanilla debian postgres client (#10269 ) ## Problem We are chasing down segfaults in the storage controller https://github.com/neondatabase/cloud/issues/21010 This is for use by the storage controller, which links dynamically with `libpq`. We currently use the neon-built libpq, but this may be unsafe for use from multi-threaded programs like the controller, as it uses a statically linked openssl Precursor to https://github.com/neondatabase/neon/pull/10258 ## Summary of changes - Include `postgresql-15` in container builds. The reason for using version 15 is simply because that is what's available in Debian 12 without adding any extra repositories, and we don't have any special need for latest version in our libpq usage.	2025-01-03 16:16:04 +00:00
Erik Grinaker	a77e87a48a	pageserver: assert that uploads don't modify indexed layers (#10228 ) ## Problem It's not legal to modify layers that are referenced by the current layer index. Assert this in the upload queue, as preparation for upload queue reordering. Touches #10096. ## Summary of changes Add a debug assertion that the upload queue does not modify layers referenced by the current index. I could be convinced that this should be a plain assertion, but will be conservative for now.	2025-01-03 16:03:19 +00:00
Erik Grinaker	d719709316	Revert "pageserver: revert flush backpressure (#8550 ) (#10135 )" (#10270 ) This reverts commit `f3ecd5d76a`. It is [suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759) to have caused significant read amplification in the [ingest benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs) (specifically during index creation). We will revisit an intermediate improvement here to unblock [upload parallelism](https://github.com/neondatabase/neon/issues/10096) before properly addressing [compaction backpressure](https://github.com/neondatabase/neon/issues/8390).	2025-01-03 16:51:16 +01:00
Erik Grinaker	97912f19fc	pageserver,safekeeper: disable heap profiling (#10268 ) ## Problem Since enabling continuous profiling in staging, we've seen frequent seg faults. This is suspected to be because jemalloc and pprof-rs take a stack trace at the same time, and the handlers aren't signal safe. jemalloc does this probabilistically on every allocation, regardless of whether someone is taking a heap profile, which means that any CPU profile has a chance to cause a seg fault. Touches #10225. ## Summary of changes For now, just disable heap profiles -- CPU profiles are more important, and we need to be able to take them without risking a crash.	2025-01-03 16:51:16 +01:00
Erik Grinaker	1393cc668b	Revert "pageserver: revert flush backpressure (#8550 ) (#10135 )" (#10270 ) This reverts commit `f3ecd5d76a`. It is [suspected](https://neondb.slack.com/archives/C033RQ5SPDH/p1735907405716759) to have caused significant read amplification in the [ingest benchmark](https://neonprod.grafana.net/d/de3mupf4g68e8e/perf-test3a-ingest-benchmark?orgId=1&from=now-30d&to=now&timezone=utc&var-new_project_endpoint_id=ep-solitary-sun-w22bmut6&var-large_tenant_endpoint_id=ep-holy-bread-w203krzs) (specifically during index creation). We will revisit an intermediate improvement here to unblock [upload parallelism](https://github.com/neondatabase/neon/issues/10096) before properly addressing [compaction backpressure](https://github.com/neondatabase/neon/issues/8390).	2025-01-03 15:38:51 +00:00
Erik Grinaker	b33299dc37	pageserver,safekeeper: disable heap profiling (#10268 ) ## Problem Since enabling continuous profiling in staging, we've seen frequent seg faults. This is suspected to be because jemalloc and pprof-rs take a stack trace at the same time, and the handlers aren't signal safe. jemalloc does this probabilistically on every allocation, regardless of whether someone is taking a heap profile, which means that any CPU profile has a chance to cause a seg fault. Touches #10225. ## Summary of changes For now, just disable heap profiles -- CPU profiles are more important, and we need to be able to take them without risking a crash.	2025-01-03 15:21:31 +00:00
John Spray	e9d30edc7f	pageserver: fix a 500 during timeline creation + shutdown (#10259 ) ## Problem The test_create_churn_during_restart test fails if timeline creation calls return 500 errors (because the API shouldn't do it), and it's sometimes failing, for example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/3ce2e7045465012e ## Summary of changes - Avoid handling UploadQueueShutDownOrStopped case as an Other (i.e. 500)	2025-01-03 13:13:22 +00:00
Arpad Müller	1303cd5d05	Fix defusing race between Tenant::shutdown and offload_timeline (#10150 ) There is a race condition between `Tenant::shutdown`'s `defuse_for_drop` loop and `offload_timeline`, where timeline offloading can insert into a tenant that is in the process of shutting down, in fact so far progressed that the `defuse_for_drop` has already been called. This prevents warn log lines of the form: ``` offloaded timeline <hash> was dropped without having cleaned it up at the ancestor ``` The solution piggybacks on the `offloaded_timelines` lock: both the defuse loop and the offloaded timeline insertion need to acquire the lock, and we know that the defuse loop only runs after the tenant has set its `TenantState` to `Stopping`. So if we hold the `offloaded_timelines` lock, and know that the `TenantState` is not `Stopping`, then we know that the defuse loop has not ran yet, and holding the lock ensures that it doesn't start running while we are inserting the offloaded timeline. Fixes #10070	2025-01-03 12:36:01 +00:00
John Spray	c08759f367	storcon: verbose logs in rare case of shards not attached yet (#10262 ) ## Problem When we do a timeline CRUD operation, we check that the shards we need to mutate are currently attached to a pageserver, by reading `generation` and `generation_pageserver` from the database. If any don't appear to be attached, we respond with a a 503 and "One or more shards in tenant is not yet attached". This is happening more often than expected, and it's not obvious with current logging what's going on: specifically which shard has a problem, and exactly what we're seeing in these persistent generation columns. (Aside: it's possible that we broke something with the change in #10011 which clears generation_pageserver when we detach a shard, although if so the mechanism isn't trivial: what should happen is that if we stamp on generation_pageserver if a reconciler is running, then it shouldn't matter because we're about to ## Summary of changes - When we are in Attached mode but find that generation_pageserver/generation are unset, output details while looping over shards.	2025-01-03 10:55:15 +00:00
John Spray	ba9722a2fd	tests: add upload wait in test_scrubber_physical_gc_ancestors (#10260 ) ## Problem We see periodic failures in `test_scrubber_physical_gc_ancestors`, where the logs show that the pageserver is creating image layers that should cause child shards to no longer reference their parents' layers, but then the scrubber runs and doesn't find any unreferenced layers.[ https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/78ea06dea6ba8dd3 From inspecting the code & test, it seems like this could be as simple as the test failing to wait for uploads before running the scrubber. It had a 2 second delay built in to satisfy the scrubbers time threshold checks, which on a lightly loaded machine would also have been easily enough for uploads to complete, but our test machines are more heavily loaded all the time. ## Summary of changes - Wait for uploads to complete after generating images layers in test_scrubber_physical_gc_ancestors, so that the scrubber should reliably see the post-compaction metadata.	2025-01-03 10:55:07 +00:00
John Spray	2d4f267983	cargo: update diesel, pq-sys (#10256 ) ## Problem Versions of `diesel` and `pq-sys` were somewhat stale. I was checking on libpq->openssl versions while investigating a segfault via https://github.com/neondatabase/cloud/issues/21010. I don't think these rust bindings are likely to be the source of issues, but we might as well freshen them as a precaution. ## Summary of changes - Update diesel to 2.2.6 - Update pq-sys to 0.6.3	2025-01-03 10:20:18 +00:00
Ivan Efremov	7a598b9842	[proxy/docs]imprv: Add local testing section to proxy README (#10230 ) Add commands to run proxy locally with the mocked control plane	2025-01-03 10:04:58 +00:00
github-actions[bot]	49724aa3b6	Storage release 2025-01-03	2025-01-03 06:02:03 +00:00
Tristan Partin	eefad27538	Inline various migration queries (#10231 ) There was no value in saving them off to temporary variables. Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-02 22:12:56 +00:00
Em Sharnoff	cd10c719f9	compute: Add spec support for disabling LFC resizing (#10132 ) ref neondatabase/cloud#21731 ## Problem When we manually override the LFC size for particular computes, autoscaling will typically undo that because vm-monitor will resize LFC itself. So, we'd like a way to make vm-monitor not set LFC size — this actually already exists, if we just don't give vm-monitor a postgres connection string. ## Summary of changes Add a new field to the compute spec, `disable_lfc_resizing`. When set to `true`, we pass in `None` for its postgres connection string. That matches the configuration tested in `neondatabase/autoscaling` CI.	2025-01-02 19:45:59 +00:00
Tristan Partin	363ea97f69	Add more substantial tests for compute migrations (#9811 ) The previous tests really didn't do much. This set should be quite a bit more encompassing. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-02 18:37:50 +00:00
Conrad Ludgate	56e6ebfe17	chore: building compute_tools and local_proxy together (#10257 ) ## Problem Building local_proxy and compute_tools features the same dependency tree, but as they are currently built in separate clean layers all that progress is wasted. For our arm builds that's an extra 10 minutes. ## Summary of changes Combines the compute_tools and local_proxy build layers.	2025-01-02 16:05:14 +00:00
Raphael 'kena' Poss	1622fd8bda	proxy: recognize but ignore the 3 new redis message types (#10197 ) ## Problem https://neondb.slack.com/archives/C085MBDUSS2/p1734604792755369 ## Summary of changes Recognize and ignore the 3 new broadcast messages: - `/block_public_or_vpc_access_updated` - `/allowed_vpc_endpoints_updated_for_org` - `/allowed_vpc_endpoints_updated_for_projects`	2025-01-02 16:02:48 +00:00
Konstantin Knizhnik	8c7dcd2598	Set heartbeat interval for chaos test (#10222 ) ## Problem See https://neondb.slack.com/archives/C033RQ5SPDH/p1734707873215729 test_timeline_archival_chaos becomes more flaky with increased heartbeat interval Resolves #10250. ## Summary of changes Override heatbeat interval for `test_timelirn_archival_chaos.py` --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-02 14:14:18 +00:00
Folke Behrens	ee22d4c9ef	proxy: Set TCP_NODELAY for compute connections (#10240 ) neondatabase/cloud#19184	2025-01-02 13:32:24 +00:00
JC Grünhage	26600f2973	Skip running clippy without default features (#10098 ) ## Problem Running clippy with `cargo hack --feature-powerset` in CI isn't particularly fast. This PR follows-up on https://github.com/neondatabase/neon/pull/8912 to improve the speed of our clippy runs. Parallelism as suggested in https://github.com/neondatabase/neon/issues/9901 was tested, but didn't show consistent enough improvements to be worth it. It actually increased the amount of work done, as there's less cache hits when clippy runs are spread out over multiple target directories. Additionally, parallelism makes it so caching needs to be thought about more actively and copying around target directories to enable parallelism eats the rest of the performance gains from parallel execution. After some discussion, the decision was to instead cut down on the number of jobs that are running further. The easiest way to do this is to not run clippy without default features. The list of default features is empty for all crates, and I haven't found anything using `cfg(feature = "default")` either, so this is likely not going to change anything except speeding the runs up. ## Summary of changes Reduce the amount of feature combinations tried by `cargo hack` (as suggested in https://github.com/neondatabase/neon/pull/8912#pullrequestreview-2286482368) by never disabling default features. ## Alternatives - We can split things out into different jobs which reduces the time until everything is finished by running more things in parallel. This does however decreases the amount of cache hits and increases the amount of time spent on overhead tasks like repo cloning and restoring caches by doing those multiple times instead of once. - We could replace `cargo hack [...] clippy` with `cargo clippy [...]; cargo clippy --features testing`. I'm not 100% sure how this compares to the change here in the PR, but it does seem to run a bit faster. That likely means it's doing less work, but without understanding what exactly we loose by that I'd rather not do that for now. I'd appreciate input on this though.	2025-01-02 11:33:42 +00:00
Konstantin Knizhnik	b3cd883f93	Unlock LFC mutex when LFC cache is disabled (#10235 ) ## Problem See https://github.com/neondatabase/neon/issues/10233 `lfc_containsv` returns with holding lock when LFC was disabled. This bug was introduced in commit `78938d1b59` ## Summary of changes Release lock before return. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-01-02 11:28:15 +00:00
Conrad Ludgate	38c7a2abfc	chore(proxy): pre-load native tls certificates and propagate compute client config (#10182 ) Now that we construct the TLS client config for cancellation as well as connect, it feels appropriate to construct the same config once and re-use it elsewhere. It might also help should #7500 require any extra setup, so we can easily add it to all the appropriate call sites.	2025-01-02 09:36:13 +00:00
Conrad Ludgate	f94248a594	chore(libs/proxy): refactor tokio-postgres connection control flow (#10247 ) In #10207 it was clear there was some confusion with the current connection logic. To analyse the flow to make sure there was no poll stalling, I ended up with the following refactor. Notable changes: 1. Now all functions called `poll_xyz` and that have a `cx: &mut Context` argument must return a `Poll<_>` type, and can only return `Pending` iff an internal poll call also returned `Pending` 2. State management is handled entirely by `poll_messages`. There are now only 2 states which makes it much easier to keep track of. Each commit should be self-reviewable and should be simple to verify that it keeps the same behaviour	2025-01-02 09:35:28 +00:00
Alex Chi Z.	9c53b41245	fix(pageserver): update remote latest_gc_cutoff after gc-compaction (#10209 ) ## Problem close https://github.com/neondatabase/neon/issues/10208 part of #9114 ## Summary of changes * Ensure remote `latest_gc_cutoff` is up-to-date before removing any files for gc-compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-19 18:40:20 +00:00
Konstantin Knizhnik	197a89ab3d	Increase default stotrage controller heartbeat interval from 100msec … (#10206 ) ## Problem Currently default value of storage controller heartbeat interval is 100msec. It means that 10 times per second it establish connection to PS. And it seems to be quite expensive. At MacOS right now storage_controller consumes 70% CPU and trusts - 30%. So together they completely utilize one core. A lot of us has Macs. Let's save environment a little bit and do not waste electricity and contribute to global warming. By the way, on prod we have interval 10seconds ## Summary of changes Increase heartbeat interval from 100msec to 1 second. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-19 18:32:32 +00:00
Alex Chi Z.	b89e02f3e8	fix(pageserver): consider partial compaction layer map in layer check (#10044 ) ## Problem In https://github.com/neondatabase/neon/pull/9897 we temporarily disabled the layer valid check because the current one only considers the end result of all compaction algorithms, but partial gc-compaction would temporarily produce an "invalid" layer map. part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes Allow LSN splits to overlap in the slow path check. Currently, the valid check is only used in storage scrubber (background job) and during gc-compaction (without taking layer lock). Therefore, it's fine for such checks to be a little bit inefficient but more accurate. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-19 18:04:53 +00:00
Konstantin Knizhnik	04517c6ff3	Do not reload config file on PS reconnect (#10204 ) ## Problem See https://github.com/neondatabase/neon/issues/10184 and https://neondb.slack.com/archives/C04DGM6SMTM/p1733997259898819 Reloading config file inside parallel worker cause it's termination ## Summary of changes Remove call of `HandleMainLoopInterrupts()` Update of page server URL is propagated by postmaster through shared memory and we should not reload config for it. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-19 15:22:39 +00:00
Vlad Lazar	628451d68e	safekeeper: short-circuit interpreted wal sender (#10202 ) ## Problem Safekeeper may currently send a batch to the pageserver even if it hasn't decoded a new record. I think this is quite unlikely in the field, but worth adressing. ## Summary of changes Don't send anything if we haven't decoded a full record. Once this merges and releases, the `InterpretedWalRecords` struct can be updated to remove the Option wrapper for `next_record_lsn`.	2024-12-19 14:04:46 +00:00
Vlad Lazar	502d512fe2	safekeeper: lift benchmarking utils into safekeeper crate (#10200 ) ## Problem The benchmarking utilities are also useful for testing. We want to write tests in the safekeeper crate. ## Summary of changes This commit lifts the utils to the safekeeper crate. They are compiled if the benchmarking features is enabled or if in test mode.	2024-12-19 14:04:42 +00:00
John Spray	afda6d4700	storage_scrubber: don't report half-created timelines as corruption (#10198 ) ## Problem test_timeline_archival_chaos does timeline creation with failure injection, and thereby sometimes leaves timelines in a part created state. This was being reported as corruption by the scrubber on test teardown, because it considered a layer without an index to be an invalid state. This was incorrect: the scrubber should accept this state, it occurs legitimately during timeline creation. Closes: https://github.com/neondatabase/neon/issues/9988 ## Summary of changes - Report a timeline with layers but no index as Relic rather than MissingIndexPart. - We retain the MissingIndexPart variant for the case where an index _was_ found in the listing, but was not found by a subsequent GET, i.e. racing with deletion.	2024-12-19 12:55:05 +00:00
John Spray	65042cbadd	tests: use high IO concurrency in `test_pgdata_import_smoke`, use `effective_io_concurrency=2` in tests by default (#10114 ) ## Problem `test_pgdata_import_smoke` writes two gigabytes of pages and then reads them back serially. This is CPU bottlenecked and results in a long runtime, and sensitivity to CPU load from other tests on the same machine. Closes: https://github.com/neondatabase/neon/issues/10071 ## Summary of changes - Use effective_io_concurrency=32 when doing sequential scans through 2GiB of pages in test_pgdata_import_smoke. This is a ~10x runtime decrease in the parts of the test that do sequential scans. - Also set `effective_io_concurrency=2` for tests, as I noticed while debugging that we were doing all getpage requests serially, which is bad for checking the stability of the batching code.	2024-12-19 10:58:49 +00:00
Folke Behrens	b135194090	proxy: Delay SASL complete message until auth is done (#10189 ) The final SASL complete message can be bundled with the remainder of the auth flow messages until ReadyForQuery. neondatabase/cloud#19184	2024-12-19 10:37:08 +00:00
Peter Bendel	43dc03459d	Run pgbench on 10 GB scale factor on database with n relations (e.g. 10k) (#10172 ) ## Problem We want to verify how much / if pgbench throughput and latency on Neon suffers if the database contains many other relations, too. ## Summary of changes Modify the benchmarking.yml pgbench-compare job to - create an addiitional project at scale factor 10 GiB - before running pgbench add n tables (initially 10k) to the database - then compare the pgbench throughput and latency to the existing pgbench-compare at 10 Gib scale factor We use a realistic template for the n relations that is a partitioned table with some realistic data types, indexes and constraints - similar to a table that we use internally. Example run: https://github.com/neondatabase/neon/actions/runs/12377565956/job/34547386959	2024-12-19 10:25:44 +00:00
Christian Schwarz	a1b0558493	fast import: importer: use aws s3 cli (#10162 ) ## Problem s5cmd doesn't pick up the pod service account ``` 2024/12/16 16:26:01 Ignoring, HTTP credential provider invalid endpoint host, "169.254.170.23", only loopback hosts are allowed. <nil> ERROR "ls s3://neon-dev-bulk-import-us-east-2/import-pgdata/fast-import/v1/br-wandering-hall-w2xobawv": NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors ``` ## Summary of changes Switch to offical CLI. ## Testing Tested the pre-merge image in staging, using `job_image` override in project settings. https://neondb.slack.com/archives/C033RQ5SPDH/p1734554944391949?thread_ts=1734368383.258759&cid=C033RQ5SPDH ## Future Work Switch back to s5cmd once https://github.com/peak/s5cmd/pull/769 gets merged. ## Refs - fixes https://github.com/neondatabase/cloud/issues/21876 --------- Co-authored-by: Gleb Novikov <NanoBjorn@users.noreply.github.com>	2024-12-19 10:04:17 +00:00
Alex Chi Z.	cc138b56f9	fix(pageserver): run psql in thread to avoid blocking (#10177 ) ## Problem ref https://github.com/neondatabase/neon/issues/10170 ref https://github.com/neondatabase/neon/issues/9994 The psql command will block the main thread, causing other async tasks to timeout (i.e., HTTP connect). Therefore, we need to move it to an I/O executor thread. ## Summary of changes * run psql connection in a thread --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2024-12-19 09:45:06 +00:00
Konstantin Knizhnik	61fcf64c22	Fix flukyness of test_physical_and_logical_replicaiton.py (#10176 ) ## Problem See https://github.com/neondatabase/neon/issues/10037 test_physical_and_logical_replication.py sometimes failed. ## Summary of changes Add `wait_replica_caughtup` to wait for replica sync Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-18 19:15:38 +00:00
Alex Chi Z.	6d3e8096fc	refactor(test): tighten up test_gc_feedback (#10126 ) ## Problem In https://github.com/neondatabase/neon/pull/8103 we changed the test case to have more test coverage of gc_compaction. Now that we have `test_gc_compaction_smoke`, we can revert this test case to serve its original purpose and revert the parameter changes. part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Revert pitr_interval from 60s to 10s. * Assert the physical/logical size ratio in the benchmark. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-18 18:10:05 +00:00
Alex Chi Z.	3d1c3a80ae	feat(pageserver): add compact queue http endpoint (#10173 ) ## Problem We cannot get the size of the compaction queue and access the info. Part of #9114 ## Summary of changes * Add an API endpoint to get the compaction queue. * gc_compaction test case now waits until the compaction finishes. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-18 18:09:02 +00:00
John Spray	835287ba3a	neon_local: add a `flock` to protect against concurrent execution (#10185 ) ## Problem `neon_local` has always been unsafe to run concurrently with itself: it uses simple text files for persistent state, and concurrent runs will step on each other. In some test environments we intentionally handle this with mutexes in python land, but it's fragile to try and always remember to do that. ## Summary of changes - Add a `flock` based mutex around the `main` function of neon_local, using the repo directory as the file to lock - Clean up an Option<> around control_plane_api, this is a drive-by change because it was one of the fields that had a weird effect when previous concurrent stuff stamped on it.	2024-12-18 16:29:47 +00:00
Conrad Ludgate	d63602cc78	chore(proxy): fully remove allow-self-signed-compute flag (#10168 ) When https://github.com/neondatabase/cloud/pull/21856 is merged, this flag is no longer necessary.	2024-12-18 16:03:14 +00:00
Erik Grinaker	1668d39b7c	safekeeper: fix typo in allowlist for `/profile/heap` (#10186 )	2024-12-18 15:51:53 +00:00
Alex Chi Z.	1d12efc428	fix(pageserver): allow repartition errors during gc-compaction smoke tests (#10164 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 In https://github.com/neondatabase/neon/pull/10127 we fixed the race, but we didn't add the errors to the allowlist. ## Summary of changes * Allow repartition errors in the gc-compaction smoke test. I think it might be worth to refactor the code to allow multiple threads getting a copy of repartition status (i.e., using Rcu) in the future. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-18 15:37:26 +00:00
Arpad Müller	85696297c5	Add safekeepers command to storcon_cli for listing (#10151 ) Add a `safekeepers` subcommand to `storcon_cli` that allows listing the safekeepers. ``` $ curl -X POST --url http://localhost:1234/control/v1/safekeeper/42 --data \ '{"active":true, "id":42, "created_at":"2023-10-25T09:11:25Z", "updated_at":"2024-08-28T11:32:43Z","region_id":"neon_local","host":"localhost","port":5454,"http_port":0,"version":123,"availability_zone_id":"us-east-2b"}' $ cargo run --bin storcon_cli -- --api http://localhost:1234 safekeepers Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.38s Running `target/debug/storcon_cli --api 'http://localhost:1234' safekeepers` +----+---------+-----------+------+-----------+------------+ \| Id \| Version \| Host \| Port \| Http Port \| AZ Id \| +==========================================================+ \| 42 \| 123 \| localhost \| 5454 \| 0 \| us-east-2b \| +----+---------+-----------+------+-----------+------------+ ``` Also: * Don't return the raw `SafekeeperPersistence` struct that contains the raw database presentation, but instead a new `SafekeeperDescribeResponse` struct. * The `SafekeeperPersistence` struct leaves out the `active` field on purpose because we want to deprecate it and replace it with a `scheduling_policy` one. Part of https://github.com/neondatabase/neon/issues/9981	2024-12-18 12:47:56 +00:00
Konstantin Knizhnik	aaf980f70d	Online checkpoint replication state (#9976 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1733180965970089 Replication state is checkpointed only by shutdown checkpoint. It means that replication snapshots are not removed till compute shutdown. ## Summary of changes Checkpoint replication state during online checkpoint Related Postgres PR: https://github.com/neondatabase/postgres/pull/546 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-18 09:34:38 +00:00
a-masterov	c52514ab02	Fix allure report creation on periodic `pg_regress` testing (#10171 ) ## Problem The allure report finishes with the error `HttpError: Resource not accessible by integration` while running the `pg_regress` test against a cloud staging project due to a lack of permissions. ## Summary of changes The permissions are added.	2024-12-17 20:47:44 +00:00
Conrad Ludgate	2ee6bc5ec4	chore(proxy): update vendored postgres libs to edition 2021 (#10139 ) I ran `cargo fix --edition` in each project prior, and it found nothing that needed fixing.	2024-12-17 20:06:18 +00:00
John Spray	fd230227f2	storcon: include preferred AZ in compute notifications (#9953 ) ## Problem It is unreliable for the control plane to infer the AZ for computes from where the tenant is currently attached, because if a tenant happens to be in a degraded state or a release is ongoing while a compute starts, then the tenant's attached AZ can be a different one to where it will run long-term, and the control plane doesn't check back later to restart the compute. This can land in parallel with https://github.com/neondatabase/neon/pull/9947 ## Summary of changes - Thread through the preferred AZ into the compute hook code via the reconciler - Include the preferred AZ in the body of compute hook notifications	2024-12-17 20:04:09 +00:00
Ivan Efremov	93e958341f	[proxy]: Use TLS for cancellation queries (#10152 ) ## Problem pg_sni_router assumes that all the streams are upgradable to TLS. Cancellation requests were declined because of using NoTls config. ## Summary of changes Provide TLS client config for cancellation requests. Fixes [#21789](https://github.com/orgs/neondatabase/projects/65/views/1?pane=issue&itemId=90911361&issue=neondatabase%7Ccloud%7C21789)	2024-12-17 19:26:54 +00:00
Tristan Partin	7dddbb9570	Add pg_repack extension (#10100 ) Our solutions engineers and some customers would like to have this extension available. Link: https://github.com/neondatabase/cloud/issues/18890 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-17 18:36:55 +00:00
Erik Grinaker	a55853f67f	utils: symbolize heap profiles (#10153 ) ## Problem Jemalloc heap profiles aren't symbolized. This is inconvenient, and doesn't work with Grafana Cloud Profiles. Resolves #9964. ## Summary of changes Symbolize the heap profiles in-process, and strip unnecessary cruft. This uses about 100 MB additional memory to cache the DWARF information, but I believe this is already the case with CPU profiles, which use the same library for symbolization. With cached DWARF information, the symbolization CPU overhead is negligible. Example profiles: * [pageserver.pb.gz](https://github.com/user-attachments/files/18141395/pageserver.pb.gz) * [safekeeper.pb.gz](https://github.com/user-attachments/files/18141396/safekeeper.pb.gz)	2024-12-17 16:51:58 +00:00
Mikhail Kot	007b13b79a	Don't build tests in compute image, use ninja (#10149 ) Don't build tests in h3 and rdkit: ~15 min speedup. Use Ninja as cmake generator where possible: ~10 min speedup. Clean apt cache for smaller images: around 250mb size loss for intermediate layers	2024-12-17 16:43:54 +00:00
Alexey Kondratov	2dfd3cab8c	fix(compute): Report compute_backpressure_throttling_seconds as counter (#10125 ) ## Problem It was reported as `gauge`, but it's actually a `counter`. Also add `_total` suffix as that's the convention for counters. The corresponding flux-fleet PR: https://github.com/neondatabase/flux-fleet/pull/386	2024-12-17 16:14:07 +00:00
John Spray	b5833ef259	remote_storage: configurable connection pooling for ABS (#10169 ) ## Problem The ABS SDK's default behavior is to do no connection pooling, i.e. open and close a fresh connection for each request. Under high request rates, this can result in an accumulation of TCP connections in TIME_WAIT or CLOSE_WAIT state, and in extreme cases exhaustion of client ports. Related: https://github.com/neondatabase/cloud/issues/20971 ## Summary of changes - Add a configurable `conn_pool_size` parameter for Azure storage, defaulting to zero (current behavior) - Construct a custom reqwest client using this connection pool size.	2024-12-17 12:24:51 +00:00
Erik Grinaker	b0e43c2f88	postgres_ffi: add `WalStreamDecoder::complete_record()` benchmark (#10158 ) Touches #10097.	2024-12-17 10:35:00 +00:00
a-masterov	e226d7a3d1	Fix docker compose with PG17 (#10165 ) ## Problem It's impossible to run docker compose with compute v17 due to `pg_anon` extension which is not supported under PG17. ## Summary of changes The auto-loading of `pg_anon` is disabled by default	2024-12-17 08:16:54 +00:00
Folke Behrens	aa7ab9b3ac	proxy: Allow dumping TLS session keys for debugging (#10163 ) ## Problem To debug issues with TLS connections there's no easy way to decrypt packets unless a client has special support for logging the keys. ## Summary of changes Add TLS session keys logging to proxy via `SSLKEYLOGFILE` env var gated by flag.	2024-12-16 18:56:24 +00:00
Erik Grinaker	28ccda0a63	test_runner: ignore error in `test_timeline_archival_chaos` (#10161 ) Resolves #10159.	2024-12-16 17:10:55 +00:00
Conrad Ludgate	59b7ff8988	chore(proxy): disallow unwrap and unimplemented (#10142 ) As the title says, I updated the lint rules to no longer allow unwrap or unimplemented. Three special cases: * Tests are allowed to use them * std::sync::Mutex lock().unwrap() is common because it's usually correct to continue panicking on poison * `tokio::spawn_blocking(...).await.unwrap()` is common because it will only error if the blocking fn panics, so continuing the panic is also correct I've introduced two extension traits to help with these last two, that are a bit more explicit so they don't need an expect message every time.	2024-12-16 16:37:15 +00:00
Conrad Ludgate	2e4c9c5704	chore(proxy): remove allow_self_signed from regular proxy (#10157 ) I noticed that the only place we use this flag is for testing console redirect proxy. Makes sense to me to make this assumption more explicit.	2024-12-16 16:11:39 +00:00
Erik Grinaker	3d30a7a934	pageserver: make `RemoteTimelineClient::schedule_index_upload` infallible (#10155 ) Remove an unnecessary `Result` and address a `FIXME`.	2024-12-16 15:54:47 +00:00
Conrad Ludgate	6565fd4056	chore: fix clippy lints 2024-12-06 (#10138 )	2024-12-16 15:33:21 +00:00
Arseny Sher	c5e3314c6e	Add test restarting compute at WAL page boundary (#10111 ) ## Problem We've had similar test in test_logical_replication, but then removed it because it wasn't needed to trigger LR related bug. Restarting at WAL page boundary is still a useful test, so add it separately back. ## Summary of changes Add the test.	2024-12-16 14:53:04 +00:00
Arseny Sher	1ed0e52bc8	Extract safekeeper http client to separate crate. (#10140 ) ## Problem We want to use safekeeper http client in storage controller and neon_local. ## Summary of changes Extract it to separate crate. No functional changes.	2024-12-16 12:07:24 +00:00
Conrad Ludgate	24d6587914	chore(proxy): refactor self-signed config (#10154 ) ## Problem While reviewing #10152 I found it tricky to actually determine whether the connection used `allow_self_signed_compute` or not. I've tried to remove this setting in the past: * https://github.com/neondatabase/neon/pull/7884 * https://github.com/neondatabase/neon/pull/7437 * https://github.com/neondatabase/cloud/pull/13702 But each time it seems it is used by e2e tests ## Summary of changes The `node_info.allow_self_signed_computes` is always initialised to false, and then sometimes inherits the proxy config value. There's no need this needs to be in the node_info, so removing it and propagating it via `TcpMechansim` is simpler.	2024-12-16 11:15:25 +00:00
John Spray	ebcbc1a482	pageserver: tighten up code around SLRU dir key handling (#10082 ) ## Problem Changes in #9786 were functionally complete but missed some edges that made testing less robust than it should have been: - `is_key_disposable` didn't consider SLRU dir keys disposable - Timeline `init_empty` was always creating SLRU dir keys on all shards The result was that when we had a bug (https://github.com/neondatabase/neon/pull/10080), it wasn't apparent in tests, because one would only encounter the issue if running on a long-lived timeline with enough compaction to drop the initially created empty SLRU dir keys, _and_ some CLog truncation going on. Closes: https://github.com/neondatabase/cloud/issues/21516 ## Summary of changes - Update is_key_global and init_empty to handle SLRU dir keys properly -- the only functional impact is that we avoid writing some spurious keys in shards >0, but this makes testing much more robust. - Make `test_clog_truncate` explicitly use a sharded tenant The net result is that if one reverts #10080, then tests fail (i.e. this PR is a reproducer for the issue)	2024-12-16 10:06:08 +00:00
Konstantin Knizhnik	117c1b5dde	Do not perform prefetch for temp relations (#10146 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1734002916827019 With recent prefetch fixes for pg17 and `effective_io_concurrency=100` pg_regress test stats.sql is failed when set temp_buffers to 100. Stream API will try to lock all this 100 buffers for prefetch. ## Summary of changes Disable such behaviour for temp relations. Postgres PR: https://github.com/neondatabase/postgres/pull/548 Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-16 06:03:53 +00:00
Erik Grinaker	f3ecd5d76a	pageserver: revert flush backpressure (#8550 ) (#10135 ) ## Problem In #8550, we made the flush loop wait for uploads after every layer. This was to avoid unbounded buildup of uploads, and to reduce compaction debt. However, the approach has several problems: * It prevents upload parallelism. * It prevents flush and upload pipelining. * It slows down ingestion even when there is no need to backpressure. * It does not directly backpressure WAL ingestion (only via `disk_consistent_lsn`), and will build up in-memory layers. * It does not directly backpressure based on compaction debt and read amplification. An alternative solution to these problems is proposed in #8390. In the meanwhile, we revert the change to reduce the impact on ingest throughput. This does reintroduce some risk of unbounded upload/compaction buildup. Until https://github.com/neondatabase/neon/issues/8390, this can be addressed in other ways: * Use `max_replication_apply_lag` (aka `remote_consistent_lsn`), which will more directly limit upload debt. * Shard the tenant, which will spread the flush/upload work across more Pageservers and move the bottleneck to Safekeeper. Touches #10095. ## Summary of changes Remove waiting on the upload queue in the flush loop.	2024-12-15 09:45:12 +00:00
Mikhail Kot	cf161e1556	fix(adapter): password not set in role drop (#10130 ) ## Problem When entry was dropped and password wasn't set, new entry had uninitialized memory in controlplane adapter Resolves: https://github.com/neondatabase/cloud/issues/14914 ## Summary of changes Initialize password in all cases, add tests. Minor formatting for less indentation	2024-12-14 17:37:13 +00:00
Konstantin Knizhnik	2521eba674	Check for invalid down link while prefetching B-Tree leave pages for index-only scan (#9867 ) ## Problem See #9866 Index-only scan prefetch implementation doesn't take in account that down link may be invalid ## Summary of changes Check that downlink is valid block number Correspondent Postgres PRs: https://github.com/neondatabase/postgres/pull/534 https://github.com/neondatabase/postgres/pull/535 https://github.com/neondatabase/postgres/pull/536 https://github.com/neondatabase/postgres/pull/537 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-13 20:46:41 +00:00
Alexander Bayandin	d56fea680e	CI: always require aws-oicd-role-arn input to be set (#10145 ) ## Problem `benchmarking` job fails because `aws-oicd-role-arn` input is not set ## Summary of changes: - Set `aws-oicd-role-arn` for `benchmarking job - Always require `aws-oicd-role-arn` to be set - Rename `aws_oicd_role_arn` to `aws-oicd-role-arn` for consistency	2024-12-13 19:56:32 +00:00
Alex Chi Z.	7ee5dca752	fix(pageserver): race between gc-compaction and repartition (#10127 ) ## Problem close https://github.com/neondatabase/neon/issues/10124 gc-compaction split_gc_jobs is holding the repartition lock for too long time. ## Summary of changes * Ensure split_gc_compaction_jobs drops the repartition lock once it finishes cloning the structures. * Update comments. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-13 18:22:25 +00:00
Tristan Partin	07d1db54b3	Improve comments and log messages in the logical replication monitor (#9974 ) Improved comments will help others when they read the code, and the log messages will help others understand why the logical replication monitor works the way it does. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-13 18:10:42 +00:00
Konstantin Knizhnik	eeabecd89f	Correctly update LFC used_pages in case of LFC resize (#10128 ) ## Problem LFC used_pages statistic is not updated in case of LFC resize (shrinking `neon.file_cache_size_limit`) ## Summary of changes Update `lfc_ctl->used_pages` in `lfc_change_limit_hook` Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-13 17:40:26 +00:00
Christian Schwarz	fcff752851	fix(test_timeline_archival_chaos): flakiness caused by orphan layers (#10083 ) The test was failing with the scary but generic message `Remote storage metadata corrupted`. The underlying scrubber error is `Orphan layer detected: ...`. The test kills pageserver at random points, hence it's expected that we leak layers if we're killed in the window after layer upload but before it's referenced from index part. Refer to generation numbers RFC for details. Refs: - fixes https://github.com/neondatabase/neon/issues/9988 - root-cause analysis https://github.com/neondatabase/neon/issues/9988#issuecomment-2520673167	2024-12-13 16:28:21 +00:00
Alexander Bayandin	2c91062828	test_prefetch: reduce timeout to default 5m from 10m (#10105 ) ## Problem `test_prefetch` is flaky (https://github.com/neondatabase/neon/issues/9961), but if it passes, the run time is less than 30 seconds — we don't need an extended timeout for it. ## Summary of changes - Remove extended test timeout for `test_prefetch`	2024-12-13 14:52:54 +00:00
Arseny Sher	ce8eb089f3	Extract public sk types to safekeeper_api (#10137 ) ## Problem We want to extract safekeeper http client to separate crate for use in storage controller and neon_local. However, many types used in the API are internal to safekeeper. ## Summary of changes Move them to safekeeper_api crate. No functional changes. ref https://github.com/neondatabase/neon/issues/9011	2024-12-13 14:06:27 +00:00
a-masterov	7dc382601c	Fix pg_regress tests on a cloud staging instance (#10134 ) ## Problem pg_regress tests start failing due to unique ids added to Neon error messages ## Summary of changes Patches updated	2024-12-13 13:59:04 +00:00
Rahul Patil	2451969d5c	fix(ci): Allow github-action-script to post reports (#10136 ) Allow github-action-script to post reports. Failed CI: https://github.com/neondatabase/neon/actions/runs/12304655364/job/34342554049#step:13:514	2024-12-13 12:22:15 +00:00
Arpad Müller	671889b0e9	Merge pull request #10133 from neondatabase/rc/release/2024-12-13 Storage release 2024-12-13	2024-12-13 13:08:40 +01:00
github-actions[bot]	aeb79d1bb6	Storage release 2024-12-13	2024-12-13 06:02:24 +00:00
JC Grünhage	59ef701925	CI(deploy): fix git tag/release creation (#10119 ) ## Problem When moving the comment on proxy-releases from the yaml doc into a javascript code block, I missed converting the comment marker from `#` to `//`. ## Summary of changes Correctly convert comment marker.	2024-12-12 23:38:20 +00:00
Alexander Bayandin	ac04bad457	CI: don't run debug builds with LFC (#10123 ) ## Problem I've noticed that debug builds with LFC fail more frequently and for some reason ,their failure do block merging (but it should not) ## Summary of changes - Do not run Debug builds with LFC	2024-12-12 22:55:38 +00:00
Peter Bendel	2f3f98a319	use OIDC role instead of AWS access keys for managing test runner (#10117 ) in periodic pagebench workflow ## Problem for background see https://github.com/neondatabase/cloud/issues/21545 ## Summary of changes use OIDC role to manage runners instead of AWS access key which needs to be periodically rotated ## logs seems to work in https://github.com/neondatabase/neon/actions/runs/12298575888/job/34322306127#step:6:1	2024-12-12 20:25:39 +00:00
Alex Chi Z.	5ff4b991c7	feat(pageserver): gc-compaction split over LSN (#9900 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114, stacked PR over https://github.com/neondatabase/neon/pull/9897, partially refactored to help with https://github.com/neondatabase/neon/issues/10031 ## Summary of changes * gc-compaction takes `above_lsn` parameter. We only compact the layers above this LSN, and all data below the LSN are treated as if they are on the ancestor branch. * refactored gc-compaction to take `GcCompactJob` that describes the rectangular range to be compacted. * Added unit test for this case. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-12 20:23:24 +00:00
John Spray	a93e3d31cc	storcon: refine logic for choosing AZ on tenant creation (#10054 ) ## Problem When we update our scheduler/optimization code to respect AZs properly (https://github.com/neondatabase/neon/pull/9916), the choice of AZ becomes a much higher-stakes decision. We will pretty much always run a tenant in its preferred AZ, and that AZ is fixed for the lifetime of the tenant (unless a human intervenes) Eventually, when we do auto-balancing based on utilization, I anticipate that part of that will be to automatically change the AZ of tenants if our original scheduling decisions have caused imbalance, but as an interim measure, we can at least avoid making this scheduling decision based purely on which AZ contains the emptiest node. This is a precursor to https://github.com/neondatabase/neon/pull/9947 ## Summary of changes - When creating a tenant, instead of scheduling a shard and then reading its preferred AZ back, make the AZ decision first. - Instead of choosing AZ based on which node is emptiest, use the median utilization of nodes in each AZ to pick the AZ to use. This avoids bad AZ decisions during periods when some node has very low utilization (such as after replacing a dead node) I considered also making the selection a weighted pseudo-random choice based on utilization, but wanted to avoid destabilising tests with that for now.	2024-12-12 19:35:38 +00:00
Rahul Patil	6d5687521b	fix(ci): Allow github-script to post test reports (#10120 ) Allow github-script to post test reports	2024-12-12 18:53:35 +00:00
Heikki Linnakangas	53721266f1	Disable connection logging in pgbouncer by default (#10118 ) It can produce a lot of logs, making pgbouncer itself consume all CPU in extreme cases. We saw that happen in stress testing.	2024-12-12 17:05:58 +00:00
a-masterov	2f3433876f	Change the channel for notification. (#10112 ) ## Problem Now notifications about failures in `pg_regress` tests run on the staging cloud instance, reach the channel `on-call-staging-stream`, while they should reach `on-call-qa-staging-stream` ## Summary of changes The channel changed.	2024-12-12 16:34:07 +00:00
Rahul Patil	58d45c6e86	ci(fix): Use OIDC auth to login on ECR (#10055 ) ## Problem CI currently uses static credentials in some places. These are less secure and hard to maintain, so we are going to deprecate them and use OIDC auth. ## Summary of changes - ci(fix): Use OIDC auth to upload artifact on s3 - ci(fix): Use OIDC auth to login on ECR	2024-12-12 15:13:08 +00:00
Conrad Ludgate	e502e880b5	chore(proxy): remove code for old API (#10109 ) ## Problem Now that https://github.com/neondatabase/cloud/issues/15245 is done, we can remove the old code. ## Summary of changes Removes support for the ManagementV2 API, in favour of the ProxyV1 API.	2024-12-12 13:42:50 +00:00
Arseny Sher	c9a773af37	Fix test_subscriber_synchronous_commit flakiness. (#10057 ) `6f7aeaa` configured LFC for USE_LFC case, but omitted setting shared_buffers for non USE_LFC, causing flakiness. ref https://github.com/neondatabase/neon/issues/9989	2024-12-12 11:57:00 +00:00
Vlad Lazar	ec0ce06c16	tests: default interpreted proto in tests (#10079 ) ## Problem We aren't using the sharded interpreted wal receiver protocol in all tests. ## Summary of changes Default to the interpreted protocol.	2024-12-12 10:53:10 +00:00
Alexander Bayandin	0bd8eca9ca	Storage: create release PRs On Fridays (#10017 ) ## Problem To give Storage more time on preprod — create a release branch on Friday ## Summary of changes - Automatically create Storage release PR on Friday instead of Monday	2024-12-12 09:18:50 +00:00
Misha Sakhnov	739f627b96	Bump vm-builder v0.35.0 -> v0.37.1 (#10015 ) Bump version to pick up changes introduced in the neonvm-daemon to support sys fs based CPU scaling (https://github.com/neondatabase/autoscaling/issues/1082). Previous update: https://github.com/neondatabase/neon/pull/9208	2024-12-12 08:45:52 +00:00
Arpad Müller	342cbea255	storcon: add safekeeper list API (#10089 ) This adds an API to the storage controller to list safekeepers registered to it. This PR does a `diesel print-schema > storage_controller/src/schema.rs` because of an inconsistency between up.sql and schema.rs, introduced by [this](`2c142f14f7`) commit, so there is some updates of `schema.rs` due to that. As a followup to this, we should maybe think about running `diesel print-schema` in CI. Part of #9981	2024-12-12 01:09:24 +00:00
Tristan Partin	b391b29bdc	Improve typing in test_runner/fixtures/httpserver.py (#10103 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-11 22:21:42 +00:00
Erik Grinaker	5126ebbfed	test_runner: bump test_check_visibility_map timeout (#10091 ) ## Problem `test_check_visibility_map` has been seen to time out in debug tests. ## Summary of changes Bump the timeout to 10 minutes (test reports indicate 7 minutes is sufficient). We don't want to disable the test entirely in debug builds, to exercise this with debug assertions enabled. Resolves #10069.	2024-12-11 21:37:25 +00:00
Arpad Müller	7fa986bc92	Do tenant manifest validation with index-part (#10007 ) This adds some validation of invariants that we want to uphold wrt the tenant manifest and `index_part.json`: * the data the manifest has about a timeline must match with the data in `index_part.json`. It might actually change, e.g. when we do reparenting during detach ancestor, but that requires the timeline to be unoffloaded, i.e. removed from the manifest. * any timeline mentioned in index part, must, if present, be archived. If we unarchive, we first update the tenant manifest to unoffload, and only then update index part. And one needs to archive before offloading. * it is legal for timelines to be mentioned in the manifest but have no `index_part`: this is a temporary state visible during deletion of the timeline. if the pageserver crashed, an attach of the tenant will clean the state up. * it is also legal for offloaded timelines to have an `ancestor_retain_lsn` of None while having an `ancestor_timeline_id`. This is for the to-be-added flattening functionality: the plan is to set former to None if we have flattened a timeline. follow-up of #9942 part of #8088	2024-12-11 20:10:22 +00:00
Vlad Lazar	e8395807a5	storcon: allow for more concurrency in drain/fill operations (#10093 ) ## Problem We saw the drain/fill operations not drain fast enough in ap-southeast. ## Summary of changes These are some quick changes to speed it up: * double reconcile concurrency - this is now half of the available reconcile bandwidth * reduce the waiter polling timeout - this way we can spawn new reconciliations faster	2024-12-11 19:43:40 +00:00
Vlad Lazar	a3e80448e8	pageserver/storcon: add patch endpoints for tenant config metrics (#10020 ) ## Problem Cplane and storage controller tenant config changes are not additive. Any change overrides all existing tenant configs. This would be fine if both did client side patching, but that's not the case. Once this merges, we must update cplane to use the PATCH endpoint. ## Summary of changes ### High Level Allow for patching of tenant configuration with a `PATCH /v1/tenant/config` endpoint. It takes the same data as it's PUT counterpart. For example the payload below will update `gc_period` and unset `compaction_period`. All other fields are left in their original state. ``` { "tenant_id": "1234", "gc_period": "10s", "compaction_period": null } ``` ### Low Level * PS and storcon gain `PATCH /v1/tenant/config` endpoints. PS endpoint is only used for cplane managed instances. * `storcon_cli` is updated to have separate commands for `set-tenant-config` and `patch-tenant-config` Related https://github.com/neondatabase/cloud/issues/21043	2024-12-11 19:16:33 +00:00
Anastasia Lubennikova	ef233e91ef	Update compute_installed_extensions metric: (#9891 ) add owned_by_superuser field to filter out system extensions. While on it, also correct related code: - fix the metric setting: use set() instead of inc() in a loop. inc() is not idempotent and can lead to incorrect results if the function called multiple times. Currently it is only called at compute start, but this will change soon. - fix the return type of the installed_extensions endpoint to match the metric. Currently it is only used in the test.	2024-12-11 16:43:26 +00:00
Mikhail Kot	dee2041cd3	walproposer: fix link error on debian 12 / ubuntu 22 (#10090 ) ## Problem Linking walproposer library (e.g. `cargo t`) produces linker errors: /home/myrrc/neon/pgxn/neon/walproposer_compat.c:169: undefined reference to `pg_snprintf' The library with these symbols (libpgcommon.a) is present ## Summary of changes Changed order of libraries resolution for linker	2024-12-11 16:23:59 +00:00
Vlad Lazar	5525abdadb	Merge pull request #10087 from neondatabase/vlad/cherry-pick-multixact-truncation-fix storage: cherry-pick SLRU, metrics and sharded ingest fixes into the release branch	2024-12-11 16:02:54 +00:00
Arseny Sher	e4bb1ca7d8	Increase neon_local http client to compute timeout in reconfigure. (#10088 ) Seems like 30s sometimes not enough when CI runners are overloaded, causing pull_timeline flakiness. ref https://github.com/neondatabase/neon/issues/9731#issuecomment-2535946443	2024-12-11 15:46:50 +00:00
a-masterov	b987648e71	Enable LFC for all the PG versions. (#10068 ) ## Problem We added support for LFC for tests but are still using it only for the PG17 release. ## Summary of changes LFC is enabled for all PG versions. Errors in tests with LFC enabled now block merging as usual. We keep tests with disabled LFC for PG17 release. Tests on debug builds with LFC enabled still don't affect permission to merge.	2024-12-11 15:28:10 +00:00
Mikhail Kot	c79c1dd8e9	compute_ctl: don't panic if control plane can't be reached (#10078 ) ## Problem If the control plane cannot be reached for some reason, compute_ctl panics ## Summary of changes panic is removed in favour of returning an error. Code is reformatted a bit for more flat control flow Resolves: #5391	2024-12-11 15:03:11 +00:00
Vlad Lazar	a53db73851	pageserver: don't drop multixact slrus on non zero shards (#10086 ) ## Problem We get slru truncation commands on non-zero shards. Compaction will drop the slru dir keys and ingest will fail when receiving such records. https://github.com/neondatabase/neon/pull/10080 fixed it for clog, but not for multixact. ## Summary of changes Only truncate multixact slrus on shard zero. I audited the rest of the ingest code and it looks fine from this pov.	2024-12-11 14:28:18 +00:00
Christian Schwarz	c4ce4ac25a	page_service: don't count time spent in Batcher towards smgr latency metrics (#10075 ) ## Problem With pipelining enabled, the time a request spends in the batcher stage counts towards the smgr op latency. If pipelining is disabled, that time is not accounted for. In practice, this results in a jump in smgr getpage latencies in various dashboards and degrades the internal SLO. ## Solution In a similar vein to #10042 and with a similar rationale, this PR stops counting the time spent in batcher stage towards smgr op latency. The smgr op latency metric is reduced to the actual execution time. Time spent in batcher stage is tracked in a separate histogram. I expect to remove that histogram after batching rollout is complete, but it will be helpful in the meantime to reason about the rollout.	2024-12-11 14:48:54 +01:00
Vlad Lazar	fde1046278	wal_decoder: fix compact key protobuf encoding (#10074 ) ## Problem Protobuf doesn't support 128 bit integers, so we encode the keys as two 64 bit integers. Issue is that when we split the 128 bit compact key we use signed 64 bit integers to represent the two halves. This may result in a negative lower half when relnode is larger than `0x00800000`. When we convert the lower half to an i128 we get a negative `CompactKey`. ## Summary of Changes Use unsigned integers when encoding into Protobuf. ## Deployment * Prod: We disabled the interpreted proto, so no compat concerns. * Staging: Disable the interpreted proto, do one release, and then release the fixed version. We do this because a negative int32 will convert to a large uint32 value and could give a key in the actual pageserver space. In production we would around this by adding new fields to the proto and deprecating the old ones, but we can make our lives easy here. * Pre-prod: Same as staging	2024-12-11 14:48:45 +01:00
Christian Schwarz	9ae980bf4f	page_service: don't count time spent in Batcher towards smgr latency metrics (#10075 ) ## Problem With pipelining enabled, the time a request spends in the batcher stage counts towards the smgr op latency. If pipelining is disabled, that time is not accounted for. In practice, this results in a jump in smgr getpage latencies in various dashboards and degrades the internal SLO. ## Solution In a similar vein to #10042 and with a similar rationale, this PR stops counting the time spent in batcher stage towards smgr op latency. The smgr op latency metric is reduced to the actual execution time. Time spent in batcher stage is tracked in a separate histogram. I expect to remove that histogram after batching rollout is complete, but it will be helpful in the meantime to reason about the rollout.	2024-12-11 13:37:08 +00:00
Vlad Lazar	fcfd1c7d0a	pageserver: don't drop multixact slrus on non zero shards	2024-12-11 13:41:35 +01:00
Vlad Lazar	665369c439	wal_decoder: fix compact key protobuf encoding (#10074 ) ## Problem Protobuf doesn't support 128 bit integers, so we encode the keys as two 64 bit integers. Issue is that when we split the 128 bit compact key we use signed 64 bit integers to represent the two halves. This may result in a negative lower half when relnode is larger than `0x00800000`. When we convert the lower half to an i128 we get a negative `CompactKey`. ## Summary of Changes Use unsigned integers when encoding into Protobuf. ## Deployment * Prod: We disabled the interpreted proto, so no compat concerns. * Staging: Disable the interpreted proto, do one release, and then release the fixed version. We do this because a negative int32 will convert to a large uint32 value and could give a key in the actual pageserver space. In production we would around this by adding new fields to the proto and deprecating the old ones, but we can make our lives easy here. * Pre-prod: Same as staging	2024-12-11 12:35:02 +00:00
JC Grünhage	d7aeca2f34	CI(deploy): create git tags/releases before triggering deploy workflows (#10022 ) ## Problem When dev deployments are disabled (or fail), the tags for releases aren't created. It makes more sense to have tag and release creation before the deployment to prevent situations like [this](https://github.com/neondatabase/neon/pull/9959). It is not enough to move the tag creation before the deployment. If the deployment fails, re-running the job isn't possible because the API call to create the tag will fail. ## Summary of changes - Tag/Release creation now happens before the deployment - The two steps for tag and release have been merged into a bigger one - There's new checks to ensure the that if the tags/releases already exist as expected, things will continue just fine.	2024-12-11 09:41:34 +00:00
John Spray	38415a9816	pageserver: fix ingest handling of CLog truncate (#10080 ) ## Problem In #9786 we stop storing SLRUs on non-zero shards. However, there was one code path during ingest that still tries to enumerate SLRU relations on all shards. This fails if it sees a tenant who has never seen any write to an SLRU, or who has done such thorough compaction+GC that it has dropped its SLRU directory key. ## Summary of changes - Avoid trying to list SLRU relations on nonzero shards	2024-12-11 09:16:11 +00:00
Alex Chi Z.	2455dca403	Merge pull request #10081 from neondatabase/skyzh/cherry-pick-fix pageserver: fix CLog truncate walingest	2024-12-10 22:53:46 -05:00
John Spray	bc6354921f	pageserver: fix CLog truncate walingest	2024-12-10 22:30:25 -05:00
Matthias van de Meent	597125e124	Disable readstream's reliance on seqscan readahead (#9860 ) Neon doesn't have seqscan detection of its own, so stop read_stream from trying to utilize that readahead, and instead make it issue readahead of its own. ## Problem @knizhnik noticed that we didn't issue smgrprefetch[v] calls for seqscans in PG17 due to the move to the read_stream API, which assumes that the underlying IO facilities do seqscan detection for readahead. That is a wrong assumption when Neon is involved, so let's remove the code that applies that assumption. ## Summary of changes Remove the cases where seqscans are detected and prefetch is disabled as a consequence, and instead don't do that detection. PG PR: https://github.com/neondatabase/postgres/pull/532	2024-12-11 00:51:05 +00:00
Matthias van de Meent	e71d20d392	Emit nbtree vacuum cycle id in nbtree xlog through forced FPIs (#9932 ) This fixes neondatabase/neon#9929. ## Postgres repo PRS: - PG17: https://github.com/neondatabase/postgres/pull/538 - PG16: https://github.com/neondatabase/postgres/pull/539 - PG15: https://github.com/neondatabase/postgres/pull/540 - PG14: https://github.com/neondatabase/postgres/pull/541 ## Problem see #9929 ## Summary of changes We update the split code to force the code to emit an FPI whenever the cycle ID might be interesting for concurrent btree vacuum.	2024-12-10 19:42:52 +00:00
Alex Chi Z.	aa0554fd1e	feat(test_runner): allowed_errors in storage scrubber (#10062 ) ## Problem resolve https://github.com/neondatabase/neon/issues/9988#issuecomment-2528239437 ## Summary of changes * New verbose mode for storage scrubber scan metadata (pageserver) that contains the error messages. * Filter allowed_error list from the JSON output to determine the healthy flag status. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-10 17:00:47 +00:00
Heikki Linnakangas	b853f78136	Print a log message if GetPage response takes too long (#10046 ) We have metrics for GetPage request latencies, but this is an extra measure to capture requests that take way too long in the logs. The log message is printed every 10 s, until the response is received: ``` PG:2024-12-09 16:02:07.715 GMT [1782845] LOG: [NEON_SMGR] [shard 0] no response received from pageserver for 10.000 s, still waiting (sent 10613 requests, received 10612 responses) PG:2024-12-09 16:02:17.723 GMT [1782845] LOG: [NEON_SMGR] [shard 0] no response received from pageserver for 20.008 s, still waiting (sent 10613 requests, received 10612 responses) PG:2024-12-09 16:02:19.719 GMT [1782845] LOG: [NEON_SMGR] [shard 0] received response from pageserver after 22.006 s ```	2024-12-10 16:26:56 +00:00
Alex Chi Z.	6ad99826c1	fix(pageserver): refresh_gc_info should always increase cutoff (#9862 ) ## Problem close https://github.com/neondatabase/cloud/issues/19671 ``` Timeline ----------------------------- ^ last GC happened LSN ^ original retention period setting = 24hr > refresh-gc-info updates the gc_info ^ planned cutoff (gc_info) ^ customer set retention to 48hr, and it's still within the last GC LSN ^1 ^2 we have two choices: (1) update the planned cutoff to move backwards, or (2) keep the current one ``` In this patch, we decided to keep the current cutoff instead of moving back the gc_info to avoid races. In the future, we could allow the planned gc cutoff to go back once cplane sends a retention_history tenant config update, but this requires a careful revisit of the code. ## Summary of changes Ensure that GC cutoffs never go back if retention settings get changed. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-10 15:23:26 +00:00
Konstantin Knizhnik	311ee793b9	Fix handling in-flight requersts in prefetch buffer resize (#9968 ) ## Problem See https://github.com/neondatabase/neon/issues/9961 Current implementation of prefetch buffer resize doesn't correctly handle in-flight requests ## Summary of changes 1. Fix index of entry we should wait for if new prefetch buffer size is smaller than number of in-flight requests. 2. Correctly set flush position Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-10 15:01:40 +00:00
Erik Grinaker	ad472bd4a1	test_runner: add visibility map test (#9940 ) Verifies that visibility map pages are correctly maintained across shards. Touches #9914.	2024-12-10 12:07:00 +00:00
Arpad Müller	c51db1db61	Replace MAX_KEYS_PER_DELETE constant with function (#10061 ) Azure has a different per-request limit of 256 items for bulk deletion compared to the number of 1000 on AWS. Therefore, we need to support multiple values. Due to `GenericRemoteStorage`, we can't add an associated constant, but it has to be a function. The PR replaces the `MAX_KEYS_PER_DELETE` constant with a function of the same name, implemented on both the `RemoteStorage` trait as well as on `GenericRemoteStorage`. The value serves as hint of how many objects to pass to the `delete_objects` function. Reading: * https://learn.microsoft.com/en-us/rest/api/storageservices/blob-batch * https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html Part of #7931	2024-12-10 11:29:38 +00:00
Ivan Efremov	34c1295594	[proxy] impr: Additional logging for cancellation queries (#10039 ) ## Problem Since cancellation tasks spawned in the background sometimes logs missing context. https://neondb.slack.com/archives/C060N3SEF9D/p1733427801527419?thread_ts=1733419882.560159&cid=C060N3SEF9D ## Summary of changes Add `session_id` and change loglevel for cancellation queries	2024-12-10 10:14:28 +00:00
Evan Fleming	b593e51eae	safekeeper: use arc for global timelines and config (#10051 ) Hello! I was interested in potentially making some contributions to Neon and looking through the issue backlog I found [8200](https://github.com/neondatabase/neon/issues/8200) which seemed like a good first issue to attempt to tackle. I see it was assigned a while ago so apologies if I'm stepping on any toes with this PR. I also apologize for the size of this PR. I'm not sure if there is a simple way to reduce it given the footprint of the components being changed. ## Problem This PR is attempting to address part of the problem outlined in issue [8200](https://github.com/neondatabase/neon/issues/8200). Namely to remove global static usage of timeline state in favour of `Arc<GlobalTimelines>` and to replace wasteful clones of `SafeKeeperConf` with `Arc<SafeKeeperConf>`. I did not opt to tackle `RemoteStorage` in this PR to minimize the amount of changes as this PR is already quite large. I also did not opt to introduce an `SafekeeperApp` wrapper struct to similarly minimize changes but I can tackle either or both of these omissions in this PR if folks would like. ## Summary of changes - Remove static usage of `GlobalTimelines` in favour of `Arc<GlobalTimelines>` - Wrap `SafeKeeperConf` in `Arc` to avoid wasteful clones of the underlying struct ## Some additional thoughts - We seem to currently store `SafeKeeperConf` in `GlobalTimelines` and then expose it through a public`get_global_config` function which requires locking. This seems needlessly wasteful and based on observed usage we could remove this public accessor and force consumers to acquire `SafeKeeperConf` through the new Arc reference.	2024-12-09 21:09:20 +00:00
Alex Chi Z.	4c4cb80186	fix(pageserver): fix gc-compaction racing with legacy gc (#10052 ) ## Problem close https://github.com/neondatabase/neon/issues/10049, close https://github.com/neondatabase/neon/issues/10030, close https://github.com/neondatabase/neon/issues/8861 part of https://github.com/neondatabase/neon/issues/9114 The legacy gc process calls `get_latest_gc_cutoff`, which uses a Rcu different than the gc_info struct. In the gc_compaction_smoke test case, the "latest" cutoff could be lower than the gc_info struct, causing gc-compaction to collect data that could be accessed by `latest_gc_cutoff`. Technically speaking, there's nothing wrong with gc-compaction using gc_info without considering latest_gc_cutoff, because gc_info is the source of truth. But anyways, let's fix it. ## Summary of changes * gc-compaction uses `latest_gc_cutoff` instead of gc_info to determine the gc horizon. * if a gc-compaction is scheduled via tenant compaction iteration, it will take the gc_block lock to avoid racing with functionalities like detach ancestor (if it's triggered via manual compaction API without scheduling, then it won't take the lock) --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-12-09 20:06:06 +00:00
a-masterov	92273b6d5e	Enable the pg_regress tests on staging for PG17 (#9978 ) ## Problem Currently, we run the `pg_regress` tests only for PG16 However, PG17 is a part of Neon and should be tested as well ## Summary of changes Modified the workflow and added a patch for PG17 enabling the `pg_regress` tests. The problem with leftovers was solved by using branches.	2024-12-09 19:30:39 +00:00
Vlad Lazar	7ac2a5560f	Merge pull request #10060 from neondatabase/vlad/manual-release-2024-12-09 Manual storage release 2024-12-09	2024-12-09 18:14:40 +00:00
Arpad Müller	e74e7aac93	Use updated patched azure SDK crates (#10036 ) For a while already, we've been unable to update the Azure SDK crates due to Azure adopting use of a non-tokio async runtime, see #7545. The effort to upstream the fix got stalled, and I think it's better to switch to a patched version of the SDK that is up to date. Now we have a fork of the SDK under the neondatabase github org, to which I have applied Conrad's rebased patches to: https://github.com/neondatabase/azure-sdk-for-rust/tree/neon . The existence of a fork will also help with shipping bulk delete support before it's upstreamed (#7931). Also, in related news, the Azure SDK has gotten a rift in development, where the main branch pertains to a future, to-be-officially-blessed release of the SDK, and the older versions, which we are currently using, are on the `legacy` branch. Upstream doesn't really want patches for the `legacy` branch any more, they want to focus on the `main` efforts. However, even then, the `legacy` branch is still newer than what we are having right now, so let's switch to `legacy` for now. Depending on how long it takes, we can switch to the official version of the SDK once it's released or switch to the upstream `main` branch if there is changes we want before that. As a nice side effect of this PR, we now use reqwest 0.12 everywhere, dropping the dependency on version 0.11. Fixes #7545	2024-12-09 15:50:06 +00:00
Vlad Lazar	4cca5cdb12	deps: update url to 2.5.4 for RUSTSEC-2024-0421 (#10059 ) ## Problem See https://rustsec.org/advisories/RUSTSEC-2024-0421 ## Summary of changes Update url crate to 2.5.4.	2024-12-09 14:57:42 +00:00
Arpad Müller	9d425b54f7	Update AWS SDK crates (#10056 ) Result of running: cargo update -p aws-types -p aws-sigv4 -p aws-credential-types -p aws-smithy-types -p aws-smithy-async -p aws-sdk-kms -p aws-sdk-iam -p aws-sdk-s3 -p aws-config We want to keep the AWS SDK up to date as that way we benefit from new developments and improvements.	2024-12-09 12:46:59 +00:00
Vlad Lazar	5f4559ecd2	Merge pull request #10053 from neondatabase/rc/release/2024-12-09 Storage release 2024-12-09	2024-12-09 12:28:51 +00:00
github-actions[bot]	6c349e76d9	Storage release 2024-12-09	2024-12-09 06:05:40 +00:00
John Spray	ec790870d5	storcon: automatically clear Pause/Stop scheduling policies to enable detaches (#10011 ) ## Problem We saw a tenant get stuck when it had been put into Pause scheduling mode to pin it to a pageserver, then it was left idle for a while and the control plane tried to detach it. Close: https://github.com/neondatabase/neon/issues/9957 ## Summary of changes - When changing policy to Detached or Secondary, set the scheduling policy to Active. - Add a test that exercises this - When persisting tenant shards, set their `generation_pageserver` to null if the placement policy is not Attached (this enables consistency checks to work, and avoids leaving state in the DB that could be confusing/misleading in future)	2024-12-07 13:05:09 +00:00
Christian Schwarz	4d7111f240	page_service: don't count time spent flushing towards smgr latency metrics (#10042 ) ## Problem In #9962 I changed the smgr metrics to include time spent on flush. It isn't under our (=storage team's) control how long that flush takes because the client can stop reading requests. ## Summary of changes Stop the timer as soon as we've buffered up the response in the `pgb_writer`. Track flush time in a separate metric. --------- Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>	2024-12-07 08:57:55 +00:00
Alex Chi Z.	b1fd086c0c	test(pageserver): disable gc_compaction smoke test for now (#10045 ) ## Problem The test is flaky. ## Summary of changes Disable the test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-06 22:30:04 +00:00
Heikki Linnakangas	b6eea65597	Fix error message if PS connection is lost while receiving prefetch (#9923 ) If the pageserver connection is lost while receiving the prefetch request, the prefetch queue is cleared. The error message prints the values from the prefetch slot, but because the slot was already cleared, they're all zeros: LOG: [NEON_SMGR] [shard 0] No response from reading prefetch entry 0: 0/0/0.0 block 0. This can be caused by a concurrent disconnect To fix, make local copies of the values. In the passing, also add a sanity check that if the receive() call succeeds, the prefetch slot is still intact.	2024-12-06 20:56:57 +00:00
Alex Chi Z.	c42c28b339	feat(pageserver): gc-compaction split job and partial scheduler (#9897 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114, stacked PR over #9809 The compaction scheduler now schedules partial compaction jobs. ## Summary of changes * Add the compaction job splitter based on size. * Schedule subcompactions using the compaction scheduler. * Test subcompaction scheduler in the smoke regress test. * Temporarily disable layer map checks --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-12-06 18:44:26 +00:00
Tristan Partin	e4837b0a5a	Bump sql_exporter to 0.16.0 (#10041 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-06 17:43:55 +00:00
Erik Grinaker	14c4fae64a	test_runner/performance: add improved bulk insert benchmark (#9812 ) Adds an improved bulk insert benchmark, including S3 uploads. Touches #9789.	2024-12-06 15:17:15 +00:00
Vlad Lazar	cc70fc802d	pageserver: add metric for number of wal records received by each shard (#10035 ) ## Problem With the current metrics we can't identify which shards are ingesting data at any given time. ## Summary of changes Add a metric for the number of wal records received for processing by each shard. This is per (tenant, timeline, shard).	2024-12-06 12:51:41 +00:00
Alexey Kondratov	fa07097f2f	chore: Reorganize and refresh CODEOWNERS (#10008 ) ## Problem We didn't have a codeowner for `/compute`, so nobody was auto-assigned for PRs like #9973 ## Summary of changes While on it: 1. Group codeowners into sections. 2. Remove control plane from the `/compute_tools` because it's primarily the internal `compute_ctl` code. 3. Add control plane (and compute) to `/libs/compute_api` because that's the shared public interface of the compute.	2024-12-06 11:44:50 +00:00
Erik Grinaker	7838659197	pageserver: assert that keys belong to shard (#9943 ) We've seen cases where stray keys end up on the wrong shard. This shouldn't happen. Add debug assertions to prevent this. In release builds, we should be lenient in order to handle changing key ownership policies. Touches #9914.	2024-12-06 10:24:13 +00:00
Vlad Lazar	3f1c542957	pageserver: add disk consistent and remote lsn metrics (#10005 ) ## Problem There's no metrics for disk consistent LSN and remote LSN. This stuff is useful when looking at ingest performance. ## Summary of changes Two per timeline metrics are added: `pageserver_disk_consistent_lsn` and `pageserver_projected_remote_consistent_lsn`. I went for the projected remote lsn instead of the visible one because that more closely matches remote storage write tput. Ideally we would have both, but these metrics are expensive.	2024-12-06 10:21:52 +00:00
Erik Grinaker	ec4072f845	pageserver: add `wait_until_flushed` parameter for timeline checkpoint (#10013 ) ## Problem I'm writing an ingest benchmark in #9812. To time S3 uploads, I need to schedule a flush of the Pageserver's in-memory layer, but don't actually want to wait around for it to complete (which will take a minute). ## Summary of changes Add a parameter `wait_until_flush` (default `true`) for `timeline/checkpoint` to control whether to wait for the flush to complete.	2024-12-06 10:12:39 +00:00
Erik Grinaker	56f867bde5	pageserver: only zero truncated FSM page on owning shard (#10032 ) ## Problem FSM pages are managed like regular relation pages, and owned by a single shard. However, when truncating the FSM relation the last FSM page was zeroed out on all shards. This is unnecessary and potentially confusing. The superfluous keys will be removed during compactions, as they do not belong on these shards. Resolves #10027. ## Summary of changes Only zero out the truncated FSM page on the owning shard.	2024-12-06 07:22:22 +00:00
Arpad Müller	d1ab7471e2	Fix desc_str for Azure container (#10021 ) Small logs fix I've noticed while working on https://github.com/neondatabase/cloud/issues/19963 .	2024-12-05 20:51:57 +00:00
Tristan Partin	6ff4175fd7	Send Content-Type header on reconfigure request from neon_local (#10029 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 20:30:35 +00:00
Tristan Partin	6331cb2161	Bump anyhow to 1.0.94 (#10028 ) We were over a year out of date. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 19:42:52 +00:00
Alex Chi Z.	71f38d1354	feat(pageserver): support schedule gc-compaction (#9809 ) ## Problem part of https://github.com/neondatabase/neon/issues/9114 gc-compaction can take a long time. This patch adds support for scheduling a gc-compaction job. The compaction loop will first handle L0->L1 compaction, and then gc compaction. The scheduled jobs are stored in a non-persistent queue within the tenant structure. This will be the building block for the partial compaction trigger -- if the system determines that we need to do a gc compaction, it will partition the keyspace and schedule several jobs. Each of these jobs will run for a short amount of time (i.e, 1 min). L0 compaction will be prioritized over gc compaction. ## Summary of changes * Add compaction scheduler in tenant. * Run scheduled compaction in integration tests. * Change the manual compaction API to allow schedule a compaction instead of immediately doing it. * Add LSN upper bound as gc-compaction parameter. If we schedule partial compactions, gc_cutoff might move across different runs. Therefore, we need to pass a pre-determined gc_cutoff beforehand. (TODO: support LSN lower bound so that we can compact arbitrary "rectangle" in the layer map) * Refactor the gc_compaction internal interface. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-05 19:37:17 +00:00
Tristan Partin	c0ba416967	Add compute_logical_snapshots_bytes metric (#9887 ) This metric exposes the size of all non-temporary logical snapshot files. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-05 19:04:33 +00:00
Alexey Kondratov	13e8105740	feat(compute): Allow specifying the reconfiguration concurrency (#10006 ) ## Problem We need a higher concurrency during reconfiguration in case of many DBs, but the instance is already running and used by the client. We can easily get out of `max_connections` limit, and the current code won't handle that. ## Summary of changes Default to 1, but also allow control plane to override this value for specific projects. It's also recommended to bump `superuser_reserved_connections` += `reconfigure_concurrency` for such projects to ensure that we always have enough spare connections for reconfiguration process to succeed. Quick workaround for neondatabase/cloud#17846	2024-12-05 17:57:25 +00:00
Erik Grinaker	db79304416	storage_controller: increase shard scan timeout (#10000 ) ## Problem The node shard scan timeout of 1 second is a bit too aggressive, and we've seen this cause test failures. The scans are performed in parallel across nodes, and the entire operation has a 15 second timeout. Resolves #9801. ## Summary of changes Increase the timeout to 5 seconds. This is still enough to time out on a network failure and retry successfully within 15 seconds.	2024-12-05 17:29:21 +00:00
Ivan Efremov	ffc9c33eb2	proxy: Present new auth backend cplane_proxy_v1 (#10012 ) Implement a new auth backend based on the current Neon backend to switch to the new Proxy V1 cplane API. Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)	2024-12-05 05:30:38 +00:00
Yuchen Liang	ed2d892113	pageserver: fix buffered-writer on macos build (#10019 ) ## Problem In https://github.com/neondatabase/neon/pull/9693, we forgot to check macos build. The [CI run](https://github.com/neondatabase/neon/actions/runs/12164541897/job/33926455468) on main showed that macos build failed with unused variables and dead code. ## Summary of changes - add `allow(dead_code)` and `allow(unused_variables)` to the relevant code that is not used on macos. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-12-05 02:16:09 +00:00
Conrad Ludgate	131585eb6b	chore: update rust-postgres (#10002 ) Like #9931 but without rebasing upstream just yet, to try and minimise the differences. Removes all proxy-specific commits from the rust-postgres fork, now that proxy no longer depends on them. Merging upstream changes to come later.	2024-12-04 21:07:44 +00:00
Conrad Ludgate	0bab7e3086	chore: update clap (#10009 ) This updates clap to use a new version of anstream	2024-12-04 17:42:17 +00:00
Yuchen Liang	e6cd5050fc	pageserver: make `BufferedWriter` do double-buffering (#9693 ) Closes #9387. ## Problem `BufferedWriter` cannot proceed while the owned buffer is flushing to disk. We want to implement double buffering so that the flush can happen in the background. See #9387. ## Summary of changes - Maintain two owned buffers in `BufferedWriter`. - The writer is in charge of copying the data into owned, aligned buffer, once full, submit it to the flush task. - The flush background task is in charge of flushing the owned buffer to disk, and returned the buffer to the writer for reuse. - The writer and the flush background task communicate through a bi-directional channel. For in-memory layer, we also need to be able to read from the buffered writer in `get_values_reconstruct_data`. To handle this case, we did the following - Use replace `VirtualFile::write_all` with `VirtualFile::write_all_at`, and use `Arc` to share it between writer and background task. - leverage `IoBufferMut::freeze` to get a cheaply clonable `IoBuffer`, one clone will be submitted to the channel, the other clone will be saved within the writer to serve reads. When we want to reuse the buffer, we can invoke `IoBuffer::into_mut`, which gives us back the mutable aligned buffer. - InMemoryLayer reads is now aware of the maybe_flushed part of the buffer. Caveat - We removed the owned version of write, because this interface does not work well with buffer alignment. The result is that without direct IO enabled, [`download_object`](`a439d57050/pageserver/src/tenant/remote_timeline_client/download.rs (L243)`) does one more memcpy than before this PR due to the switch to use `_borrowed` version of the write. - "Bypass aligned part of write" could be implemented later to avoid large amount of memcpy. Testing - use an oneshot channel based control mechanism to make flush behavior deterministic in test. - test reading from `EphemeralFile` when the last submitted buffer is not flushed, in-progress, and done flushing to disk. ## Performance We see performance improvement for small values, and regression on big values, likely due to being CPU bound + disk write latency. [Results](https://www.notion.so/neondatabase/Benchmarking-New-BufferedWriter-11-20-2024-143f189e0047805ba99acda89f984d51?pvs=4) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-04 16:54:56 +00:00
John Spray	60c0d19f57	tests: make storcon scale test AZ-aware (#9952 ) ## Problem We have a scale test for the storage controller which also acts as a good stress test for scheduling stability. However, it created nodes with no AZs set. ## Summary of changes - Bump node count to 6 and set AZs on them. This is a precursor to other AZ-related PRs, to make sure any new code that's landed is getting scale tested in an AZ-aware environment.	2024-12-04 15:04:04 +00:00
a-masterov	dec2e2fb29	Create a branch for compute release (#9637 ) ## Problem We practice a manual release flow for the compute module. This will allow automation of the compute release process. ## Summary of changes The workflow was modified to make a compute release automatically on the branch release-compute. ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-12-04 13:10:00 +00:00
Erik Grinaker	699a213c5d	Display reqwest error source (#10004 ) ## Problem Reqwest errors don't include details about the inner source error. This means that we get opaque errors like: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config) ``` Instead of the more helpful: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out ``` Touches #9801. ## Summary of changes Include the source error for `reqwest::Error` wherever it's displayed.	2024-12-04 13:05:53 +00:00
Alexey Kondratov	9a4157dadb	feat(compute): Set default application_name for pgbouncer connections (#9973 ) ## Problem When client specifies `application_name`, pgbouncer propagates it to the Postgres. Yet, if client doesn't do it, we have hard time figuring out who opens a lot of Postgres connections (including the `cloud_admin` ones). See this investigation as an example: https://neondb.slack.com/archives/C0836R0RZ0D ## Summary of changes I haven't found this documented, but it looks like pgbouncer accepts standard Postgres connstring parameters in the connstring in the `[databases]` section, so put the default `application_name=pgbouncer` there. That way, we will always see who opens Postgres connections. I did tests, and if client specifies a `application_name`, pgbouncer overrides this default, so it only works if it's not specified or set to blank `&application_name=` in the connection string. This is the last place we could potentially open some Postgres connections without `application_name`. Everything else should be either of two: 1. Direct client connections without `application_name`, but these should be strictly non-`cloud_admin` ones 2. Some ad-hoc internal connections, so if we see spikes of unidentified `cloud_admin` connections, we will need to investigate it again. Fixes neondatabase/cloud#20948	2024-12-04 13:05:31 +00:00
Conrad Ludgate	bd52822e14	feat(proxy): add option to forward startup params (#9979 ) (stacked on #9990 and #9995) Partially fixes #1287 with a custom option field to enable the fixed behaviour. This allows us to gradually roll out the fix without silently changing the observed behaviour for our customers. related to https://github.com/neondatabase/cloud/issues/15284	2024-12-04 12:58:35 +00:00
Folke Behrens	dcd016bbfc	Assign /libs/proxy/ to proxy team (#10003 )	2024-12-04 12:58:31 +00:00
Erik Grinaker	7b18e33997	pageserver: return proper status code for heatmap_upload errors (#9991 ) ## Problem During deploys, we see a lot of 500 errors due to heapmap uploads for inactive tenants. These should be 503s instead. Resolves #9574. ## Summary of changes Make the secondary tenant scheduler use `ApiError` rather than `anyhow::Error`, to propagate the tenant error and convert it to an appropriate status code.	2024-12-04 12:53:52 +00:00
Peter Bendel	9d75218ba7	fix parsing human time output like "50m37s" (#10001 ) ## Problem In ingest_benchmark.yml workflow we use pgcopydb tool to migrate project. pgcopydb logs human time. Our parsing of the human time doesn't work for times like "50m37s". [Example workflow](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:479) contains "57m45s" but we [reported](https://github.com/neondatabase/neon/actions/runs/12145539948/job/33867418065#step:10:500) only the seconds part: 45.000 s ## Summary of changes add a regex pattern for Minute/Second combination	2024-12-04 11:37:24 +00:00
Peter Bendel	1b3558df7a	optimize parms for ingest bench (#9999 ) ## Problem we tried different parallelism settings for ingest bench ## Summary of changes the following settings seem optimal after merging - SK side Wal filtering - batched getpages Settings: - effective_io_concurrency 100 - concurrency limit 200 (different from Prod!) - jobs 4, maintenance workers 7 - 10 GB chunk size	2024-12-04 11:07:22 +00:00
Vlad Lazar	68205c48ed	storcon: return an error for drain attempts while paused (#9997 ) ## Problem We currently allow drain operations to proceed while the node policy is paused. ## Summary of changes Return a precondition failed error in such cases. The orchestrator is updated in https://github.com/neondatabase/infra/pull/2544 to skip drain and fills if the pageserver is paused. Closes: https://github.com/neondatabase/neon/issues/9907	2024-12-04 09:25:29 +00:00
Christian Schwarz	8d93d02c2f	page_service: enable batching in Rust & Python Tests + Python benchmarks (#9993 ) This is the first step towards batching rollout. Refs - rollout plan: https://github.com/neondatabase/cloud/issues/20620 - task https://github.com/neondatabase/neon/issues/9377 - uber-epic: https://github.com/neondatabase/neon/issues/9376	2024-12-04 00:07:49 +00:00
Alexander Bayandin	023821a80c	test_page_service_batching: fix non-numeric metrics (#9998 ) ## Problem ``` 2024-12-03T15:42:46.5978335Z + poetry run python /__w/neon/neon/scripts/ingest_perf_test_result.py --ingest /__w/neon/neon/test_runner/perf-report-local 2024-12-03T15:42:49.5325077Z Traceback (most recent call last): 2024-12-03T15:42:49.5325603Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 165, in <module> 2024-12-03T15:42:49.5326029Z main() 2024-12-03T15:42:49.5326316Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 155, in main 2024-12-03T15:42:49.5326739Z ingested = ingest_perf_test_result(cur, item, recorded_at_timestamp) 2024-12-03T15:42:49.5327488Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-12-03T15:42:49.5327914Z File "/__w/neon/neon/scripts/ingest_perf_test_result.py", line 99, in ingest_perf_test_result 2024-12-03T15:42:49.5328321Z psycopg2.extras.execute_values( 2024-12-03T15:42:49.5328940Z File "/github/home/.cache/pypoetry/virtualenvs/non-package-mode-_pxWMzVK-py3.11/lib/python3.11/site-packages/psycopg2/extras.py", line 1299, in execute_values 2024-12-03T15:42:49.5335618Z cur.execute(b''.join(parts)) 2024-12-03T15:42:49.5335967Z psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type numeric: "concurrent-futures" 2024-12-03T15:42:49.5336287Z LINE 57: 'concurrent-futures', 2024-12-03T15:42:49.5336462Z ^ ``` ## Summary of changes - `test_page_service_batching`: save non-numeric params as `labels` - Add a runtime check that `metric_value` is NUMERIC	2024-12-03 22:46:18 +00:00
Christian Schwarz	944c1adc4c	tests & benchmarks: unify the way we customize the default tenant config (#9992 ) Before this PR, some override callbacks used `.default()`, others used `.setdefault()`. As of this PR, all callbacks use `.setdefault()` which I think is least prone to failure. Aligning on a single way will set the right example for future tests that need such customization. The `test_pageserver_getpage_throttle.py` technically is a change in behavior: before, it replaced the `tenant_config` field, now it just configures the throttle. This is what I believe is intended anyway.	2024-12-03 22:07:03 +00:00
Arpad Müller	ca85f364ba	Support tenant manifests in the scrubber (#9942 ) Support tenant manifests in the storage scrubber: * list the manifests, order them by generation * delete all manifests except for the two most recent generations * for the latest manifest: try parsing it. I've tested this patch by running the against a staging bucket and it successfully deleted stuff (and avoided deleting the latest two generations). In follow-up work, we might want to also check some invariants of the manifest, as mentioned in #8088. Part of #9386 Part of #8088 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-12-03 20:39:10 +00:00
Conrad Ludgate	9ef0662a42	chore(proxy): enforce single host+port (#9995 ) proxy doesn't ever provide multiple hosts/ports, so this code adds a lot of complexity of error handling for no good reason. (stacked on #9990)	2024-12-03 20:00:14 +00:00
Alexey Immoreev	3baef0bca3	Improvement: add console redirect timeout warning (#9985 ) ## Problem There is no information on session being cancelled in 2 minutes at the moment ## Summary of changes The timeout being logged for the user	2024-12-03 18:59:44 +00:00
Erik Grinaker	f312c6571f	pageserver: respond to multiple shutdown signals (#9982 ) ## Problem The Pageserver signal handler would only respond to a single signal and initiate shutdown. Subsequent signals were ignored. This meant that a `SIGQUIT` sent after a `SIGTERM` had no effect (e.g. in the case of a slow or stalled shutdown). The `test_runner` uses this to force shutdown if graceful shutdown is slow. Touches #9740. ## Summary of changes Keep responding to signals after the initial shutdown signal has been received. Arguably, the `test_runner` should also use `SIGKILL` rather than `SIGQUIT` in this case, but it seems reasonable to respond to `SIGQUIT` regardless.	2024-12-03 18:47:17 +00:00
Conrad Ludgate	27a42d0f96	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-03 18:39:23 +00:00
John Spray	b04ab468ee	pageserver: more detailed logs when calling re-attach (#9996 ) ## Problem We saw a peculiar case where a pageserver apparently got a 0-tenant response to `/re-attach` but we couldn't see the request landing on a storage controller. It was hard to confirm retrospectively that the pageserver was configured properly at the moment it sent the request. ## Summary of changes - Log the URL to which we are sending the request - Log the NodeId and metadata that we sent	2024-12-03 18:36:37 +00:00
John Spray	dcb629532b	pageserver: only store SLRUs & aux files on shard zero (#9786 ) ## Problem Since https://github.com/neondatabase/neon/pull/9423 the non-zero shards no longer need SLRU content in order to do GC. This data is now redundant on shards >0. One release cycle after merging that PR, we may merge this one, which also stops writing those pages to shards > 0, reaping the efficiency benefit. Closes: https://github.com/neondatabase/neon/issues/7512 Closes: https://github.com/neondatabase/neon/issues/9641 ## Summary of changes - Avoid storing SLRUs on non-zero shards - Bonus: avoid storing aux files on non-zero shards	2024-12-03 17:22:49 +00:00
John Spray	71d004289c	storcon: in shard splits, inherit parent's AZ (#9946 ) ## Problem Sharded tenants should be run in a single AZ for best performance, so that computes have AZ-local latency to all the shards. Part of https://github.com/neondatabase/neon/issues/8264 ## Summary of changes - When we split a tenant, instead of updating each shard's preferred AZ to wherever it is scheduled, propagate the preferred AZ from the parent. - Drop the check in `test_shard_preferred_azs` that asserts shards end up in their preferred AZ: this will not be true again until the optimize_attachment logic is updated to make this so. The existing check wasn't testing anything about scheduling, it was just asserting that we set preferred AZ in a way that matches the way things happen to be scheduled at time of split.	2024-12-03 16:55:00 +00:00
Christian Schwarz	4d422b937c	pageserver: only throttle pagestream requests & bring back throttling deduction for smgr latency metrics (#9962 ) ## Problem In the batching PR - https://github.com/neondatabase/neon/pull/9870 I stopped deducting the time-spent-in-throttle fro latency metrics, i.e., - smgr latency metrics (`SmgrOpTimer`) - basebackup latency (+scan latency, which I think is part of basebackup). The reason for stopping the deduction was that with the introduction of batching, the trick with tracking time-spent-in-throttle inside RequestContext and swap-replacing it from the `impl Drop for SmgrOpTimer` no longer worked with >1 requests in a batch. However, deducting time-spent-in-throttle is desirable because our internal latency SLO definition does not account for throttling. ## Summary of changes - Redefine throttling to be a page_service pagestream request throttle instead of a throttle for repository `Key` reads through `Timeline::get` / `Timeline::get_vectored`. - This means reads done by `basebackup` are no longer subject to any throttle. - The throttle applies after batching, before handling of the request. - Drive-by fix: make throttle sensitive to cancellation. - Rename metric label `kind` from `timeline_get` to `pagestream` to reflect the new scope of throttling. To avoid config format breakage, we leave the config field named `timeline_get_throttle` and ignore the `task_kinds` field. This will be cleaned up in a future PR. ## Trade-Offs Ideally, we would apply the throttle before reading a request off the connection, so that we queue the minimal amount of work inside the process. However, that's not possible because we need to do shard routing. The redefinition of the throttle to limit pagestream request rate instead of repository `Key` rate comes with several downsides: - We're no longer able to use the throttle mechanism for other other tasks, e.g. image layer creation. However, in practice, we never used that capability anyways. - We no longer throttle basebackup.	2024-12-03 15:25:58 +00:00
Erik Grinaker	bbe4dfa991	test_runner: use immediate shutdown in `test_sharded_ingest` (#9984 ) ## Problem `test_sharded_ingest` ingests a lot of data, which can cause shutdown to be slow e.g. due to local "S3 uploads" or compactions. This can cause test flakes during teardown. Resolves #9740. ## Summary of changes Perform an immediate shutdown of the cluster.	2024-12-03 14:33:31 +00:00
Erik Grinaker	dcb24ce170	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-03 11:35:59 +00:00
a-masterov	a2a942f93c	Add support for the extensions test for Postgres v17 (#9748 ) ## Problem The extensions for Postgres v17 are ready but we do not test the extensions shipped with v17 ## Summary of changes Build the test image based on Postgres v17. Run the tests for v17. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2024-12-03 11:25:29 +00:00
Christian Schwarz	cb10be710d	page_service: batching observability & include throttled time in smgr metrics (#9870 ) This PR - fixes smgr metrics https://github.com/neondatabase/neon/issues/9925 - adds an additional startup log line logging the current batching config - adds a histogram of batch sizes global and per-tenant - adds a metric exposing the current batching config The issue described #9925 is that before this PR, request latency was only observed after batching. This means that smgr latency metrics (most importantly getpage latency) don't account for - `wait_lsn` time - time spent waiting for batch to fill up / the executor stage to pick up the batch. The fix is to use a per-request batching timer, like we did before the initial batching PR. We funnel those timers through the entire request lifecycle. I noticed that even before the initial batching changes, we weren't accounting for the time spent writing & flushing the response to the wire. This PR drive-by fixes that deficiency by dropping the timers at the very end of processing the batch, i.e., after the `pgb.flush()` call. I was *unable to maintain the behavior that we deduct time-spent-in-throttle from various latency metrics. The reason is that we're using a single* counter in `RequestContext` to track micros spent in throttle. But there are N metrics timers in the batch, one per request. As a consequence, the practice of consuming the counter in the drop handler of each timer no longer works because all but the first timer will encounter error `close() called on closed state`. A failed attempt to maintain the current behavior can be found in https://github.com/neondatabase/neon/pull/9951. So, this PR remvoes the deduction behavior from all metrics. I started a discussion on Slack about it the implications this has for our internal SLO calculation: https://neondb.slack.com/archives/C033RQ5SPDH/p1732910861704029 # Refs - fixes https://github.com/neondatabase/neon/issues/9925 - sub-issue https://github.com/neondatabase/neon/issues/9377 - epic: https://github.com/neondatabase/neon/issues/9376	2024-12-03 11:03:23 +00:00
Christian Schwarz	15d01b257a	storcon_cli tenant-describe: include tenant-wide information in output (#9899 ) Before this PR, the storcon_cli didn't have a way to show the tenant-wide information of the TenantDescribeResponse. Sadly, the `Serialize` impl for the tenant config doesn't skip on `None`, so, the output becomes a bit bloated. Maybe we can use `skip_serializing_if(Option::is_none)` in the future. => https://github.com/neondatabase/neon/issues/9983	2024-12-03 10:55:13 +00:00
John Spray	aaee713e53	storcon: use proper schedule context during node delete (#9958 ) ## Problem I was touching `test_storage_controller_node_deletion` because for AZ scheduling work I was adding a change to the storage controller (kick secondaries during optimisation) that made a FIXME in this test defunct. While looking at it I also realized that we can easily fix the way node deletion currently doesn't use a proper ScheduleContext, using the iterator type recently added for that purpose. ## Summary of changes - A testing-only behavior in storage controller where if a secondary location isn't yet ready during optimisation, it will be actively polled. - Remove workaround in `test_storage_controller_node_deletion` that previously was needed because optimisation would get stuck on cold secondaries. - Update node deletion code to use a `TenantShardContextIterator` and thereby a proper ScheduleContext	2024-12-03 08:59:38 +00:00
Alexey Kondratov	2e9207fdf3	fix(testing): Use 1 MB shared_buffers even with LFC (#9969 ) ## Problem After enabling LFC in tests and lowering `shared_buffers` we started having more problems with `test_pg_regress`. ## Summary of changes Set `shared_buffers` to 1MB to both exercise getPage requests/LFC, and still have enough room for Postgres to operate. Everything smaller might be not enough for Postgres under load, and can cause errors like 'no unpinned buffers available'. See Konstantin's comment [1] as well. Fixes #9956 [1]: https://github.com/neondatabase/neon/issues/9956#issuecomment-2511608097	2024-12-02 18:46:06 +00:00
Tristan Partin	d8ebd33fe6	Stop changing the value of neon.extension_server_port at runtime (#9972 ) On reconfigure, we no longer passed a port for the extension server which caused us to not write out the neon.extension_server_port line. Thus, Postgres thought we were setting the port to the default value of 0. PGC_POSTMASTER GUCs cannot be set at runtime, which causes the following log messages: > LOG: parameter "neon.extension_server_port" cannot be changed without restarting the server > LOG: configuration file "/var/db/postgres/compute/pgdata/postgresql.conf" contains errors; unaffected changes were applied Fixes: https://github.com/neondatabase/neon/issues/9945 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-12-02 18:06:19 +00:00
Conrad Ludgate	2dc238e5b3	feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971 ) Fix the HTTP AuthMethod to accomodate the JWT authorization method. Introduces the JWT issuer as an additional field in the parquet logs	2024-12-02 17:54:32 +00:00
Folke Behrens	243bca1c49	Bump OTel, tracing, reqwest crates (#9970 )	2024-12-02 17:24:48 +00:00
Arseny Sher	fa909c27fc	Update consensus protocol spec (#9607 ) The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have. ref https://github.com/neondatabase/neon/issues/8699	2024-12-02 16:10:44 +00:00
Folke Behrens	1b60571636	proxy: Create Elasticache credentials provider lazily (#9967 ) ## Problem The credentials providers tries to connect to AWS STS even when we use plain Redis connections. ## Summary of changes * Construct the CredentialsProvider only when needed ("irsa").	2024-12-02 15:38:12 +00:00
Alexander Bayandin	c18716bb3f	CI(replication-tests): fix notifications about replication-tests failures (#9950 ) ## Problem `if: ${{ github.event.schedule }}` gets skipped if a previous step has failed, but we want to run the step for both `success` and `failure` ## Summary of changes - Add `!cancelled()` to notification step if-condition, to skip only cancelled jobs	2024-12-02 12:46:07 +00:00
Conrad Ludgate	cd1d2d1996	fix(proxy): forward notifications from authentication (#9948 ) Fixes https://github.com/neondatabase/cloud/issues/20973. This refactors `connect_raw` in order to return direct access to the delayed notices. I cannot find a way to test this with psycopg2 unfortunately, although testing it with psql does return the expected results.	2024-12-02 12:29:57 +00:00
John Spray	73ad44ae25	Merge pull request #9959 from neondatabase/rc/release/2024-12-02 Storage & Compute release 2024-12-02	2024-12-02 12:19:16 +00:00
John Spray	bd09369198	storcon: add metric for AZ scheduling violations (#9949 ) ## Problem We can't easily tell how far the state of shards is from their AZ preferences. This can be a cause of performance issues, so it's important for diagnosability that we can tell easily if there are significant numbers of shards that aren't running in their preferred AZ. Related: https://github.com/neondatabase/cloud/issues/15413 ## Summary of changes - In reconcile_all, count shards that are scheduled into the wrong AZ (if they have a preference), and publish it as a prometheus gauge. - Also calculate a statistic for how many shards wanted to reconcile but couldn't. This is clearly a lazy calculation: reconcile all only runs periodically. But that's okay: shards in the wrong AZ is something that only matters if it stays that way for some period of time.	2024-12-02 11:50:22 +00:00
Erik Grinaker	5330122049	test_runner: improve `wait_until` (#9936 ) Improves `wait_until` by: * Use `timeout` instead of `iterations`. This allows changing the timeout/interval parameters independently. * Make `timeout` and `interval` optional (default 20s and 0.5s). Most callers don't care. * Only output status every 1s by default, and add optional `status_interval` parameter. * Remove `show_intermediate_error`, this was always emitted anyway. Most callers have been updated to use the defaults, except where they had good reason otherwise.	2024-12-02 10:26:15 +00:00
Anastasia Lubennikova	45658ccccb	Update pgvector to 0.8.0 (#9733 )	2024-12-02 10:10:51 +00:00
github-actions[bot]	304af5c9e3	Storage & Compute release 2024-12-02	2024-12-02 06:05:37 +00:00
John Spray	14853a3284	storcon: don't take any Service locks in /status and /ready (#9944 ) ## Problem We saw unexpected container terminations when running in k8s with with small CPU resource requests. The /status and /ready handlers called `maybe_forward`, which always takes the lock on Service::inner. If there is a lot of writer lock contention, and the container is starved of CPU, this increases the likelihood that we will get killed by the kubelet. It isn't certain that this was a cause of issues, but it is a potential source that we can eliminate. ## Summary of changes - Revise logic to return immediately if the URL is in the non-forwarded list, rather than calling maybe_forward	2024-12-01 18:09:58 +00:00
Konstantin Knizhnik	aad809b048	Fix issues with prefetch ring buffer resize (#9847 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1732110190129479 We observe the following error in the logs ``` [XX000] ERROR: [NEON_SMGR] [shard 3] Incorrect prefetch read: status=1 response=0x7fafef335138 my=128 receive=128 ``` most likely caused by changing `neon.readahead_buffer_size` ## Summary of changes 1. Copy shard state 2. Do not use prefetch_set_unused in readahead_buffer_resize 3. Change prefetch buffer overflow criteria --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-12-01 15:47:28 +00:00
Alexander Bayandin	fae8e7ba76	Compute image: prepare Postgres v14-v16 for Debian 12 (#9954 ) ## Problem Current compute images for Postgres 14-16 don't build on Debian 12 because of issues with extensions. This PR fixes that, but for the current setup, it is mostly a no-op change. ## Summary of changes - Use `/bin/bash -euo pipefail` as SHELL to fail earlier - Fix `plv8` build: backport a trivial patch for v8 - Fix `postgis` build: depend `sfgal` version on Debian version instead of Postgres version Tested in: https://github.com/neondatabase/neon/pull/9849	2024-12-01 13:04:37 +00:00
Konstantin Knizhnik	97a9abd181	Add GUC controlling whether to pause recovery if some critical GUCs at replica have smaller value than on primary (#9057 ) ## Problem See https://github.com/neondatabase/neon/issues/9023 ## Summary of changes Ass GUC `recovery_pause_on_misconfig` allowing not to pause in case of replica and primary configuration mismatch See https://github.com/neondatabase/postgres/pull/501 See https://github.com/neondatabase/postgres/pull/502 See https://github.com/neondatabase/postgres/pull/503 See https://github.com/neondatabase/postgres/pull/504 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-12-01 12:23:10 +00:00
Folke Behrens	4abc8e5282	Merge the consumption metric pushes (#9939 ) #8564 ## Problem The main and backup consumption metric pushes are completely independent, resulting in different event time windows and different idempotency keys. ## Summary of changes * Merge the push tasks, but keep chunks the same size.	2024-11-30 10:11:37 +00:00
Christian Schwarz	aa4ec11af9	page_service: rewrite batching to work without a timeout (#9851 ) # Problem The timeout-based batching adds latency to unbatchable workloads. We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see [this](https://github.com/neondatabase/neon/pull/9822) abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads. # Solution The insight is that, in the steady state of a batchable workload, the time we spend in `get_vectored` will be hundreds of microseconds anyway. If we prepare the next batch concurrently to `get_vectored`, we will have a sizeable batch ready once `get_vectored` of the current batch is done and do not need an explicit timeout. This can be reasonably described as pipelining of the protocol handler. # Implementation We model the sub-protocol handler for pagestream requests (`handle_pagrequests`) as two futures that form a pipeline: 2. Batching: read requests from the connection and fill the current batch 3. Execution: `take` the current batch, execute it using `get_vectored`, and send the response. The Reading and Batching stage are connected through a new type of channel called `spsc_fold`. See the long comment in the `handle_pagerequests_pipelined` for details. # Changes - Refactor `handle_pagerequests` - separate functions for - reading one protocol message; produces a `BatchedFeMessage` with just one page request in it - batching; tried to merge an incoming `BatchedFeMessage` into an existing `BatchedFeMessage`; returns `None` on success and returns back the incoming message in case merging isn't possible - execution of a batched message - unify the timeline handle acquisition & request span construction; it now happen in the function that reads the protocol message - Implement serial and pipelined model - serial: what we had before any of the batching changes - read one protocol message - execute protocol messages - pipelined: the design described above - optionality for execution of the pipeline: either via concurrent futures vs tokio tasks - Pageserver config - remove batching timeout field - add ability to configure pipelining mode - add ability to limit max batch size for pipelined configurations (required for the rollout, cf https://github.com/neondatabase/cloud/issues/20620 ) - ability to configure execution mode - Tests - remove `batch_timeout` parametrization - rename `test_getpage_merge_smoke` to `test_throughput` - add parametrization to test different max batch sizes and execution moes - rename `test_timer_precision` to `test_latency` - rename the test case file to `test_page_service_batching.py` - better descriptions of what the tests actually do ## On the holding The `TimelineHandle` in the pending batch While batching, we hold the `TimelineHandle` in the pending batch. Therefore, the timeline will not finish shutting down while we're batching. This is not a problem in practice because the concurrently ongoing `get_vectored` call will fail quickly with an error indicating that the timeline is shutting down. This results in the Execution stage returning a `QueryError::Shutdown`, which causes the pipeline / entire page service connection to shut down. This drops all references to the `Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the contained `TimelineHandle`s. - => fixes https://github.com/neondatabase/neon/issues/9850 # Performance Local run of the benchmarks, results in [this empty commit](`1cf5b1463f`) in the PR branch. Key take-aways: * `concurrent-futures` and `tasks` deliver identical `batching_factor` * tail latency impact unknown, cf https://github.com/neondatabase/neon/issues/9837 * `concurrent-futures` has higher throughput than `tasks` in all workloads (=lower `time` metric) * In unbatchable workloads, `concurrent-futures` has 5% higher `CPU-per-throughput` than that of `tasks`, and 15% higher than that of `serial`. * In batchable-32 workload, `concurrent-futures` has 8% lower `CPU-per-throughput` than that of `tasks` (comparison to tput of `serial` is irrelevant) * in unbatchable workloads, mean and tail latencies of `concurrent-futures` is practically identical to `serial`, whereas `tasks` adds 20-30us of overhead Overall, `concurrent-futures` seems like a slightly more attractive choice. # Rollout This change is disabled-by-default. Rollout plan: - https://github.com/neondatabase/cloud/issues/20620 # Refs - epic: https://github.com/neondatabase/neon/issues/9376 - this sub-task: https://github.com/neondatabase/neon/issues/9377 - the abandoned attempt to improve batching timeout resolution: https://github.com/neondatabase/neon/pull/9820 - closes https://github.com/neondatabase/neon/issues/9850 - fixes https://github.com/neondatabase/neon/issues/9835	2024-11-30 00:16:24 +00:00
Matthias van de Meent	973a8d2680	Fix timeout value used in XLogWaitForReplayOf (#9937 ) The previous value assumed usec precision, while the timeout used is in milliseconds, causing replica backends to wait for (potentially) many hours for WAL replay without the expected progress reports in logs. This fixes the issue. Reported-By: Alexander Lakhin <exclusion@gmail.com> ## Problem https://github.com/neondatabase/postgres/pull/279#issuecomment-2507671817 The timeout value was configured with the assumption the indicated value would be microseconds, where it's actually milliseconds. That causes the backend to wait for much longer (2h46m40s) before it emits the "I'm waiting for recovery" message. While we do have wait events configured on this, it's not great to have stuck backends without clear logs, so this fixes the timeout value in all our PostgreSQL branches. ## PG PRs * PG14: https://github.com/neondatabase/postgres/pull/542 * PG15: https://github.com/neondatabase/postgres/pull/543 * PG16: https://github.com/neondatabase/postgres/pull/544 * PG17: https://github.com/neondatabase/postgres/pull/545	2024-11-29 19:10:26 +00:00
Gleb Novikov	c848f25ec2	Fixed fast_import pgbin in calling get_pg_version (#9933 ) Was working on https://github.com/neondatabase/cloud/pull/20795 and discovered that fast_import is not working normally.	2024-11-29 17:58:36 +00:00
John Spray	d5624cc505	pageserver: download small objects using a smaller timeout (#9938 ) ## Problem It appears that the Azure storage API tends to hang TCP connections more than S3 does. Currently we use a 2 minute timeout for all downloads. This is large because sometimes the objects we download are large. However, waiting 2 minutes when doing something like downloading a manifest on tenant attach is problematic, because when someone is doing a "create tenant, create timeline" workflow, that 2 minutes is long enough for them reasonably to give up creating that timeline. Rather than propagate oversized timeouts further up the stack, we should use a different timeout for objects that we expect to be small. Closes: https://github.com/neondatabase/neon/issues/9836 ## Summary of changes - Add a `small_timeout` configuration attribute to remote storage, defaulting to 30 seconds (still a very generous period to do something like download an index) - Add a DownloadKind parameter to DownloadOpts, so that callers can indicate whether they expect the object to be small or large. - In the azure client, use small timeout for HEAD requests, and for GET requests if DownloadKind::Small is used. - Use DownloadKind::Small for manifests, indices, and heatmap downloads. This PR intentionally does not make the equivalent change to the S3 client, to reduce blast radius in case this has unexpected consequences (we could accomplish the same thing by editing lots of configs, but just skipping the code is simpler for right now)	2024-11-29 15:11:44 +00:00
Alexey Kondratov	538e2312a6	feat(compute_ctl): Always set application_name (#9934 ) ## Problem It was not always possible to judge what exactly some `cloud_admin` connections were doing because we didn't consistently set `application_name` everywhere. ## Summary of changes Unify the way we connect to Postgres: 1. Switch to building configs everywhere 2. Always set `application_name` and make naming consistent Follow-up for #9919 Part of neondatabase/cloud#20948	2024-11-29 13:55:56 +00:00
Erik Grinaker	a6073b5013	safekeeper: use jemalloc (#9780 ) ## Problem To add Safekeeper heap profiling in #9778, we need to switch to an allocator that supports it. Pageserver and proxy already use jemalloc. Touches #9534. ## Summary of changes Use jemalloc in Safekeeper.	2024-11-29 13:38:04 +00:00
John Spray	ea3798e3b3	storage controller: use proper ScheduleContext when evacuating a node (#9908 ) ## Problem When picking locations for a shard, we should use a ScheduleContext that includes all the other shards in the tenant, so that we apply proper anti-affinity between shards. If we don't do this, then it can lead to unstable scheduling, where we place a shard somewhere that the optimizer will then immediately move it away from. We didn't always do this, because it was a bit awkward to accumulate the context for a tenant rather than just walking tenants. This was a TODO in `handle_node_availability_transition`: ``` // TODO: populate a ScheduleContext including all shards in the same tenant_id (only matters // for tenants without secondary locations: if they have a secondary location, then this // schedule() call is just promoting an existing secondary) ``` This is a precursor to https://github.com/neondatabase/neon/issues/8264, where the current imperfect scheduling during node evacuation hampers testing. ## Summary of changes - Add an iterator type that yields each shard along with a schedulecontext that includes all the other shards from the same tenant - Use the iterator to replace hand-crafted logic in optimize_all_plan (functionally identical) - Use the iterator in `handle_node_availability_transition` to apply proper anti-affinity during node evacuation.	2024-11-29 13:27:49 +00:00
Conrad Ludgate	1d642d6a57	chore(proxy): vendor a subset of rust-postgres (#9930 ) Our rust-postgres fork is getting messy. Mostly because proxy wants more control over the raw protocol than tokio-postgres provides. As such, it's diverging more and more. Storage and compute also make use of rust-postgres, but in more normal usage, thus they don't need our crazy changes. Idea: * proxy maintains their subset * other teams use a minimal patch set against upstream rust-postgres Reviewing this code will be difficult. To implement it, I 1. Copied tokio-postgres, postgres-protocol and postgres-types from `00940fcdb5` 2. Updated their package names with the `2` suffix to make them compile in the workspace. 3. Updated proxy to use those packages 4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches applied https://github.com/jbg/tokio-postgres-rustls/pull/32 https://github.com/jbg/tokio-postgres-rustls/pull/33) 5. Removed as much dead code as I could find in the vendored libraries 6. Updated the tokio-postgres-rustls code to use our existing channel binding implementation	2024-11-29 11:08:01 +00:00
Erik Grinaker	3ffe6de0b9	test_runner/performance: add logical message ingest benchmark (#9749 ) Adds a benchmark for logical message WAL ingestion throughput end-to-end. Logical messages are essentially noops, and thus ignored by the Pageserver. Example results from my MacBook, with fsync enabled: ``` postgres_ingest: 14.445 s safekeeper_ingest: 29.948 s pageserver_ingest: 30.013 s pageserver_recover_ingest: 8.633 s wal_written: 10,340 MB message_count: 1310720 messages postgres_throughput: 715 MB/s safekeeper_throughput: 345 MB/s pageserver_throughput: 344 MB/s pageserver_recover_throughput: 1197 MB/s ``` See https://github.com/neondatabase/neon/issues/9642#issuecomment-2475995205 for running analysis. Touches #9642.	2024-11-29 09:40:08 +00:00
Heikki Linnakangas	1ca9b56faf	Merge pull request #9935 from neondatabase/compute-rc-2024-11-28 Compute release 2024-11-28	2024-11-29 09:58:00 +02:00
Alexey Kondratov	42fb3c4d30	fix(compute_ctl): Allow usage of DB names with whitespaces (#9919 ) ## Problem We used `set_path()` to replace the database name in the connection string. It automatically does url-safe encoding if the path is not already encoded, but it does it as per the URL standard, which assumes that tabs can be safely removed from the path without changing the meaning of the URL. See, e.g., https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks for DBs with properly %-encoded names, like with `%20`, as they are kept intact, but actually should be escaped. Yet, this is not true for Postgres, where it's completely valid to have trailing tabs in the database name. I think this is the PR that caused this regression https://github.com/neondatabase/neon/pull/9717, as it switched from `postgres::config::Config` back to `set_path()`. This was fixed a while ago already [1], btw, I just haven't added a test to catch this regression back then :( ## Summary of changes This commit changes the code back to use `postgres/tokio_postgres::Config` everywhere. While on it, also do some changes around, as I had to touch this code: 1. Bump some logging from `debug` to `info` in the spec apply path. We do not use `debug` in prod, and it was tricky to understand what was going on with this bug in prod. 2. Refactor configuration concurrency calculation code so it was reusable. Yet, still keep `1` in the case of reconfiguration. The database can be actively used at this moment, so we cannot guarantee that there will be enough spare connection slots, and the underlying code won't handle connection errors properly. 3. Simplify the installed extensions code. It was spawning a blocking task inside async function, which doesn't make much sense. Instead, just have a main sync function and call it with `spawn_blocking` in the API code -- the only place we need it to be async. 4. Add regression python test to cover this and related problems in the future. Also, add more extensive testing of schema dump and DBs and roles listing API. [1]: `4d1e48f3b9` [2]: https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org Resolves neondatabase/cloud#20869	2024-11-28 21:38:30 +00:00
Alexander Bayandin	e04dd3be0b	test_runner: rerun all failed tests (#9917 ) ## Problem Currently, we rerun only known flaky tests. This approach was chosen to reduce the number of tests that go unnoticed (by forcing people to take a look at failed tests and rerun the job manually), but it has some drawbacks: - In PRs, people tend to push new changes without checking failed tests (that's ok) - In the main, tests are just restarted without checking (understandable) - Parametrised tests become flaky one by one, i.e. if `test[1]` is flaky `, test[2]` is not marked as flaky automatically (which may or may not be the case). I suggest rerunning all failed tests to increase the stability of GitHub jobs and using the Grafana Dashboard with flaky tests for deeper analysis. ## Summary of changes - Rerun all failed tests twice at max	2024-11-28 19:02:57 +00:00
Vlad Lazar	eb520a14ce	pageserver: return correct LSN for interpreted proto keep alive responses (#9928 ) ## Problem For the interpreted proto the pageserver is not returning the correct LSN in replies to keep alive requests. This is because the interpreted protocol arm was not updating `last_rec_lsn`. ## Summary of changes * Return correct LSN in keep-alive responses * Fix shard field in wal sender traces	2024-11-28 17:38:47 +00:00
Arpad Müller	eb5d832e6f	Update rust to 1.83.0, also update cargo adjacent tools (#9926 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.83.0/). Also update `cargo-hakari`, `cargo-deny`, `cargo-hack` and `cargo-nextest` to their latest versions. Prior update was in #9445.	2024-11-28 15:49:30 +00:00
Erik Grinaker	70780e310c	Makefile: build pg_visibility (#9922 ) Build the `pg_visibility` extension for use with `neon_local`. This is useful to inspect the visibility map for debugging. Touches #9914.	2024-11-28 15:48:18 +00:00
Vlad Lazar	e82f7f0dfc	remote_storage/abs: count 404 and 304 for get as ok for metrics (#9912 ) ## Problem We currently see elevated levels of errors for GetBlob requests. This is because 404 and 304 are counted as errors for metric reporting. ## Summary of Changes Bring the implementation in line with the S3 client and treat 404 and 304 responses as ok for metric purposes. Related: https://github.com/neondatabase/cloud/issues/20666	2024-11-28 10:11:08 +00:00
Ivan Efremov	8173dc600a	proxy: spawn cancellation checks in the background (#9918 ) ## Problem For cancellation, a connection is open during all the cancel checks. ## Summary of changes Spawn cancellation checks in the background, and close connection immediately. Use task_tracker for cancellation checks.	2024-11-28 06:32:22 +00:00
Erik Grinaker	da1daa2426	pageserver: only apply `ClearVmBits` on relevant shards (#9895 ) # Problem VM (visibility map) pages are stored and managed as any regular relation page, in the VM fork of the main relation. They are also sharded like other pages. Regular WAL writes to the VM pages (typically performed by vacuum) are routed to the correct shard as usual. However, VM pages are also updated via `ClearVmBits` metadata records emitted when main relation pages are updated. These metadata records were sent to all shards, like other metadata records. This had the following effects: * On shards responsible for VM pages, the `ClearVmBits` applies as expected. * On shard 0, which knows about the VM relation and its size but doesn't necessarily have any VM pages, the `ClearVmBits` writes may have been applied without also having applied the explicit WAL writes to VM pages. * If VM pages are spread across multiple shards (unlikely with 256MB stripe size), all shards may have applied `ClearVmBits` if the pages fall within their local view of the relation size, even for pages they do not own. * On other shards, this caused a relation size cache miss and a DbDir and RelDir lookup before dropping the `ClearVmBits`. With many relations, this could cause significant CPU overhead. This is not believed to be a correctness problem, but this will be verified in #9914. Resolves #9855. # Changes Route `ClearVmBits` metadata records only to the shards responsible for the VM pages. Verification of the current VM handling and cleanup of incomplete VM pages on shard 0 (and potentially elsewhere) is left as follow-up work.	2024-11-27 19:44:24 +00:00
Alex Chi Z.	9e3cb75bc7	fix(pageserver): flush deletion queue in `reload` shutdown mode (#9884 ) ## Problem close https://github.com/neondatabase/neon/issues/9859 ## Summary of changes Ensure that the deletion queue gets fully flushed (i.e., the deletion lists get applied) during a graceful shutdown. It is still possible that an incomplete shutdown would leave deletion list behind and cause race upon the next startup, but we assume this will unlikely happen, and even if it happened, the pageserver should already be at a tainted state and the tenant should be moved to a new tenant with a new generation number. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-27 18:30:54 +00:00
Folke Behrens	5c41707bee	proxy: promote two logs to error, fix multiline log (#9913 ) * Promote two logs from mpsc send errors to error level. The channels are unbounded and there shouldn't be errors. * Fix one multiline log from anyhow::Error. Use Debug instead of Display.	2024-11-27 18:05:46 +00:00
Erik Grinaker	cc37fa0f33	pageserver: add metrics for unknown `ClearVmBits` pages (#9911 ) ## Problem When ingesting implicit `ClearVmBits` operations, we silently drop the writes if the relation or page is unknown. There are implicit assumptions around VM pages wrt. explicit/implicit updates, sharding, and relation sizes, which can possibly drop writes incorrectly. Adding a few metrics will allow us to investigate further and tighten up the logic. Touches #9855. ## Summary of changes Add a `pageserver_wal_ingest_clear_vm_bits_unknown` metric to record dropped `ClearVmBits` writes. Also add comments clarifying the behavior of relation sizes on non-zero shards.	2024-11-27 17:16:41 +00:00
Alex Chi Z.	23f5a27146	fix(storage-scrubber): valid layermap error degrades to warning (#9902 ) Valid layer assumption is a necessary condition for a layer map to be valid. It's a stronger check imposed by gc-compaction than the actual valid layermap definition. Actually, the system can work as long as there are no overlapping layer maps. Therefore, we degrade that into a warning. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-27 16:07:39 +00:00
Erik Grinaker	e4f437a354	pageserver: add relsize cache metrics (#9890 ) ## Problem We don't have any observability for the relation size cache. We have seen cache misses cause significant performance impact with high relation counts. Touches #9855. ## Summary of changes Adds the following metrics: * `pageserver_relsize_cache_entries` * `pageserver_relsize_cache_hits` * `pageserver_relsize_cache_misses` * `pageserver_relsize_cache_misses_old`	2024-11-27 13:54:14 +00:00
Vlad Lazar	8fdf786217	pageserver: add tenant config override for wal receiver proto (#9888 ) ## Problem Can't change protocol at tenant granularity. ## Summary of changes Add tenant config level override for wal receiver protocol. ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-27 13:46:23 +00:00
Vlad Lazar	9e0148de11	safekeeper: use protobuf for sending compressed records to pageserver (#9821 ) ## Problem https://github.com/neondatabase/neon/pull/9746 lifted decoding and interpretation of WAL to the safekeeper. This reduced the ingested amount on the pageservers by around 10x for a tenant with 8 shards, but doubled the ingested amount for single sharded tenants. Also, https://github.com/neondatabase/neon/pull/9746 uses bincode which doesn't support schema evolution. Technically the schema can be evolved, but it's very cumbersome. ## Summary of changes This patch set addresses both problems by adding protobuf support for the interpreted wal records and adding compression support. Compressed protobuf reduced the ingested amount by 100x on the 32 shards `test_sharded_ingest` case (compared to non-interpreted proto). For the 1 shard case the reduction is 5x. Sister change to `rust-postgres` is [here](https://github.com/neondatabase/rust-postgres/pull/33). ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-27 12:12:21 +00:00
Alexander Bayandin	7b41ee872e	CI(pre-merge-checks): build only one build-tools-image (#9718 ) ## Problem The `pre-merge-checks` workflow relies on the build-tools image. If changes to the `build-tools` image have been merged into the main branch since the last CI run for a PR (with other changes to the `build-tools`), the image will be rebuilt during the merge queue run. Otherwise, cached images are used. Rebuilding the image adds approximately 10 minutes on x86-64 and 20 minutes on arm64 to the process. ## Summary of changes - parametrise `build-build-tools-image` job with arch and Debian version - Run `pre-merge-checks` only on Debian 12 x86-64 image	2024-11-27 10:42:26 +00:00
Peter Bendel	277c33ba3f	ingest benchmark: after effective_io_concurrency = 100 we can increase compute side parallelism (#9904 ) ## Problem ingest benchmark tests project migration to Neon involving steps - COPY relation data - create indexes - create constraints Previously we used only 4 copy jobs, 4 create index jobs and 7 maintenance workers. After increasing effective_io_concurrency on compute we see that we can sustain more parallelism in the ingest bench ## Summary of changes Increase copy jobs to 8, create index jobs to 8 and maintenance workers to 16	2024-11-27 10:09:01 +00:00
Tristan Partin	2b788cb53f	Bump neon.logical_replication_max_snap_files default to 10000 (#9896 ) This bump comes from a recommendation from Chi. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-26 17:49:37 +00:00
Peter Bendel	13feda0669	track how much time the flush loop is stalled waiting for uploads (#9885 ) ## Problem We don't know how much time PS is losing during ingest when waiting for remote storage uploads in the flush frozen layer loop. Also we don't know how many remote storage requests get an permit without waiting (not throttled by remote_storage concurrency_limit). ## Summary of changes - Add a metric that accumulates the time waited per shard/PS - in [remote storage semaphore wait seconds](https://neonprod.grafana.net/d/febd9732-9bcf-4992-a821-49b1f6b02724/remote-storage?orgId=1&var-datasource=HUNg6jvVk&var-instance=pageserver-26.us-east-2.aws.neon.build&var-instance=pageserver-27.us-east-2.aws.neon.build&var-instance=pageserver-28.us-east-2.aws.neon.build&var-instance=pageserver-29.us-east-2.aws.neon.build&var-instance=pageserver-30.us-east-2.aws.neon.build&var-instance=pageserver-31.us-east-2.aws.neon.build&var-instance=pageserver-36.us-east-2.aws.neon.build&var-instance=pageserver-37.us-east-2.aws.neon.build&var-instance=pageserver-38.us-east-2.aws.neon.build&var-instance=pageserver-39.us-east-2.aws.neon.build&var-instance=pageserver-40.us-east-2.aws.neon.build&var-instance=pageserver-41.us-east-2.aws.neon.build&var-request_type=put_object&from=1731961336340&to=1731964762933&viewPanel=3) add a first bucket with 100 microseconds to count requests that do not need to wait on semaphore Update: created a new version that uses a Gauge (one increasing value per PS/shard) instead of histogram as suggested by review	2024-11-26 11:46:58 +00:00
Conrad Ludgate	96a1b71c84	chore(proxy): discard request context span during passthrough (#9882 ) ## Problem The RequestContext::span shouldn't live for the entire postgres connection, only the handshake. ## Summary of changes * Slight refactor to the RequestContext to discard the span upon handshake completion. * Make sure the temporary future for the handshake is dropped (not bound to a variable) * Runs our nightly fmt script	2024-11-25 21:32:53 +00:00
Arpad Müller	a74ab9338d	fast_import: remove hardcoding of pg_version (#9878 ) Before, we hardcoded the pg_version to 140000, while the code expected version numbers like 14. Now we use an enum, and code from `extension_server.rs` to auto-detect the correct version. The enum helps when we add support for a version: enums ensure that compilation fails if one forgets to put the version to one of the `match` locations. cc https://github.com/neondatabase/neon/pull/9218	2024-11-25 20:23:42 +00:00
Folke Behrens	7404887b81	proxy: Demote errors from cplane request routines to debug (#9886 ) ## Problem Any errors from these async blocks are unconditionally logged at error level even though we already handle such errors based on context. ## Summary of changes * Log raw errors from creating and executing cplane requests at debug level. * Inline macro calls to retain the correct callsite.	2024-11-25 19:35:32 +00:00
Folke Behrens	87e4dd23a1	proxy: Demote all cplane error replies to info log level (#9880 ) ## Problem The vast majority of the error/warn logs from cplane are about time or data transfer quotas exceeded or endpoint-not-found errors and not operational errors in proxy or cplane. ## Summary of changes * Demote cplane error replies to info level. * Raise other errors from warn back to error.	2024-11-25 17:53:26 +00:00
Vlad Lazar	7a2f0ed8d4	safekeeper: lift decoding and interpretation of WAL to the safekeeper (#9746 ) ## Problem For any given tenant shard, pageservers receive all of the tenant's WAL from the safekeeper. This soft-blocks us from using larger shard counts due to bandwidth concerns and CPU overhead of filtering out the records. ## Summary of changes This PR lifts the decoding and interpretation of WAL from the pageserver into the safekeeper. A customised PG replication protocol is used where instead of sending raw WAL, the safekeeper sends filtered, interpreted records. The receiver drives the protocol selection, so, on the pageserver side, usage of the new protocol is gated by a new pageserver config: `wal_receiver_protocol`. More granularly the changes are: 1. Optionally inject the protocol and shard identity into the arguments used for starting replication 2. On the safekeeper side, implement a new wal sending primitive which decodes and interprets records before sending them over 3. On the pageserver side, implement the ingestion of this new replication message type. It's very similar to what we already have for raw wal (minus decoding and interpreting). ## Notes * This PR currently uses my [branch of rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support) which includes the deserialization logic for the new replication message type. PR for that is open [here](https://github.com/neondatabase/rust-postgres/pull/32). * This PR contains changes for both pageservers and safekeepers. It's safe to merge because the new protocol is disabled by default on the pageserver side. We can gradually start enabling it in subsequent releases. * CI tests are running on https://github.com/neondatabase/neon/pull/9747 ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-25 17:29:28 +00:00
Christian Schwarz	5c2356988e	page_service: add benchmark for batching (#9820 ) This PR adds two benchmark to demonstrate the effect of server-side getpage request batching added in https://github.com/neondatabase/neon/pull/9321. For the CPU usage, I found the the `prometheus` crate's built-in CPU usage accounts the seconds at integer granularity. That's not enough you reduce the target benchmark runtime for local iteration. So, add a new `libmetrics` metric and report that. The benchmarks are disabled because [on our benchmark nodes, timer resolution isn't high enough](https://neondb.slack.com/archives/C059ZC138NR/p1732264223207449). They work (no statement about quality) on my bare-metal devbox. They will be refined and enabled once we find a fix. Candidates at time of writing are: - https://github.com/neondatabase/neon/pull/9822 - https://github.com/neondatabase/neon/pull/9851 Refs: - Epic: https://github.com/neondatabase/neon/issues/9376 - Extracted from https://github.com/neondatabase/neon/pull/9792	2024-11-25 15:52:39 +00:00
Christian Schwarz	23e579d01f	Merge pull request #9881 from neondatabase/rc/release/2024-11-25--2 Fixup Storage & Compute Release 2024-11-25	2024-11-25 16:26:02 +01:00
Konstantin Knizhnik	441612c1ce	Prefetch on macos (#9875 ) ## Problem Prefetch is disabled at MacODS because `posix_fadvise` is not available. But Neon prefetch is not using this function and for testing at MacOS is it very convenient that prefetch is available. ## Summary of changes Define `USE_PREFETCH` in Makefile. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-25 15:21:52 +00:00
Christian Schwarz	166f33f96b	Fixup Storage & Compute Release 2024-11-25	2024-11-25 16:19:36 +01:00
Arpad Müller	77630e5408	Address beta clippy lint needless_lifetimes (#9877 ) The 1.82.0 version of Rust will be stable soon, let's get the clippy lint fixes in before the compiler version upgrade.	2024-11-25 14:59:12 +00:00
Alexander Bayandin	3d380acbd1	Bump default Debian version to Bookworm everywhere (#9863 ) ## Problem We have a couple of CI workflows that still run on Debian Bullseye, and the default Debian version in images is Bullseye as well (we explicitly set building on Bookworm) ## Summary of changes - Run `pgbench-pgvector` on Bookworm (fix a couple of packages) - Run `trigger_bench_on_ec2_machine_in_eu_central_1` on Bookworm - Change default `DEBIAN_VERSION` in Dockerfiles to Bookworm - Make `pinned` docker tag an alias to `pinned-bookworm`	2024-11-25 14:43:32 +00:00
Alex Chi Z.	4630b70962	fix(pageserver): ensure all layers are flushed before measuring RSS (#9861 ) ## Problem close https://github.com/neondatabase/neon/issues/9761 The test assumed that no new L0 layers are flushed throughout the process, which is not true. ## Summary of changes Fix the test case `test_compaction_l0_memory` by flushing in-memory layers before compaction. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-25 14:25:18 +00:00
Conrad Ludgate	6f6749c4a9	chore: update rustls (#9871 )	2024-11-25 12:01:30 +00:00
Christian Schwarz	aada2ee61a	Merge pull request #9869 from neondatabase/rc/release/2024-11-25 Storage & Compute release 2024-11-25	2024-11-25 12:59:32 +01:00
Folke Behrens	0d1e82f0a7	Bump futures-* crates, drop unused license, hide duplicate crate warnings (#9858 ) * The futures-util crate we use was yanked. Bump it and its siblings to new patch release. https://github.com/rust-lang/futures-rs/releases/tag/0.3.31 * cargo-deny: Drop an unused license. * cargo-deny: Don't warn about duplicate crate. Duplicate crates are unavoidable and the noise just hides real warnings.	2024-11-25 10:59:49 +00:00
Alexander Bayandin	6f7aeaa1c5	test_runner: use LFC by default (#8613 ) ## Problem LFC is not enabled by default in tests, but it is enabled in production. This increases the risk of errors in the production environment, which were not found during the routine workflow. However, enabling LFC for all the tests may overload the disk on our servers and increase the number of failures. So, we try enabling LFC in one case to evaluate the possible risk. ## Summary of changes A new environment variable, USE_LFC is introduced. If it is set to true, LFC is enabled by default in all the tests. In our workflow, we enable LFC for PG17, release, x86-64, and disabled for all other combinations. --------- Co-authored-by: Alexey Masterov <alexeymasterov@neon.tech> Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>	2024-11-25 09:01:05 +00:00
github-actions[bot]	0fc6f6af8e	Storage & Compute release 2024-11-25	2024-11-25 06:05:23 +00:00
Christian Schwarz	450be26bbb	fast imports: initial Importer and Storage changes (#9218 ) Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Stas Kelvic <stas@neon.tech> # Context This PR contains PoC-level changes for a product feature that allows onboarding large databases into Neon without going through the regular data path. # Changes This internal RFC provides all the context * https://github.com/neondatabase/cloud/pull/19799 In the language of the RFC, this PR covers * the Importer code (`fast_import`) * all the Pageserver changes (mgmt API changes, flow implementation, etc) * a basic test for the Pageserver changes # Reviewing As acknowledged in the RFC, the code added in this PR is not ready for general availability. Also, the architecture is not to be discussed in this PR, but in the RFC and associated Slack channel instead. Reviewers of this PR should take that into consideration. The quality bar to apply during review depends on what area of the code is being reviewed: * Importer code (`fast_import`): practically anything goes * Core flow (`flow.rs`): * Malicious input data must be expected and the existing threat models apply. * The code must not be safe to execute on dedicated Pageserver instances: * This means in particular that tenants on other Pageserver instances must not be affected negatively wrt data confidentiality, integrity or availability. * Other code: the usual quality bar * Pay special attention to correct use of gate guards, timeline cancellation in all places during shutdown & migration, etc. * Consider the broader system impact; if you find potentially problematic interactions with Storage features that were not covered in the RFC, bring that up during the review. I recommend submitting three separate reviews, for the three high-level areas with different quality bars. # References (Internal-only) * refs https://github.com/neondatabase/cloud/issues/17507 * refs https://github.com/neondatabase/company_projects/issues/293 * refs https://github.com/neondatabase/company_projects/issues/309 * refs https://github.com/neondatabase/cloud/issues/20646 --------- Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2024-11-22 22:47:06 +00:00
Anastasia Lubennikova	3245f7b88d	Rename 'installed_extensions' metric to 'compute_installed_extensions' (#9759 ) to keep it consistent with existing compute metrics. flux-fleet change is not needed, because it doesn't have any filter by metric name for compute metrics.	2024-11-22 19:27:04 +00:00
Alex Chi Z.	c1937d073f	fix(pageserver): ensure upload happens after delete (#9844 ) ## Problem Follow up of https://github.com/neondatabase/neon/pull/9682, that patch didn't fully address the problem: what if shutdown fails due to whatever reason and then we reattach the tenant? Then we will still remove the future layer. The underlying problem is that the fix for #5878 gets voided because of the generation optimizations. Of course, we also need to ensure that delete happens after uploads, but note that we only schedule deletes when there are no ongoing upload tasks, so that's fine. ## Summary of changes * Add a test case to reproduce the behavior (by changing the original test case to attach the same generation). * If layer upload happens after the deletion, drain the deletion queue before uploading. * If blocked_deletion is enabled, directly remove it from the blocked_deletion queue. * Local fs backend fix to avoid race between deletion and preload. * test_emergency_mode does not need to wait for uploads (and it's generally not possible to wait for uploads). * ~~Optimize deletion executor to skip validation if there are no files to delete.~~ this doesn't work --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-22 18:30:53 +00:00
Alex Chi Z.	6f8b1eb5a6	test(pageserver): add detach ancestor smoke test (#9842 ) ## Problem Follow up to https://github.com/neondatabase/neon/pull/9682, hopefully we can detect some issues or assure ourselves that this is ready for production. ## Summary of changes * Add a compaction-detach-ancestor smoke test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-22 18:21:51 +00:00
Erik Grinaker	e939d36dd4	safekeeper,pageserver: fix CPU profiling allowlists (#9856 ) ## Problem The HTTP router allowlists matched both on the path and the query string. This meant that only `/profile/cpu` would be allowed without auth, while `/profile/cpu?format=svg` would require auth. Follows #9764. ## Summary of changes * Match allowlists on URI path, rather than the entire URI. * Fix the allowlist for Safekeeper to use `/profile/cpu` rather than the old `/pprof/profile`. * Just use a constant slice for the allowlist; it's only a handful of items, and these handlers are not on hot paths.	2024-11-22 17:50:33 +00:00
Alex Chi Z.	211e4174d2	fix(pageserver): preempt and retry azure list operation (#9840 ) ## Problem close https://github.com/neondatabase/neon/issues/9836 Looking at Azure SDK, the only related issue I can find is https://github.com/azure/azure-sdk-for-rust/issues/1549. Azure uses reqwest as the backend, so I assume there's some underlying magic unknown to us that might have caused the stuck in #9836. The observation is: * We didn't get an explicit out of resource HTTP error from Azure. * The connection simply gets stuck and times out. * But when we retry after we reach the timeout, it succeeds. This issue is hard to identify -- maybe something went wrong at the ABS side, or something wrong with our side. But we know that a retry will usually succeed if we give up the stuck connection. Therefore, I propose the fix that we preempt stuck HTTP operation and actively retry. This would mitigate the problem, while in the long run, we need to keep an eye on ABS usage and see if we can fully resolve this problem. The reasoning of such timeout mechanism: we use a much smaller timeout than before to preempt, while it is possible that a normal listing operation would take a longer time than the initial timeout if it contains a lot of keys. Therefore, after we terminate the connection, we should double the timeout, so that such requests would eventually succeed. ## Summary of changes * Use exponential growth for ABS list timeout. * Rather than using a fixed timeout, use a timeout that starts small and grows * Rather than exposing timeouts to the list_streaming caller as soon as we see them, only do so after we have retried a few times Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-22 17:50:00 +00:00
Ivan Efremov	3b1ac8b14a	proxy: Implement cancellation rate limiting (#9739 ) Implement cancellation rate limiting and ip allowlist checks. Add ip_allowlist to the cancel closure Fixes [#16456](https://github.com/neondatabase/cloud/issues/16456)	2024-11-22 16:46:38 +00:00
Alexander Bayandin	b3b579b45e	test_bulk_insert: fix typing for PgVersion (#9854 ) ## Problem Along with the migration to Python 3.11, I switched `C(str, Enum)` with `C(StrEnum)`; one such example is the `PgVersion` enum. It required more changes in `PgVersion` itself (before, it accepted both `str` and `int`, and after it, it supports only `str`), which caused the `test_bulk_insert` test to fail. ## Summary of changes - `test_bulk_insert`: explicitly cast pg_version from `timeline_detail` to str	2024-11-22 16:13:53 +00:00
Conrad Ludgate	8ab96cc71f	chore(proxy/jwks): reduce the rightward drift of jwks renewal (#9853 ) I found the rightward drift of the `renew_jwks` function hard to review. This PR splits out some major logic and uses early returns to make the happy path more linear.	2024-11-22 14:51:32 +00:00
Alexander Bayandin	51d26a261b	build(deps): bump mypy from 1.3.0 to 1.13.0 (#9670 ) ## Problem We use a pretty old version of `mypy` 1.3 (released 1.5 years ago), it produces false positives for `typing.Self`. ## Summary of changes - Bump `mypy` from 1.3 to 1.13 - Fix new warnings and errors - Use `typing.Self` whenever we `return self`	2024-11-22 14:31:36 +00:00
Tristan Partin	c10b7f7de9	Write a newline after adding dynamic_shared_memory_type to PG conf (#9843 ) Without adding a newline, we can end up with a conf line that looks like the following: dynamic_shared_memory_type = mmap# Managed by compute_ctl: begin This leads to Postgres logging: LOG: configuration file "/var/db/postgres/compute/pgdata/postgresql.conf" contains errors; unaffected changes were applied Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-22 13:37:06 +00:00
Heikki Linnakangas	7372312a73	Avoid unnecessary send_replace calls in seqwait (#9852 ) The notifications need to be sent whenever the waiters heap changes, per the comment in `update_status`. But if 'advance' is called when there are no waiters, or the new LSN is lower than the waiters so that no one needs to be woken up, there's no need to send notifications. This saves some CPU cycles in the common case that there are no waiters.	2024-11-22 13:29:49 +00:00
John Spray	d9de65ee8f	pageserver: permit reads behind GC cutoff during LSN grace period (#9833 ) ## Problem In https://github.com/neondatabase/neon/issues/9754 and the flakiness of `test_readonly_node_gc`, we saw that although our logic for controlling GC was sound, the validation of getpage requests was not, because it could not consider LSN leases when requests arrived shortly after restart. Closes https://github.com/neondatabase/neon/issues/9754 ## Summary of changes This is the "Option 3" discussed verbally -- rather than holding back gc cutoff, we waive the usual validation of request LSN if we are still waiting for leases to be sent after startup - When validating LSN in `wait_or_get_last_lsn`, skip the validation relative to GC cutoff if the timeline is still in its LSN lease grace period - Re-enable test_readonly_node_gc	2024-11-22 09:24:23 +00:00
Fedor Dikarev	83b73fc24e	Batch scrape workflows up to last 30 days and stop ad-hoc (#9846 ) Comparing Batch and Ad-hoc collectors there is no big difference, just we need scrape for longer duration to catch retries. Dashboard with comparison: https://neonprod.grafana.net/d/be3pjm7c9ne2oe/compare-ad-hoc-and-batch?orgId=1&from=1731345095814&to=1731946295814 I should anyway raise support case with Github relating to that, meanwhile that should be working solution and should save us some cost, so it worths to switch to Batch now. Ref: https://github.com/neondatabase/cloud/issues/17503	2024-11-22 09:06:00 +00:00
Peter Bendel	1e05e3a6e2	minor PostgreSQL update in benchmarking (#9845 ) ## Problem in benchmarking.yml job pgvector we install postgres from deb packages. After the minor postgres update the referenced packages no longer exist [Failing job: ](https://github.com/neondatabase/neon/actions/runs/11965785323/job/33360391115#step:4:41) ## Summary of changes Reference and install the updated packages. [Successful job after this fix](https://github.com/neondatabase/neon/actions/runs/11967959920/job/33366011934#step:4:45)	2024-11-22 08:31:54 +00:00
Tristan Partin	37962e729e	Fix panic in compute_ctl metrics collection (#9831 ) Calling unwrap on the encoder is a little overzealous. One of the errors that can be returned by the encode function in particular is the non-existence of metrics for a metric family, so we should prematurely filter instances like that out. I believe that the cause of this panic was caused by a race condition between the prometheus collector and the compute collecting the installed extensions metric for the first time. The HTTP server is spawned on a separate thread before we even start bringing up Postgres. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-21 20:19:02 +00:00
Erik Grinaker	190e8cebac	safekeeper,pageserver: add CPU profiling (#9764 ) ## Problem We don't have a convenient way to gather CPU profiles from a running binary, e.g. during production incidents or end-to-end benchmarks, nor during microbenchmarks (particularly on macOS). We would also like to have continuous profiling in production, likely using [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/). We may choose to use either eBPF profiles or pprof profiles for this (pending testing and discussion with SREs), but pprof profiles appear useful regardless for the reasons listed above. See https://github.com/neondatabase/cloud/issues/14888. This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888. ## Summary of changes Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns it. Defaults to a 5-second pprof Protobuf profile for use with e.g. `pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query parameters: * `format`: output format (`pprof` or `svg`) * `frequency`: sampling frequency in microseconds (default 100) * `seconds`: number of seconds to profile (default 5) Also integrates pprof profiles into Criterion benchmarks, such that flamegraph reports can be taken with `cargo bench ... --profile-duration <seconds>`. Output under `target/criterion//profile/flamegraph.svg`. Example profiles: pprof profile (use [`pprof`](https://github.com/google/pprof)): [profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz) * Web interface: `pprof -http :6060 profile.pb.gz` * Interactive flamegraph: [profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)	2024-11-21 18:59:46 +00:00
Conrad Ludgate	725a5ff003	fix(proxy): CancelKeyData display log masking (#9838 ) Fixes the masking for the CancelKeyData display format. Due to negative i32 cast to u64, the top-bits all had `0xffffffff` prefix. On the bitwise-or that followed, these took priority. This PR also compresses 3 logs during sql-over-http into 1 log with durations as label fields, as prior discussed.	2024-11-21 16:46:30 +00:00
Alexander Bayandin	8d1c44039e	Python 3.11 (#9515 ) ## Problem On Debian 12 (Bookworm), Python 3.11 is the latest available version. ## Summary of changes - Update Python to 3.11 in build-tools - Fix ruff check / format - Fix mypy - Use `StrEnum` instead of pair `str`, `Enum` - Update docs	2024-11-21 16:25:31 +00:00
Konstantin Knizhnik	0713ff3176	Bump Postgres version (#9808 ) ## Problem I have made a mistake in merging Postgre PRs ## Summary of changes Restore consistency of submodule referenced. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-21 14:56:56 +00:00
John Spray	42bda5d632	pageserver: revise metrics lifetime for SecondaryTenant (#9818 ) ## Problem We saw a scale test failure when one shard went secondary->attached->secondary in a short period of time -- the metrics for the shard failed a validation assertion that is meant to ensure the size metric matches the sum of layer sizes in the SecondaryDetail struct. This appears to be due to two SecondaryTenants being alive at the same time -- the first one was shut down but still had its contributions to the metrics. Closes: https://github.com/neondatabase/neon/issues/9628 ## Summary of changes - Refactor code for validating metrics and call it in shutdown as well as during downloads - Move code for dropping per-tenant secondary metrics from drop() into shutdown(), so that once shutdown() completes it is definitely safe to instantiate another SecondaryTenant for the same tenant.	2024-11-21 08:31:24 +00:00
Arpad Müller	59c2c3f8ad	compute_ctl: print OpenTelemetry errors via tracing, not stdout (#9830 ) Before, `OpenTelemetry` errors were printed to stdout/stderr directly, causing one of the few log lines without a timestamp, like: ``` OpenTelemetry trace error occurred. error sending request for url (http://localhost:4318/v1/traces) ``` Now, we print: ``` 2024-11-21T02:24:20.511160Z INFO OpenTelemetry error: error sending request for url (http://localhost:4318/v1/traces) ``` I found this while investigating #9731.	2024-11-21 04:46:01 +00:00
Ivan Efremov	2d6bf176a0	proxy: Refactor http conn pool (#9785 ) - Use the same ConnPoolEntry for http connection pool. - Rename EndpointConnPool to the HttpConnPool. - Narrow clone bound for client Fixes #9284	2024-11-20 19:36:29 +00:00
Vadim Kharitonov	313ebfdb88	[proxy] chore: allow bypassing empty `params` to `/sql` endpoint (#9827 ) ## Problem ``` curl -H "Neon-Connection-String: postgresql://neondb_owner:PASSWORD@ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/neondb?sslmode=require" https://ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/sql -d '{"query":"SELECT 1","params":[]}' ``` For such a query, I also need to send `params`. Do I really need it? ## Summary of changes I've marked `params` as optional	2024-11-20 19:36:23 +00:00
Arpad Müller	811fab136f	scrubber: allow restricting find_garbage to a partial tenant id prefix (#9814 ) Adds support to the `find_garbage` command to restrict itself to a partial tenant ID prefix, say `a`, and then it only traverses tenants with IDs starting with `a`. One can now pass the `--tenant-id-prefix` parameter. That way, one can shard the `find_garbage` command and make it run in parallel. The PR also does a change of how `remote_storage` first removes trailing `/`s, only to then add them in the listing function. It turns out that this isn't neccessary and it prevents the prefix functionality from working. S3 doesn't do this either.	2024-11-20 19:31:02 +00:00
Vlad Lazar	ee26f09e45	pageserver: remove shard split hard link assertion (#9829 ) ## Problem We were hitting this assertion in debug mode tests sometimes. This case was being hit when the parent shard has no resident layers. For instance, this is the case on split retry where the previous attempt shut-down the parent and deleted local state for it. If the logical size calculation does not download some layers before we get to the hardlinking, then the assertion is hit. ## Summary of Changes Remove the assertion. It's fine for the ancestor to not have any resident layers at the time of the split. Closes https://github.com/neondatabase/neon/issues/9412	2024-11-20 18:33:05 +00:00
Conrad Ludgate	f36f0068b8	chore(proxy): demote more logs during successful connection attempts (#9828 ) Follow up to #9803 See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment. I've also slightly refactored the sql-over-http body read/length reject code. I can split that into a separate PR. It just felt natural after I switched to `read_body_with_limit` as we discussed during the meet.	2024-11-20 17:50:39 +00:00
John Spray	5ff2f1ee7d	pageserver: enable compaction to proceed while live-migrating (#5397 ) ## Problem Long ago, in #5299 the tenant states for migration are added, but respected only in a coarse-grained way: when hinted not to do deletions, tenants will just avoid doing all GC or compaction. Skipping compaction is not necessary for AttachedMulti, as we will soon become the primary attached location, and it is not a waste of resources to proceed with compaction. Instead, per the RFC https://github.com/neondatabase/neon/pull/5029/files), deletions should be queued up in this state, and executed later when we switch to AttachedSingle. Avoiding compaction in AttachedMulti can have an operational impact if a tenant is under significant write load, as a long-running migration can result in a large accumulation of delta layers with commensurate impact on read latency. Closes: https://github.com/neondatabase/neon/issues/5396 ## Summary of changes - Add a 'config' part to RemoteTimelineClient so that it can be aware of the mode of the tenant it belongs to, and wire this through for construction + updates - Add a special buffer for delayed deletions, and when in AttachedMulti route deletions here instead of into the main remote client queue. This is drained when transitioning to AttachedSingle. If the tenant is detached or our process dies before then, then these objects are leaked. - As a quality of life improvement, also use the remote timeline client's knowledge of the tenant state to avoid submitting remote consistent LSN updates for validation when in AttachedStale (as we know these will fail) ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-11-20 17:31:55 +00:00
John Spray	67f5f83edc	pageserver: avoid reading SLRU blocks for GC on shards >0 (#9423 ) ## Problem SLRU blocks, which can add up to several gigabytes, are currently ingested by all shards, multiplying their capacity cost by the shard count and slowing down ingest. We do this because all shards need the SLRU pages to do timestamp->LSN lookup for GC. Related: https://github.com/neondatabase/neon/issues/7512 ## Summary of changes - On non-zero shards, learn the GC offset from shard 0's index instead of calculating it. - Add a test `test_sharding_gc` that exercises this - Do GC in test_pg_regress as a general smoke test that GC functions run (e.g. this would fail if we were using SLRUs we didn't have) In this PR we are still ingesting SLRUs everywhere, but not using them any more. Part 2 PR (https://github.com/neondatabase/neon/pull/9786) makes the change to not store them at all. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-11-20 15:56:14 +00:00
John Spray	593e35027a	tests: use fewer pageservers in test_sharding_split_smoke (#9804 ) ## Problem This test uses a gratuitous number of pageservers (16). This works fine when there are plenty of system resources, but causes issues on test runners that have limited resources and run many tests concurrently. Related: https://github.com/neondatabase/neon/issues/9802 ## Summary of changes - Split from 2 shards to 4, instead of 4 to 8 - Don't give every shard a separate pageserver, let two locations share each pageserver. Net result is 4 pageservers instead of 16	2024-11-20 14:57:59 +00:00
Folke Behrens	bf7d859a8b	proxy: Rename RequestMonitoring to RequestContext (#9805 ) ## Problem It is called context/ctx everywhere and the Monitoring suffix needlessly confuses with proper monitoring code. ## Summary of changes * Rename RequestMonitoring to RequestContext * Rename RequestMonitoringInner to RequestContextInner	2024-11-20 12:50:36 +00:00
Alexander Bayandin	899933e159	scan_log_for_errors: check that regex is correct (#9815 ) ## Problem I've noticed that we have 2 flaky tests which failed with error: ``` re.error: missing ), unterminated subpattern at position 21 ``` - `test_timeline_archival_chaos` — has been already fixed - `test_sharded_tad_interleaved_after_partial_success` — I didn't manage to find the incorrect regex [Internal link](https://neonprod.grafana.net/goto/yfmVHV7NR?orgId=1) ## Summary of changes - Wrap `re.match` in `try..except` block and print incorrect regex	2024-11-20 12:48:21 +00:00
Alexander Bayandin	46beecacce	CI(benchmarking): route test failures to on-call-qa-staging-stream (#9813 ) ## Problem We want to keep `#on-call-staging-stream` channel close to the prod one and redirect notifications from failing benchmarks to another channel for investigation. ## Summary of changes - Send notifications regarding failures in `benchmarking` job to `#on-call-staging-stream` - Send notifications regarding failures in `periodic_pagebench` job to `#on-call-staging-stream`	2024-11-20 12:23:41 +00:00
Fedor Dikarev	94e4a0e2a0	update macos version for runner (#9817 ) Closes: https://github.com/neondatabase/neon/issues/9816 Run MacOs builds on `macos-15`. As `pkg-config` is bundled in runner image, don't install it with `brew`	2024-11-20 13:04:14 +01:00
John Spray	33dce25af8	safekeeper: block deletion on protocol handler shutdown (#9364 ) ## Problem Two recently observed log errors indicate safekeeper tasks for a timeline running after that timeline's deletion has started. - https://github.com/neondatabase/neon/issues/8972 - https://github.com/neondatabase/neon/issues/8974 These code paths do not have a mechanism that coordinates task shutdown with the overall shutdown of the timeline. ## Summary of changes - Add a `Gate` to `Timeline` - Take the gate as part of resident timeline guard: any code that holds a guard over a timeline staying resident should also hold a guard over the timeline's total lifetime. - Take the gate from the wal removal task - Respect Timeline::cancel in WAL send/recv code, so that we do not block shutdown indefinitely. - Add a test that deletes timelines with open pageserver+compute connections, to check these get torn down as expected. There is some risk to introducing gates: if there is code holding a gate which does not properly respect a cancellation token, it can cause shutdown hangs. The risk of this for safekeepers is lower in practice than it is for other services, because in a healthy timeline deletion, the compute is shutdown first, then the timeline is deleted on the pageserver, and finally it is deleted on the safekeepers -- that makes it much less likely that some protocol handler will still be running. Closes: #8972 Closes: #8974	2024-11-20 11:07:45 +00:00
Conrad Ludgate	3ae0b2149e	chore(proxy): demote a ton of logs for successful connection attempts (#9803 ) See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment.	2024-11-20 10:14:28 +00:00
Arpad Müller	0a499a3176	Don't preload offloaded timelines (#9646 ) In timeline preloading, we also do a preload for offloaded timelines. This includes the download of `index-part.json`. Ultimately, such a download is wasteful, therefore avoid it. Same goes for the remote client, we just discard it immediately thereafter. Part of #8088 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-11-20 05:44:23 +00:00
Matthias van de Meent	ea1858e3b6	compute_ctl: Streamline and Pipeline startup SQL (#9717 ) Before, compute_ctl didn't have a good registry for what command would run when, depending exclusively on sync code to apply changes. When users have many databases/roles to manage, this step can take a substantial amount of time, breaking assumptions about low (re)start times in other systems. This commit reduces the time compute_ctl takes to restart when changes must be applied, by making all commands more or less blind writes, and applying these commands in an asynchronous context, only waiting for completion once we know the commands have all been sent. Additionally, this reduces time spent by batching per-database operations where previously we would create a new SQL connection for every user-database operation we planned to execute.	2024-11-20 02:14:58 +01:00
Alexander Bayandin	2281a02c49	CODEOWNERS: add developer-productivity team (#9810 ) Notify @neondatabase/developer-productivity team about changes in CI (i.e. in `.github/` directory)	2024-11-20 00:30:24 +00:00
Alexander Bayandin	725e0a1ac9	CI(release): create reusable workflow for releases (#9806 ) ## Problem We have a bunch of duplicated code for automated releases. There will be even more, once we have `release-compute` branch (https://github.com/neondatabase/neon/pull/9637). Another issue with the current `release` workflow is that it creates a PR from the main as is. If we create 2 different releases from the same commit, GitHub could mix up results from different PRs. ## Summary of changes - Create a reusable workflow for releases - Create an empty commit to differentiate releases	2024-11-19 23:03:15 +00:00
Konstantin Knizhnik	770ac34ae6	Register custom xlog reader callbacks for on-demand WAL download in StartupDecodingContext (#9007 ) ## Problem See https://github.com/neondatabase/neon/issues/8931 On-demand WAL download are not set in all cases where WAL is accessed by logical replication ## Summary of changes Set customer xlog reader handles in StartupDecodingContext Related changes in Postgres modules: https://github.com/neondatabase/postgres/pull/495 https://github.com/neondatabase/postgres/pull/496 https://github.com/neondatabase/postgres/pull/497 https://github.com/neondatabase/postgres/pull/498 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-19 22:29:57 +02:00
Alex Chi Z.	b22a84a7bf	feat(pageserver): support key range for manual compaction trigger (#9723 ) part of https://github.com/neondatabase/neon/issues/9114, we want to be able to run partial gc-compaction in tests. In the future, we can also expand this functionality to legacy compaction, so that we can trigger compaction for a specific key range. ## Summary of changes * Support passing compaction key range through pageserver routes. * Refactor input parameters of compact related function to take the new `CompactOptions`. * Add tests for partial compaction. Note that the test may or may not trigger compaction based on GC horizon. We need to improve the test case to ensure things always get below the gc_horizon and the gc-compaction can be triggered. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-19 19:38:41 +00:00
Arpad Müller	b092126c94	scrubber: fix parsing issue with Azure (#9797 ) Apparently Azure returns timelines ending with `/` which confuses the parsing. So remove all trailing `/`s before attempting to parse. Part of https://github.com/neondatabase/cloud/issues/19963	2024-11-19 20:10:53 +01:00
Alex Chi Z.	5e3fbef721	fix(pageserver): queue stopped error should be ignored during create timeline (#9767 ) close https://github.com/neondatabase/neon/issues/9730 The test case tests if anything goes wrong during pageserver restart + during timeline creation not complete. Therefore, queue is stopped error is normal in this case, except that it should be categorized as a shutdown error instead of a real error. ## Summary of changes * More comments for the test case. * Queue stopped error will now be forwarded as CreateTimelineError::ShuttingDown. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-19 14:10:09 -05:00
dependabot[bot]	15468cd23c	build(deps): bump aiohttp from 3.10.2 to 3.10.11 (#9794 )	2024-11-19 19:08:00 +00:00
Peter Bendel	a8ac895b83	re-acquire S3 OIDC token after long running tests for report upload to S3 (#9799 ) ## Problem If a benchmark or test-case runs longer than the AWS OIDC token lifetime successive upload of test reports to S3 fail - example: https://github.com/neondatabase/neon/actions/runs/11905529176/job/33176168174#step:9:243 ## Summary of changes In actions that require access to S3 and which are invoked after a long running python testcase we re-acquire the OIDC token explicitly. Note that we need to pass down the aws_oicd_role_arn from the workflow to the action because actions have no access to GitHub vars for security reasons. Sample run https://github.com/neondatabase/neon/actions/runs/11912328276/job/33195676867	2024-11-19 18:22:51 +01:00
Heikki Linnakangas	ada84400b7	PostgreSQL minor version updates (17.2, 16.6, 15.10, 14.15) (#9795 ) The community decided to make a new off-schedule release due to ABI breakage in last week's release. We're not affected by the ABI breakage because we rebuild all extensions in our docker images, but let's stay up-to-date. There were a few other fixes in the release too.	2024-11-19 17:01:05 +02:00
Conrad Ludgate	191f745c81	fix(proxy/auth_broker): ignore -pooler suffix (#9800 ) Fixes https://github.com/neondatabase/cloud/issues/20400 We cannot mix local_proxy and pgbouncer, so we are filtering out the `-pooler` suffix prior to calling wake_compute.	2024-11-19 13:58:26 +00:00
Conrad Ludgate	37b97b3a68	chore(local_proxy): reduce some startup logging (#9798 ) Currently, local_proxy will write an error log if it doesn't find the config file. This is expected for startup, so it's just noise. It is an error if we do receive an explicit SIGHUP though. I've also demoted the build info logs to be debug level. We don't need them in the compute image since we have other ways to determine what code is running. Lastly, I've demoted SIGHUP signal handling from warn to info, since it's not really a warning event. See https://github.com/neondatabase/cloud/issues/10880 for more details	2024-11-19 13:58:11 +00:00
Konstantin Knizhnik	c9acd214ae	Do not create DSM segment for wal_redo_postgres (#9793 ) ## Problem See https://github.com/neondatabase/neon/issues/9738 ## Summary of changes Do not create DSM segment for wal_redo Postgres --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-19 11:56:40 +02:00
Peter Bendel	982cb1c15d	Move logic for ingest benchmark from GitHub workflow into python testcase (#9762 ) ## Problem The first version of the ingest benchmark had some parsing and reporting logic in shell script inside GitHub workflow. it is better to move that logic into a python testcase so that we can also run it locally. ## Summary of changes - Create new python testcase - invoke pgcopydb inside python test case - move the following logic into python testcase - determine backpressure - invoke pgcopydb and report its progress - parse pgcopydb log and extract metrics - insert metrics into perf test database - add additional column to perf test database that can receive endpoint ID used for pgcopydb run to have it available in grafana dashboard when retrieving other metrics for an endpoint ## Example run https://github.com/neondatabase/neon/actions/runs/11860622170/job/33056264386	2024-11-19 09:46:46 +00:00
Arpad Müller	9b6af2bcad	Add the ability to configure GenericRemoteStorage for the scrubber (#9652 ) Earlier work (#7547) has made the scrubber internally generic, but one could only configure it to use S3 storage. This is the final piece to make (most of, snapshotting still requires S3) the scrubber be able to be configured via GenericRemoteStorage. I.e. you can now set an env var like: ``` REMOTE_STORAGE_CONFIG='remote_storage = { bucket_name = "neon-dev-safekeeper-us-east-2d", bucket_region = "us-east-2" } ``` and the scrubber will read it instead.	2024-11-18 21:01:48 +00:00
Arpad Müller	4fc3af15dd	Remove at most one retain_lsn entry from (possibly offloaded) timelne's parent (#9791 ) There is a potential data corruption issue, not one I've encountered, but it's still not hard to hit with some correct looking code given our current architecture. It has to do with the timeline's memory object storage via reference counted `Arc`s, and the removal of `retain_lsn` entries at the drop of the last `Arc` reference. The corruption steps are as follows: 1. timeline gets offloaded. timeline object A doesn't get dropped though, because some long-running task accesses it 2. the same timeline gets unoffloaded again. timeline object B gets created for it, timeline object A still referenced. both point to the same timeline. 3. the task keeping the reference to timeline object A exits. destructor for object A runs, removing `retain_lsn` in the timeline's parent. 4. the timeline's parent runs gc without the `retain_lsn` of the still exant timleine's child, leading to data corruption. In general we are susceptible each time when we recreate a `Timeline` object in the same process, which happens both during a timeline offload/unoffload cycle, as well as during an ancestor detach operation. The solution this PR implements is to make the destructor for a timeline as well as an offloaded timeline remove at most one `retain_lsn`. PR #9760 has added a log line to print the refcounts at timeline offload, but this only detects one of the places where we do such a recycle operation. Plus it doesn't prevent the actual issue. I doubt that this occurs in practice. It is more a defense in depth measure. Usually I'd assume that the timeline gets dropped immediately in step 1, as there is no background tasks referencing it after its shutdown. But one never knows, and reducing the stakes of step 1 actually occurring is a really good idea, from potential data corruption to waste of CPU time. Part of #8088	2024-11-18 21:42:19 +01:00
Vlad Lazar	d7662fdc7b	feat(page_service): timeout-based batching of requests (#9321 ) ## Problem We don't take advantage of queue depth generated by the compute on the pageserver. We can process getpage requests more efficiently by batching them. ## Summary of changes Batch up incoming getpage requests that arrive within a configurable time window (`server_side_batch_timeout`). Then process the entire batch via one `get_vectored` timeline operation. By default, no merging takes place. ## Testing * Functional: https://github.com/neondatabase/neon/pull/9792 * Performance: will be done in staging/pre-prod # Refs * https://github.com/neondatabase/neon/issues/9377 * https://github.com/neondatabase/neon/issues/9376 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-11-18 20:24:03 +00:00
Alex Chi Z.	e5c89f3da3	feat(pageserver): drop disposable keys during gc-compaction (#9765 ) close https://github.com/neondatabase/neon/issues/9552, close https://github.com/neondatabase/neon/issues/8920, part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Drop keys not belonging to this shard during gc-compaction to avoid constructing history that might have been truncated during shard compaction. * Run gc-compaction at the end of shard compaction test. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-18 19:27:52 +00:00
Alexey Kondratov	5f0e9c9a94	feat(compute/tests): Report successful replication test runs as well (#9787 ) It should increase the visibility of whether they run and pass.	2024-11-18 16:05:09 +00:00
Alexander Bayandin	44f33b2bd6	Bump default Postgres version for tests to v17 (#9777 ) ## Problem Tests that are marked with `run_only_on_default_postgres` do not run on debug builds on CI because we run debug builds only for the latest Postgres version (which is 17) ## Summary of changes - Bump `PgVersion.DEFAULT` to `v17` - Skip `test_timeline_archival_chaos` in debug builds	2024-11-18 15:06:24 +00:00
Alexander Bayandin	913b5b7027	CI: remove separate check-build-tools-image workflow (#9708 ) ## Problem We call `check-build-tools-image` twice for each workflow whenever we use it, along with `build-build-tools-image`, once as a workflow itself, and the second time from `build-build-tools-image`. This is not necessary. ## Summary of changes - Inline `check-build-tools-image` into `build-build-tools-image` - Remove separate `check-build-tools-image` workflow	2024-11-18 13:14:28 +00:00
John Spray	3f401a328f	tests: mitigate bug to stabilize test_storage_controller_many_tenants (#9771 ) ## Problem Due to #9471 , the scale test occasionally gets 404s while trying to modify the config of a timeline that belongs to a tenant being migrated. We rarely see this narrow race in the field, but the test is quite good at reproducing it. ## Summary of changes - Ignore 404 errors in this test.	2024-11-18 11:33:27 +00:00
Peter Bendel	c3eecf6763	adapt pgvector bench to minor version upgrades of PostgreSql (#9784 ) ## Problem pgvector benchmark is failing because after PostgreSQL minor version upgrade previous version packages are no longer available in deb repository [example failure](https://github.com/neondatabase/neon/actions/runs/11875503070/job/33092787149#step:4:40) ## Summary of changes Update postgres minor version of packages to current version [Example run after this change](https://github.com/neondatabase/neon/actions/runs/11888978279/job/33124614605)	2024-11-18 10:47:43 +00:00
Konstantin Knizhnik	6fa9b0cd8c	Use DATA_DIR instead of current workign directory in restore_from_wal script (#9729 ) ## Problem See https://github.com/neondatabase/neon/issues/7750 test_wal_restore.sh is copying file to current working directory which can cause interfere of test_wa_restore.py tests spawned of different configurations. ## Summary of changes Copy file to $DATA_DIR Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-18 11:55:38 +02:00
a-masterov	10bc1903e1	Fix the regression test running against the staging instance (#9773 ) ## Problem The Postgres version was updated. The patch has to be updated accordingly. ## Summary of changes The patch of the regression test was updated.	2024-11-18 10:30:50 +01:00
Arseny Sher	1388bbae73	Merge pull request #9783 from neondatabase/rc/2024-11-18 Storage & Compute release 2024-11-18	2024-11-18 12:22:58 +03:00
John Spray	261d065e6f	pageserver: respect no_sync in `VirtualFile` (#9772 ) ## Problem `no_sync` initially just skipped syncfs on startup (#9677). I'm also interested in flaky tests that time out during pageserver shutdown while flushing l0s, so to eliminate disk throughput as a source of issues there, ## Summary of changes - Drive-by change for test timeouts: add a couple more ::info logs during pageserver startup so it's obvious which part got stuck. - Add a SyncMode enum to configure VirtualFile and respect it in sync_all and sync_data functions - During pageserver startup, set SyncMode according to `no_sync`	2024-11-18 08:59:05 +00:00
Christian Schwarz	b6154b03f4	build(deps): bump smallvec to 1.13.2 to get UB fix (#9781 ) Smallvec 1.13.2 contains [an UB fix](https://github.com/servo/rust-smallvec/pull/345). Upstream opened [a request](https://github.com/rustsec/advisory-db/issues/1960) for this in the advisory-db but it never got acted upon. Found while working on https://github.com/neondatabase/neon/pull/9321.	2024-11-17 21:25:16 +01:00
Erik Grinaker	8880134171	Cargo.toml: upgrade tikv-jemallocator to 0.6.0 (#9779 )	2024-11-17 19:52:05 +01:00
Erik Grinaker	de7e4a34ca	safekeeper: send `AppendResponse` on segment flush (#9692 ) ## Problem When processing pipelined `AppendRequest`s, we explicitly flush the WAL every second and return an `AppendResponse`. However, the WAL is also implicitly flushed on segment bounds, but this does not result in an `AppendResponse`. Because of this, concurrent transactions may take up to 1 second to commit and writes may take up to 1 second before sending to the pageserver. ## Summary of changes Advance `flush_lsn` when a WAL segment is closed and flushed, and emit an `AppendResponse`. To accommodate this, track the `flush_lsn` in addition to the `flush_record_lsn`.	2024-11-17 18:19:14 +01:00
Vlad Lazar	ac689ab014	wal_decoder: rename end_lsn to next_record_lsn (#9776 ) ## Problem It turns out that `WalStreamDecoder::poll_decode` returns the start LSN of the next record and not the end LSN of the current record. They are not always equal. For example, they're not equal when the record in question is an XLOG SWITCH record. ## Summary of changes Rename things to reflect that.	2024-11-15 21:53:11 +00:00
Tristan Partin	23eabb9919	Fix PG_MAJORVERSION_NUM typo In `ea32f1d0a3`, Matthias added a feature to our extension to expose more granular wait events. However, due to the typo, those wait events were never registered, so we used the more generic wait events instead. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-15 15:17:23 -06:00
Vlad Lazar	2af791ba83	wal_decoder: make InterpretedWalRecord serde (#9775 ) ## Problem We want to serialize interpreted records to send them over the wire from safekeeper to pageserver. ## Summary of changes Make `InterpretedWalRecord` ser/de. This is a temporary change to get the bulk of the lift merged in https://github.com/neondatabase/neon/pull/9746. For going to prod, we don't want to use bincode since we can't evolve the schema. Questions on serialization will be tackled separately.	2024-11-15 20:34:48 +00:00
Mikhail Kot	e12628fe93	Collect max_connections metric (#9770 ) This will further allow us to expose this metric to users	2024-11-15 17:42:41 +00:00
Arpad Müller	7880c246f1	Correct mistakes in offloaded timeline retain_lsn management (#9760 ) PR #9308 has modified tenant activation code to take offloaded child timelines into account for populating the list of `retain_lsn` values. However, there is more places than just tenant activation where one needs to update the `retain_lsn`s. This PR fixes some bugs of the current code that could lead to corruption in the worst case: 1. Deleting of an offloaded timeline would not get its `retain_lsn` purged from its parent. With the patch we now do it, but as the parent can be offloaded as well, the situatoin is a bit trickier than for non-offloaded timelines which can just keep a pointer to their parent. Here we can't keep a pointer because the parent might get offloaded, then unoffloaded again, creating a dangling pointer situation. Keeping a pointer to the tenant is not good either, because we might drop the offloaded timeline in a context where a `offloaded_timelines` lock is already held: so we don't want to acquire a lock in the drop code of OffloadedTimeline. 2. Unoffloading a timeline would not get its `retain_lsn` values populated, leading to it maybe garbage collecting values that its children might need. We now call `initialize_gc_info` on the parent. 3. Offloading of a timeline would not get its `retain_lsn` values registered as offloaded at the parent. So if we drop the `Timeline` object, and its registration is removed, the parent would not have any of the child's `retain_lsn`s around. Also, before, the `Timeline` object would delete anything related to its timeline ID, now it only deletes `retain_lsn`s that have `MaybeOffloaded::No` set. Incorporates Chi's reproducer from #9753. cc https://github.com/neondatabase/cloud/issues/20199 The `test_timeline_retain_lsn` test is extended: 1. it gains a new dimension, duplicating each mode, to either have the "main" branch be the direct parent of the timeline we archive, or the "test_archived_parent" branch intermediary, creating a three timeline structure. This doesn't test anything fixed by this PR in particular, just explores the vast space of possible configurations a little bit more. 2. it gains two new modes, `offload-parent`, which tests the second point, and `offload-no-restart` which tests the third point. It's easy to verify the test actually is "sharp" by removing one of the respective `self.initialize_gc_info()`, `gc_info.insert_child()` or `ancestor_children.push()`. Part of #8088 --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2024-11-15 14:22:29 +01:00
John Spray	04938d9d55	tests: tolerate pageserver 500s in test_timeline_archival_chaos (#9769 ) ## Problem Test exposes cases where pageserver gives 500 responses, causing failures like https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9766/11844529470/index.html#suites/d1acc79950edeb0563fc86236c620898/3546be2ffed99ba6 ## Summary of changes - Tolerate such messages, and link an issue for cleaning up the pageserver not to return such 500s.	2024-11-15 13:22:05 +00:00
Erik Grinaker	19f7d40c1d	deny.toml: allow CDDL-1.0 license (#9766 ) #9764, which adds profiling support to Safekeeper, pulls in the dependency [`inferno`](https://crates.io/crates/inferno) via [`pprof-rs`](https://crates.io/crates/pprof). This is licenced under the [Common Development and Distribution License 1.0](https://spdx.org/licenses/CDDL-1.0.html), which is not allowed by `cargo-deny`. This patch allows the CDDL-1.0 license. It is a derivative of the Mozilla Public License, which we already allow, but avoids some issues around European copyright law that the MPL had. As such, I don't expect this to be problematic.	2024-11-15 10:41:43 +00:00
John Spray	38563de7dd	storcon: exclude non-Active tenants from shard autosplitting (#9743 ) ## Problem We didn't have a neat way to prevent auto-splitting of tenants. This could be useful during incidents or for testing. Closes https://github.com/neondatabase/neon/issues/9332 ## Summary of changes - Filter splitting candidates by scheduling policy	2024-11-14 19:41:10 +00:00
John Spray	93939f123f	tests: add test_timeline_archival_chaos (#9609 ) ## Problem - We lack test coverage of cases where multiple timelines fight for updates to the same manifest (https://github.com/neondatabase/neon/pull/9557), and in timeline archival changes while dual-attached (https://github.com/neondatabase/neon/pull/9555) ## Summary of changes - Add a chaos test for timeline creation->archival->offload->deletion	2024-11-14 17:31:35 +00:00
Tristan Partin	49b599c113	Remove the replication slot in test_snap_files at the end of the test Analysis of the LR benchmarking tests indicates that in the duration of test_subscriber_lag, a leftover 'slotter' replication slot can lead to retained WAL growing on the publisher. This replication slot is not used by any subscriber. The only purpose of the slot is to generate snapshot files for the puspose of test_snap_files. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-14 10:59:15 -06:00
Yuchen Liang	8cde37bc0b	test: disable test_readonly_node_gc until proper fix (#9755 ) ## Problem After investigation, we think to make `test_readonly_node_gc` less flaky, we need to make a proper fix (likely involving persisting part of the lease state). See https://github.com/neondatabase/neon/issues/9754 for details. ## Summary of changes - skip the test until proper fix. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-14 15:26:58 +00:00
Konstantin Knizhnik	f70611c8df	Correctly truncate VM (#9342 ) ## Problem https://github.com/neondatabase/neon/issues/9240 ## Summary of changes Correctly truncate VM page instead just replacing it with zero page. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-11-14 17:19:13 +02:00
Vlad Lazar	21282aa113	cargo: use neon branch of rust-postgres (#9757 ) ## Problem We are pining our fork of rust-postgres to a commit hash and that prevents us from making further changes to it. The latest commit in rust-postgres requires https://github.com/neondatabase/neon/pull/8747, but that seems to have gone stale. I reverted rust-postgres `neon` branch to the pinned commit in https://github.com/neondatabase/rust-postgres/pull/31. ## Summary of changes Switch back to using the `neon` branch of the rust-postgres fork.	2024-11-14 15:16:43 +00:00
Arseny Sher	d06bf4b0fe	safekeeper: fix atomicity of WAL truncation (#9685 ) If WAL truncation fails in the middle it might leave some data on disk above the write/flush LSN. In theory, concatenated with previous records it might form bogus WAL (though very unlikely in practice because CRC would protect from that). To protect from that, set pending_wal_truncation flag: means before any WAL writes truncation must be retried until it succeeds. We already did that in case of safekeeper restart, now extend this mechanism for failures without restart. Also, importantly, reset LSNs in the beginning of the operation, not in the end, because once on disk deletion starts previous pointers are wrong. All this most likely haven't created any problems in practice because CRC protects from the consequences. Tests for this are hard; simulation infrastructure might be useful here in the future, but not yet.	2024-11-14 13:06:42 +03:00
Tristan Partin	1280b708f1	Improve error handling for NeonAPI fixture Move error handling to the common request function and add a debug log. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-13 20:35:48 -06:00
Alexey Kondratov	6dba1a36b8	Merge pull request #9745 from neondatabase/compute-release-2024-11-13 Compute release 2024-11-13 Includes Postgres minor version upgrades and various other bugfixes and improvements.	2024-11-13 19:11:15 +01:00
John Spray	b4e00b8b22	pageserver: refuse to load tenants with suspiciously old indices in old generations (#9719 ) ## Problem Historically, if a control component passed a pageserver "generation: 1" this could be a quick way to corrupt a tenant by loading a historic index. Follows https://github.com/neondatabase/neon/pull/9383 Closes #6951 ## Summary of changes - Introduce a Fatal variant to DownloadError, to enable index downloads to signal when they have encountered a scary enough situation that we shouldn't proceed to load the tenant. - Handle this variant by putting the tenant into a broken state (no matter which timeline within the tenant reported it) - Add a test for this case In the event that this behavior fires when we don't want it to, we have ways to intervene: - "Touch" an affected index to update its mtime (download+upload S3 object) - If this behavior is triggered, it indicates we're attaching in some old generation, so we should be able to fix that by manually bumping generation numbers in the storage controller database (this should never happen, but it's an option if it does)	2024-11-13 18:07:39 +00:00
Heikki Linnakangas	10aaa3677d	PostgreSQL minor version updates (17.1, 16.5, 15.9, 14.14) (#9727 ) This includes a patch to temporarily disable one test in the pg_anon test suite. It is an upstream issue, the test started failing with the new PostgreSQL minor versions because of a change in the default timezone used in tests. We don't want to block the release for this, so just disable the test for now. See `199f0a392b (note_2148017485)` Corresponding postgres repository PRs: https://github.com/neondatabase/postgres/pull/524 https://github.com/neondatabase/postgres/pull/525 https://github.com/neondatabase/postgres/pull/526 https://github.com/neondatabase/postgres/pull/527	2024-11-13 15:08:58 +02:00
Heikki Linnakangas	d5435b1a81	tests: Increase timeout in test_create_churn_during_restart (#9736 ) This test was seen to be flaky, e.g. at: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9457/11804246485/index.html#suites/ec4311502db344eee91f1354e9dc839b/982bd121ea698414/. If I _reduce_ the timeout from 10s to 8s on my laptop, it reliably hits that timeout and fails. That suggests that the test is pretty close to the edge even when it passes. Let's bump up the timeout to 30 s to make it more robust. See also https://github.com/neondatabase/neon/issues/9730, although the error message is different there.	2024-11-13 12:20:32 +02:00
Anastasia Lubennikova	080d585b22	Add installed_extensions prometheus metric (#9608 ) and add /metrics endpoint to compute_ctl to expose such metrics metric format example for extension pg_rag with versions 1.2.3 and 1.4.2 installed in 3 and 1 databases respectively: neon_extensions_installed{extension="pg_rag", version="1.2.3"} = 3 neon_extensions_installed{extension="pg_rag", version="1.4.2"} = 1 ------ infra part: https://github.com/neondatabase/flux-fleet/pull/251 --------- Co-authored-by: Tristan Partin <tristan@neon.tech>	2024-11-13 09:36:48 +00:00
John Spray	7595d3afe6	pageserver: add `no_sync` for use in regression tests (2/2) (#9678 ) ## Problem Followup to https://github.com/neondatabase/neon/pull/9677 which enables `no_sync` in tests. This can be merged once the next release has happened. ## Summary of changes - Always run pageserver with `no_sync = true` in tests.	2024-11-13 09:17:26 +00:00
Konstantin Knizhnik	1ff5333a1b	Do not wallog AUX files at replica (#9457 ) ## Problem Attempt to persist LR stuff at replica cause cannot make new WAL entries during recovery` error. See https://neondb.slack.com/archives/C07S7RBFVRA/p1729280401283389 ## Summary of changes Do not wallog AUX files at replica. Related Postgres PRs: https://github.com/neondatabase/postgres/pull/517 https://github.com/neondatabase/postgres/pull/516 https://github.com/neondatabase/postgres/pull/515 https://github.com/neondatabase/postgres/pull/514 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-11-13 08:50:01 +02:00
Tristan Partin	d8f5d43549	Fix autocommit footguns in performance tests psycopg2 has the following warning related to autocommit: > By default, any query execution, including a simple SELECT will start > a transaction: for long-running programs, if no further action is > taken, the session will remain “idle in transaction”, an undesirable > condition for several reasons (locks are held by the session, tables > bloat…). For long lived scripts, either ensure to terminate a > transaction as soon as possible or use an autocommit connection. In the 2.9 release notes, psycopg2 also made the following change: > `with connection` starts a transaction on autocommit transactions too Some of these connections are indeed long-lived, so we were retaining tons of WAL on the endpoints because we had a transaction pinned in the past. Link: https://www.psycopg.org/docs/news.html#what-s-new-in-psycopg-2-9 Link: https://github.com/psycopg/psycopg2/issues/941 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-12 15:48:19 -06:00
Erik Grinaker	2256a5727a	safekeeper: use `WAL_SEGMENT_SIZE` for empty timeline state (#9734 ) ## Problem `TimelinePersistentState::empty()`, used for tests and benchmarks, had a hardcoded 16 MB WAL segment size. This caused confusion when attempting to change the global segment size. ## Summary of changes Inherit from `WAL_SEGMENT_SIZE` in `TimelinePersistentState::empty()`.	2024-11-12 20:35:44 +00:00
Tristan Partin	3f80af8b1d	Add neon.logical_replication_max_logicalsnapdir_size This GUC will drop replication slots if the size of the pg_logical/snapshots directory (not including temp snapshot files) becomes larger than the specified size. Keeping the size of this directory smaller will help with basebackup size from the pageserver. Part-of: https://github.com/neondatabase/neon/issues/8619 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-12 13:13:28 -06:00
Tristan Partin	a61d81bbc7	Calculate compute_backpressure_throttling_seconds correctly The original value that we get is measured in microseconds. It comes from a calculation using Postgres' GetCurrentTimestamp(), whihc is implemented in terms of gettimeofday(2). Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-12 13:12:08 -06:00
Erik Grinaker	05381a48f0	utils: remove unnecessary fsync in `durable_rename()` (#9686 ) ## Problem WAL segment fsyncs significantly affect WAL ingestion throughput. `durable_rename()` is used when initializing every 16 MB segment, and issues 3 fsyncs of which 1 was unnecessary. ## Summary of changes Remove an fsync in `durable_rename` which is unnecessary with Linux and ext4 (which we currently use). This improves WAL ingestion throughput by up to 23% with large appends on my MacBook.	2024-11-12 18:57:31 +01:00
Alex Chi Z.	cef165818c	test(pageserver): add gc-compaction tests with delta will_init (#9724 ) I had an impression that gc-compaction didn't test the case where the first record of the key history is will_init because of there are some code path that will panic in this case. Luckily it got fixed in https://github.com/neondatabase/neon/pull/9026 so we can now implement such tests. Part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes * Randomly changed some images into will_init neon wal record * Split `test_simple_bottom_most_compaction_deltas` into two test cases, one of them has the bottom layer as delta layer with will_init flags, while the other is the original one with image layers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-12 10:37:31 -05:00
Erik Grinaker	6b19867410	safekeeper: don't flush control file on WAL ingest path (#9698 ) ## Problem The control file is flushed on the WAL ingest path when the commit LSN advances by one segment, to bound the amount of recovery work in case of a crash. This involves 3 additional fsyncs, which can have a significant impact on WAL ingest throughput. This is to some extent mitigated by `AppendResponse` not being emitted on segment bound flushes, since this will prevent commit LSN advancement, which will be addressed separately. ## Summary of changes Don't flush the control file on the WAL ingest path at all. Instead, leave that responsibility to the timeline manager, but ask it to flush eagerly if the control file lags the in-memory commit LSN by more than one segment. This should not cause more than `REFRESH_INTERVAL` (300 ms) additional latency before flushing the control file, which is negligible.	2024-11-12 15:17:03 +00:00
Tristan Partin	cc8029c4c8	Update pg_cron to 1.6.4 This comes with PG 17 support. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 20:10:53 -06:00
Tristan Partin	5be6b07cf1	Improve typing related to regress/test_logical_replication.py (#9725 ) Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 17:36:45 -06:00
Arpad Müller	b018bc7da8	Add a retain_lsn test (#9599 ) Add a test that ensures the `retain_lsn` functionality works. Right now, there is not a single test that is broken if offloaded or non-offloaded timelines don't get registered at their parents, preventing gc from discarding the ancestor_lsns of the children. This PR fills that gap. The test has four modes: * `offloaded`: offload the child timeline, run compaction on the parent timeline, unarchive the child timeline, then try reading from it. hopefully the data is still there. * `offloaded-corrupted`: offload the child timeline, corrupts the manifest in a way that the pageserver believes the timeline was flattened. This is the closest we can get to pretend the `retain_lsn` mechanism doesn't exist for offloaded timelines, so we can avoid adding endpoints to the pageserver that do this manually for tests. The test then checks that indeed data is corrupted and the endpoint can't be started. That way we know that the test is actually working, and actually tests the `retain_lsn` mechanism, instead of say the lsn lease mechanism, or one of the many other mechanisms that impede gc. * `archived`: the child timeline gets archived but doesn't get offloaded. this currently matches the `None` case but we might have refactors in the future that make archived timelines sufficiently different from non-archived ones. * `None`: the child timeline doesn't even get archived. this tests that normal timelines participate in `retain_lsn`. I've made them locally not participate in `retain_lsn` (via commenting out the respective `ancestor_children.push` statement in tenant.rs) and ran the testsuite, and not a single test failed. So this test is first of its kind. Part of #8088.	2024-11-11 22:29:21 +00:00
Tristan Partin	4b075db7ea	Add a postgres_exporter config file This exporter logs an ERROR if a file called `postgres_exporter.yml` is not located in its current working directory. We can silence it by adding an empty config file and pointing the exporter at it. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 14:49:37 -06:00
Fedor Dikarev	fde16f8614	use batch gh-workflow-stats-action with separate table (#9722 ) We found that exporting GH Workflow Runs in batch is more efficient due to - better utilisation of Github API - and gh runners usage is rounded to minutes, so even when ad-hoc export is done in 5-10 seconds, we billed for one minute usage So now we introduce batch exporting, with version v0.2.x of github workflow stats exporter. How it's expected to work now: - every 15 minutes we query for the workflow runs, created in last 2 hours - to avoid missing workflows that ran for more than 2 hours, every night (00:25) we will query workflows created in past 24 hours and export them as well - should we have query for even longer periods? - lets see how it works with current schedule - for longer periods like for days or weeks, it may require to adjust logic and concurrency of querying data, so lets for now use simpler version	2024-11-11 20:33:29 +00:00
Alex Chi Z.	5a138d08a3	feat(pageserver): support partial gc-compaction for delta layers (#9611 ) The final patch for partial compaction, part of https://github.com/neondatabase/neon/issues/9114, close https://github.com/neondatabase/neon/issues/8921 (note that we didn't implement parallel compaction or compaction scheduler for partial compaction -- currently this needs to be scheduled by using a Python script to split the keyspace, and in the future, automatically split based on the key partitioning when the pageserver wants to trigger a gc-compaction) ## Summary of changes * Update the layer selection algorithm to use the same selection as full compaction (everything intersect/below gc horizon) * Update the layer selection algorithm to also generate a list of delta layers that need to be rewritten * Add the logic to rewrite delta layers and add them back to the layer map * Update test case to do partial compaction on deltas --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 20:30:32 +00:00
Tristan Partin	2d9652c434	Clean up C.UTF-8 locale changes Removes some unnecessary initdb arguments, and fixes Neon for MacOS since it doesn't seem to ship a C.UTF-8 locale. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 13:53:12 -06:00
Alex Chi Z.	61ff18dbae	Merge pull request #9721 from neondatabase/skyzh/locale-changes cherry-pick Clean up C.UTF-8 locale changes	2024-11-11 14:29:57 -05:00
Tristan Partin	96d66a201d	Clean up C.UTF-8 locale changes Removes some unnecessary initdb arguments, and fixes Neon for MacOS since it doesn't seem to ship a C.UTF-8 locale. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-11 14:10:30 -05:00
Alexander Bayandin	e9dcfa2eb2	test_runner: skip more tests using decorator instead of pytest.skip (#9704 ) ## Problem Running `pytest.skip(...)` in a test body instead of marking the test with `@pytest.mark.skipif(...)` makes all fixtures to be initialised, which is not necessary if the test is going to be skipped anyway. Also, some tests are unnecessarily skipped (e.g. `test_layer_bloating` on Postgres 17, or `test_idle_reconnections` at all) or run (e.g. `test_parse_project_git_version_output_positive` more than on once configuration) according to comments. ## Summary of changes - Move `skip_on_postgres` / `xfail_on_postgres` / `run_only_on_default_postgres` decorators to `fixture.utils` - Add new `skip_in_debug_build` and `skip_on_ci` decorators - Replace `pytest.skip(...)` calls with decorators where possible	2024-11-11 18:07:01 +00:00
Peter Bendel	8db84d9964	new ingest benchmark (#9711 ) ## Problem We have no specific benchmark testing project migration of postgresql project with existing data into Neon. Typical steps of such a project migration are - schema creation in the neon project - initial COPY of relations - creation of indexes and constraints - vacuum analyze ## Summary of changes Add a periodic benchmark running 9 AM UTC every day. In each run: - copy a 200 GiB project that has realistic schema, data, tables, indexes and constraints from another project into - a new Neon project (7 CU fixed) - an existing tenant, (but new branch and new database) that already has 4 TiB of data - use pgcopydb tool to automate all steps and parallelize COPY and index creation - parse pgcopydb output and report performance metrics in Neon performance test database ## Logs This benchmark has been tested first manually and then as part of benchmarking.yml workflow, example run see https://github.com/neondatabase/neon/actions/runs/11757679870	2024-11-11 17:51:15 +00:00
Alexander Bayandin	1aab34715a	Remove checklist from the PR template (#9702 ) ## Problem Once we enable the merge queue for the `main` branch, it won't be possible to adjust the commit message right after pressing the "Squash and merge" button and the PR title + description will be used as is. To avoid extra noise in the commits in the `main` with the checklist leftovers, I propose removing the checklist from the PR template and keeping only the Problem / Summary of changes. ## Summary of changes - Remove the checklist from the PR template	2024-11-11 17:01:02 +00:00
Erik Grinaker	f63de5f527	safekeeper: add `initialize_segment` variant of `safekeeper_wal_storage_operation_seconds` (#9691 ) ## Problem We don't have a metric capturing the latency of segment initialization. This can be significant due to fsyncs. ## Summary of changes Add an `initialize_segment` variant of `safekeeper_wal_storage_operation_seconds`.	2024-11-11 17:55:50 +01:00
Alex Chi Z.	b24850bdb5	Merge pull request #9710 from neondatabase/rc/2024-11-11 Storage & Compute release 2024-11-11	2024-11-11 11:05:41 -05:00
Alex Chi Z.	54a1676680	rfc: update aux file rfc to reflect latest optimizations (#9681 ) Reflects https://github.com/neondatabase/neon/pull/9631 in the RFC. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 09:19:03 -05:00
Alex Chi Z.	04f91eea45	fix(pageserver): increase frozen layer warning threshold; ignore in tests (#9705 ) Perf benchmarks produce a lot of layers. ## Summary of changes Bumping the threshold and ignore the warning. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 09:15:15 -05:00
Alex Chi Z.	48c06d9f7b	fix(pageserver): increase frozen layer warning threshold; ignore in tests (#9705 ) Perf benchmarks produce a lot of layers. ## Summary of changes Bumping the threshold and ignore the warning. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-11 09:13:46 -05:00
Alexander Bayandin	f510647c7e	CI: retry `actions/github-script` for 5XX errors (#9703 ) ## Problem GitHub API can return error 500, and it fails jobs that use `actions/github-script` action. ## Summary of changes - Add `retry: 500` to all `actions/github-script` usage	2024-11-11 12:42:32 +00:00
Vlad Lazar	ceaa80ffeb	storcon: add peer token for peer to peer communication (#9695 ) ## Problem We wish to stop using admin tokens in the infra repo, but step down requests use the admin token. ## Summary of Changes Introduce a new "ControllerPeer" scope and use it for step-down requests.	2024-11-11 09:58:41 +00:00
Alexander Bayandin	2fcac0e66b	CI(pre-merge-checks): add required checks (#9700 ) ## Problem The Merge queue doesn't work because it expects certain jobs, which we don't have in the `pre-merge-checks` workflow. But it turns out we can just create jobs/checks with the same names in any workflow that we run. ## Summary of changes - Add `conclusion` jobs - Create `neon-cloud-e2e` status check - Add a bunch of `if`s to handle cases with no relevant changes found and prepare the workflow to run rust checks in the future - List the workflow in `report-workflow-stats` to collect stats about it	2024-11-09 01:02:54 +00:00
Tristan Partin	ecde8d7632	Improve type safety according to pyright Pyright found many issues that mypy doesn't seem to want to catch or mypy isn't configured to catch. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-08 14:43:15 -06:00
Alex Chi Z.	af8238ae52	fix(pageserver): drain upload queue before offloading timeline (#9682 ) It is possible at the point we shutdown the timeline, there are still layer files we did not upload. ## Summary of changes * If the queue is not empty, avoid offloading. * Shutdown the timeline gracefully using the flush mode to ensure all local files are uploaded before deleting the timeline directory. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 14:28:55 -05:00
Erik Grinaker	ab47804d00	safekeeper: remove unused `WriteGuardSharedState::skip_update` (#9699 )	2024-11-08 19:25:31 +00:00
Alex Chi Z.	ecca62a45d	feat(pageserver): more log lines around frozen layers (#9697 ) We saw pageserver OOMs https://github.com/neondatabase/cloud/issues/19715 for tenants doing large writes. Add log lines around in-memory layers to hopefully collect some info during my on-call shift next week. ## Summary of changes * Estimate in-memory size of an in-mem layer. * Print frozen layer number if there are too many layers accumulated in memory. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 18:44:00 +00:00
Tristan Partin	34a4eb6f2a	Switch compute-related locales to C.UTF-8 by default Right now, our environments create databases with the C locale, which is really unfortunate for users who have data stored in other languages that they want to analyze. For instance, show_trgm on Hebrew text currently doesn't work in staging or production. I don't envision this being the final solution. I think this is just a way to set a known value so the pageserver doesn't use its parent environment. The final solution to me is exposing initdb parameters to users in the console. Then they could use a different locale or encoding if they so chose. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-08 12:19:18 -06:00
Alexander Bayandin	b6bc954c5d	CI: move check codestyle python to reusable workflow and run on a merge_group (#9683 ) ## Problem To prevent breaking main after Python 3.11 PR get merged we need to enable merge queue and run `check-codestyle-python` job on it ## Summary of changes - Move `check-codestyle-python` to a reusable workflow - Run this workflow on `merge_group` event	2024-11-08 17:32:56 +00:00
Vlad Lazar	30680d1f32	tests: use tigther storcon scopes (#9696 ) ## Problem https://github.com/neondatabase/neon/pull/9596 did not update tests because that would've broken the compat tests. ## Summary of Changes Use infra scope where possible.	2024-11-08 17:00:31 +00:00
Alex Chi Z.	f561cbe1c7	fix(pageserver): drain upload queue before detaching ancestor (#9651 ) In INC-317 https://neondb.slack.com/archives/C033RQ5SPDH/p1730815677932209, we saw an interesting series of operations that would remove valid layer files existing in the layer map. * Timeline A starts compaction and generates an image layer Z but not uploading it yet. * Timeline B/C starts ancestor detaching (which should not affect timeline A) * The tenant gets restarted as part of the ancestor detaching process, without increasing the generation number. * Timeline A reloads, discovering the layer Z is a future layer, and schedules a deletion into the deletion queue. This means that the file will be deleted any time in the future. * Timeline A starts compaction and generates layer Z again, adding it to the layer map. Note that because we don't bump generation number during ancestor detach, it has the same filename + generation number as the original Z. * Timeline A deletes layer Z from s3 + disk, and now we have a dangling reference in the layer map, blocking all compaction/logical_size_calculation process. ## Summary of changes * We wait until all layers to be uploaded before shutting down the tenants in `Flush` mode. * Ancestor detach restarts now use this mode. * Ancestor detach also waits for remote queue completion before starting the detaching process. * The patch ensures that we don't have any future image layer (or something similar) after restart, but not fixing the underlying problem around generation numbers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-08 10:35:27 -05:00
Tristan Partin	3525d2e381	Update TimescaleDB to 2.17.1 for PG 17 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-08 09:15:38 -06:00
Konstantin Knizhnik	17c002b660	Do not copy logical replicaiton slots to replica (#9458 ) ## Problem Replication slots are now persisted using AUX files mechanism and included in basebackup when replica is launched. This slots are not somehow used at replica but hold WAL, which may cause local disk space exhaustion. ## Summary of changes Add `--replica` parameter to basebackup request and do not include replication slot state files in basebackup for replica. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-08 14:54:58 +02:00
John Spray	aa9112efce	pageserver: add `no_sync` for use in regression tests (1/2) (#9677 ) ## Problem In test environments, the `syncfs` that the pageserver does on startup can take a long time, as other tests running concurrently might have many gigabytes of dirty pages. ## Summary of changes - Add a `no_sync` option to the pageserver's config. - Skip syncfs on startup if this is set - A subsequent PR (https://github.com/neondatabase/neon/pull/9678) will enable this by default in tests. We need to wait until after the next release to avoid breaking compat tests, which would fail if we set no_sync & use an old pageserver binary. Q: Why is this a different mechanism than safekeeper, which as a --no-sync CLI? A: Because the way we manage pageservers in neon_local depends on the pageserver.toml containing the full configuration, whereas safekeepers have a config file which is neon-local-specific and can drive a CLI flag. Q: Why is the option no_sync rather than sync? A: For boolean configs with a dangerous value, it's preferable to make "false" the safe option, so that any downstream future config tooling that might have a "booleans are false by default" behavior (e.g. golang structs) is safe by default. Q: Why only skip the syncfs, and not all fsyncs? A: Skipping all fsyncs would require more code changes, and the most acute problem isn't fsyncs themselves (these just slow down a running test), it's the syncfs (which makes a pageserver startup slow as a result of _other_ tests)	2024-11-08 10:16:04 +00:00
JC Grünhage	027889b06c	ci: use set-docker-config-dir from dev-actions (#9638 ) set-docker-config-dir was replicated over multiple repositories. The replica of this action was removed from this repository and it's using the version from github.com/neondatabase/dev-actions instead	2024-11-08 10:44:59 +01:00
Heikki Linnakangas	79929bb1b6	Disable `rust_2024_compatibility` lint option (#9615 ) Compiling with nightly rust compiler, I'm getting a lot of errors like this: error: `if let` assigns a shorter lifetime since Edition 2024 --> proxy/src/auth/backend/jwt.rs:226:16 \| 226 \| if let Some(permit) = self.try_acquire_permit() { \| ^^^^^^^^^^^^^^^^^^^------------------------- \| \| \| this value has a significant drop implementation which may observe a major change in drop order and requires your discretion \| = warning: this changes meaning in Rust 2024 = note: for more information, see issue #124085 <https://github.com/rust-lang/rust/issues/124085> help: the value is now dropped here in Edition 2024 --> proxy/src/auth/backend/jwt.rs:241:13 \| 241 \| } else { \| ^ note: the lint level is defined here --> proxy/src/lib.rs:8:5 \| 8 \| rust_2024_compatibility \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: `#[deny(if_let_rescope)]` implied by `#[deny(rust_2024_compatibility)]` and this: error: these values and local bindings have significant drop implementation that will have a different drop order from that of Edition 2021 --> proxy/src/auth/backend/jwt.rs:376:18 \| 369 \| let client = Client::builder() \| ------ these values have significant drop implementation and will observe changes in drop order under Edition 2024 ... 376 \| map: DashMap::default(), \| ^^^^^^^^^^^^^^^^^^ \| = warning: this changes meaning in Rust 2024 = note: for more information, see issue #123739 <https://github.com/rust-lang/rust/issues/123739> = note: `#[deny(tail_expr_drop_order)]` implied by `#[deny(rust_2024_compatibility)]` They are caused by the `rust_2024_compatibility` lint option. When we actually switch to the 2024 edition, it makes sense to go through all these and check that the drop order changes don't break anything, but in the meanwhile, there's no easy way to avoid these errors. Disable it, to allow compiling with nightly again. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-11-08 08:35:03 +00:00
Peter Bendel	9132d80aa3	add pgcopydb tool to build tools image (#9658 ) ## Problem build-tools image does not provide superuser, so additional packages can not be installed during GitHub benchmarking workflows but need to be added to the image ## Summary of changes install pgcopydb version 0.17-1 or higher into build-tools bookworm image ```bash docker run -it neondatabase/build-tools:<tag>-bookworm-arm64 /bin/bash ... nonroot@c23c6f4901ce:~$ LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version; 13:58:19.768 8 INFO Running pgcopydb version 0.17 from "/pgcopydb/bin/pgcopydb" pgcopydb version 0.17 compiled with PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit compatible with Postgres 11, 12, 13, 14, 15, and 16 ``` Example usage of that image in a workflow https://github.com/neondatabase/neon/actions/runs/11725718371/job/32662681172#step:7:14	2024-11-07 19:00:25 +01:00
Conrad Ludgate	82e3f0ecba	[proxy/authorize]: improve JWKS reliability (#9676 ) While setting up some tests, I noticed that we didn't support keycloak. They make use of encryption JWKs as well as signature ones. Our current jwks crate does not support parsing encryption keys which caused the entire jwk set to fail to parse. Switching to lazy parsing fixes this. Also while setting up tests, I couldn't use localhost jwks server as we require HTTPS and we were using webpki so it was impossible to add a custom CA. Enabling native roots addresses this possibility. I saw some of our current e2e tests against our custom JWKS in s3 were taking a while to fetch. I've added a timeout + retries to address this.	2024-11-07 16:24:38 +00:00
Arpad Müller	75aa19aa2d	Don't attach is_archived to debug output (#9679 ) We are in branches where we know its value already.	2024-11-07 16:13:50 +00:00
Alex Chi Z.	a8d9939ea9	fix(pageserver): reduce aux compaction threshold (#9647 ) ref https://github.com/neondatabase/neon/issues/9441 The metrics from LR publisher testing project: ~300KB aux key deltas per 256MB files. Therefore, I think we can do compaction more aggressively as these deltas are small and compaction can reduce layer download latency. We also have a read path perf fix https://github.com/neondatabase/neon/pull/9631 but I'd still combine the read path fix with the reduce of the compaction threshold. ## Summary of changes * reduce metadata compaction threshold * use num of L1 delta layers as an indicator for metadata compaction * dump more logs Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-07 10:38:15 -05:00
Erik Grinaker	f18aa04b90	safekeeper: use `set_len()` to zero out segments (#9665 ) ## Problem When we create a new segment, we zero it out in order to avoid changing the length and fsyncing metadata on every write. However, we zeroed it out by writing 8 KB zero-pages, and Tokio file writes have non-trivial overhead. ## Summary of changes Zero out the segment using [`File::set_len()`](https://docs.rs/tokio/latest/i686-unknown-linux-gnu/tokio/fs/struct.File.html#method.set_len) instead. This will typically (depending on the filesystem) just write a sparse file and omit the 16 MB of data entirely. This improves WAL append throughput for large messages by over 400% with fsync disabled, and 100% with fsync enabled.	2024-11-07 15:09:57 +00:00
Erik Grinaker	01265b7bc6	safekeeper: add basic WAL ingestion benchmarks (#9531 ) ## Problem We don't have any benchmarks for Safekeeper WAL ingestion. ## Summary of changes Add some basic benchmarks for WAL ingestion, specifically for `SafeKeeper::process_msg()` (single append) and `WalAcceptor` (pipelined batch ingestion). Also add some baseline file write benchmarks.	2024-11-07 13:24:03 +00:00
Arseny Sher	f54f0e8e2d	Fix direct reading from WAL buffers. (#9639 ) Fix direct reading from WAL buffers. Pointer wasn't advanced which resulted in sending corrupted WAL if part of read used WAL buffers and part read from the file. Also move it to neon_walreader so that e.g. replication could also make use of it. ref https://github.com/neondatabase/cloud/issues/19567	2024-11-07 11:29:52 +00:00
Erik Grinaker	d6aa26a533	postgres_ffi: make `WalGenerator` generic over record generator (#9614 ) ## Problem Benchmarks need more control over the WAL generated by `WalGenerator`. In particular, they need to vary the size of logical messages. ## Summary of changes * Make `WalGenerator` generic over `RecordGenerator`, which constructs WAL records. * Add `LogicalMessageGenerator` which emits logical messages, with a configurable payload. * Minor tweaks and code reorganization. There are no changes to the core logic or emitted WAL.	2024-11-07 10:38:39 +00:00
Cheng Chen	e1d0b73824	chore(compute): Bump pg_mooncake to the latest version	2024-11-06 22:41:18 -06:00
Arpad Müller	011c0a175f	Support copying layers in detach_ancestor from before shard splits (#9669 ) We need to use the shard associated with the layer file, not the shard associated with our current tenant shard ID. Due to shard splits, the shard IDs can refer to older files. close https://github.com/neondatabase/neon/issues/9667	2024-11-07 01:53:58 +01:00
Alex Chi Z.	2a95a51a0d	refactor(pageserver): better pageservice command parsing (#9597 ) close https://github.com/neondatabase/neon/issues/9460 ## Summary of changes A full rewrite of pagestream cmdline parsing to make it more robust and readable. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-11-06 20:41:01 +00:00
Yuchen Liang	11fc1a4c12	fix(test): use layer map dump in `test_readonly_node_gc` to validate layers protected by leases (#9551 ) Fixes #9518. ## Problem After removing the assertion `layers_removed == 0` in #9506, we could miss breakage if we solely rely on the successful execution of the `SELECT` query to check if lease is properly protecting layers. Details listed in #9518. Also, in integration tests, we sometimes run into the race condition where getpage request comes before the lease get renewed (item 2 of #8817), even if compute_ctl sends a lease renewal as soon as it sees a `/configure` API calls that updates the `pageserver_connstr`. In this case, we would observe a getpage request error stating that we `tried to request a page version that was garbage collected` (as we seen in [Allure Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8613/11550393107/index.html#suites/3ccffb1d100105b98aed3dc19b717917/d1a1ba47bc180493)). ## Summary of changes - Use layer map dump to verify if the lease protects what it claimed: Record all historical layers that has `start_lsn <= lease_lsn` before and after running timeline gc. This is the same check as `ad79f42460/pageserver/src/tenant/timeline.rs (L5025-L5027)` The set recorded after GC should contain every layer in the set recorded before GC. - Wait until log contains another successful lease request before running the `SELECT` query after GC. We argued in #8817 that the bad request can only exist within a short period after migration/restart, and our test shows that as long as a lease renewal is done before the first getpage request sent after reconfiguration, we will not have bad request. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-11-06 20:18:21 +00:00
Tristan Partin	93123f2623	Rename compute_backpressure_throttling_ms to compute_backpressure_throttling_seconds This is in line with the Prometheus guidance[0]. We also haven't started using this metric, so renaming is essentially free. Link: https://prometheus.io/docs/practices/naming/ [0] Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-06 13:28:23 -06:00
Alex Chi Z.	1d3559d4bc	feat(pageserver): add fast path for sparse keyspace read (#9631 ) In https://github.com/neondatabase/neon/issues/9441, the tenant has a lot of aux keys spread in multiple aux files. The perf tool shows that a significant amount of time is spent on remove_overlapping_keys. For sparse keyspaces, we don't need to report missing key errors anyways, and it's very likely that we will need to read all layers intersecting with the key range. Therefore, this patch adds a new fast path for sparse keyspace reads that we do not track `unmapped_keyspace` in a fine-grained way. We only modify it when we find an image layer. In debug mode, it was ~5min to read the aux files for a dump of the tenant, and now it's only 8s, that's a 60x speedup. ## Summary of changes * Do not add sparse keys into `keys_done` so that remove_overlapping does nothing. * Allow `ValueReconstructSituation::Complete` to be updated again in `ValuesReconstructState::update_key` for sparse keyspaces. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-11-06 18:17:02 +00:00
Conrad Ludgate	73bdc9a2d0	[proxy]: minor changes to endpoint-cache handling (#9666 ) I think I meant to make these changes over 6 months ago. alas, better late than never. 1. should_reject doesn't eagerly intern the endpoint string 2. Rate limiter uses a std Mutex instead of a tokio Mutex. 3. Recently I introduced a `-local-proxy` endpoint suffix. I forgot to add this to normalize. 4. Random but a small cleanup making the ControlPlaneEvent deser directly to the interned strings.	2024-11-06 17:40:40 +00:00
John Spray	d182ff294c	storcon: respect tenant scheduling policy in drain/fill (#9657 ) ## Problem Pinning a tenant by setting Pause scheduling policy doesn't work because drain/fill code moves the tenant around during deploys. Closes: https://github.com/neondatabase/neon/issues/9612 ## Summary of changes - In drain, only move a tenant if it is in Active or Essential mode - In fill, only move a tenant if it is in Active mode. The asymmetry is a bit annoying, but it faithfully respects the purposes of the modes: Essential is meant to endeavor to keep the tenant available, which means it needs to be drained but doesn't need to be migrated during fills.	2024-11-06 15:14:43 +00:00
Vlad Lazar	4dfa0c221b	pageserver: ingest pre-serialized batches of values (#9579 ) ## Problem https://github.com/neondatabase/neon/pull/9524 split the decoding and interpretation step from ingestion. The output of the first phase is a `wal_decoder::models::InterpretedWalRecord`. Before this patch set that struct contained a list of `Value` instances. We wish to lift the decoding and interpretation step to the safekeeper, but it would be nice if the safekeeper gave us a batch containing the raw data instead of actual values. ## Summary of changes Main goal here is to make `InterpretedWalRecord` hold a raw buffer which contains pre-serialized Values. For this we do: 1. Add a `SerializedValueBatch` type. This is `inmemory_layer::SerializedBatch` with some extra functionality for extension, observing values for shard 0 and tests. 2. Replace `inmemory_layer::SerializedBatch` with `SerializedValueBatch` 3. Make `DatadirModification` maintain a `SerializedValueBatch`. ### `DatadirModification` changes `DatadirModification` now maintains a `SerializedValueBatch` and extends it as new WAL records come in (to avoid flushing to disk on every record). In turn, this cascaded into a number of modifications to `DatadirModification`: 1. Replace `pending_data_pages` and `pending_zero_data_pages` with `pending_data_batch`. 2. Removal of `pending_zero_data_pages` and its cousin `on_wal_record_end` 3. Rename `pending_bytes` to `pending_metadata_bytes` since this is what it tracks now. 4. Adapting of various utility methods like `len`, `approx_pending_bytes` and `has_dirty_data_pages`. Removal of `pending_zero_data_pages` and the optimisation associated with it ((1) and (2)) deserves more detail. Previously all zero data pages went through `pending_zero_data_pages`. We wrote zero data pages when filling gaps caused by relation extension (case A) and when handling special wal records (case B). If it happened that the same WAL record contained a non zero write for an entry in `pending_zero_data_pages` we skipped the zero write. Case A: We handle this differently now. When ingesting the `SerialiezdValueBatch` associated with one PG WAL record, we identify the gaps and fill the them in one go. Essentially, we move from a per key process (gaps were filled after each new key), and replace it with a per record process. Hence, the optimisation is not required anymore. Case B: When the handling of a special record needs to zero out a key, it just adds that to the current batch. I inspected the code, and I don't think the optimisation kicked in here.	2024-11-06 14:10:32 +00:00
Folke Behrens	bdd492b1d8	proxy: Replace "web(auth)" with "console redirect" everywhere (#9655 )	2024-11-06 11:03:38 +00:00
Folke Behrens	5d8284c7fe	proxy: Read cplane JWT with clap arg (#9654 )	2024-11-06 10:27:55 +00:00
Folke Behrens	ebc43efebc	proxy: Refactor cplane types (#9643 ) The overall idea of the PR is to rename a few types to make their purpose more clear, reduce abstraction where not needed, and move types to to more better suited modules.	2024-11-05 23:03:53 +01:00
Folke Behrens	754d2950a3	proxy: Revert ControlPlaneEvent back to struct (#9649 ) Due to neondatabase/cloud#19815 we need to be more tolerant when reading events.	2024-11-05 21:32:33 +00:00
Conrad Ludgate	fcde40d600	[proxy] use the proxy protocol v2 command to silence some logs (#9620 ) The PROXY Protocol V2 offers a "command" concept. It can be of two different values. "Local" and "Proxy". The spec suggests that "Local" be used for health-checks. We can thus use this to silence logging for such health checks such as those from NLB. This additionally refactors the flow to be a bit more type-safe, self documenting and using zerocopy deser.	2024-11-05 17:23:00 +00:00
Erik Grinaker	babfeb70ba	safekeeper: don't allocate send buffers on stack (#9644 ) ## Problem While experimenting with `MAX_SEND_SIZE` for benchmarking, I saw stack overflows when increasing it to 1 MB. Turns out a few buffers of this size are stack-allocated rather than heap-allocated. Even at the default 128 KB size, that's a bit large to allocate on the stack. ## Summary of changes Heap-allocate buffers of size `MAX_SEND_SIZE`.	2024-11-05 17:05:30 +00:00
Ivan Efremov	2f1a56c8f9	proxy: Unify local and remote conn pool client structures (#9604 ) Unify client, EndpointConnPool and DbUserConnPool for remote and local conn. - Use new ClientDataEnum for additional client data. - Add ClientInnerCommon client structure. - Remove Client and EndpointConnPool code from local_conn_pool.rs	2024-11-05 17:33:41 +02:00
John Spray	e30f5fb922	scrubber: remove AWS region assumption, tolerate negative max_project_size (#9636 ) ## Problem First issues noticed when trying to run scrubber find-garbage on Azure: - Azure staging contains projects with -1 set for max_project_size: apparently the control plane treats this as a signed field. - Scrubber code assumed that listing projects should filter to aws-$REGION. This is no longer needed (per comment in the code) because we know hit region-local APIs. This PR doesn't make it work all the way (`init_remote` still assumes S3), but these are necessary precursors. ## Summary of changes - Change max-project_size from unsigned to signed - Remove region filtering in favor of simply using the right region's API (which we already do)	2024-11-05 13:32:50 +00:00
Arpad Müller	70ae8c16da	Construct models::TenantConfig only once (#9630 ) Since 5f83c9290b482dc90006c400dfc68e85a17af785/#1504 we've had duplication in construction of models::TenantConfig, where both constructs contained the same code. This PR removes one of the two locations to avoid the duplication.	2024-11-05 13:02:49 +00:00
Erik Grinaker	8840f3858c	pageserver: return 503 during tenant shutdown (#9635 ) ## Problem Tenant operations may return `409 Conflict` if the tenant is shutting down. This status code is not retried by the control plane, causing user-facing errors during pageserver restarts. Operations should instead return `503 Service Unavailable`, which may be retried for idempotent operations. ## Summary of changes Convert `GetActiveTenantError::WillNotBecomeActive(TenantState::Stopping)` to `ApiError::ShuttingDown` rather than `ApiError::Conflict`. This error is returned by `Tenant::wait_to_become_active` in most (all?) tenant/timeline-related HTTP routes.	2024-11-05 13:16:55 +01:00
Tristan Partin	1e16221f82	Update psycopg2 to latest version for complete PG 17 support Update the types to match. Changes the cursor import to match the C bindings[0]. Link: https://github.com/python/typeshed/issues/12578 [0] Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-04 18:21:59 -06:00
Tristan Partin	34812a6aab	Improve some typing related to performance testing for LR Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-04 15:52:01 -06:00
Arpad Müller	ee68bbf6f5	Add tenant config option to allow timeline_offloading (#9598 ) Allow us to enable timeline offloading for single tenants without having to enable it for the entire pageserver. Part of #8088.	2024-11-04 21:01:18 +01:00
Folke Behrens	1085fe57d3	proxy: Rewrite ControlPlaneEvent as enum (#9627 )	2024-11-04 20:19:26 +01:00
Folke Behrens	59879985b4	proxy: Wrap JWT errors in separate AuthError variant (#9625 ) * Also rename `AuthFailed` variant to `PasswordFailed`. * Before this all JWT errors end up in `AuthError::AuthFailed()`, expects a username and also causes cache invalidation.	2024-11-04 19:56:40 +01:00
Conrad Ludgate	81d1bb1941	quieten aws_config logs (#9626 ) logs during aws authentication are soooo noisy in staging 🙃	2024-11-04 17:28:10 +00:00
Arpad Müller	8e4161eb94	Merge pull request #9617 from neondatabase/rc/2024-11-04 Storage & Compute release 2024-11-04	2024-11-04 17:50:29 +01:00
Christian Schwarz	06113e94e6	fix(test_regress): always use storcon virtual pageserver API to set tenant config (#9622 ) Problem ------- Tests that directly call the Pageserver Management API to set tenant config are flaky if the Pageserver is managed by Storcon because Storcon is the source of truth and may (theoretically) reconcile a tenant at any time. Solution -------- Switch all users of `set_tenant_config`/`patch_tenant_config_client_side` to use the `env.storage_controller.pageserver_api()` Future Work ----------- Prevent regressions from creeping in. And generally clean up up tenant configuration. Maybe we can avoid the Pageserver having a default tenant config at all and put the default into Storcon instead? * => https://github.com/neondatabase/neon/issues/9621 Refs ---- fixes https://github.com/neondatabase/neon/issues/9522	2024-11-04 17:42:08 +01:00
Erik Grinaker	0d5a512825	safekeeper: add walreceiver metrics (#9450 ) ## Problem We don't have any observability for Safekeeper WAL receiver queues. ## Summary of changes Adds a few WAL receiver metrics: * `safekeeper_wal_receivers`: gauge of currently connected WAL receivers. * `safekeeper_wal_receiver_queue_depth`: histogram of queue depths per receiver, sampled every 5 seconds. * `safekeeper_wal_receiver_queue_depth_total`: gauge of total queued messages across all receivers. * `safekeeper_wal_receiver_queue_size_total`: gauge of total queued message sizes across all receivers. There are already metrics for ingested WAL volume: `written_wal_bytes` counter per timeline, and `safekeeper_write_wal_bytes` per-request histogram.	2024-11-04 15:22:46 +00:00
Conrad Ludgate	8ad1dbce72	[proxy]: parse proxy protocol TLVs with aws/azure support (#9610 ) AWS/azure private link shares extra information in the "TLV" values of the proxy protocol v2 header. This code doesn't action on it, but it parses it as appropriate.	2024-11-04 14:04:56 +00:00
Conrad Ludgate	3dcdbcc34d	remove aws-lc-rs dep and fix storage_broker tls (#9613 ) It seems the ecosystem is not so keen on moving to aws-lc-rs as it's build setup is more complicated than ring (requiring cmake). Eventually I expect the ecosystem should pivot to https://github.com/ctz/graviola/tree/main/rustls-graviola as it stabilises (it has a very simply build step and license), but for now let's try not have a headache of juggling two crypto libs. I also noticed that tonic will just fail with tls without a default provider, so I added some defensive code for that.	2024-11-04 13:29:13 +00:00
Matthias van de Meent	d5de63c6b8	Fix a time zone issue in a PG17 test case (#9618 ) The commit was cherry-picked and thus shouldn't cause issues once we merge the release tag for PostgreSQL 17.1	2024-11-04 12:10:32 +00:00
John Spray	4534f5cdc6	pageserver: make local timeline deletion infallible (#9594 ) ## Problem In https://github.com/neondatabase/neon/pull/9589, timeline offload code is modified to return an explicit error type rather than propagating anyhow::Error. One of the 'Other' cases there is I/O errors from local timeline deletion, which shouldn't need to exist, because our policy is not to try and continue running if the local disk gives us errors. ## Summary of changes - Make `delete_local_timeline_directory` and use `.fatal_err(` on I/O errors --------- Co-authored-by: Erik Grinaker <erik@neon.tech>	2024-11-04 09:11:52 +00:00
Erik Grinaker	0058eb09df	test_runner/performance: add sharded ingest benchmark (#9591 ) Adds a Python benchmark for sharded ingestion. This ingests 7 GB of WAL (100M rows) into a Safekeeper and fans out to 10 shards running on 10 different pageservers. The ingest volume and duration is recorded.	2024-11-02 16:42:10 +00:00
Konstantin Knizhnik	8ac523d2ee	Do not assign page LSN to new (uninitialized) page in ClearVisibilityMapFlags redo handler (#9287 ) ## Problem https://neondb.slack.com/archives/C04DGM6SMTM/p1727872045252899 See https://github.com/neondatabase/neon/issues/9240 ## Summary of changes Add `!page_is_new` check before assigning page lsn. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-01 20:31:29 +02:00
John Spray	3c16bd6e0b	storcon: skip non-active projects in chaos injection (#9606 ) ## Problem We may sometimes use scheduling modes like `Pause` to pin a tenant in its current location for operational reasons. It is undesirable for the chaos task to make any changes to such projects. ## Summary of changes - Add a check for scheduling mode - Add a log line when we do choose to do a chaos action for a tenant: this will help us understand which operations originate from the chaos task.	2024-11-01 16:47:20 +00:00
Erik Grinaker	123816e99a	safekeeper: log slow WalAcceptor sends (#9564 ) ## Problem We don't have any observability into full WalAcceptor queues per timeline. ## Summary of changes Logs a message when a WalAcceptor send has blocked for 5 seconds, and another message when the send completes. This implies that the log frequency is at most once every 5 seconds per timeline, so we don't need further throttling.	2024-11-01 13:47:03 +01:00
Peter Bendel	8b3bcf71ee	revert higher token expiration (#9605 ) ## Problem The IAM role associated with our github action runner supports a max token expiration which is lower than the value we tried. ## Summary of changes Since we believe to have understood the performance regression we (by ensuring availability zone affinity of compute and pageserver) the job should again run in lower than 5 hours and we revert this change instead of increasing the max session token expiration in the IAM role which would reduce our security.	2024-11-01 12:46:02 +01:00
Erik Grinaker	4c2c8d6708	test_runner: fix `tenant_get_shards` with one pageserver (#9603 ) ## Problem `tenant_get_shards()` does not work with a sharded tenant on 1 pageserver, as it assumes an unsharded tenant in this case. This special case appears to have been added to handle e.g. `test_emergency_mode`, where the storage controller is stopped. This breaks e.g. the sharded ingest benchmark in #9591 when run with a single shard. ## Summary of changes Correctly look up shards even with a single pageserver, but add a special case that assumes an unsharded tenant if the storage controller is stopped and the caller provides an explicit pageserver, in order to accomodate `test_emergency_mode`.	2024-11-01 11:25:04 +00:00
Conrad Ludgate	2d1366c8ee	fix pre-commit hook with python stubs (#9602 ) fix #9601	2024-11-01 11:22:38 +00:00
Vlad Lazar	e589c2e5ec	storage_controller: allow deployment infra to use infra token (#9596 ) ## Problem We wish for the deployment orchestrator to use infra scoped tokens, but storcon endpoints it's using require admin scoped tokens. ## Summary of Changes Switch over all endpoints that are used by the deployment orchestrator to use an infra scoped token. This causes no breakage during mixed version scenarios because admin scoped tokens allow access to all endpoints. The deployment orchestrator can cut over to the infra token after this commit touches down in prod. Once this commit is released we should also update the tests code to use infra scoped tokens where appropriate. Currently it would fail on the [compat tests](`9761b6a64e/test_runner/regress/test_storage_controller.py (L69-L71)`).	2024-10-31 18:29:16 +00:00
Conrad Ludgate	9761b6a64e	update pg_session_jwt to use pgrx 0.12 for pg17 (#9595 ) Updates the extension to use pgrx 0.12. No changes to the extensions have been made, the only difference is the pgrx version.	2024-10-31 15:50:41 +00:00
Conrad Ludgate	897cffb9d8	auth_broker: fix local_proxy conn count (#9593 ) our current metrics for http pool opened connections is always negative :D oops	2024-10-31 14:57:55 +00:00
John Spray	552088ac16	pageserver: fix spurious error logs in timeline lifecycle (#9589 ) ## Problem The final part of https://github.com/neondatabase/neon/issues/9543 will be a chaos test that creates/deletes/archives/offloads timelines while restarting pageservers and migrating tenants. Developing that test showed up a few places where we log errors during normal shutdown. ## Summary of changes - UninitializedTimeline's drop should log at info severity: this is a normal code path when some part of timeline creation encounters a cancellation `?` path. - When offloading and finding a `RemoteTimelineClient` in a non-initialized state, this is not an error and should not be logged as such. - The `offload_timeline` function returned an anyhow error, so callers couldn't gracefully pick out cancellation errors from real errors: update this to have a structured error type and use it throughout.	2024-10-31 14:44:59 +00:00
Peter Bendel	51fda118f6	increase lifetime of AWS session token to 12 hours (#9590 ) ## Problem clickbench regression causes clickbench to run >9 hours and the AWS session token is expired before the run completes ## Summary of changes extend lifetime of session token for this job to 12 hours	2024-10-31 13:34:50 +00:00
Anastasia Lubennikova	e96398a552	Add support of extensions for v17 (part 4) (#9568 ) - pg_jsonschema 0.3.3 - pg_graphql 1.5.9 - rum 65e0a752 - pg_tiktoken a5bc447e update support of extensions for v14-v16: - pg_jsonschema 0.3.1 -> 0.3.3 - pg_graphql 1.5.7 -> 1.5.9 - rum 6ab37053 -> 65e0a752 - pg_tiktoken e64e55aa -> a5bc447e	2024-10-31 15:05:24 +02:00
Erik Grinaker	f9d8256d55	pageserver: don't return option from `DeletionQueue::new` (#9588 ) `DeletionQueue::new()` always returns deletion workers, so the returned `Option` is redundant.	2024-10-31 10:51:58 +00:00
Vlad Lazar	411c3aa0d6	pageserver: lift decoding and interpreting of wal into wal_decoder (#9524 ) ## Problem Decoding and ingestion are still coupled in `pageserver::WalIngest`. ## Summary of changes A new type is added to `wal_decoder::models`, InterpretedWalRecord. This type contains everything that the pageserver requires in order to ingest a WAL record. The highlights are the `metadata_record` which is an optional special record type to be handled and `blocks` which stores key, value pairs to be persisted to storage. This type is produced by `wal_decoder::models::InterpretedWalRecord::from_bytes` from a raw PG wal record. The rest of this commit separates decoding and interpretation of the PG WAL record from its application in `WalIngest::ingest_record`. Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-31 10:47:43 +00:00
Arpad Müller	65b69392ea	Disallow offloaded children during timeline deletion (#9582 ) If we delete a timeline that has childen, those children will have their data corrupted. Therefore, extend the already existing safety check to offloaded timelines as well. Part of #8088	2024-10-30 19:37:09 +01:00
Alex Chi Z.	8d70f88b37	refactor(pageserver): use JSON field encoding for consumption metrics cache (#9470 ) In https://github.com/neondatabase/neon/issues/9032, I would like to eventually add a `generation` field to the consumption metrics cache. The current encoding is not backward compatible and it is hard to add another field into the cache. Therefore, this patch refactors the format to store "field -> value", and it's easier to maintain backward/forward compatibility with this new format. ## Summary of changes * Add `NewRawMetric` as the new format. * Add upgrade path. When opening the disk cache, the codepath first inspects the `version` field, and decide how to decode. * Refactor metrics generation code and tests. * Add tests on upgrade / compatibility with the old format. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-30 18:13:11 +00:00
Arpad Müller	bcfe013094	Don't keep around the timeline's remote_client (#9583 ) Constructing a remote client is no big deal. Yes, it means an extra download from S3 but it's not that expensive. This simplifies code paths and scenarios to test. This unifies timelines that have been recently offloaded with timelines that have been offloaded in an earlier invocation of the process. Part of #8088	2024-10-30 18:44:29 +01:00
Arpad Müller	d0a02f3649	Disallow archived timelines to be detached or reparented (#9578 ) Disallow a request for timeline ancestor detach if either the to be detached timeline, or any of the to be reparented timelines are offloaded or archived. In theory we could support timelines that are archived but not offloaded, but archived timelines are at the risk of being offloaded, so we treat them like offloaded timelines. As for offloaded timelines, any code to "support" them would amount to unoffloading them, at which point we can just demand to have the timelines be unarchived. Part of #8088	2024-10-30 17:04:57 +01:00
Tristan Partin	8af9412eb2	Collect compute backpressure throttling time This will tell us how much time the compute has spent throttled if pageserver/safekeeper cannot keep up with WAL generation. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-30 09:58:29 -05:00
Anastasia Lubennikova	e369c58a3c	Merge pull request #9577 from neondatabase/compute-hotfix-2024-10-30 Compute hotfix release 2024-10-30	2024-10-30 12:25:46 +00:00
Erik Grinaker	96e35e11a6	postgres_ffi: add WAL generator for tests/benchmarks (#9503 ) ## Problem We don't have a convenient way to generate WAL records for benchmarks and tests. ## Summary of changes Adds a WAL generator, exposed as an iterator. It currently only generates logical messages (noops), but will be extended to write actual table rows later. Some existing code for WAL generation has been replaced with this generator, to reduce duplication.	2024-10-30 14:46:39 +03:00
Alexey Kondratov	745061ddf8	chore(compute): Bump pg_mooncake to the latest version (#9576 ) ## Problem There were some critical breaking changes made in the upstream since Oct 29th morning. ## Summary of changes Point it to the topmost commit in the `neon` branch at the time of writing this https://github.com/Mooncake-Labs/pg_mooncake/commits/neon/ `c495cd17d6`	2024-10-30 11:07:02 +01:00
Tristan Partin	0c828c57e2	Remove non-gzipped basebackup code path In July of 2023, Bojan and Chi authored `92aee7e07f`. Our in production pageservers are most definitely at a version where they all support gzipped basebackups.	2024-10-29 23:03:45 -05:00
John Spray	8e2e9f0fed	pageserver: generation-aware storage for TenantManifest (#9555 ) ## Problem When tenant manifest objects are written without a generation suffix, concurrently attached pageservers may stamp on each others writes of the manifest and cause undefined behavior. Closes: #9543 ## Summary of changes - Use download_generation_object helper when reading manifests, to search for the most recent generation - Use Tenant::generation as the generation suffix when writing manifests.	2024-10-29 23:24:04 +01:00
Alexey Kondratov	237d6ffc02	chore(compute): Bump pg_mooncake to the latest version The topmost commit in the `neon` branch at the time of writing this https://github.com/Mooncake-Labs/pg_mooncake/commits/neon/ `568b5a82b5`	2024-10-29 23:12:30 +01:00
Tristan Partin	b77b9bdc9f	Add tests for sql-exporter metrics Should help us keep non-working metrics from hitting staging or production. Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Fixes: https://github.com/neondatabase/neon/issues/8569 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-29 15:13:06 -05:00
Alex Chi Z.	81f9aba005	fix(pagectl): layer parsing and image layer dump (#9571 ) This patch contains various improvements for the pagectl tool. ## Summary of changes * Rewrite layer name parsing: LayerName now supports all variants we use now. * Drop pagectl's own layer parsing function, use LayerName in the pageserver crate. * Support image layer dumping in the layer dump command using ImageLayer::dump, drop the original implementation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-29 15:16:23 -04:00
Anastasia Lubennikova	93f7f1d10f	Merge pull request #9573 from neondatabase/releases/2024-10-29-compute-only-2 Compute release 2024-10-29	2024-10-29 18:53:03 +00:00
Alex Chi Z.	88ff8a7803	feat(pageserver): support partial gc-compaction for lowest retain lsn (#9134 ) part of https://github.com/neondatabase/neon/issues/8921, https://github.com/neondatabase/neon/issues/9114 ## Summary of changes We start the partial compaction implementation with the image layer partial generation. The partial compaction API now takes a key range. We will only generate images for that key range for now, and remove layers fully included in the key range after compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-29 18:25:32 +00:00
Konstantin Knizhnik	0c075fab3a	Add --replica parameter to basebackup (#9553 ) ## Problem See https://github.com/neondatabase/neon/pull/9458 This PR separates PS related changes in #9458 from compute_ctl changes to enforce that PS is deployed before compute. ## Summary of changes This PR adds handlings of `--replica` parameters of backebackup to page server. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-29 18:40:10 +02:00
Anastasia Lubennikova	80e1630042	Use pg_mooncake from our fork. (#9565 ) Switch to main repo once https://github.com/Mooncake-Labs/pg_mooncake/pull/3 is merged	2024-10-29 15:57:52 +00:00
Jakub Kołodziejczak	57499640c5	proxy: more granular http status codes for sql-over-http errors (#9549 ) closes #9532	2024-10-29 15:44:45 +00:00
Anastasia Lubennikova	793ad50b7d	fix allow_unstable_extensions GUC - make it USERSET (#9563 ) fix message wording	2024-10-29 14:25:23 +00:00
John Spray	7a1331eee5	pageserver: make concurrent offloaded timeline operations safe wrt manifest uploads (#9557 ) ## Problem Uploads of the tenant manifest could race between different tasks, resulting in unexpected results in remote storage. Closes: https://github.com/neondatabase/neon/issues/9556 ## Summary of changes - Create a central function for uploads that takes a tokio::sync::Mutex - Store the latest upload in that Mutex, so that when there is lots of concurrency (e.g. archive 20 timelines at once) we can coalesce their manifest writes somewhat.	2024-10-29 13:54:48 +00:00
John Spray	4ef74215e1	pageserver: refactor generation-aware loading code into generic (#9545 ) ## Problem Indices used to be the only kind of object where we had to search across generations to find the most recent one. As of https://github.com/neondatabase/neon/issues/9543, manifests will need the same treatment. ## Summary of changes - Refactor download_index_part to a generic download_generation_object function, which will be usable for downloading manifest objects as well.	2024-10-29 13:00:03 +00:00
Conrad Ludgate	d4cbc8cfeb	[auth_broker]: regress test (#9541 ) python based regression test setup for auth_broker. This uses a http mock for cplane as well as the JWKs url. complications: 1. We cannot just use local_proxy binary, as that requires the pg_session_jwt extension which we don't have available in the current test suite 2. We cannot use just any old http mock for local_proxy, as auth_broker requires http2 to local_proxy as such, I used the h2 library to implement an echo server - copied from the examples in the h2 docs.	2024-10-29 11:39:09 +00:00
Conrad Ludgate	47c35f67c3	[proxy]: fix JWT handling for AWS cognito. (#9536 ) In the base64 payload of an aws cognito jwt, I saw the following: ``` "iss":"https:\/\/cognito-idp.us-west-2.amazonaws.com\/us-west-2_redacted" ``` issuers are supposed to be URLs, and URLs are always valid un-escaped JSON. However, `\/` is a valid escape character so what AWS is doing is technically correct... sigh... This PR refactors the test suite and adds a new regression test for cognito.	2024-10-29 11:01:09 +00:00
Peter Bendel	45b558f480	temporarily increase timeout for clickbench benchmark until regression is resolved (#9554 ) ## Problem click bench job in benchmarking workflow has a performance regression causing it to run in timeout of max job run. Suspected root cause: Project has been migrated from single pageserver to storage controller managed project on Oct 14th. Since then the regression shows. ## Summary of changes Increase timeout of pytest to 12 hours. Increase job timeout to 12 hours	2024-10-29 10:53:28 +00:00
Arpad Müller	a73402e646	Offloaded timeline deletion (#9519 ) As pointed out in https://github.com/neondatabase/neon/pull/9489#discussion_r1814699683 , we currently didn't support deletion for offloaded timelines after the timeline has been loaded from the manifest instead of having been offloaded. This was because the upload queue hasn't been initialized yet. This PR thus initializes the timeline and shuts it down immediately. Part of #8088	2024-10-29 10:41:53 +00:00
Vlad Lazar	07b974480c	pageserver: move things around to prepare for decoding logic (#9504 ) ## Problem We wish to have high level WAL decoding logic in `wal_decoder::decoder` module. ## Summary of Changes For this we need the `Value` and `NeonWalRecord` types accessible there, so: 1. Move `Value` and `NeonWalRecord` to `pageserver::value` and `pageserver::record` respectively. 2. Get rid of `pageserver::repository` (follow up from (1)) 3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In theory they could live in `wal_decoder`, but it would create a circular dependency between `wal_decoder` and `postgres_ffi`. Long term it makes sense for those types to be PG version specific, so that will work out nicely. 4. Move higher level WAL record types (to be ingested by pageserver) into `wal_decoder::models` Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-29 10:00:34 +00:00
Arpad Müller	62f5d484d9	Assert the tenant to be active in `unoffload_timeline` (#9539 ) Currently, all callers of `unoffload_timeline` ensure that the tenant the unoffload operation is called on is active. We rely on it being active as we activate the timeline below and don't want to race with the activation code of the tenant (in the worst case, activating a timeline twice). Therefore, add this assertion. Part of #8088	2024-10-29 00:36:05 +00:00
Tristan Partin	4df3987054	Get role name when not a C string We will only have a C string if the specified role is a string. Otherwise, we need to resolve references to public, current_role, current_user, and session_user. Fixes: https://github.com/neondatabase/cloud/issues/19323 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-28 18:21:45 -05:00
Konstantin Knizhnik	0624565617	Create the notion of unstable extensions As a DBaaS provider, Neon needs to provide a stable platform for customers to build applications upon. At the same time however, we also need to enable customers to use the latest and greatest technology, so they can prototype their work, and we can solicit feedback. If all extensions are treated the same in terms of stability, it is hard to meet that goal. There are now two new GUCs created by the Neon extension: neon.allow_unstable_extensions: This is a session GUC which allows a session to install and load unstable extensions. neon.unstable_extensions: This is a comma-separated list of extension names. We can check if a CREATE EXTENSION statement is attempting to install an unstable extension, and if so, deny the request if neon.allow_unstable_extensions is not set to true. Signed-off-by: Tristan Partin <tristan@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-28 17:47:15 -05:00
George MacKerron	7d5f6b6a52	Build `pgrag` extensions x3 (#8486 ) Build the pgrag extensions (rag, rag_bge_small_en_v15, and rag_jina_reranker_v1_tiny_en) as part of the compute node Dockerfile. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-10-28 20:06:36 +00:00
Alex Chi Z.	f7c61e856f	fix(pageserver): bump tokio-epoll-uring (#9546 ) Includes https://github.com/neondatabase/tokio-epoll-uring/pull/58 that fixes the clippy error. ## Summary of changes Update the version of tokio-epoll-uring Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-28 20:03:02 +00:00
Alex Chi Z.	57c21aff9f	refactor(pageserver): remove aux v1 configs (#9494 ) ## Problem Part of https://github.com/neondatabase/neon/issues/8623 ## Summary of changes Removed all aux-v1 config processing code. Note that we persisted it into the index part file, so we cannot really remove the field from index part. I also kept the config item within the tenant config, but we will not read it any more. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-28 19:51:14 +00:00
Erik Grinaker	248558dee8	safekeeper: refactor `WalAcceptor` to be event-driven (#9462 ) ## Problem The `WalAcceptor` main loop currently uses two nested loops to consume inbound messages. This makes it hard to slot in periodic events like metrics collection. It also duplicates the event processing code, and assumes all messages in steady state are AppendRequests (other messages types may be dropped if following an AppendRequest). ## Summary of changes Refactor the `WalAcceptor` loop to be event driven.	2024-10-28 17:18:37 +00:00
Sergey Melnikov	3bad52543f	We don't have legacy proxies anymore (#9544 ) We don't have legacy scram proxies anymore: cc: https://github.com/neondatabase/cloud/issues/9745	2024-10-28 16:42:35 +00:00
Tristan Partin	3d64a7ddcd	Add pg_mooncake to compute-node.Dockerfile Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-28 11:23:30 -05:00
Conrad Ludgate	25f1e5cfeb	[proxy] demote warnings and remove dead-argument (#9512 ) fixes https://github.com/neondatabase/cloud/issues/19000	2024-10-28 15:02:20 +00:00
Rahul Patil	8dd555d396	ci(proxy): Update GH action flag on proxy deployment (#9535 ) ## Problem Based on a recent proxy deployment issue, we deployed another proxy version (proxy-scram), which was not needed when deploying a specific proxy type. we have [PR](https://github.com/neondatabase/infra/pull/2142) to update on the infra branch and need to update CI in this repo which triggers proxy deployment. ## Summary of changes - Update proxy deployment flag ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-10-28 13:17:09 +01:00
Arthur Petukhovsky	01b6843e12	Route pgbouncer logs to virtio-serial (#9488 ) virtio-serial is much more performant than /dev/console emulation, therefore, is much more suitable for the verbose logs inside vm. This commit changes routing for pgbouncer logs, since we've recently noticed it can emit large volumes of logs. Manually tested on staging by pinning a compute image to my test project. Should help with https://github.com/neondatabase/cloud/issues/19072	2024-10-28 12:09:47 +00:00
John Spray	93987b5a4a	tests: add test_storage_controller_onboard_detached (#9431 ) ## Problem We haven't historically taken this API route where we would onboard a tenant to the controller in detached state. It worked, but we didn't have test coverage. ## Summary of changes - Add a test that onboards a tenant to the storage controller in Detached mode, and checks that deleting it without attaching it works as expected.	2024-10-28 11:11:12 +00:00
John Spray	33baca07b6	storcon: add an API to cancel ongoing reconciler (#9520 ) ## Problem If something goes wrong with a live migration, we currently only have awkward ways to interrupt that: - Restart the storage controller - Ask it to do some other modification/migration on the shard, which we don't really want. ## Summary of changes - Add a new `/cancel` control API, and storcon_cli wrapper for it, which fires the Reconciler's cancellation token. This is just for on-call use and we do not expect it to be used by any other services.	2024-10-28 09:26:01 +00:00
John Spray	923974d4da	safekeeper: don't un-evict timelines during snapshot API handler (#9428 ) ## Problem When we use pull_timeline API on an evicted timeline, it gets downloaded to serve the snapshot API request. That means that to evacuate all the timelines from a node, the node needs enough disk space to download partial segments from all timelines, which may not be physically the case. Closes: #8833 ## Summary of changes - Add a "try" variant of acquiring a residence guard, that returns None if the timeline is offloaded - During snapshot API handler, take a different code path if the timeline isn't resident, where we just read the checkpoint and don't try to read any segments.	2024-10-28 08:47:12 +00:00
Arpad Müller	e7277885b3	Don't consider archived timelines for synthetic size calculation (#9497 ) Archived timelines should not count towards synthetic size. Closes #9384. Part of #8088.	2024-10-26 13:27:57 +00:00
dependabot[bot]	80262e724f	build(deps): bump werkzeug from 3.0.3 to 3.0.6 (#9527 )	2024-10-26 08:24:15 +01:00
Yuchen Liang	cf8646da19	Merge pull request #9528 from neondatabase/rc/2024-10-25 Storage & Compute release 2024-10-25	2024-10-25 16:49:34 -04:00
Yuchen Liang	46e9a472d7	Merge branch 'release' into rc/2024-10-25	2024-10-25 16:41:06 -04:00
Yuchen Liang	85b954f449	pageserver: add tokio-epoll-uring slots waiters queue depth metrics (#9482 ) In complement to https://github.com/neondatabase/tokio-epoll-uring/pull/56. ## Problem We want to make tokio-epoll-uring slots waiters queue depth observable via Prometheus. ## Summary of changes - Add `pageserver_tokio_epoll_uring_slots_submission_queue_depth` metrics as a `Histogram`. - Each thread-local tokio-epoll-uring system is given a `LocalHistogram` to observe the metrics. - Keep a list of `Arc<ThreadLocalMetrics>` used on-demand to flush data to the shared histogram. - Extend `Collector::collect` to report `pageserver_tokio_epoll_uring_slots_submission_queue_depth`. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-25 21:30:57 +01:00
Arpad Müller	76328ada05	Fix unoffload_timeline races with creation (#9525 ) This PR does two things: 1. Obtain a `TimelineCreateGuard` object in `unoffload_timeline`. This prevents two unoffload tasks from racing with each other. While they already obtain locks for `timelines` and `offloaded_timelines`, they aren't sufficient, as we have already constructed an entire timeline at that point. We shouldn't ever have two `Timeline` objects in the same process at the same time. 2. don't allow timeline creations for timelines that have been offloaded. Obviously they already exist, so we should not allow creation. the previous logic only looked at the timelines list. Part of #8088	2024-10-25 20:06:27 +00:00
Erik Grinaker	b54b632c6a	safekeeper: don't pass conf into storage constructors (#9523 ) ## Problem The storage components take an entire `SafekeeperConf` during construction, but only actually use the `no_sync` field. This makes it hard to understand the storage inputs (which fields do they actually care about?), and is also inconvenient for tests and benchmarks that need to set up a lot of unnecessary boilerplate. ## Summary of changes * Don't take the entire config, but pass in the `no_sync` field explicitly. * Take the timeline dir instead of `ttid` as an input, since it's the only thing it cares about. * Fix a couple of tests to not leak tempdirs. * Various minor tweaks.	2024-10-25 18:19:52 +01:00
Erik Grinaker	9909551f47	safekeeper: fix version in `TimelinePersistentState::empty()` (#9521 ) ## Problem The Postgres version in `TimelinePersistentState::empty()` is incorrect: the major version should be multiplied by 10000. ## Summary of changes Multiply the version by 10000.	2024-10-25 16:22:35 +01:00
Arseny Sher	700b102b0f	safekeeper: retry eviction. (#9485 ) Without this manager may sleep forever after eviction failure without retries.	2024-10-25 17:48:29 +03:00
Conrad Ludgate	dbadb0f9bb	proxy: propagate session IDs (#9509 ) fixes #9367 by sending session IDs to local_proxy, and also returns session IDs to the client for easier debugging.	2024-10-25 14:34:19 +00:00
John Spray	8297f7a181	pageserver: fix N^2 I/O when processing relation drops in transaction abort (#9507 ) ## Problem We have some known N^2 behaviors when it comes to large relation counts, due to the monolithic encoding and full rewrites of of RelDirectory each time a relation is added. Ordinarily our backpressure mechanisms give "slow but steady" performance when creating/dropping/truncating relations. However, in the case of a transaction abort, it is possible for a single WAL record to drop an unbounded number of relations. The results in an unavailable compute, as when it sends one of these records, it can stall the pageserver's ingest for many minutes, even though the compute only sent a small amount of WAL. Closes https://github.com/neondatabase/neon/issues/9505 ## Summary of changes - Rewrite relation-dropping code to do one read/modify/write cycle of RelDirectory, instead of doing it separately for each relation in a loop. - Add a test for the bug scenario encountered: `test_tx_abort_with_many_relations` The test has ~40s runtime on my workstation. About 1 second of that is the part where we wait for ingest to catch up after a rollback, the rest is the slowness of creating and truncating a large number of relations. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-25 15:09:02 +01:00
Christian Schwarz	2090e928d1	refactor(timeline creation): idempotency checking (#9501 ) # Context In the PGDATA import code (https://github.com/neondatabase/neon/pull/9218) I add a third way to create timelines, namely, by importing from a copy of a vanilla PGDATA directory in object storage. For idempotency, I'm using the PGDATA object storage location specification, which is stored in the IndexPart for the entire lifespan of the timeline. When loading the timeline from remote storage, that value gets stored inside `struct Timeline` and timeline creation compares the creation argument with that value to determine idempotency of the request. # Changes This PR refactors the existing idempotency handling of Timeline bootstrap and branching such that we simply compare the `CreateTimelineIdempotency` struct, using the derive-generated `PartialEq` implementation. Also, by spelling idempotency out in the type names, I find it adds a lot of clarity. The pathway to idempotency via requester-provided idempotency key also becomes very straight-forward, if we ever want to do this in the future. # Refs * platform context: https://github.com/neondatabase/neon/pull/9218 * product context: https://github.com/neondatabase/cloud/issues/17507 * stacks on top of https://github.com/neondatabase/neon/pull/9366	2024-10-25 14:44:20 +01:00
Tristan Partin	05eff3a67e	Move logical replication slot monitor neon.c is getting crowded and the logical replication slot monitor is a good candidate for reorganization. It is very self-contained, and being in a separate file will make it that much easier to find. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-25 08:41:44 -05:00
Arseny Sher	c6cf5e7c0f	Make test_pageserver_lsn_wait_error_safekeeper_stop less aggressive. (#9517 ) Previously it inserted ~150MiB of WAL while expecting page fetching to work in 1s (wait_lsn_timeout=1s). It failed in CI in debug builds. Instead, just directly wait for the wanted condition, i.e. needed safekeepers are reported in pageserver timed out waiting for WAL error message. Also set NEON_COMPUTE_TESTING_BASEBACKUP_RETRIES to 1 in this test and neighbour one, it reduces execution time from 2.5m to ~10s.	2024-10-25 14:13:46 +01:00
Christian Schwarz	e0c7f1ce15	remote_storage(local_fs): return correct file sizes (#9511 ) ## Problem `local_fs` doesn't return file sizes, which I need in PGDATA import (#9218) ## Solution Include file sizes in the result. I would have liked to add a unit test, and started doing that in * https://github.com/neondatabase/neon/pull/9510 by extending the common object storage tests (`libs/remote_storage/tests/common/tests.rs`) to check for sizes as well. But it turns out that localfs is not even covered by the common object storage tests and upon closer inspection, it seems that this area needs more attention. => punt the effort into https://github.com/neondatabase/neon/pull/9510	2024-10-25 12:20:53 +00:00
Christian Schwarz	6f5c262684	pageserver: add testing API to scan layers for disposable keys (#9393 ) This PR adds a pageserver mgmt API to scan a layer file for disposable keys. It hooks it up to the sharding compaction test, demonstrating that we're not filtering out all disposable keys. This is extracted from PGDATA import (https://github.com/neondatabase/neon/pull/9218) where I do the filtering of layer files based on `is_key_disposable`.	2024-10-25 14:16:45 +02:00
Jakub Kołodziejczak	9768f09f6b	proxy: don't follow redirects for user provided JWKS urls + set custom user agent (#9514 ) partially fixes https://github.com/neondatabase/cloud/issues/19249 ref https://docs.rs/reqwest/latest/reqwest/redirect/index.html > By default, a Client will automatically handle HTTP redirects, having a maximum redirect chain of 10 hops. To customize this behavior, a redirect::Policy can be used with a ClientBuilder.	2024-10-25 14:04:41 +02:00
Yuchen Liang	db900ae9d0	fix(test): remove too strict layers_removed==0 check in test_readonly_node_gc (#9506 ) Fixes #9098 ## Problem `test_readonly_node_gc` is flaky. As shown in [Allure Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9469/11444519440/index.html#suites/3ccffb1d100105b98aed3dc19b717917/2c02073738fa2b39), we would get a `AssertionError: No layers should be removed, old layers are guarded by leases.` after the test restarts pageservers or after reconfigure pageservers. During the investigation, we found that the layers has LSN (`0/1563088`) greater than the LSN (`0x1562000`) protected by the lease. For instance, Layers removed <pre> 000000067F00000005000034540100000000-000000067F00000005000040050100000000__000000000<b><i>1563088</i></b>-00000001 (shard 0002) 000000068000000000000017E20000000001-010000000100000001000000000000000001__000000000<b><i>1563088</i></b>-00000001 (shard 0002) </pre> Lsn Lease Granted <pre> handle_make_lsn_lease{lsn=<b><i>0/1562000</i></b> shard_id=0002 shard_id=0002}: lease created, valid until 2024-10-21 </pre> This means that these layers are not guarded by the leases: they are in "future", not visible to the static endpoint. ## Summary of changes - Remove the assertion layers_removed == 0 after trigger timeline GC while holding the lease. Instead rely on the successful execution of the`SELECT` query to test lease validity. - Improve test logging Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-25 12:50:47 +01:00
Arpad Müller	4d9036bf1f	Support offloaded timelines during shard split (#9489 ) Before, we didn't copy over the `index-part.json` of offloaded timelines to the new shard's location, resulting in the new shard not knowing the timeline even exists. In #9444, we copy over the manifest, but we also need to do this for `index-part.json`. As the operations to do are mostly the same between offloaded and non-offloaded timelines, we can iterate over all of them in the same loop, after the introduction of a `TimelineOrOffloadedArcRef` type to generalize over the two cases. This is analogous to the deletion code added in #8907. The added test also ensures that the sharded archival config endpoint works, something that has not yet been ensured by tests. Part of #8088	2024-10-25 12:32:46 +02:00
Vlad Lazar	b3bedda6fd	pageserver/walingest: log on gappy rel extend (#9502 ) ## Problem https://github.com/neondatabase/neon/pull/9492 added a metric to track the total count of block gaps filled on rel extend. More context is needed to understand when this happens. The current theory is that it may only happen on pg 14 and pg 15 since they do not WAL log relation extends. ## Summary of Changes A rate limited log is added.	2024-10-25 11:15:53 +01:00
Christian Schwarz	b782b11b33	refactor(timeline creation): represent bootstrap vs branch using enum (#9366 ) # Problem Timeline creation can either be bootstrap or branch. The distinction is made based on whether the `ancestor_` fields are present or not. In the PGDATA import code (https://github.com/neondatabase/neon/pull/9218), I add a third variant to timeline creation. # Solution The above pushed me to refactor the code in Pageserver to distinguish the different creation requests through enum variants. There is no externally observable effect from this change. On the implementation level, a notable change is that the acquisition of the `TimelineCreationGuard` happens later than before. This is necessary so that we have everything in place to construct the `CreateTimelineIdempotency`. Notably, this moves the acquisition of the creation guard _after_ the acquisition of the `gc_cs` lock in the case of branching. This might appear as if we're at risk of holding `gc_cs` longer than before this PR, but, even before this PR, we were holding `gc_cs` until after the `wait_completion()` that makes the timeline creation durable in S3 returns. I don't see any deadlock risk with reversing the lock acquisition order. As a drive-by change, I found that the `create_timeline()` function in `neon_local` is unused, so I removed it. # Refs platform context: https://github.com/neondatabase/neon/pull/9218 * product context: https://github.com/neondatabase/cloud/issues/17507 * next PR stacked atop this one: https://github.com/neondatabase/neon/pull/9501	2024-10-25 10:04:27 +00:00
Vlad Lazar	5069123b6d	pageserver: refactor ingest inplace to decouple decoding and handling (#9472 ) ## Problem WAL ingest couples decoding of special records with their handling (updates to the storage engine mostly). This is a roadblock for our plan to move WAL filtering (and implicitly decoding) to safekeepers since they cannot do writes to the storage engine. ## Summary of changes This PR decouples the decoding of the special WAL records from their application. The changes are done in place and I've done my best to refrain from refactorings and attempted to preserve the original code as much as possible. Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-24 17:12:47 +01:00
Alex Chi Z.	fb0406e9d2	refactor(pageserver): refactor split writers using batch layer writer (#9493 ) part of https://github.com/neondatabase/neon/issues/9114, https://github.com/neondatabase/neon/issues/8836, https://github.com/neondatabase/neon/issues/8362 The split layer writer code can be used in a more general way: the caller puts unfinished writers into the batch layer writer and let batch layer writer to ensure the atomicity of the layer produces. ## Summary of changes * Add batch layer writer, which atomically finishes the layers. `BatchLayerWriter::finish` is simply a copy-paste from previous split layer writers. * Refactor split writers to use the batch layer writer. * The current split writer tests cover all code path of batch layer writer. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-24 10:49:54 -04:00
Alexander Bayandin	b8a311131e	CI: remove `git config --add safe.directory` hack (#9391 ) ## Problem We have `git config --global --add safe.directory ...` leftovers from the past, but `actions/checkout` does it by default (since v3.0.2, we use v4) ## Summary of changes - Remove `git config --global --add safe.directory ...` hack	2024-10-24 15:49:26 +01:00
John Spray	d589498c6f	storcon: respect Reconciler::cancel during await_lsn (#9486 ) ## Problem When a pageserver is misbehaving (e.g. we hit an ingest bug or something is pathologically slow), the storage controller could get stuck in the part of live migration that waits for LSNs to catch up. This is a problem, because it can prevent us migrating the troublesome tenant to another pageserver. Closes: https://github.com/neondatabase/cloud/issues/19169 ## Summary of changes - Respect Reconciler::cancel during await_lsn.	2024-10-24 15:23:09 +01:00
Christian Schwarz	6f34f97573	refactor(pageserver(load_remote_timeline)) remove dead code handling absence of IndexPart (#9408 ) The code is dead at runtime since we're nowadays always running with remote storage and treat it as the source of truth during attach. Clean it up as a preliminary to https://github.com/neondatabase/neon/pull/9218. Related: https://github.com/neondatabase/neon/pull/9366	2024-10-24 09:00:22 +01:00
Tristan Partin	b86432c29e	Fix buggy sizeof A sizeof on a pointer on a 64 bit machine is 8 bytes whereas Entry::old_name is a 64 byte array of characters. There was most likely no fallout since the string would start with NUL bytes, but best to fix nonetheless. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-23 21:52:22 -06:00
Vlad Lazar	ac1205c14c	pageserver: add metric for number of zeroed pages on rel extend (#9492 ) ## Problem Filling the gap in with zeroes is annoying for sharded ingest. We are not sure it even happens in reality. ## Summary of Changes Add one global counter which tracks how many such gap blocks we filled on relation extends. We can add more metrics once we understand the scope.	2024-10-23 19:58:28 +01:00
John Spray	e3ff87ce3b	tests: avoid using background_process when invoking pg_ctl (#9469 ) ## Problem Occasionally, we get failures to start the storage controller's db with errors like: ``` aborting due to panic at /__w/neon/neon/control_plane/src/background_process.rs:349:67: claim pid file: lock file Caused by: file is already locked ``` e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9428/11380574562/index.html#/testresult/1c68d413ea9ecd4a This is happening in a stop,start cycle during a test. Presumably the pidfile from the startup background process is still held at the point we stop, because we let pg_ctl keep running in the background. ## Summary of changes - Refactor pg_ctl invocations into a helper - In the controller's `start` function, use pg_ctl & a wait loop for pg_isready, instead of using background_process --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-10-23 16:29:55 +00:00
Tristan Partin	0595320c87	Protect call to pg_current_wal_lsn() in retained_wal query We can't call pg_current_wal_lsn() if we are a standby instance (read replica). Any attempt to call this function while in recovery results in: ERROR: recovery is in progress Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-23 09:55:00 -06:00
Folke Behrens	92d5e0e87a	proxy: clear lib.rs of code items (#9479 ) We keep lib.rs for crate configs, lint configs and re-exports for the binaries.	2024-10-23 08:21:28 +02:00
Arpad Müller	3a3bd34a28	Rename IndexPart::{from_s3_bytes,to_s3_bytes} (#9481 ) We support multiple storage backends now, so remove the `_s3_` from the name. Analogous to the names adopted for tenant manifests added in #9444.	2024-10-23 00:34:24 +02:00
Alex Chi Z.	64949a37a9	fix(pageserver): make delta split layer writer finish atomic (#9048 ) similar to https://github.com/neondatabase/neon/pull/8841, we make the delta layer writer atomic when finishing the layers. ## Summary of changes * `put_value` not taking discard fn anymore * `finish` decides what layers to keep --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-22 22:06:21 +00:00
Arpad Müller	6f8fcdf9ea	Timeline offloading persistence (#9444 ) Persist timeline offloaded state to S3. Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful. This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched. This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start. We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active. Part of #9386, #8088	2024-10-22 20:52:30 +00:00
Tristan Partin	fcb55a2aa2	Fix copy-paste error in checkpoints_timed metric Importing the wrong metric. Sigh... Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-22 14:34:26 -06:00
a-masterov	f36cf3f885	Fix local errors for the tests with the versions mix (#9477 ) ## Problem If the environment variables `COMPATIBILITY_NEON_BIN` or `COMPATIBILITY_POSTGRES_DISTRIB_DIR` are not set (this is usual during a local run), the tests with the versions mix cannot run. ## Summary of changes If these variables are not set turn off the version mix. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-10-22 21:58:55 +02:00
John Spray	8dca188974	storage controller: add metrics for tenant shard, node count (#9475 ) ## Problem Previously, figuring out how many tenant shards were managed by a storage controller was typically done by peeking at the database or calling into the API. A metric makes it easier to monitor, as unexpectedly increasing shard counts can be indicative of problems elsewhere in the system. ## Summary of changes - Add metrics `storage_controller_pageserver_nodes` (updated on node CRUD operations from Service) and `storage_controller_tenant_shards` (updated RAII-style from TenantShard)	2024-10-22 19:43:02 +01:00
Tristan Partin	b7fa93f6b7	Use make's builtin RM variable At least as far as removing individual files goes, this is the best pattern for removing. I can't say the same for removing directories, but I went ahead and changed those to `$(RM) -r` anyway. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-22 09:14:29 -06:00
Alexey Kondratov	c4e5693145	Merge pull request #9476 from neondatabase/tristan957/auth Compute release 2024-10-22	2024-10-22 12:07:19 +02:00
Arseny Sher	1e8e04bb2c	safekeeper: refactor timeline initialization (#9362 ) Always do timeline init through atomic rename of temp directory. Add GlobalTimelines::load_temp_timeline which does this, and use it from both pull_timeline and basic timeline creation. Fixes a collection of issues: - previously timeline creation didn't really flushed cfile to disk due to 'nothing to do if state didn't change' check; - even if it did, without tmp dir it is possible to lose the cfile but leave timeline dir in place, making it look corrupted; - tenant directory creation fsync was missing in timeline creation; - pull_timeline is now protected from concurrent both itself and timeline creation; - now global timelines map entry got special CreationInProgress entry type which prevents from anyone getting access to timeline while it is being created (previously one could get access to it, but it was locked during creation, which is valid but confusing if creation failed). fixes #8927	2024-10-22 07:11:36 +01:00
David Gomes	2b3cc87a2a	chore(compute): bumps pg_session_jwt to latest version (#9474 )	2024-10-21 18:17:38 -06:00
David Gomes	94369af782	chore(compute): bumps pg_session_jwt to latest version (#9474 )	2024-10-21 23:39:30 +00:00
Arpad Müller	34b6bd416a	offloaded timeline list API (#9461 ) Add a way to list the offloaded timelines. Before, one had to look at logs to figure out if a timeline has been offloaded or not, or use the non-presence of a certain timeline in the list of normal timelines. Now, one can list them directly. Part of #8088	2024-10-21 16:33:05 +01:00
Yuchen Liang	49d5e56c08	pageserver: use direct IO for delta and image layer reads (#9326 ) Part of #8130 ## Problem Pageserver previously goes through the kernel page cache for all the IOs. The kernel page cache makes light-loaded pageserver have deceptive fast performance. Using direct IO would offer predictable latencies of our virtual file IO operations. In particular for reads, the data pages also have an extremely low temporal locality because the most frequently accessed pages are cached on the compute side. ## Summary of changes This PR enables pageserver to use direct IO for delta layer and image layer reads. We can ship them separately because these layers are write-once, read-many, so we will not be mixing buffered IO with direct IO. - implement `IoBufferMut`, an buffer type with aligned allocation (currently set to 512). - use `IoBufferMut` at all places we are doing reads on image + delta layers. - leverage Rust type system and use `IoBufAlignedMut` marker trait to guarantee that the input buffers for the IO operations are aligned. - page cache allocation is also made aligned. _* in-memory layer reads and the write path will be shipped separately._ ## Testing Integration test suite run with O_DIRECT enabled: https://github.com/neondatabase/neon/pull/9350 ## Performance We evaluated performance based on the `get-page-at-latest-lsn` benchmark. The results demonstrate a decrease in the number of IOps, no sigificant change in the latency mean, and an slight improvement on the p99.9 and p99.99 latencies. [Benchmark](https://www.notion.so/neondatabase/Benchmark-O_DIRECT-for-image-and-delta-layers-2024-10-01-112f189e00478092a195ea5a0137e706?pvs=4) ## Rollout We will add `virtual_file_io_mode=direct` region by region to enable direct IO on image + delta layers. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-21 11:01:25 -04:00
Alex Chi Z.	aca81f5fa4	fix(pageserver): make image split layer writer finish atomic (#8841 ) Part of https://github.com/neondatabase/neon/issues/8836 ## Summary of changes This pull request makes the image layer split writer atomic when finishing the layers. All the produced layers either finish at the same time, or discard at the same time. Note that this does not prevent atomicity when crash, but anyways, it will be cleaned up on pageserver restart. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-21 15:59:48 +01:00
Ivan Efremov	2dcac94194	proxy: Use common error interface for error handling with cplane (#9454 ) - Remove obsolete error handles. - Use one source of truth for cplane errors. #18468	2024-10-21 17:20:09 +03:00
Ivan Efremov	ababa50cce	Use '-f' for make clean in Makefile compute (#9464 ) Use '-f' instead of '--force' because it is impossible to clean the targets on MacOS	2024-10-21 16:20:39 +03:00
Alexander Bayandin	163beaf9ad	CI: use build-tools on Debian 12 whenever we use Neon artifact (#9463 ) ## Problem ``` + /tmp/neon/pg_install/v16/bin/psql '***' -c 'SELECT version()' /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/bin/psql) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/bin/psql) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5) /tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5) ``` ## Summary of changes - Use `build-tools:pinned-bookworm` whenever we download Neon artefact	2024-10-21 12:14:19 +01:00
Alexander Bayandin	5b37485c99	Rename dockerfiles from `Dockerfile.<something>` to `<something>.Dockerfile` (#9446 ) ## Problem Our dockerfiles, for some historical reason, have unconventional names `Dockerfile.<something>`, and some tools (like GitHub UI) fail to highlight the syntax in them. > Some projects may need distinct Dockerfiles for specific purposes. A common convention is to name these `<something>.Dockerfile` From: https://docs.docker.com/build/concepts/dockerfile/#filename ## Summary of changes - Rename `Dockerfile.build-tools` -> `build-tools.Dockerfile` - Rename `compute/Dockerfile.compute-node` -> `compute/compute-node.Dockerfile`	2024-10-21 09:51:12 +01:00
Folke Behrens	ed958da38a	proxy: Make tests fail fast when test proxy exited early (#9432 ) This currently happens when proxy is not compiled with feature `testing`. Also fix an adjacent function.	2024-10-21 08:29:23 +00:00
Alexey Kondratov	fe1b181fb1	Merge pull request #9459 from neondatabase/compute-rc-2024-10-20 Compute release 2024-10-20	2024-10-20 16:12:37 +02:00
Conrad Ludgate	cc25ef7342	bump pg-session-jwt version (#9455 ) forgot to bump this before	2024-10-20 14:42:50 +02:00
Arpad Müller	71d09c78d4	Accept basebackup <tenant> <timeline> --gzip requests (#9456 ) In #9453, we want to remove the non-gzipped basebackup code in the computes, and always request gzipped basebackups. However, right now the pageserver's page service only accepts basebackup requests in the following formats: * `basebackup <tenant_id> <timeline_id>`, lsn is determined by the pageserver as the most recent one (`timeline.get_last_record_rlsn()`) * `basebackup <tenant_id> <timeline_id> <lsn>` * `basebackup <tenant_id> <timeline_id> <lsn> --gzip` We add a fourth case, `basebackup <tenant_id> <timeline_id> --gzip` to allow gzipping the request for the latest lsn as well.	2024-10-19 00:23:49 +02:00
Anastasia Lubennikova	7f080da9d8	Merge pull request #9451 from neondatabase/releases/2024-10-17-compute-kq-only Releases/2024 10 17 compute kq only	2024-10-18 16:19:33 +01:00
Tristan Partin	62a334871f	Take the collector name as argument when generating sql_exporter configs In neon_collector_autoscaling.jsonnet, the collector name is hardcoded to neon_collector_autoscaling. This issue manifests itself such that sql_exporter would not find the collector configuration. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-18 09:36:29 -05:00
Vlad Lazar	e162ab8b53	storcon: handle ongoing deletions gracefully (#9449 ) ## Problem Pageserver returns 409 (Conflict) if any of the shards are already deleting the timeline. This resulted in an error being propagated out of the HTTP handler and to the client. It's an expected scenario so we should handle it nicely. This caused failures in `test_storage_controller_smoke` [here](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9435/11390431900/index.html#suites/8fc5d1648d2225380766afde7c428d81/86eee4b002d6572d). ## Summary of Changes Instead of returning an error on 409s, we now bubble the status code up and let the HTTP handler code retry until it gets a 404 or times out.	2024-10-18 15:33:04 +01:00
Conrad Ludgate	5cbdec9c79	[local_proxy]: install pg_session_jwt extension on demand (#9370 ) Follow up on #9344. We want to install the extension automatically. We didn't want to couple the extension into compute_ctl so instead local_proxy is the one to issue requests specific to the extension. depends on #9344 and #9395	2024-10-18 14:41:21 +01:00
Vlad Lazar	ec6d3422a5	pageserver: disconnect when asking client to reconnect (#9390 ) ## Problem Consider the following sequence of events: 1. Shard location gets downgraded to secondary while there's a libpq connection in pagestream mode from the compute 2. There's no active tenant, so we return `QueryError::Reconnect` from `PageServerHandler::handle_get_page_at_lsn_request`. 3. Error bubbles up to `PostgresBackendIO::process_message`, bailing us out of pagestream mode. 4. We instruct the client to reconnnect, but continue serving the libpq connection. The client isn't yet aware of the request to reconnect and believes it is still in pagestream mode. Pageserver fails to deserialize get page requests wrapped in `CopyData` since it's not in pagestream mode. ## Summary of Changes When we wish to instruct the client to reconnect, also disconnect from the server side after flushing the error. Closes https://github.com/neondatabase/cloud/issues/17336	2024-10-18 13:38:59 +01:00
Arseny Sher	fecff15f18	walproposer: immediately exit if sync-safekeepers collected 0/0. (#9442 ) Otherwise term history starting with 0/0 is streamed to safekeepers. ref https://github.com/neondatabase/neon/issues/9434	2024-10-18 15:31:50 +03:00
Jere Vaara	3532ae76ef	compute_ctl: Add endpoint that allows extensions to be installed (#9344 ) Adds endpoint to install extensions: POST `/extensions` ``` {"extension":"pg_sessions_jwt","database":"neondb","version":"1.0.0"} ``` Will be used by `local-proxy`. Example, for the JWT authentication to work the database needs to have the pg_session_jwt extension and also to enable JWT to work in RLS policies. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-10-18 15:07:36 +03:00
Folke Behrens	15fecffe6b	Update ruff to much newer version (#9433 ) Includes a multidict patch release to fix build with newer cpython.	2024-10-18 12:42:41 +02:00
Arseny Sher	98fee7a97d	Increase shared_buffers in test_subscriber_synchronous_commit. (#9427 ) Might make the test less flaky.	2024-10-18 13:31:14 +03:00
John Spray	b7173b1ef0	storcon: fix case where we might fail to send compute notifications after two opposite migrations (#9435 ) ## Problem If we migrate A->B, then B->A, and the notification of A->B fails, then we might have retained state that makes us think "A" is the last state we sent to the compute hook, whereas when we migrate B->A we should really be sending a fresh notification in case our earlier failed notification has actually mutated the remote compute config. Closes: #9417 ## Summary of changes - Add a reproducer for the bug (`test_storage_controller_compute_hook_revert`) - Refactor compute hook code to represent remote state with `ComputeRemoteState` which stores a boolean for whether the compute has fully applied the change as well as the request that the compute accepted. - The actual bug fix: after sending a compute notification, if we got a 423 response then update our ComputeRemoteState to reflect that we have mutated the remote state. This way, when we later try and notify for our historic location, we will properly see that as a change and send the notification. Co-authored-by: Vlad Lazar <vlad@neon.tech>	2024-10-18 11:29:23 +01:00
Jere Vaara	24654b8eee	compute_ctl: Add endpoint that allows setting role grants (#9395 ) This PR introduces a `/grants` endpoint which allows setting specific `privileges` to certain `role` for a certain `schema`. Related to #9344 Together these endpoints will be used to configure JWT extension and set correct usage to its schema to specific roles that will need them. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-10-18 11:25:45 +01:00
Conrad Ludgate	b8304f90d6	2024 oct new clippy lints (#9448 ) Fixes new lints from `cargo +nightly clippy` (`clippy 0.1.83 (798fb83f 2024-10-16)`)	2024-10-18 10:27:50 +01:00
Conrad Ludgate	d762ad0883	update rustls (#9396 ) The forever ongoing effort of juggling multiple versions of rustls :3 now with new crypto library aws-lc. Because of dependencies, it is currently impossible to not have both ring and aws-lc in the dep tree, therefore our only options are not updating rustls or having both crypto backends enabled... According to benchmarks run by the rustls maintainer, aws-lc is faster than ring in some cases too <https://jbp.io/graviola/>, so it's not without its upsides,	2024-10-17 20:45:37 +01:00
Arpad Müller	928d98b6dc	Update Rust to 1.82.0 and mold to 2.34.0 (#9445 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1820-2024-10-17). Also update mold. [release notes for 2.34.0](https://github.com/rui314/mold/releases/tag/v2.34.0), [release notes for 2.34.1](https://github.com/rui314/mold/releases/tag/v2.34.1). Prior update was in #8939.	2024-10-17 21:25:51 +02:00
John Spray	24398bf060	pageserver: detect & warn on loading an old index which is probably the result of a bad generation (#9383 ) ## Problem The pageserver generally trusts the storage controller/control plane to give it valid generations. However, sometimes it should be obvious that a generation is bad, and for defense in depth we should detect that on the pageserver. This PR is part 1 of 2: 1. in this PR we detect and warn on such situations, but do not block starting up the tenant. Once we have confidence that the check is not firing unexpectedly in the field 2. part 2 of 2 will introduce a condition that refuses to start a tenant in this situtation, and a test for that (maybe, if we can figure out how to spoof an ancient mtime) Related: #6951 ## Summary of changes - When loading an index older than 2 weeks, log an INFO message noting that we will check for other indices - When loading an index older than 2 weeks _and_ a newer-generation index exists, log a warning.	2024-10-17 19:02:24 +01:00
Alex Chi Z.	63b3491c1b	refactor(pageserver): remove aux v1 code path (#9424 ) Part of the aux v1 retirement https://github.com/neondatabase/neon/issues/8623 ## Summary of changes Remove write/read path for aux v1, but keeping the config item and the index part field for now. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-17 17:22:44 +01:00
Anastasia Lubennikova	858867c627	Add logging of installed_extensions (#9438 ) Simple PR to log installed_extensions statistics. in the following format: ``` 2024-10-17T13:53:02.860595Z INFO [NEON_EXT_STAT] {"extensions":[{"extname":"plpgsql","versions":["1.0"],"n_databases":2},{"extname":"neon","versions":["1.5"],"n_databases":1}]} ```	2024-10-17 16:35:19 +01:00
Erik Grinaker	299cde899b	safekeeper: flush WAL on compute disconnect (#9436 ) ## Problem In #9259, we found that the `check_safekeepers_synced` fast path could result in a lower basebackup LSN than the `flush_lsn` reported by Safekeepers in `VoteResponse`, causing the compute to panic once on startup. This would happen if the Safekeeper had unflushed WAL records due to a compute disconnect. The `TIMELINE_STATUS` query would report a `flush_lsn` below these unflushed records, while `VoteResponse` would flush the WAL and report the advanced `flush_lsn`. See https://github.com/neondatabase/neon/issues/9259#issuecomment-2410849032. ## Summary of changes Flush the WAL if the compute disconnects during WAL processing.	2024-10-17 17:19:18 +02:00
Erik Grinaker	4c9835f4a3	storage_controller: delete stale shards when deleting tenant (#9333 ) ## Problem Tenant deletion only removes the current shards from remote storage. Any stale parent shards (before splits) will be left behind. These shards are kept since child shards may reference data from the parent until new image layers are generated. ## Summary of changes * Document a special case for pageserver tenant deletion that deletes all shards in remote storage when given an unsharded tenant ID, as well as any unsharded tenant data. * Pass an unsharded tenant ID to delete all remote storage under the tenant ID prefix. * Split out `RemoteStorage::delete_prefix()` to delete a bucket prefix, with additional test coverage. * Add a `delimiter` argument to `asset_prefix_empty()` to support partial prefix matches (i.e. all shards starting with a given tenant ID).	2024-10-17 14:34:51 +00:00
Alex Chi Z.	f3a3eefd26	feat(pageserver): do space check before gc-compaction (#9250 ) part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes gc-compaction may take a lot of disk space, and if it does, the caller should do a partial gc-compaction. This patch adds space check for the compaction job. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-17 10:29:53 -04:00
Ivan Efremov	a7c05686cc	test_runner: Update the README.md to build neon with 'testing' (#9437 ) Without having the '--features testing' in the cargo build the proxy won't start causing tests to fail.	2024-10-17 17:20:42 +03:00
Anastasia Lubennikova	8b47938140	Add support of extensions for v17 (part 3) (#9430 ) - pgvector 7.4 update support of extensions for v14-v16: - pgvector 7.2 -> 7.4	2024-10-17 13:37:21 +01:00
Arpad Müller	35e7d91bc9	Add config variable for timeline offloading (#9421 ) Adds a configuration variable for timeline offloading support. The added pageserver-global config option controls whether the pageserver automatically offloads timelines during compaction. Therefore, already offloaded timelines are not affected by this, nor is the manual testing endpoint. This allows the rollout of timeline offloading to be driven by the storage team. Part of #8088	2024-10-17 12:07:58 +00:00
Ivan Efremov	22d8834474	proxy: move the connection pools to separate file (#9398 ) First PR for #9284 Start unification of the client and connection pool interfaces: - Exclude the 'global_connections_count' out from the get_conn_entry() - Move remote connection pools to the conn_pool_lib as a reference - Unify clients among all the conn pools	2024-10-17 13:38:24 +03:00
John Spray	db68e82235	storage_scrubber: fixes to garbage commands (#9409 ) ## Problem While running `find-garbage` and `purge-garbage`, I encountered two things that needed updating: - Console API may omit `user_id` since org accounts were added - When we cut over to using GenericRemoteStorage, the object listings we do during purge did not get proper retry handling, so could easily fail on usual S3 errors, and make the whole process drop out. ...and one bug: - We had a `.unwrap` which expects that after finding an object in a tenant path, a listing in that path will always return objects. This is not true, because a pageserver might be deleting the path at the same time as we scan it. ## Summary of changes - When listing objects during purge, use backoff::retry - Make `user_id` an `Option` - Handle the case where a tenant's objects go away during find-garbage.	2024-10-17 10:06:02 +01:00
Konstantin Knizhnik	934dbb61f5	Check access_count in lfc_evict (#9407 ) ## Problem See https://neondb.slack.com/archives/C033A2WE6BZ/p1729007738526309?thread_ts=1722942856.987979&cid=C033A2WE6BZ When replica receives WAL record which target page is not present in shared buffer, we evict this page from LFC. If all pages from the LFC chunk are evicted, then chunk is moved to the beginning of LRU least to force it reuse. Unfortunately access_count is not checked and if the entry is access at this moment then this operation can cause LRU list corruption. ## Summary of changes Check `access_count` in `lfc_evict` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-17 08:04:57 +03:00
Christian Schwarz	67d5d98b19	readme: fix build instructions for debian 12 (#9371 ) We need libprotobuf-dev for some of the `/usr/include/google/protobuf/...*.proto` referenced by our protobuf decls.	2024-10-16 21:47:53 +02:00
Tristan Partin	e0fa6bcf1a	Fix some sql_exporter metrics for PG 17 Checkpointer related statistics moved from pg_stat_bgwriter to pg_stat_checkpointer, so we need to adjust our queries accordingly. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-16 14:46:33 -05:00
Tristan Partin	409a286eaa	Fix typo in sql_exporter generator Bad copy-paste seemingly. This manifested itself as a failure to start for the sql_exporter, and was just dying on loop in staging. A future PR will have E2E testing of sql_exporter. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-16 13:08:40 -05:00
Arpad Müller	0551cfb6a7	Fix beta clippy warnings (#9419 ) ``` warning: first doc comment paragraph is too long --> compute_tools/src/installed_extensions.rs:35:1 \| 35 \| / /// Connect to every database (see list_dbs above) and get the list of installed extensions. 36 \| \| /// Same extension can be installed in multiple databases with different versions, 37 \| \| /// we only keep the highest and lowest version across all databases. \| \|_ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph = note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default help: add an empty line \| 35 ~ /// Connect to every database (see list_dbs above) and get the list of installed extensions. 36 + /// \| ```	2024-10-16 19:04:56 +01:00
Folke Behrens	ed694732e7	proxy: merge AuthError and AuthErrorImpl (#9418 ) Since GetAuthInfoError now boxes the ControlPlaneError message the variant is not big anymore and AuthError is 32 bytes.	2024-10-16 19:10:49 +02:00
Alex Chi Z.	8a114e3aed	refactor(pageserver): upgrade remote_storage to use hyper1 (#9405 ) part of https://github.com/neondatabase/neon/issues/9255 ## Summary of changes Upgrade remote_storage crate to use hyper1. Hyper0 is used when providing the streaming HTTP body to the s3 SDK, and it is refactored to use hyper1. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-16 16:19:45 +01:00
Arpad Müller	55b246085e	Activate timelines during unoffload (#9399 ) The current code has forgotten to activate timelines during unoffload, leading to inability to receive the basebackup, due to the timeline still being in loading state. ``` stderr: command failed: compute startup failed: failed to get basebackup@0/0 from pageserver postgresql://no_user@localhost:15014 Caused by: 0: db error: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading 1: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading ``` Therefore, also activate the timeline during unoffloading. Part of #8088	2024-10-16 16:47:17 +02:00
Anastasia Lubennikova	9668601f46	Add support of extensions for v17 (part 2) (#9389 ) - plv8 3.2.3 - HypoPG 1.4.1 - pgtap 1.3.3 - timescaledb 2.17.0 - pg_hint_plan 17_1_7_0 - rdkit Release_2024_09_1 - pg_uuidv7 1.6.0 - wal2json 2.6 - pg_ivm 1.9 - pg_partman 5.1.0 update support of extensions for v14-v16: - HypoPG 1.4.0 -> 1.4.1 - pgtap 1.2.0 -> 1.3.3 - plpgsql_check 2.5.3 -> 2.7.11 - pg_uuidv7 1.0.1 -> 1.6.0 - wal2json 2.5 -> 2.6 - pg_ivm 1.7 -> 1.9 - pg_partman 5.0.1 -> 5.1.0	2024-10-16 15:29:23 +01:00
Arpad Müller	3140c14d60	Remove allow(clippy::unknown_lints) (#9416 ) the lint stabilized in 1.80.	2024-10-16 16:28:55 +02:00
John Spray	d6281cbe65	tests: stabilize test_timelines_parallel_endpoints (#9413 ) ## Problem This test would get failures like `command failed: Found no timeline id for branch name 'branch_8'` It's because neon_local is being invoked concurrently for branch creation, which is unsafe (they'll step on each others' JSON writes) Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9410/11363051979/index.html#testresult/5ddc56c640f5422b/retries ## Summary of changes - Don't do branch creation concurrently with endpoint creation via neon_local	2024-10-16 15:27:46 +01:00
Vlad Lazar	d490ad23e0	storcon: use the same trace fields for reconciler and results (#9410 ) ## Problem The reconciler use `seq`, but processing of results uses `sequence`. Order is different too. It makes it annoying to read logs. ## Summary of Changes Use the same tracing fields in both	2024-10-16 14:04:17 +01:00
Folke Behrens	f14e45f0ce	proxy: format imports with nightly rustfmt (#9414 ) ```shell cargo +nightly fmt -p proxy -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true ``` These rust-analyzer settings for VSCode should help retain this style: ```json "rust-analyzer.imports.group.enable": true, "rust-analyzer.imports.prefix": "crate", "rust-analyzer.imports.merge.glob": false, "rust-analyzer.imports.granularity.group": "module", "rust-analyzer.imports.granularity.enforce": true, ```	2024-10-16 15:01:56 +02:00
John Spray	89a65a9e5a	pageserver: improve handling of archival_config calls during Timeline shutdown (#9415 ) ## Problem In test `test_timeline_offloading`, we see failures like: ``` PageserverApiException: queue is in state Stopped ``` Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/main/11356917668/index.html#testresult/ff0e348a78a974ee/retries ## Summary of changes - Amend code paths that handle errors from RemoteTimelineClient to check for cancellation and emit the Cancelled error variant in these cases (will give clients a 503 to retry) - Remove the implicit `#[from]` for the Other error case, to make it harder to add code that accidentally squashes errors into this (500-equivalent) error variant. This would be neater if we made RemoteTimelineClient return a structured error instead of anyhow::Error, but that's a bigger refactor. I'm not sure if the test really intends to hit this path, but the error handling fix makes sense either way.	2024-10-16 13:39:58 +01:00
Cihan Demirci	bc6b8cee01	don't trigger workflows in two repos (#9340 ) https://github.com/neondatabase/cloud/issues/16723	2024-10-16 10:43:48 +01:00
Tristan Partin	061ea0de7a	Add jsonnetfmt targets This should make it a little bit easier for people wanting to check if their files are formated correctly. Has the added bonus of making the CI check simpler as well. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-15 20:01:13 -05:00
Tristan Partin	be5d6a69dc	Fix jsonnet_files wildcard Just a typo in a path. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-15 16:30:31 -05:00
Matthias van de Meent	18f4e5f10c	Add newly added metrics from neondatabase/neon#9116 to exports (#9402 ) They weren't added in that PR, but should be available immediately on rollout as the neon extension already defaults to 1.5.	2024-10-15 23:13:31 +02:00
Alex Chi Z.	f1eb703256	fix(pageserver): use a buffer for basebackup; add aux basebackup metrics log (#9401 ) Our replication bench project is stuck because it is too slow to generate basebackup and it caused compute to disconnect. https://neondb.slack.com/archives/C03438W3FLZ/p1728330685012419 The compute timeout for waiting for basebackup is 10m (is it true?). Generating basebackup directly on pageserver takes ~3min. Therefore, I suspect it's because there are too many wasted round-trip time for writing the 10000+ snapshot aux files. Also, it is possible that the basebackup process takes too long time retrieving all aux files that it did not write anything over the wire protocol, causing a read timeout. Basebackup size is 800KB gzipped for that project and was 55MB tar before compression. ## Summary of changes * Potentially fix the issue by placing a write buffer for basebackup. * Log how many aux files did we read + the time spent on it. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-15 16:35:21 -04:00
Tristan Partin	cf7a596a15	Generate sql_exporter config files with Jsonnet There are quite a few benefits to this approach: - Reduce config duplication - The two sql_exporter configs were super similar with just a few differences - Pull SQL queries into standalone files - That means we could run a SQL formatter on the file in the future - It also means access to syntax highlighting - In the future, run different queries for different PG versions - This is relevant because right now, we have queries that are failing on PG 17 due to catalog updates Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-15 11:18:38 -05:00
Konstantin Knizhnik	614c3aef72	Remove redundant code (#9373 ) ## Problem There is double update of resize cache in `put_rel_truncation` Also `page_server_request` contains check that fork is MAIN_FORKNUM which 1. is incorrect (because Vm/FSM pages are shreded in the same way as MAIN fork pages and 2. is redundant because `page_server_request` is never called for `get page` request so first part to OR condition is always true. ## Summary of changes Remove redundant code ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-15 17:18:52 +03:00
Folke Behrens	fb74c21e8c	proxy: Migrate jwt module away from anyhow (#9361 )	2024-10-15 15:24:56 +02:00
Conrad Ludgate	d92d36a315	[local_proxy] update api for pg_session_jwt (#9359 ) pg_session_jwt now: 1. Sets the JWK in a PGU_BACKEND session guc, no longer in the init() function. 2. JWK no longer needs the kid.	2024-10-15 12:13:57 +00:00
Arpad Müller	ec4cc30de9	Shut down timelines during offload and add offload tests (#9289 ) Add a test for timeline offloading, and subsequent unoffloading. Also adds a manual endpoint, and issues a proper timeline shutdown during offloading which prevents a pageserver hang at shutdown. Part of #8088.	2024-10-15 09:46:51 +00:00
John Spray	73c6626b38	pageserver: stabilize & refine controller scale test (#8971 ) ## Problem We were seeing timeouts on migrations in this test. The test unfortunately tends to saturate local storage, which is shared between the pageservers and the control plane database, which makes the test kind of unrealistic. We will also want to increase the scale of this test, so it's worth fixing that. ## Summary of changes - Instead of randomly creating timelines at the same time as the other background operations, explicitly identify a subset of tenant which will have timelines, and create them at the start. This avoids pageservers putting a lot of load on the test node during the main body of the test. - Adjust the tenants created to create some number of 8 shard tenants and the rest 1 shard tenants, instead of just creating a lot of 2 shard tenants. - Use archival_config to exercise tenant-mutating operations, instead of using timeline creation for this. - Adjust reconcile_until_idle calls to avoid waiting 5 seconds between calls, which causes timelines with large shard count tenants. - Fix a pageserver bug where calls to archival_config during activation get 404	2024-10-15 09:31:18 +01:00
Alexander Bayandin	0fc4ada3ca	Switch CI, Storage and Proxy to Debian 12 (Bookworm) (#9170 ) ## Problem This PR switches CI and Storage to Debain 12 (Bookworm) based images. ## Summary of changes - Add Debian codename (`bookworm`/`bullseye`) to most of docker tags, create un-codenamed images to be used by default - `vm-compute-node-image`: create a separate spec for `bookworm` (we don't need to build cgroups in the future) - `neon-image`: Switch to `bookworm`-based `build-tools` image - Storage components and Proxy use it - CI: run lints and tests on `bookworm`-based `build-tools` image	2024-10-14 21:12:43 +01:00
Matthias van de Meent	dab96a6eb1	Add more timing histogram and gauge metrics to the Neon extension (#9116 ) We now also track: - Number of PS IOs in-flight - Number of pages cached by smgr prefetch implementation - IO timing histograms for LFC reads and writes, per IO issued ## Problem There's little insight into the timing metrics of LFC, and what the prefetch state of each backend is. This changes that, by measuring (and subsequently exposing) these data points. ## Summary of changes - Extract IOHistogram as separate type, rather than a collection of fields on NeonMetrics - others, see items above. Part of https://github.com/neondatabase/neon/issues/8926	2024-10-14 20:30:21 +02:00
Arpad Müller	f54e3e9147	Also consider offloaded timelines for obtaining retain_lsn (#9308 ) Also consider offloaded timelines for obtaining `retain_lsn`. This is required for correctness for all timelines that have not been flattened yet: otherwise we GC data that might still be required for reading. This somewhat counteracts the original purpose of timeline offloading of not having to iterate over offloaded timelines, but sadly it's required. In the future, we can improve the way the offloaded timelines are stored. We also make the `retain_lsn` optional so that in the future, when we implement flattening, we can make it None. This also applies to full timeline objects by the way, where it would probably make most sense to add a bool flag whether the timeline is successfully flattened, and if it is, one can exclude it from `retain_lsn` as well. Also, track whether a timeline was offloaded or not in `retain_lsn` so that the `retain_lsn` can be excluded from visibility and size calculation. Part of #8088	2024-10-14 17:54:03 +02:00
Vlad Lazar	f4f7ea247c	tests: make size comparisons more lenient (#9388 ) The empirically determined threshold doesn't hold for PG 17. Bump the limit to stabilise ci.	2024-10-14 16:50:12 +01:00
Vlad Lazar	ec94acdf03	Merge pull request #9372 from neondatabase/rc/2024-10-14 Storage & Compute release 2024-10-14	2024-10-14 14:25:09 +01:00
Arpad Müller	d92ff578c4	Add test for fixed storage broker issue (#9311 ) Adds a test for the (now fixed) storage broker limit issue, see #9268 for the description and #9299 for the fix. Also fix a race condition with endpoint creation/starts running in parallel, leading to file not found errors.	2024-10-14 14:34:57 +02:00
Alexander Bayandin	31b7703fa8	CI(build-build-tools): fix unexpected cancellations (#9357 ) ## Problem When `Dockerfile.build-tools` gets changed, several PRs catch up with it and some might get unexpectedly cancelled workflows because of GitHub's concurrency model for workflows. See the comment in the code for more details. It should be possible to revert it after https://github.com/orgs/community/discussions/41518 (I don't expect it anytime soon, but I subscribed) ## Summary of changes - Do not queue `build-build-tools-image` workflows in the concurrency group	2024-10-14 11:51:01 +01:00
Konstantin Knizhnik	d056ae9be5	Ignore pg_dynshmem fiel when comparing directories (#9374 ) ## Problem At MacOS `pg_dynshmem` file is create in PGDATADIR which cause mismatch in directories comparison ## Summary of changes Add this files to the ignore list. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-14 13:45:20 +03:00
Conrad Ludgate	cb9ab7463c	proxy: split out the console-redirect backend flow (#9270 ) removes the ConsoleRedirect backend from the main auth::Backends enum, copy-paste the existing crate::proxy::task_main structure to use the ConsoleRedirectBackend exclusively. This makes the logic a bit simpler at the cost of some fairly trivial code duplication.	2024-10-14 12:25:55 +02:00
Conrad Ludgate	ab5bbb445b	proxy: refactor auth backends (#9271 ) preliminary for #9270 The auth::Backend didn't need to be in the mega ProxyConfig object, so I split it off and passed it manually in the few places it was necessary. I've also refined some of the uses of config I saw while doing this small refactor. I've also followed the trend and make the console redirect backend it's own struct, same as LocalBackend and ControlPlaneBackend.	2024-10-11 20:14:52 +01:00
Alexander Bayandin	5ef805e12c	CI(run-python-test-set): allow to skip missing compatibility snapshot (#9365 ) ## Problem Action `run-python-test-set` fails if it is not used for `regress_tests` on release PR, because it expects `test_compatibility.py::test_create_snapshot` to generate a snapshot, and the test exists only in `regress_tests` suite. For example, in https://github.com/neondatabase/neon/pull/9291 [`test-postgres-client-libs`](https://github.com/neondatabase/neon/actions/runs/11209615321/job/31155111544) job failed. ## Summary of changes - Add `skip-if-does-not-exist` input to `.github/actions/upload` action (the same way we do for `.github/actions/download`) - Set `skip-if-does-not-exist=true` for "Upload compatibility snapshot" step in `run-python-test-set` action	2024-10-11 16:58:41 +01:00
a-masterov	091a175a3e	Test versions mismatch (#9167 ) ## Problem We faced the problem of incompatibility of the different components of different versions. This should be detected automatically to prevent production bugs. ## Summary of changes The test for this situation was implemented Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-10-11 15:29:54 +02:00
Fedor Dikarev	326cd80f0d	ci: gh-workflow-stats-action v0.1.4: remove debug output and proper pagination (#9356 ) ## Problem In previous version pagination didn't work so we collect information only for first 30 jobs in WorkflowRun	2024-10-11 14:46:45 +02:00
Folke Behrens	6baf1aae33	proxy: Demote some errors to warnings in logs (#9354 )	2024-10-11 11:29:08 +02:00
John Spray	184935619e	tests: stabilize test_storage_controller_heartbeats (#9347 ) ## Problem This could fail with `reconciliation in progress` if running on a slow test node such that background reconciliation happens at the same time as we call consistency_check. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/11258171952/index.html#/testresult/54889c9469afb232 ## Summary of changes - Call reconcile_until_idle before calling consistency check once, rather than calling consistency check until it passes	2024-10-11 09:41:08 +01:00
Ivan Efremov	b2ecbf3e80	Introduce "quota" ErrorKind (#9300 ) ## Problem Fixes #8340 ## Summary of changes Introduced ErrorKind::quota to handle quota-related errors ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-10-11 10:45:55 +03:00
Tristan Partin	53147b51f9	Use valid type hints for Python 3.9 I have no idea how this made it past the linters. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 13:00:25 -05:00
Tristan Partin	006d9dfb6b	Add compute_config_dir fixture Allows easy access to various compute config files. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 12:43:40 -05:00
Tristan Partin	1f7904c917	Enable cargo caching in check-codestyle-rust This job takes an extraordinary amount of time for what I understand it to do. The obvious win is caching dependencies. Rory disabled caching in `cd5732d9d8`. I assume this was to get gen3 runners up and running. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 12:40:30 -05:00
John Spray	07c714343f	tests: allow a log warning in test_cli_start_stop_multi (#9320 ) ## Problem This test restarts services in an undefined order (whatever neon_local does), which means we should be tolerant of warnings that come from restarting the storage controller while a pageserver is running. We can see failures with warnings from dropped requests, e.g. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9307/11229000712/index.html#/testresult/d33d5cb206331e28 ``` WARN request{method=GET path=/v1/location_config request_id=b7dbda15-6efb-4610-8b19-a3772b65455f}: request was dropped before completing\n') ``` ## Summary of changes - allow-list the `request was dropped before completing` message on pageservers before restarting services	2024-10-10 17:06:42 +01:00
Tristan Partin	264c34dfb7	Move path-related fixtures into their own module (#9304 ) neon_fixtures.py has grown into quite a beast. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-10 10:26:23 -05:00
Erik Grinaker	9dd80b9b4c	storage_scrubber: fix faulty assertion when no timelines (#9345 ) When there are no timelines in remote storage, the storage scrubber would incorrectly trip an assertion with "Must be set if results are present", referring to the last processed tenant ID. When there are no timelines we don't expect there to be a tenant ID either. The assertion was introduced in `37aa6fd`. Only apply the assertion when any timelines are present.	2024-10-10 09:09:53 -04:00
Erik Grinaker	c2623ffef4	CODEOWNERS: assign `storage_scrubber` to storage (#9346 )	2024-10-10 12:40:35 +01:00
John Spray	426b1c5f08	storage controller: use 'infra' JWT scope for node registration (#9343 ) ## Problem Storage controller `/control` API mostly requires admin tokens, for interactive use by engineers. But for endpoints used by scripts, we should not require admin tokens. Discussion at https://neondb.slack.com/archives/C033RQ5SPDH/p1728550081788989?thread_ts=1728548232.265019&cid=C033RQ5SPDH ## Summary of changes - Introduce the 'infra' JWT scope, which was not previously used in the neon repo - For pageserver & safekeeper node registrations, require infra scope instead of admin Note that admin will still work, as the controller auth checks permit admin tokens for all endpoints irrespective of what scope they require.	2024-10-10 12:26:43 +01:00
Conrad Ludgate	306094a87d	add local-proxy suffix to wake-compute requests, respect the returned port (#9298 ) https://github.com/neondatabase/cloud/issues/18349 Use the `-local-proxy` suffix to make sure we get the 10432 local_proxy port back from cplane.	2024-10-09 22:43:35 +01:00
Tristan Partin	d3464584a6	Improve some typing in test_runner Fixes some types, adds some types, and adds some override annotations. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-09 15:42:22 -05:00
Tristan Partin	878135fe9c	Move PgBenchInitResult.EXTRACTORS to a private module constant This seems to paper over a behavioral difference in Python 3.9 and Python 3.12 with how dataclasses work with mutable variables. On Python 3.12, I get the following error: ValueError: mutable default <class 'dict'> for field EXTRACTORS is not allowed: use default_factory This obviously doesn't occur in our testing environment. When I do what the error tells me, EXTRACTORS doesn't seem to exist as an attribute on the class in at least Python 3.9. The solution provided in this commit seems like the least amount of friction to keep the wheels turning. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-09 14:02:09 -05:00
Conrad Ludgate	75434060a5	local_proxy: integrate with pg_session_jwt extension (#9086 )	2024-10-09 18:24:10 +01:00
Anastasia Lubennikova	721803a0e7	Add partial support of extensions for v17: (#9322 ) - PostGIS 3.5.0 - pgrouting 3.6.2 - h3 4.1.3 - unit 7.9 - pgjwt version (f3d82fd) - pg_hashids 1.2.1 - ip4r 2.4.2 - prefix 1.2.10 - postgresql-hll 2.18 - pg_roaringbitmap 0.5.4 - pg-semver 0.40.0 update support of extensions for v14-v16: - unit 7.7 -> 7.9 - pgjwt 9742dab -> f3d82fd --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-09 17:07:59 +01:00
Fedor Dikarev	108a211917	added workflow Report Workflow Stats (#9330 ) ## Summary of changes CI: Collect stats for Github Workflows Runs	2024-10-09 17:27:41 +02:00
Heikki Linnakangas	72ef0e0fa1	tests: Remove redundant log lines when stopping storage nodes (#9317 ) The neon_cli functions print the command that gets executed, which contains the same information. Before: 2024-10-07 22:32:28.884 INFO [neon_fixtures.py:3927] Stopping safekeeper 1 2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1" 2024-10-07 22:32:28.989 INFO [neon_fixtures.py:3927] Stopping safekeeper 2 2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2" 2024-10-07 22:32:29.93 INFO [neon_fixtures.py:3927] Stopping safekeeper 3 2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3" 2024-10-07 22:32:29.251 INFO [neon_cli.py:450] Stopping pageserver with ['pageserver', 'stop', '--id=1'] 2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1" After: 2024-10-07 22:32:28.884 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 1" 2024-10-07 22:32:28.989 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 2" 2024-10-07 22:32:29.94 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local safekeeper stop 3" 2024-10-07 22:32:29.251 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local pageserver stop --id=1"	2024-10-09 15:51:34 +03:00
Heikki Linnakangas	eb23d355a9	tests: Use ThreadedMotoServer python class to launch mock S3 server (#9313 ) This is simpler than using subprocess. One difference is in how moto's log output is now collected. Previously, moto's logs went to stderr, and were collected and printed at the end of the test by pytest, like this: 2024-10-07T22:45:12.3705222Z ----------------------------- Captured stderr call ----------------------------- 2024-10-07T22:45:12.3705577Z 127.0.0.1 - - [07/Oct/2024 22:35:14] "PUT /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7 HTTP/1.1" 200 - 2024-10-07T22:45:12.3706181Z 127.0.0.1 - - [07/Oct/2024 22:35:15] "GET /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7/?list-type=2&delimiter=/&prefix=/tenants/43da25eac0f41412696dd31b94dbb83c/timelines/ HTTP/1.1" 200 - 2024-10-07T22:45:12.3706894Z 127.0.0.1 - - [07/Oct/2024 22:35:16] "PUT /pageserver-test-deletion-queue-2e6efa8245ec92a37a07004569c29eb7//tenants/43da25eac0f41412696dd31b94dbb83c/timelines/eabba5f0c1c72c8656d3ef1d85b98c1d/initdb.tar.zst?x-id=PutObject HTTP/1.1" 200 - Note the timestamps: the timestamp at the beginning of the line is the time that the stderr was dumped, i.e. the end of the test, which makes those timestamps rather useless. The timestamp in the middle of the line is when the operation actually happened, but it has only 1 s granularity. With this change, moto's log lines are printed in the "live log call" section, as they happen, which makes the timestamps more useful: 2024-10-08 12:12:31.129 INFO [_internal.py:97] 127.0.0.1 - - [08/Oct/2024 12:12:31] "GET /pageserver-test-deletion-queue-e24e7525d437e1874d8a52030dcabb4f/?list-type=2&delimiter=/&prefix=/tenants/7b6a16b1460eda5204083fba78bc360f/timelines/ HTTP/1.1" 200 - 2024-10-08 12:12:32.612 INFO [_internal.py:97] 127.0.0.1 - - [08/Oct/2024 12:12:32] "PUT /pageserver-test-deletion-queue-e24e7525d437e1874d8a52030dcabb4f//tenants/7b6a16b1460eda5204083fba78bc360f/timelines/7ab4c2b67fa8c712cada207675139877/initdb.tar.zst?x-id=PutObject HTTP/1.1" 200 -	2024-10-09 15:34:51 +03:00
Yuchen Liang	bee04b8a69	pageserver: add direct io config to virtual file (#9214 ) ## Problem We need a way to incrementally switch to direct IO. During the rollout we might want to switch to O_DIRECT on image and delta layer read path first before others. ## Summary of changes - Revisited and simplified direct io config in `PageserverConf`. - We could add a fallback mode for open, but for read there isn't a reasonable alternative (without creating another buffered virtual file). - Added a wrapper around `VirtualFile`, current implementation become `VirtualFileInner` - Use `open_v2`, `create_v2`, `open_with_options_v2` when we want to use the IO mode specified in PS config. - Once we onboard all IO through VirtualFile using this new API, we will delete the old code path. - Make io mode live configurable for benchmarking. - Only guaranteed for files opened after the config change, so do it before the experiment. As an example, we are using `open_v2` with `virtual_file::IoMode::Direct` in https://github.com/neondatabase/neon/pull/9169 We also remove `io_buffer_alignment` config in `a04cfd754b` and use it as a compile time constant. This way we don't have to carry the alignment around or make frequent call to retrieve this information from the static variable. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-09 08:33:07 -04:00
Anastasia Lubennikova	63e7fab990	Add /installed_extensions endpoint to collect statistics about extension usage. (#8917 ) Add /installed_extensions endpoint to collect statistics about extension usage. It returns a list of installed extensions in the format: ```json { "extensions": [ { "extname": "extension_name", "versions": ["1.0", "1.1"], "n_databases": 5, } ] } ``` --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-09 13:32:13 +01:00
Arseny Sher	a181392738	safekeeper: add evicted_timelines gauge. (#9318 ) showing total number of evicted timelines.	2024-10-09 14:40:30 +03:00
Alexander Bayandin	fc7397122c	test_runner: fix path to tpc-h queries (#9327 ) ## Problem The path to TPC-H queries was incorrectly changed in #9306. This path is used for `test_tpch` parameterization, so all perf tests started to fail: ``` ==================================== ERRORS ==================================== __________ ERROR collecting test_runner/performance/test_perf_olap.py __________ test_runner/performance/test_perf_olap.py:205: in <module> @pytest.mark.parametrize("query", tpch_queuies()) test_runner/performance/test_perf_olap.py:196: in tpch_queuies assert queries_dir.exists(), f"TPC-H queries dir not found: {queries_dir}" E AssertionError: TPC-H queries dir not found: /__w/neon/neon/test_runner/performance/performance/tpc-h/queries E assert False E + where False = <bound method Path.exists of PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries')>() E + where <bound method Path.exists of PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries')> = PosixPath('/__w/neon/neon/test_runner/performance/performance/tpc-h/queries').exists ``` ## Summary of changes - Fix the path to tpc-h queries	2024-10-09 12:11:06 +01:00
Vlad Lazar	cc599e23c1	storcon: make observed state updates more granular (#9276 ) ## Problem Previously, observed state updates from the reconciler may have clobbered inline changes made to the observed state by other code paths. ## Summary of changes Model observed state changes from reconcilers as deltas. This means that we only update what has changed. Handling for node going off-line concurrently during the reconcile is also added: set observed state to None in such cases to respect the convention. Closes https://github.com/neondatabase/neon/issues/9124	2024-10-09 11:53:29 +01:00
Folke Behrens	54d1185789	proxy: Unalias hyper1 and replace one use of hyper0 in test (#9324 ) Leaves one final use of hyper0 in proxy for the health service, which requires some coordinated effort with other services.	2024-10-09 12:44:17 +02:00
Heikki Linnakangas	8a138db8b7	tests: Reduce noise from logging renamed files (#9315 ) Instead of printing the full absolute path for every file, print just the filenames. Before: 2024-10-08 13:19:39.98 INFO [test_pageserver_generations.py:669] Found file /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 2024-10-08 13:19:39.99 INFO [test_pageserver_generations.py:673] Renamed /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 -> /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/0c04a8df7691a367ad0bb1cc1373ba4d/timelines/f41022551e5f96ce8dbefb9b5d35ab45/000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0 After: 2024-10-08 13:24:39.726 INFO [test_pageserver_generations.py:667] Renaming files in /home/heikki/git-sandbox/neon/test_output/test_upgrade_generationless_local_file_paths[debug-pg16]/repo/pageserver_1/tenants/3439538816c520adecc541cc8b1de21c/timelines/6a7be8ee707b355de48dd91b326d6ae1 2024-10-08 13:24:39.728 INFO [test_pageserver_generations.py:673] Renamed 000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0-v1-00000001 -> 000000067F0000000100000A8D0100000000-000000067F0000000100000AC10000000002__00000000014F16F0	2024-10-09 10:55:56 +01:00
Erik Grinaker	211970f0e0	remote_storage: add `DownloadOpts::byte_(start\|end)` (#9293 ) `download_byte_range()` is basically a copy of `download()` with an additional option passed to the backend SDKs. This can cause these code paths to diverge, and prevents combining various options. This patch adds `DownloadOpts::byte_(start\|end)` and move byte range handling into `download()`.	2024-10-09 10:29:06 +01:00
Heikki Linnakangas	f87f5a383e	tests: Remove redundant log lines when starting an endpoint (#9316 ) The "Starting postgres endpoint <name>" message is not needed, because the neon_cli.py prints the neon_local command line used to start the endpoint. That contains the same information. The "Postgres startup took XX seconds" message is not very useful because no one pays attention to those in the python test logs when things are going smoothly, and if you do wonder about the startup speed, the same information and more can be found in the compute log. Before: 2024-10-07 22:32:27.794 INFO [neon_fixtures.py:3492] Starting postgres endpoint ep-1 2024-10-07 22:32:27.794 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local endpoint start --safekeepers 1 ep-1" 2024-10-07 22:32:27.901 INFO [neon_fixtures.py:3690] Postgres startup took 0.11398935317993164 seconds After: 2024-10-07 22:32:27.794 INFO [neon_cli.py:73] Running command "/tmp/neon/bin/neon_local endpoint start --safekeepers 1 ep-1"	2024-10-09 09:58:50 +01:00
Arpad Müller	e8ae37652b	Add timeline offload mechanism (#8907 ) Implements an initial mechanism for offloading of archived timelines. Offloading is implemented as specified in the RFC. For now, there is no persistence, so a restart of the pageserver will retrigger downloads until the timeline is offloaded again. We trigger offloading in the compaction loop because we need the signal for whether compaction is done and everything has been uploaded or not. Part of #8088	2024-10-09 01:33:39 +02:00
Tristan Partin	5bd8e2363a	Enable all pyupgrade checks in ruff This will help to keep us from using deprecated Python features going forward. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-08 14:32:26 -05:00
Vlad Lazar	618680c299	storcon: apply all node status changes before handling transitions (#9281 ) ## Problem When a node goes offline, we trigger reconciles to migrate shards away from it. If multiple nodes go offline at the same time, we handled them in sequence. Hence, we might migrate shards from the first offline node to the second offline node and increase the unavailability period. ## Summary of changes Refactor heartbeat delta handling to: 1. Update in memory state for all nodes first 2. Handle availability transitions one by one (we have full picture for each node after (1)) Closes https://github.com/neondatabase/neon/issues/9126	2024-10-08 17:55:25 +01:00
Alexander Bayandin	baf27ba6a3	Fix compiler warnings on macOS (#9319 ) ## Problem On macOS: ``` /Users/runner/work/neon/neon//pgxn/neon/file_cache.c:623:19: error: variable 'has_remaining_pages' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized] ``` ## Summary of changes - Initialise `has_remaining_pages` with `false`	2024-10-08 17:34:35 +01:00
Tristan Partin	16417d919d	Remove get_self_dir() It didn't serve much value, and was only used twice. Path(__file__).parent is a pretty easy invocation to use. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-08 08:57:11 -05:00
Heikki Linnakangas	18b97150b2	Remove non-existent entries from .dockerignore (#9209 )	2024-10-08 14:55:24 +03:00
Heikki Linnakangas	17c59ed786	Don't override CFLAGS when building neon extension If you override CFLAGS, you also override any flags that PostgreSQL configure script had picked. That includes many options that enable extra compiler warnings, like '-Wall', '-Wmissing-prototypes', and so forth. The override was added in commit `171385ac14`, but the intention of that was to be more strict, by enabling '-Werror', not less strict. The proper way of setting '-Werror', as documented in the docs and mentioned in PR #2405, is to set COPT='-Werror', but leave CFLAGS alone. All the compiler warnings with the standard PostgreSQL flags have now been fixed, so we can do this without adding noise. Part of the cleanup issue #9217.	2024-10-07 23:49:33 +03:00
Heikki Linnakangas	d7b960c9b5	Silence compiler warning about using variable uninitialized It's not a bug, the variable is initialized when it's used, but the compiler isn't smart enough to see that through all the conditions. Part of the cleanup issue #9217.	2024-10-07 23:49:31 +03:00
Heikki Linnakangas	2ff6d2b6b5	Silence compiler warning about variable only used in assertions Part of the cleanup issue #9217.	2024-10-07 23:49:29 +03:00
Heikki Linnakangas	30f7fbc88d	Add pg_attribute_printf to WalProposerLibLog, per gcc's suggestion /pgxn/neon/walproposer_compat.c:192:9: warning: function ‘WalProposerLibLog’ might be a candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] 192 \| vsnprintf(buf, sizeof(buf), fmt, args); \| ^~~~~~~~~	2024-10-07 23:49:27 +03:00
Heikki Linnakangas	09f2000f91	Silence warnings about shadowed local variables Part of the cleanup issue #9217.	2024-10-07 23:49:24 +03:00
Heikki Linnakangas	e553ca9e4f	Silence warnings about mixed declarations and code The warning: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] It's PostgreSQL project style to stick to the old C90 style. (Alternatively, we could disable it for our extension.) Part of the cleanup issue #9217.	2024-10-07 23:49:22 +03:00
Heikki Linnakangas	0a80dbce83	neon_write() function is not used on v17 ifdef it out on v17, to silence compiler warning. Part of the cleanup issue #9217.	2024-10-07 23:49:20 +03:00
Heikki Linnakangas	e763256448	Fix warnings about missing function prototypes Prototypes for neon_writev(), neon_readv(), and neon_regisersync() were missing. But instead of adding the missing prototypes, mark all the smgr functions 'static'. Part of the cleanup issue #9217.	2024-10-07 23:49:18 +03:00
Heikki Linnakangas	129d4480bb	Move "/* fallthrough */" comments so that GCC recognizes them This silences warnings about implicit fallthroughs. Part of the cleanup issue #9217.	2024-10-07 23:49:16 +03:00
Heikki Linnakangas	776df963ba	Fix function prototypes Silences these compiler warnings: /pgxn/neon_walredo/walredoproc.c:452:1: warning: ‘CreateFakeSharedMemoryAndSemaphores’ was used with no prototype before its definition [-Wmissing-prototypes] 452 \| CreateFakeSharedMemoryAndSemaphores() \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /pgxn/neon/walproposer_pg.c:541:1: warning: no previous prototype for ‘GetWalpropShmemState’ [-Wmissing-prototypes] 541 \| GetWalpropShmemState() \| ^~~~~~~~~~~~~~~~~~~~ Part of the cleanup issue #9217.	2024-10-07 23:49:13 +03:00
Heikki Linnakangas	11dc5feb36	Remove unused static function In v16 merge, we copied much of heap RMGR, to distinguish vanilla Postgres heap records from records generated with neon patches, with the additional CID fields. This function is only used by the HEAP_TRUNCATE records, however, which we didn't need to copy. Part of the cleanup issue #9217.	2024-10-07 23:49:11 +03:00
Heikki Linnakangas	dbbe57a837	Remove unused local vars and a prototype for non-existent function Per compiler warnings. Part of the cleanup issue #9217.	2024-10-07 23:49:09 +03:00
Em Sharnoff	cc29def544	vm-monitor: Ignore LFC in postgres cgroup memory threshold (#8668 ) In short: Currently we reserve 75% of memory to the LFC, meaning that if we scale up to keep postgres using less than 25% of the compute's memory. This means that for certain memory-heavy workloads, we end up scaling much higher than is actually needed — in the worst case, up to 4x, although in practice it tends not to be quite so bad. Part of neondatabase/autoscaling#1030.	2024-10-07 21:25:34 +01:00
Arpad Müller	912d47ec02	storage_broker: update hyper and tonic again (#9299 ) Update hyper and tonic again in the storage broker, this time with a fix for the issue that made us revert the update last time. The first commit is a revert of #9268, the second a fix for the issue. fixes #9231.	2024-10-07 21:12:13 +02:00
Tristan Partin	6eba29c732	Improve logging on changes in a compute's status I'm trying to debug a situation with the LR benchmark publisher not being in the correct state. This should aid in debugging, while just being generally useful. PR: https://github.com/neondatabase/neon/pull/9265 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-07 13:19:48 -04:00
Heikki Linnakangas	99d4c1877b	Replace BUFFERTAGS_EQUAL compatibility macro with new-style function (#9294 ) In PostgreSQL v16, BUFFERTAGS_EQUAL was replaced with a static inline macro, BufferTagsEqual. Let's use the new name going forward, and have backwards-compatibility glue to allow using the new name on v14 and v15, rather than the other way round. This also makes BufferTagsEquals consistent with InitBufferTag, for which we were already using the new name.	2024-10-07 19:49:27 +03:00
Jere Vaara	2272dc8a48	feat(compute_tools): Create JWKS Postgres roles without attributes (#9031 ) Requires https://github.com/neondatabase/neon/pull/9086 first to have `local_proxy_config`. This logic can still be reviewed implementation wise. Create JWT Auth functionality related roles without attributes and `neon_superuser` group. Read the JWT related roles from `local_proxy_config` `JWKS` settings and handle them differently than other console created roles.	2024-10-07 19:37:32 +03:00
Arseny Sher	2613769ca7	Merge pull request #9291 from neondatabase/rc/2024-10-07 Storage & Compute release 2024-10-07	2024-10-07 18:20:22 +03:00
Heikki Linnakangas	323bd018cd	Make sure BufferTag padding bytes are cleared in hash keys (#9292 ) The prefetch-queue hash table uses a BufferTag struct as the hash key, and it's hashed using hash_bytes(). It's important that all the padding bytes in the key are cleared, because hash_bytes() will include them. I was getting compiler warnings like this on v14 and v15, when compiling with -Warray-bounds: In function ‘prfh_lookup_hash_internal’, inlined from ‘prfh_lookup’ at pg_install/v14/include/postgresql/server/lib/simplehash.h:821:9, inlined from ‘neon_read_at_lsnv’ at pgxn/neon/pagestore_smgr.c:2789:11, inlined from ‘neon_read_at_lsn’ at pgxn/neon/pagestore_smgr.c:2904:2: pg_install/v14/include/postgresql/server/storage/relfilenode.h:90:43: warning: array subscript ‘PrefetchRequest[0]’ is partly outside array bounds of ‘BufferTag[1]’ {aka ‘struct buftag[1]’} [-Warray-bounds] 89 \| ((node1).relNode == (node2).relNode && \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 90 \| (node1).dbNode == (node2).dbNode && \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ 91 \| (node1).spcNode == (node2).spcNode) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pg_install/v14/include/postgresql/server/storage/buf_internals.h:116:9: note: in expansion of macro ‘RelFileNodeEquals’ 116 \| RelFileNodeEquals((a).rnode, (b).rnode) && \ \| ^~~~~~~~~~~~~~~~~ pgxn/neon/neon_pgversioncompat.h:25:31: note: in expansion of macro ‘BUFFERTAGS_EQUAL’ 25 \| #define BufferTagsEqual(a, b) BUFFERTAGS_EQUAL((a), (b)) \| ^~~~~~~~~~~~~~~~ pgxn/neon/pagestore_smgr.c:220:34: note: in expansion of macro ‘BufferTagsEqual’ 220 \| #define SH_EQUAL(tb, a, b) (BufferTagsEqual(&(a)->buftag, &(b)->buftag)) \| ^~~~~~~~~~~~~~~ pg_install/v14/include/postgresql/server/lib/simplehash.h:280:77: note: in expansion of macro ‘SH_EQUAL’ 280 \| #define SH_COMPARE_KEYS(tb, ahash, akey, b) (ahash == SH_GET_HASH(tb, b) && SH_EQUAL(tb, b->SH_KEY, akey)) \| ^~~~~~~~ pg_install/v14/include/postgresql/server/lib/simplehash.h:799:21: note: in expansion of macro ‘SH_COMPARE_KEYS’ 799 \| if (SH_COMPARE_KEYS(tb, hash, key, entry)) \| ^~~~~~~~~~~~~~~ pgxn/neon/pagestore_smgr.c: In function ‘neon_read_at_lsn’: pgxn/neon/pagestore_smgr.c:2742:25: note: object ‘buftag’ of size 20 2742 \| BufferTag buftag = {0}; \| ^~~~~~ This commit silences those warnings, although it's not clear to me why the compiler complained like that in the first place. I found the issue with padding bytes while looking into those warnings, but that was coincidental, I don't think the padding bytes explain the warnings as such. In v16, the BUFFERTAGS_EQUAL macro was replaced with a static inline function, and that also silences the compiler warning. Not clear to me why.	2024-10-07 18:04:04 +03:00
Folke Behrens	ad267d849f	proxy: Move module base files into module directory (#9297 )	2024-10-07 16:25:34 +02:00
Conrad Ludgate	8cd7b5bf54	proxy: rename console -> control_plane, rename web -> console_redirect (#9266 ) rename console -> control_plane rename web -> console_redirect I think these names are a little more representative.	2024-10-07 14:09:54 +01:00
Konstantin Knizhnik	47c3c9a413	Fix update of statistic for LFC/prefetch (#9272 ) ## Problem See #9199 ## Summary of changes Fix update of hits/misses for LFC and prefetch introduced in `78938d1b59` ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-07 12:21:16 +03:00
Arseny Sher	eae4470bb6	safekeeper: remove local WAL files ignoring peer_horizon_lsn. (#8900 ) If peer safekeeper needs garbage collected segment it will be fetched now from s3 using on-demand WAL download. Reduces danger of running out of disk space when safekeeper fails.	2024-10-04 19:07:39 +03:00
Ivan Efremov	2d248aea6f	proxy: exclude triple logging of connect compute errors (#9277 ) Fixes (#9020) - Use the compute::COULD_NOT_CONNECT for connection error message; - Eliminate logging for one connection attempt; - Typo fix.	2024-10-04 18:21:39 +03:00
Conrad Ludgate	6c05f89f7d	proxy: add local-proxy to compute image (#8823 ) 1. Adds local-proxy to compute image and vm spec 2. Updates local-proxy config processing, writing PID to a file eagerly 3. Updates compute-ctl to understand local proxy compute spec and to send SIGHUP to local-proxy over that pid. closes https://github.com/neondatabase/cloud/issues/16867	2024-10-04 14:52:01 +00:00
Arseny Sher	db53f98725	neon walsender_hooks: take basebackup LSN directly. (#9263 ) NeonWALReader needs to know LSN before which WAL is not available locally, that is, basebackup LSN. Previously it was taken from WalpropShmemState, but that's racy, as walproposer sets its there only after successfull election. Get it directly with GetRedoStartLsn. Should fix flakiness of test_ondemand_wal_download_in_replication_slot_funcs etc. ref #9201	2024-10-04 14:56:15 +01:00
Erik Grinaker	04a6222418	remote_storage: add `head_object` integration test (#9274 )	2024-10-04 12:40:41 +01:00
Vlad Lazar	dcf7af5a16	storcon: do timeline creation on all attached location (#9237 ) ## Problem Creation of a timelines during a reconciliation can lead to unavailability if the user attempts to start a compute before the storage controller has notified cplane of the cut-over. ## Summary of changes Create timelines on all currently attached locations. For the latest location, we still look at the database (this is a previously). With this change we also look into the observed state to find other attached locations. Related https://github.com/neondatabase/neon/issues/9144	2024-10-04 11:56:43 +01:00
Erik Grinaker	37158d0424	pageserver: use conditional GET for secondary tenant heatmaps (#9236 ) ## Problem Secondary tenant heatmaps were always downloaded, even when they hadn't changed. This can be avoided by using a conditional GET request passing the `ETag` of the previous heatmap. ## Summary of changes The `ETag` was already plumbed down into the heatmap downloader, and just needed further plumbing into the remote storage backends. * Add a `DownloadOpts` struct and pass it to `RemoteStorage::download()`. * Add an optional `DownloadOpts::etag` field, which uses a conditional GET and returns `DownloadError::Unmodified` on match.	2024-10-04 12:29:48 +02:00
Erik Grinaker	60fb840e1f	Cargo.toml: enable `sso` for `aws-config` (#9261 ) ## Problem The S3 tests couldn't use SSO authentication for local tests against S3. ## Summary of changes Enable the `sso` feature of `aws-config`. Also run `cargo hakari generate` which made some updates to `workspace_hack`.	2024-10-04 11:27:06 +01:00
Heikki Linnakangas	52232dd85c	tests: Add a comment explaining the rules of NeonLocalCli wrappers (#9195 )	2024-10-03 22:03:29 +03:00
Heikki Linnakangas	8ef0c38b23	tests: Rename NeonLocalCli functions to match the 'neon_local' commands (#9195 ) This makes it more clear that the functions in NeonLocalCli are just typed wrappers around the corresponding 'neon_local' commands.	2024-10-03 22:03:27 +03:00
Heikki Linnakangas	56bb1ac458	tests: Move NeonCli and friends to separate file (#9195 ) In the passing, rename it to NeonLocalCli, to reflect that the binary is called 'neon_local'. Add wrapper for the 'timeline_import' command, eliminating the last raw call to the raw_cli() function from tests, except for a few in test_neon_cli.py which are about testing the 'neon_local' iteself. All the other calls are now made through the strongly-typed wrapper functions	2024-10-03 22:03:25 +03:00
Heikki Linnakangas	19db9e9aad	tests: Replace direct calls to neon_cli with wrappers in NeonEnv (#9195 ) Add wrappers for a few commands that didn't have them before. Move the logic to generate tenant and timeline IDs from NeonCli to the callers, so that NeonCli is more purely just a type-safe wrapper around 'neon_local'.	2024-10-03 22:03:22 +03:00
David Gomes	4e9b32c442	chore: makes some onboarding document improvements (#9216 ) * I had to install `m4` in order to be able to run locally * The docs/docker.md was missing a pointer to where the compute node code is (Was originally on #8888 but I am pulling this out)	2024-10-03 20:58:30 +02:00
David Gomes	2fac0b7fac	chore: remove unnecessary comments in compute/Dockerfile.compute-node (#9253 ) See [this comment](https://github.com/neondatabase/neon/pull/8888#discussion_r1783130082).	2024-10-03 18:26:41 +00:00
Arpad Müller	e3d6ecaeee	Revert hyper and tonic updates (#9268 )	2024-10-03 19:21:22 +01:00
Arseny Sher	d785fcb5ff	safekeeper: fix panic in debug_dump. (#9097 ) Panic was triggered only when dump selected no timelines. sentry report: https://neondatabase.sentry.io/issues/5832368589/	2024-10-03 19:22:22 +03:00
Vlad Lazar	552fa2b972	pageserver: tweak oversized key read path warning (#9221 ) ## Problem `Oversized vectored read [...]` logs are spewing in prod because we have a few keys that are unexpectedly large: * reldir/relblock - these are unbounded, so it's known technical debt * slru block - they can be a bit bigger than 128KiB due to storage format overhead ## Summary of changes * Bump threshold to 130KiB * Don't warn on oversized reldir and dbdir keys Closes https://github.com/neondatabase/neon/issues/8967	2024-10-03 16:40:35 +01:00
Arpad Müller	9d93dd4807	Rename hyper 1.0 to hyper and hyper 0.14 to hyper0 (#9254 ) Follow-up of #9234 to give hyper 1.0 the version-free name, and the legacy version of hyper the one with the version number inside. As we move away from hyper 0.14, we can remove the `hyper0` name piece by piece. Part of #9255	2024-10-03 16:33:43 +02:00
Heikki Linnakangas	53b6e1a01c	vm-monitor: Upgrade axum from 0.6 to 0.7 (#9257 ) Because: - it's nice to be up-to-date, - we already had axum 0.7 in our dependency tree, so this avoids having to compile two versions, and - removes one of the remaining dpendencies to hyper version 0 Also bumps the 'tokio-tungstenite' dependency, to avoid having two versions in the dependency tree.	2024-10-03 16:49:39 +03:00
Anastasia Lubennikova	a33e1d12fb	Merge pull request #9249 from neondatabase/releases/2024-10-02-compute-only Compute release 2024-10-02 (2)	2024-10-03 10:15:52 +01:00
Joonas Koivunen	dbef1b064c	chore: smaller layer changes (#9247 ) Address minor technical debt in Layer inspired by #9224: - layer usage as arg same as in spans - avoid one Weak::upgrade	2024-10-03 09:38:45 +01:00
Heikki Linnakangas	6a9e2d657c	Remove unnecessary dependencies from postgis-build image (#9211 ) The apt install stage before this commit: 0 upgraded, 391 newly installed, 0 to remove and 9 not upgraded. Need to get 261 MB of archives. after: 0 upgraded, 367 newly installed, 0 to remove and 9 not upgraded. Need to get 220 MB of archives.	2024-10-03 10:05:23 +03:00
Arpad Müller	2d8f6d7906	Suppress wal lag timeout warnings right after tenant attachment (#9232 ) As seen in https://github.com/neondatabase/cloud/issues/17335, during releases we can have ingest lags that are above the limits for warnings. However, such lags are part of normal pageserver startup. Therefore, calculate a certain cooldown timestamp until which we accept lags up to a certain size. The heuristic is chosen to grow the later we get to fully load the tenant, and we also add 60 seconds as a grace period after that term.	2024-10-03 02:33:09 +01:00
Arpad Müller	1b176fe74a	Use hyper 1.0 and tonic 0.12 in storage broker (#9234 ) Fixes #9231 . Upgrade hyper to 1.4.0 and use hyper 1.4 instead of 0.14 in the storage broker, together with tonic 0.12. The two upgrades go hand in hand. Thanks to the broker being independent from other components, we can upgrade its hyper version without touching the other components, which makes things easier.	2024-10-03 00:48:12 +02:00
Heikki Linnakangas	1dec93f129	Add compute_tools/ to the list of paths that trigger an E2E run on a PR (#9251 ) compute_ctl is an important part of the interfaces between the control plane and the compute, so it seems important to E2E test any changes there.	2024-10-03 00:31:19 +03:00
Alexander Bayandin	16002f5e45	test_runner: bump `requests` and `psycopg2-binary` (#9248 ) ## Problem ``` Warning: The file chosen for install of requests 2.32.0 (requests-2.32.0-py3-none-any.whl) is yanked. Reason for being yanked: Yanked due to conflicts with CVE-2024-35195 mitigation ``` ## Summary of changes - Update `requests` to fix the warning - Update `psycopg2-binary`	2024-10-02 21:26:45 +01:00
dotdister	09d4bad1be	Change parentheses to clarify conditions in walproposer (#9180 ) Some parentheses in conditional expressions are redundant or necessary for clarity conditional expressions in walproposer. ## Summary of changes Change some parentheses to clarify conditions in walproposer. Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-02 14:49:52 -04:00
Heikki Linnakangas	d20448986c	Fix metric name of the 'getpage_wait_seconds_bucket' metric (#9242 ) Per convention, histogram buckets have the '_bucket' suffix. I got that wrong in commit `0d500bbd5b`. Fixes https://github.com/neondatabase/neon/issues/9241	2024-10-02 20:05:14 +03:00
John Spray	d54624153d	tests: sync_after_each_test -> sync_between_tests (#9239 ) ## Problem We are seeing frequent pageserver startup timelines while it calls syncfs(). There is an existing fixture that syncs _after_ tests, but not before the first one. We hypothesize that some failures are happening on the first test in a job. ## Summary of changes - extend the existing sync_after_each_test to be a sync between all tests, including sync'ing before running the first test. That should remove any ambiguity about whether the sync is happening on the correct node. This is an alternative to https://github.com/neondatabase/neon/pull/8957 -- I didn't realize until I saw Alexander's comment on that PR that we have an existing hook that syncs filesystems and can be extended.	2024-10-02 17:44:25 +01:00
Alex Chi Z.	700885471f	fix(test): only test num of L1 layers in compaction smoke test (#9186 ) close https://github.com/neondatabase/neon/issues/9160 For whatever reason, pg17's WAL pattern seems different from others, which triggers some flaky behavior within the compaction smoke test. ## Summary of changes * Run L0 compaction before proceeding with the read benchmark. * So that we can ensure the num of L0 layers is 0 and test the compaction behavior only with L1 layers. We have a threshold for triggering L0 compaction. In some cases, the test case did not produce enough L0 layers to do a L0 compaction, therefore leaving the layer map with 3+ L0 layers above the L1 layers. This increases the average read depth for the timeline. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-02 17:42:35 +01:00
Vlad Lazar	38a8dcab9f	storcon: add metric for long running reconciles (#9207 ) ## Problem We don't have an alert for long running reconciles. Stuck reconciles are problematic as we've seen in a recent incident. ## Summary of changes Add a new metric `storage_controller_reconcile_long_running_total` with labels: `{tenant_id, shard_number, seq}`. The metric is removed after the long running reconcile finishes. These events should be rare, so we won't break the bank on cardinality. Related https://github.com/neondatabase/neon/issues/9150	2024-10-02 17:25:11 +01:00
Vlad Lazar	8dbfda98d4	storcon: ignore deleted timelines on new location catch-up (#9244 ) ## Problem If a timeline was deleted right before waiting for LSNs to catch up before the cut-over, then we would wait forever. ## Summary of changes Fix the issue and add a test for timeline deletions mid migration. Related https://github.com/neondatabase/neon/issues/9144	2024-10-02 17:23:26 +01:00
John Spray	f875e107aa	pageserver: tweak logging of "became visible" for layers (#9224 ) ## Problem Recent change to avoid the "became visible" log messages from certain tasks missed a task: the logical size calculation that happens as a child of synthetic size calculation. Related: https://github.com/neondatabase/neon/issues/9058 ## Summary of changes - Add OnDemandLogicalSize to the list of permitted tasks for reads making a covered layer visible - Tweak the log message to use layer name instead of key: this is more terse, and easier to use when debugging, as one can search for it elsewhere to see when the layer was written/downloaded etc.	2024-10-02 13:21:04 +01:00
Folke Behrens	1e90e792d6	proxy: Add timeout to webauth confirmation wait (#9227 ) ```shell $ cargo run -p proxy --bin proxy -- --auth-backend=web --webauth-confirmation-timeout=5s ``` ``` $ psql -h localhost -p 4432 NOTICE: Welcome to Neon! Authenticate by visiting within 5s: http://localhost:3000/psql_session/e946900c8a9bc6e9 psql: error: connection to server at "localhost" (::1), port 4432 failed: Connection refused Is the server running on that host and accepting TCP/IP connections? connection to server at "localhost" (127.0.0.1), port 4432 failed: ERROR: Disconnected due to inactivity after 5s. ```	2024-10-02 12:10:56 +02:00
Matthias van de Meent	ea32f1d0a3	Expose more granular wait event data to the user (#9163 ) In PG17, there is this newfangled custom wait events system. This commit adds that feature to Neon, so that users can see what their backends may be waiting for when a PostgreSQL backend is playing the waiting game in Neon code.	2024-10-02 11:12:50 +02:00
Heikki Linnakangas	2e3b7862d0	Fix compute metrics collector config (#9235 )	2024-10-02 09:44:00 +01:00
Arpad Müller	387e569259	Update aws SDK crates (#9233 ) This updates the aws SDK crates to their newest released versions.	2024-10-02 08:00:08 +02:00
Alex Chi Z.	31f12f6426	fix: ignore tonic to resolve advisories (#9230 ) check-rust-style fails because tonic version too old, this does not seem to be an easy fix, so ignore it from the deny list. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-01 19:26:54 -04:00
Anastasia Lubennikova	5cabf32dae	Merge pull request #9228 from neondatabase/releases/2024-10-01-compute-only Compute release 2024-10-02	2024-10-01 21:36:14 +01:00
Heikki Linnakangas	8861e8a323	Fix the size of the perf counters shared memory array (#9226 ) MaxBackends doesn't include auxiliary processes. Whenever an aux process made IO operations that updated the counters, they would scribble over shared memory beoynd the end of the array. The relsize cache hash table comes after the array, so the symptom was an error about hash table corruption in the relsize cache hash.	2024-10-01 20:07:51 +01:00
Arseny Sher	62e22dfd85	Backpressure: reset ps display after it is done. (#8980 ) Previously we set the 'backpressure throttling' status, but overwrote current one and never reset it back.	2024-10-01 20:55:05 +03:00
Arseny Sher	17672c88ff	tests: wait walreceiver on sks to be gone on 'immediate' ep restart. (#9099 ) When endpoint is stopped in immediate mode and started again there is a chance of old connection delivering some WAL to safekeepers after second start checked need for sync-safekeepers and thus grabbed basebackup LSN. It makes basebackup unusable, so compute panics. Avoid flakiness by waiting for walreceivers on safekeepers to be gone in such cases. A better way would be to bump term on safekeepers if sync-safekeepers is skipped, but it needs more infrastructure. ref https://github.com/neondatabase/neon/issues/9079	2024-10-01 20:54:00 +03:00
Matthias van de Meent	6efdb1d0f3	Fix small memory accounting bug in libpagestore (#9223 ) Found while searching for other issues in shared memory. The bug should be benign, in that it over-allocates memory for this struct, but doesn't allow for out-of-bounds writes.	2024-10-01 17:37:59 +01:00
Erik Grinaker	325de52e73	pageserver: remove `TenantConfOpt::TryFrom<toml_edit::Item>` (#9219 ) Following #7656, `TenantConfOpt::TryFrom<toml_edit::Item>` appears to be dead code. This patch removes `TenantConfOpt::TryFrom<toml_edit::Item>`. The code does appear to be dead, since the TOML config is deserialized into `TenantConfig` (via `LocationConfig`) and then converted into `TenantConfOpt`. This was verified by adding a panic to `try_from()` and running the pageserver unit tests as well as a local end-to-end cluster (including creating a new tenant and restarting the pageserver). This did not fail, so this is not used on the common happy path at least. No explicit `try_from` or `try_into` calls were found either. Resolves #8918.	2024-10-01 16:35:18 +01:00
Anastasia Lubennikova	ce73db9316	Fix post_apply_config() (#9220 ) Bring back post_apply_config() step that was accidentally removed in `78938d1`	2024-10-01 16:28:58 +01:00
Shinya Kato	b675997f48	safekeeper: Fix a log message of HTTP worker (#9213 ) ## Problem There is a wrong log message. ## Summary of changes Fixed the log message.	2024-10-01 17:16:53 +02:00
Alex Chi Z.	49f99eb729	docs: add aux file v2 RFC (#9115 ) aux v2 migration is near the end and I rewrote the RFC based on what I proposed (several months before...) and what I actually implemented. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-01 15:56:54 +01:00
Heikki Linnakangas	0d500bbd5b	Add new compute metrics to sql exporter (#9190 ) These are the perf counters added in commit `263dfba6ee`. Note: This relies on 'neon' extension version 1.5. The default was bumped to 1.5 in commit `d696c41807`. --------- Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2024-10-01 17:38:19 +03:00
Heikki Linnakangas	1b8b50755c	Use debian packages for cmake again (#9212 ) On bookworm, 'cmake' is new enough that we can just use it. On bullseye, we can get a new-enough package from backports. By including 'cmake' in the build-deps stage, we don't need to install it separately in all the later build stages that need it. See https://github.com/neondatabase/neon/pull/2699, where we switched to downloading and building a specific version.	2024-10-01 15:09:09 +03:00
Conrad Ludgate	4391b25d01	proxy: ignore typ and use jwt.alg rather than jwk.alg (#9215 ) Microsoft exposes JWKs without the alg header. It's only included on the tokens. Not a problem. Also noticed that wrt the `typ` header: > It will typically not be used by applications when it is already known that the object is a JWT. This parameter is ignored by JWT implementations; any processing of this parameter is performed by the JWT application. Since we know we are expecting JWTs only, I've followed the guidance and removed the validation.	2024-10-01 10:36:49 +01:00
John Spray	40b10b878a	storage_scrubber: retry on index deletion failures (#9204 ) ## Problem In automated tests running on AWS S3, we frequently see scrubber failures when it can't delete an index. `location_conf_churn`: https://neon-github-public-dev.s3.amazonaws.com/reports/main/11076221056/index.html#/testresult/f89b1916b6a693e2 `scrubber_physical_gc`: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9178/11074269153/index.html#/testresult/9885ed5aa0fe38b6 ## Summary of changes Wrap index deletion in a backoff::retry --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-10-01 10:34:39 +01:00
David Gomes	d6c6b0a509	feat(compute): adds pg_session_jwt extension to compute image (#8888 ) ## Problem We need the [pg_session_jwt](https://github.com/neondatabase/pg_session_jwt/) extension in the compute image. This PR adds it. ## Summary of changes I added the `pg_session_jwt` extension in a very similar way to how the pggraphql and pgtiktoken extensions were added (since they're all written with pgrx). Then I tested this. ``` $ cd docker-compose/ $ PG_VERSION=16 TAG=10667533475 docker-compose up --build -d $ psql postgresql://cloud_admin:cloud_admin@localhost:55433/postgres cloud_admin@postgres=# create extension pg_session_jwt; CREATE EXTENSION Time: 43.048 ms cloud_admin@postgres=# \df auth.*; List of functions ┌────────┬──────────────────┬──────────────────┬─────────────────────┬──────┐ │ Schema │ Name │ Result data type │ Argument data types │ Type │ ├────────┼──────────────────┼──────────────────┼─────────────────────┼──────┤ │ auth │ get │ jsonb │ s text │ func │ │ auth │ init │ void │ kid bigint, s jsonb │ func │ │ auth │ jwt_session_init │ void │ s text │ func │ │ auth │ user_id │ text │ │ func │ └────────┴──────────────────┴──────────────────┴─────────────────────┴──────┘ (4 rows) cloud_admin@postgres=# select auth.init(cast('1' as bigint), to_jsonb(TEXT '{ "kty": "EC", "kid": "571683be-33cf-4e67-bccc-8905c0ebb862", "crv": "P-521", "alg": "ES512", "x": "AM_GsnQvKML2yXdn_OsN8PdgO1Sf9XMXih5vQMKLmJkp-Iz_FFWJUt6uyR_qp4brr8Ji2kjGJgN4cQJpg2kskH7V", "y": "AZg-salw24lCmsBP-BCBa5jT6INkTwLtCOC7o0BIxDVvmIEH1-PQAJVYVJPTFvPMi_PLa0QlOm-ufJYkynwa2Mau" }')); ERROR: called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"{ \\\"kty\\\": \\\"EC\\\", \\\"kid\\\": \\\"571683be-33cf-4e67-bccc-8905c0ebb862\\\", \\\"crv\\\": \\\"P-521\\\", \\\"alg\\\": \\\"ES512\\\", \\\"x\\\": \\\"AM_GsnQvKML2yXdn_OsN8PdgO1Sf9XMXih5vQMKLmJkp-Iz_FFWJUt6uyR_qp4brr8Ji2kjGJgN4cQJpg2kskH7V\\\", \\\"y\\\": \\\"AZg-salw24lCmsBP-BCBa5jT6INkTwLtCOC7o0BIxDVvmIEH1-PQAJVYVJPTFvPMi_PLa0QlOm-ufJYkynwa2Mau\\\" }\", expected struct JwkEcKey", line: 0, column: 0) Time: 6.991 ms ``` ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Move the download location to a proper URL	2024-10-01 10:29:56 +01:00
John Spray	d515727e94	tests: make test_multi_attach more stable (#9202 ) ## Problem `test_multi_attach` is sometimes failing with `invalid compute status for configuration request: Configuration`. This is likely a result of the test attempting to reconfigure the compute at the same time as the storage controller is doing so. This test was originally written before the storage controller existed, and is not expecting anything else to be reconfiguring computes at the same time. ## Summary of changes - Configure the tenant into scheduling policy `Stop` in the storage controller at the start of the test, so that it won't try to do anything to the tenant while the test is running.	2024-10-01 10:15:18 +01:00
Folke Behrens	2e508b1ff9	Upgrade OpenTelemetry and other tracing crates (#9200 ) * tracing-utils now returns a `Layer` impl. Removes the need for crates to import OTel crates. * Drop the /v1/traces URI check. Verified that the code does the right thing. * Leave a TODO to hook in an error handler for OTel to log errors to when it assumes the regular pipeline cannot be used/is broken.	2024-10-01 11:02:54 +02:00
John Spray	651ae44569	storage controller: drop out of blocking compute notification loop if migration origin becomes unavailable (#9147 ) ## Problem The live migration code waits forever for the compute notification hook, on the basis that until it succeeds, the compute is probably using the old location and we shouldn't detach it. However, if a pageserver stops or restarts in the background, then this original location might no longer be available, so there is no point waiting. Waiting is also actively harmful, because it prevents other reconciliations happening for the tenant shard, such as during an upgrade where a stuck "drain" migration might prevent the later "fill" migration from moving the shard back to its original location. ## Summary of changes - Refactor the notification wait loop into a function - Add a checks during the loop, for the origin node's cancellation token and an explicit HTTP request to the origin node to confirm the shard is still attached there. Closes: https://github.com/neondatabase/neon/issues/8901	2024-10-01 07:57:22 +00:00
Heikki Linnakangas	65bda19051	Remove unnecessary dev package from compute image (#9210 ) libcurl4-openssl-dev is needed to build pgxn/, but libcurl4 is enough at runtime.	2024-10-01 01:07:43 +03:00
Conrad Ludgate	94a5ca2817	proxy: auth broker (#8855 ) Opens http2 connection to local-proxy and forwards requests over with all headers and body closes https://github.com/neondatabase/cloud/issues/16039	2024-09-30 20:43:45 +01:00
Arthur Petukhovsky	c07cea80bd	Bump vm-builder v0.29.3 -> v0.35.0 (#9208 ) We haven't updated it for a while. Now I need the update to add quotas support to compute images (https://github.com/neondatabase/cloud/issues/13127). Previous update: https://github.com/neondatabase/neon/pull/7849	2024-09-30 19:18:42 +01:00
Conrad Ludgate	a2e2362ee9	add proxy-protocol header disable option (#9203 ) resolves https://github.com/neondatabase/cloud/issues/18026	2024-09-30 18:11:50 +00:00
Heikki Linnakangas	0a567acdb9	tests: Move comment to more appropriate place There is no 'pg_bin' in NeonEnv.	2024-09-30 17:56:43 +03:00
Heikki Linnakangas	69ea2776e9	tests: Remove creation of extra timelines in some tests neon_cli.create_tenant() creates a new tenant and a timeline on the tenant, with name "main". In most tests, there's no need to create another timeline on the same tenant. There are some more tests that do that, but in the remaining cases, I wasn't be 100% if the presence of extra root timelines affect what the tests test, so I left them alone.	2024-09-30 17:56:40 +03:00
Heikki Linnakangas	4dc9cb7cf9	tests: Remove some spurious list_timelines calls These calls seem really out of place. We know what the initial tenant and branch are in these tests, just like in all other tests.	2024-09-30 17:56:37 +03:00
John Spray	7424e7269c	tests: longer timeout in `test_delete_timeline_client_hangup` (#9161 ) ## Problem This test waits for a request to finish, and then expects deletion to complete almost immediately. The request completes, but it's a 202, the timeline is still deleting in the background: we need to be more patient. ## Summary of changes - Adjust iterations from 2 to 10 when waiting for deletion	2024-09-30 15:46:07 +01:00
a-masterov	5dc68e4e6a	test_compatibility: fix the regexes detecting the version (#9205 ) ## Problem The Neon components, built locally and by the GitHub workflow have slightly different version prefixes (git: vs git-env:) This does not allow running tests against local builds correctly. ## Summary of changes The regular expressions were changed to work with both prefixes.	2024-09-30 16:37:14 +02:00
John Spray	d3490dbfea	Merge pull request #9196 from neondatabase/rc/2024-09-30 Storage & Compute release 2024-09-30	2024-09-30 10:04:42 +01:00
John Spray	7cfd116856	pageserver: refactor immediate_gc into TenantManager (#9183 ) ## Problem Legacy functions that were called as `mgr::` and relied on the static TENANTS, see #5796 ## Summary of changes - Move the last stray function (immediate_gc) into TenantManager Closes: https://github.com/neondatabase/neon/issues/5796	2024-09-30 09:27:28 +01:00
Heikki Linnakangas	d696c41807	Bump default neon extension version to 1.5 (#9188 ) Commit `263dfba6ee` introduced neon extension version 1.5, which included some new functions and views for metrics. It didn't bump the default neon extension number yet, so that we could still safely roll back to the old binary if necessary. This bumps the default version.	2024-09-30 09:20:52 +03:00
Alexander Bayandin	3c72192065	CI(benchmarking): fix setting LD_LIBRARY_PATH (#9191 ) ## Problem `pgbench-pgvector` job from Nightly Benchmarks fails with the error: ``` /__w/_temp/f45bc2eb-4c4c-4f0a-8030-99079303fa65.sh: line 17: LD_LIBRARY_PATH: unbound variable ``` ## Summary of changes - Fix `LD_LIBRARY_PATH: unbound variable` error in benchmarks	2024-09-29 22:27:53 +00:00
Alexander Bayandin	d2d9921761	CI(benchmarking): fix Nightly Benchmarks (#9178 ) ## Problem Nightly Benchmarks have been broken for some time due to various reasons, this PR fixes it ## Summary of changes - Pull `build-tools` image from dockerhub for `benchmarking` workflow - Use `aws-actions/configure-aws-credentials` to upload/download artifacts from S3 - Fix Postgres 16 installation (for pgbench)	2024-09-28 02:44:22 +01:00
Arthur Petukhovsky	ba498a630a	Set disk quotas on bind in compute_ctl (#8936 ) Part of https://github.com/neondatabase/cloud/issues/13127. Resolves #9153 What changed in this PR: 1. Adds `ComputeSpec.disk_quota_bytes: Option<u64>` 2. Adds new arg to compute_ctl: `--set-disk-quota-for-fs <mountpoint>` 3. Implements running `/neonvm/bin/set-disk-quota` with the right value if both cmdline arg AND field in the spec are specified 4. Patches `/etc/sudoers.d` to allow `compute_ctl` to set quota with sudo This PR is very similar to the swap support added earlier, you can take a look at it as prior art: #7434 In theory, it can be implemented outside of compute_ctl when we will have a separate neonvm daemon, but we are not there yet. Current implementation is the simplest possible to unblock computes with larger disks. All code related to usage of `/neonvm/bin/set-disk-quota` is located in `disk_quota.rs`. We need to call this script with the following arguments: `/neonvm/bin/set-disk-quota {size_kb} {mountpoint}`. Quotas are set on the filesystem level, so we need to provide path to the directory that filesystem was mounted to. I tested this change locally with https://github.com/neondatabase/cloud/pull/17270. It should be safe to merge, because this feature is gated by both cmdline arg and field in the spec. If control-plane doesn't set values in both places, compute_ctl won't be affected by this change.	2024-09-27 20:52:22 +01:00
Heikki Linnakangas	e989a5e4a2	neon_local: Use clap derive macros to parse the CLI args (#9103 ) This is easier to work with.	2024-09-27 22:08:46 +03:00
Alex Chi Z.	cde1654d7b	fix(pageserver): abort process if fsync fails (#9108 ) close https://github.com/neondatabase/neon/issues/8140 The original issue is rather vague on what we should do. After discussion w/ @problame we decided to narrow down the problems we want to solve in that issue. * read path -- do not panic for now. * write path -- panic only on write errors (i.e., device error, fsync error), but not on no-space for now. The guideline is that if the pageserver behavior could lead to violation of persistent constraints (i.e., return an operation as successful but not actually persisting things), we should panic. Fsync is the place where both of us agree that we should panic, because if fsync fails, the kernel will mark dirty pages as clean, and the next fsync will not necessarily return false. This would make the storage client assume the operation is successful. ## Summary of changes Make fsync panic on fatal errors. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-27 19:58:50 +01:00
Heikki Linnakangas	cf6a776fcf	tests: Reduce the # of iterations in safekeeper::test_random_schedules (#9182 ) To make it faster. On my laptop, it takes about 30 before this commit. In the arm64 debug variant in CI, it takes about 120 s. Reduce it by factor of 4.	2024-09-27 16:25:35 +00:00
Matthias van de Meent	5c5871111a	WalProposer: Read WAL directly from WAL buffers in PG17 (#9171 ) This reduces the overhead of the WalProposer when it is not being throttled by SK WAL acceptance rate	2024-09-27 17:47:05 +02:00
Yuchen Liang	d56c4e7a38	pageserver: remove AdjacentVectoredReadBuilder and bump minmimum io_buffer_alignment to 512 (#9175 ) Part of #8130 ## Problem After deploying https://github.com/neondatabase/infra/pull/1927, we shipped `io_buffer_alignment=512` to all prod region. The `AdjacentVectoredReadBuilder` code path is no longer taken and we are running pageserver unit tests 6 times in the CI. Removing it would reduce the test duration by 30-60s. ## Summary of changes - Remove `AdjacentVectoredReadBuilder` code. - Bump the minimum `io_buffer_alignment` requirement to at least 512 bytes. - Use default `io_buffer_alignment` for Rust unit tests. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-27 16:41:42 +01:00
Conrad Ludgate	43b2445d0b	proxy: add jwks endpoint to control plane and mock providers (#9165 )	2024-09-27 16:08:43 +01:00
Yuchen Liang	42ef08db47	fix(pageserver): LSN lease edge cases around restarts/migrations (#9055 ) Part of #7497, closes #8817. ## Problem See #8817. ## Summary of changes compute_ctl - Renew lsn lease as soon as `/configure` updates pageserver_connstr, use `state_changed` Condvar for synchronization. pageserver As mentioned in https://github.com/neondatabase/neon/issues/8817#issuecomment-2315768076, we still want some permanent error reported if a lease cannot be granted. By considering attachment mode and the added `lsn_lease_deadline` when processing lease requests, we can also bound the case of bad requests to a very short period after migration/restart. - Refactor https://github.com/neondatabase/neon/pull/9024 and move `lsn_lease_deadline` to `AttachedTenantConf` so timeline can easily access it. - Have separate HTTP `init_lsn_lease` and libpq `renew_lsn_lease` API. - Always do LSN verification for the initial HTTP lease request. - LSN verification for the renewal is still done when tenants are not in `AttachedSingle` and we have pass the `lsn_lease_deadline`, which give plenty of time for compute to renew the lease. neon_local - add and call `timeline_init_lsn_lease` mgmt_api at static endpoint start. The initial lsn lease http request is sent when we run `cargo neon endpoint start <static endpoint>`. ## Testing - Extend `test_readonly_node_gc` to do pageserver restarts and migration. ## Future Work - The control plane should make the initial lease request through HTTP when creating a static endpoint. This is currently only done in `neon_local`. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-27 09:56:52 -04:00
Tristan Partin	fc962c9605	Use long options when calling initdb Verbosity in this case is good when reading the code. Short options are better when operating in an interactive shell. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-27 08:22:16 -05:00
Heikki Linnakangas	357fa070a3	Add gdb to build-tools (#9125 ) So that compute_ctl can use it to print backtrace on core dumps See issue #2800.	2024-09-27 15:36:24 +03:00
Heikki Linnakangas	02cdd37b56	Dump backtrace if a core dump is called just "core" (#9125 ) I hope this lets us capture backtraces in CI. At least it makes it work on my laptop, which is valuable even if we need to do more for CI. See issue #2800.	2024-09-27 15:36:24 +03:00
Vlad Lazar	fa354a65ab	libs: improve logging on PG connection errors (#9130 ) ## Problem We get some unexpected errors, but don't know who they're happening for. ## Summary of change Add tenant id and peer address to PG connection error logs. Related https://github.com/neondatabase/cloud/issues/17336	2024-09-27 12:36:43 +01:00
Arseny Sher	40f7930a7d	safekeeper: skip syncfs on start if --no-sync is specified. (#9166 ) https://neondb.slack.com/archives/C059ZC138NR/p1727350911890989?thread_ts=1727350211.370869&cid=C059ZC138NR	2024-09-27 09:59:38 +03:00
Conrad Ludgate	ec07a1ecc9	proxy: make local-proxy config by signal with PID, refine JWKS apis with role caching (#9164 )	2024-09-26 19:01:48 +01:00
Arseny Sher	c4cdfe66ac	Fix flakiness of test_timeline_copy. Timeline might be not initialized when timeline_start_lsn is queried. Spotted by CI.	2024-09-26 19:01:45 +03:00
Alex Chi Z.	42e19e952f	fix(pageserver): categorize client error in basebackup metrics (#9110 ) We separated client error from basebackup error log lines in https://github.com/neondatabase/neon/pull/7523, but we didn't do anything for the metrics. In this patch, we fixed it. ref https://github.com/neondatabase/neon/issues/8970 ## Summary of changes We use the same criteria as in `log_query_error` producing an info line (instead of error) for the metrics. We added a `client_error` category for the basebackup query time metrics. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-26 11:38:19 -04:00
John Spray	3d255d601b	pageserver: rename control plane client & chunk validation requests (#8997 ) ## Problem - In https://github.com/neondatabase/neon/pull/8784, the validate controller API is modified to check generations directly in the database. It batches tenants into separate queries to avoid generating a huge statement, but - While updating this, I realized that "control_plane_client" is a kind of confusing name for the client code now that it primarily talks to the storage controller (the case of talking to the control plane will go away in a few months). ## Summary of changes - Big rename to "ControllerUpcallClient" -- this reflects the storage controller's api naming, where the paths used by the pageserver are in `/upcall/` - When sending validate requests, break them up into chunks so that we avoid possible edge cases of generating any HTTP requests that require database I/O across many thousands of tenants. This PR mixes a functional change with a refactor, but the commits are cleanly separated -- only the last commit is a functional change. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-09-26 16:06:34 +01:00
Arthur Petukhovsky	80e974d05b	fix(compute_ctl): race condition in configurator (#9162 ) There was a tricky race condition in compute_ctl, that sometimes makes configurator skip updates. It makes a deadlock because: - control-plane cannot configure compute, because it's in ConfigurationPending state - compute_ctl doesn't do any reconfiguration because `configurator_main_loop` missed notification for it Full sequence that reproduces the issue: 1. `start_compute` finishes works and changes status `self.set_status(ComputeStatus::Running);` 2. configurator received update about `Running` state and dropped the mutex lock in the iteration 3. `/configure` request was triggered at the same time as step 1, and got the mutex lock 4. same `/configure` request set the spec and updated the state to `ConfigurationPending`, also sent a notification 5. next iteration in configurator got the mutex lock, but missed the notification There are more details in this slack thread: https://neondb.slack.com/archives/C03438W3FLZ/p1727281028478689?thread_ts=1727261220.483799&cid=C03438W3FLZ --------- Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2024-09-26 15:42:17 +01:00
Alexander Bayandin	7fdf1ab5b6	CI: run compatibility tests on Postgres 17 (#9145 ) ## Problem The latest storage release has generated artifacts for Postgres 17, so we can enable compatibility tests this version ## Summary of changes - Unskip `test_backward_compatibility` / `test_forward_compatibility` on Postgres 17	2024-09-26 15:17:01 +01:00
Arpad Müller	7bae78186b	Forbid creation of child timelines of archived timeline (#9122 ) We don't want to allow any new child timelines of archived timelines. If you want any new child timelines, you should first un-archive the timeline. Part of #8088	2024-09-26 02:05:25 +02:00
Anastasia Lubennikova	2b9fb47e64	Merge pull request #9151 from neondatabase/releases/2024-09-25-compute-only-2 Compute release 2024-09-25	2024-09-25 23:37:55 +01:00
Heikki Linnakangas	7e560dd00e	chore: Silence clippy warning with nightly (#9157 ) The warning: warning: first doc comment paragraph is too long --> pageserver/src/tenant/checks.rs:7:1 \| 7 \| / /// Checks whether a layer map is valid (i.e., is a valid result of the current compaction algorithm if no... 8 \| \| /// The function checks if we can split the LSN range of a delta layer only at the LSNs of the delta layer... 9 \| \| /// 10 \| \| /// ```plain \| \|_ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph = note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default help: add an empty line \| 7 ~ /// Checks whether a layer map is valid (i.e., is a valid result of the current compaction algorithm if nothing goes wrong). 8 + /// \| Fix by applying the suggestion.	2024-09-25 21:29:16 +00:00
Tristan Partin	684e924211	Fix compute_logical_snapshot_files for v14 The function, pg_ls_logicalsnapdir(), was added in version 15. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-25 16:25:17 -05:00
Tristan Partin	8ace9ea25f	Format long single DATA line in pgxn/Makefile This should be a little more readable. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-25 16:25:17 -05:00
Alex Chi Z.	6a4f49b08b	fix(pageserver): passthrough partition cancel error (#9154 ) close https://github.com/neondatabase/neon/issues/9142 ## Summary of changes passthrough CollectKeyspaceError::Cancelled to CompactionError::ShuttingDown Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-25 21:35:33 +01:00
Alexander Bayandin	c6e89445e2	CI(promote-images): fix prod ECR auth (#9146 ) A cherry-pick from the previous release (#9131) ## Problem Login to prod ECR doesn't work anymore: ``` Retrieving registries data through * SDK... * ECR detected with eu-central-1 region Error: The security token included in the request is invalid. ``` ## Summary of changes - Fix login to prod ECR by using `aws-actions/configure-aws-credentials`	2024-09-25 18:22:39 +01:00
Vlad Lazar	04f32b9526	tests: remove patching up of az id column (#8968 ) This was required since the compat tests used a snapshot generated from a version of neon local which didn't contain the availability_zone_id column.	2024-09-25 17:22:32 +01:00
Heikki Linnakangas	6f2333f52b	CI: Leave out unnecessary build files from binary artifact (#9135 ) The pg_install/build directory contains .o files and such intermediate results from the build, which are not needed in the final tarball. Except for src/test/regress/regress.so and a few other .so files in that directory; keep those. This reduces the size of the neon-Linux-X64-release-artifact.tar.zst artifact from about 1.5 GB to 700 MB. (I attempted this a long time ago already, by moving the build/ directory out of pg_install altogether, see PR #2127. But I never got around to finish that work.) Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-25 19:07:20 +03:00
Yuchen Liang	d447f49bc3	fix(pageserver): handle lsn lease requests for unnormalized lsns (#9137 ) Fixes https://github.com/neondatabase/neon/issues/9098. ## Problem See https://github.com/neondatabase/neon/issues/9098#issuecomment-2372484969. ### Related A similar problem happened with branch creation, which was discussed [here](https://github.com/neondatabase/neon/pull/2143#issuecomment-1199969052) and fixed by https://github.com/neondatabase/neon/pull/2529. ## Summary of changes - Normalize the lsn on pageserver side upon lsn lease request, stores the normalized LSN. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-25 14:57:38 +00:00
Vlad Lazar	c5972389aa	storcon: include timeline ID in LSN waiting logs (#9141 ) ## Problem Hard to tell which timeline is holding the migration. ## Summary of Changes Add timeline id to log.	2024-09-25 15:54:41 +01:00
Matthias van de Meent	c4f5736d5a	Build images for PG17 using Debian 12 "Bookworm" (#9132 ) This increases the support window of the OS used for PG17 by 2 years compared to the previous usage of Debian 11 "Bullseye".	2024-09-25 17:50:05 +03:00
Alexey Kondratov	518f598e2d	docs(rfc): Independent compute release flow (#8881 ) Related to https://github.com/neondatabase/cloud/issues/11698	2024-09-25 16:24:09 +02:00
John Spray	4b711caf5e	storage controller: make proxying of GETs to pageservers more robust (#9065 ) ## Problem These commits are split off from https://github.com/neondatabase/neon/pull/8971/commits where I was fixing this to make a better scale test pass -- Vlad also independently recognized these issues with cloudbench in https://github.com/neondatabase/neon/issues/9062. 1. The storage controller proxies GET requests to pageservers based on their intent, not the ground truth of where they're really attached. 2. Proxied requests can race with scheduling to tenants, resulting in 404 responses if the request hits the wrong pageserver. Closes: https://github.com/neondatabase/neon/issues/9062 ## Summary of changes 1. If a shard has a running reconciler, then use the database generation_pageserver to decide who to proxy the request to 2. If such a request gets a 404 response and its scheduled node has changed since the request was dispatched.	2024-09-25 13:56:39 +00:00
Vlad Lazar	2cf47b1477	storcon: do az aware scheduling (#9083 ) ## Problem Storage controller didn't previously consider AZ locality between compute and pageservers when scheduling nodes. Control plane has this feature, and, since we are migrating tenants away from it, we need feature parity to avoid perf degradations. ## Summary of changes The change itself is fairly simple: 1. Thread az info into the scheduler 2. Add an extra member to the scheduling scores Step (2) deserves some more discussion. Let's break it down by the shard type being scheduled: Attached Shards We wish for attached shards of a tenant to end up in the preferred AZ of the tenant since that is where the compute is like to be. The AZ member for `NodeAttachmentSchedulingScore` has been placed below the affinity score (so it's got the second biggest weight for picking the node). The rationale for going below the affinity score is to avoid having all shards of a single tenant placed on the same node in 2 node regions, since that would mean that one tenant can drive the general workload of an entire pageserver. I'm not 100% sure this is the right decision, so open to discussing hoisting the AZ up to first place. Secondary Shards We wish for secondary shards of a tenant to be scheduled in a different AZ from the preferred one for HA purposes. The AZ member for `NodeSecondarySchedulingScore` has been placed first, so nodes in different AZs from the preferred one will always be considered first. On small clusters, this can mean that all the secondaries of a tenant are scheduled to the same pageserver, but secondaries don't use up as many resources as the attached location, so IMO the argument made for attached shards doesn't hold. Related: https://github.com/neondatabase/neon/issues/8848	2024-09-25 14:31:04 +01:00
Folke Behrens	7dcfcccf7c	Re-export git-version from utils and remove as direct dep (#9138 )	2024-09-25 14:38:35 +02:00
Vlad Lazar	a26cc29d92	storcon: add tags to scheduler logs (#9127 ) We log something at info level each time we schedule a shard to a non-secondary location. Might as well have context for it.	2024-09-25 10:16:06 +01:00
Alex Chi Z.	5f2f31e879	fix(test): storage scrubber should only log to stdout with info (#9067 ) As @koivunej mentioned in the storage channel, for regress test, we don't need to create a log file for the scrubber, and we should reduce noisy logs. ## Summary of changes * Disable log file creation for storage scrubber * Only log at info level --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-24 22:33:03 +00:00
Damian972	938b163b42	chore(docker-compose): fix typo in readme (#9133 ) Typo in the readme inside docker-compose folder ## Summary of changes - Update the readme	2024-09-24 18:05:23 -04:00
Heikki Linnakangas	5cbf5b45ae	Remove TenantState::Loading (#9118 ) The last real use was removed in commit `de90bf4663`. It was still used in a few unit tests, but they can use Attaching too.	2024-09-24 20:58:54 +00:00
Heikki Linnakangas	af5c54ed14	test: Make test_lfc_resize more robust (#9117 ) 1. Increase statement_timeout. It defaults to 120 s, which is not quite enough on slow or busy systems with debug build. On my laptop, the index creation takes about 100 s. On buildfarm, we've seen failures, e.g: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9084/10997888708/index.html#suites/821f97908a487f1d7d3a2a4dd1571e99/db1834bddfe8c5b9/ 2. Keep twiddling the LFC size through the whole test. Before, we would do it for the first 10 seconds, but that only covers a small part of the pgbench initialization phase. Change the loop so that the pgbench run time determines how long the test runs, and we keep changing the LFC for the whole time. In the passing, also fix bogus test description, copy-pasted from a completely unrelated test.	2024-09-24 23:38:16 +03:00
Alexander Bayandin	523cf71721	Fix compiler warnings on macOS (#9128 ) ## Problem Compilation of neon extension on macOS produces a warning ``` pgxn/neon/neon_perf_counters.c:50:1: error: non-void function does not return a value [-Werror,-Wreturn-type] ``` ## Summary of changes - Change the return type of `NeonPerfCountersShmemInit` to void	2024-09-24 18:11:31 +00:00
Arpad Müller	c47f355ec1	Catch Cancelled and don't print a warning for it (#9121 ) In the `imitate_synthetic_size_calculation_worker` function, we might obtain the `Cancelled` error variant instead of hitting the cancellation token based path. Therefore, catch `Cancelled` and handle it analogously to the cancellation case. Fixes #8886.	2024-09-24 17:28:56 +00:00
Yuchen Liang	4f67b0225b	pageserver: handle decompression outside vectored `read_blobs` (#8942 ) Part of #8130. ## Problem Currently, decompression is performed within the `read_blobs` implementation and the decompressed blob will be appended to the end of the `BytesMut` buffer. We will lose this flexibility of extending the buffer when we switch to using our own dio-aligned buffer (WIP in https://github.com/neondatabase/neon/pull/8730). To facilitate the adoption of aligned buffer, we need to refactor the code to perform decompression outside `read_blobs`. ## Summary of changes - `VectoredBlobReader::read_blobs` will return `VectoredBlob` without performing decompression and appending decompressed blob. It becomes the caller's responsibility to decompress the buffer. - Added a new `BufView` type that functions as `Cow<Bytes, &[u8]>`. - Perform decompression within `VectoredBlob::read` so that people don't have to explicitly thinking about compression when using the reader interface. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-24 16:41:38 +00:00
Alexander Bayandin	7474790c80	CI(promote-images): fix prod ECR auth (#9131 ) ## Problem Login to prod ECR doesn't work anymore: ``` Retrieving registries data through * SDK... * ECR detected with eu-central-1 region Error: The security token included in the request is invalid. ``` Ref https://github.com/neondatabase/neon/actions/runs/11015238522/job/30592994281 Tested https://github.com/neondatabase/neon/actions/runs/11017690614/job/30596213259#step:5:18 (on https://github.com/neondatabase/neon/commit/aae6182ff) ## Summary of changes - Fix login to prod ECR by using `aws-actions/configure-aws-credentials`	2024-09-24 18:34:56 +02:00
Heikki Linnakangas	2f7cecaf6a	test: Poll pageserver availability more aggressively at test startup Even with the 100 ms interval, on my laptop the pageserver always becomes available on second attempt, so this saves about 900 ms at every test startup.	2024-09-24 17:16:43 +03:00
Heikki Linnakangas	589594c2e1	test: Skip fsync when initdb'ing the storage controller db After initdb, we configure it with "fsync=off" anyway.	2024-09-24 17:16:43 +03:00
Arpad Müller	db1e3ff9f4	Merge pull request #9095 from neondatabase/rc/2024-09-23 Storage & Compute release 2024-09-23	2024-09-24 15:51:27 +02:00
Heikki Linnakangas	70fe007519	test: Make test_hot_standby_feedback more forgiving of slow initialization (#9113 ) Don't start waiting for the index to appear in the secondary until it has been created in the primary. Before, if the "pgbench -i" step took more than 60 s, we would give up. There was a flaky test failure along those lines at: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9105/10997477941/index.html#suites/950eff205b552e248417890b8b8f189e/73cf4b5648fa6f74/ Hopefully, this avoids such failures in the future.	2024-09-24 16:41:59 +03:00
a-masterov	b224a5a377	Move the patch to compute (#9120 ) ## Problem All the other patches were moved to the compute directory, and only one was left in the patches subdirectory in the root directory. ## Summary of changes The patch was moved to the compute directory as others	2024-09-24 15:13:18 +02:00
Christian Schwarz	a65d437930	chore(#9077 ): cleanups & code dedup (#9082 ) Punted from https://github.com/neondatabase/neon/pull/9077	2024-09-24 13:05:07 +00:00
Matthias van de Meent	fc67f8dc60	Update PostgreSQL 17 from 17rc1 to 17.0 (#9119 ) The PostgreSQL 17 vendor module is now based on postgres/postgres @ d7ec59a63d745ba74fba0e280bbf85dc6d1caa3e, presumably the final code change before the V17 tag.	2024-09-24 14:15:52 +02:00
Folke Behrens	2b65a2b53e	proxy: check if IP is allowed during webauth flow (#9101 ) neondatabase/cloud#12018	2024-09-24 11:52:25 +02:00
Vlad Lazar	9490360df4	storcon: improve initial shard scheduling (#9081 ) ## Problem Scheduling on tenant creation uses different heuristics compared to the scheduling done during background optimizations. This results in scenarios where shards are created and then immediately migrated by the optimizer. ## Summary of changes 1. Make scheduler aware of the type of the shard it is scheduling (attached vs secondary). We wish to have different heuristics. 2. For attached shards, include the attached shard count from the context in the node score calculation. This brings initial shard scheduling in line with what the optimization passes do. 3. Add a test for (2). This looks like a bigger change than required, but the refactoring serves as the basis for az-aware shard scheduling where we also need to make the distinction between attached and secondary shards. Closes https://github.com/neondatabase/neon/issues/8969	2024-09-24 09:03:41 +00:00
a-masterov	91d947654e	Add regression tests for a cloud-based Neon instance (#8681 ) ## Problem We need to be able to run the regression tests against a cloud-based Neon staging instance to prepare the migration to the arm architecture. ## Summary of changes Some tests were modified to work on the cloud instance (i.e. added passwords, server-side copy changed to client-side, etc) --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-09-24 09:44:45 +02:00
Yuchen Liang	37aa6fd953	scrubber: retry when missing index key in the listing (#8873 ) Part of #8128, fixes #8872. ## Problem See #8872. ## Summary of changes - Retry `list_timeline_blobs` another time if - there are layer file keys listed but not index. - failed to download index. - Instrument code with `analyze-tenant` and `analyze-timeline` span. - Remove `initdb_archive` check, it could have been deleted. - Return with exit code 1 on fatal error if `--exit-code` parameter is set. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-09-23 21:58:12 +00:00
Heikki Linnakangas	3ad567290c	Move metric exporter and pgbouncer config files Instead of adding them to the VM image late in the build process, when putting together the final VM image, include them in the earlier compute image already. That makes it more convenient to edit the files, and to test them.	2024-09-24 00:35:52 +03:00
Heikki Linnakangas	3a110e45ed	Move files related to building compute image into compute/ dir Seems nice to keep all these together. This also provides a nice place for a README file to describe the compute image build process. For now, it briefly describes the contents of the directory, but can be expanded.	2024-09-24 00:35:52 +03:00
Heikki Linnakangas	e7e6319e20	Fix compiler warnings with nightly rustc about elided lifetimes having names (#9105 ) The warnings: warning: elided lifetime has a name --> pageserver/src/metrics.rs:1386:29 \| 1382 \| pub(crate) fn start_timer<'c: 'a, 'a>( \| -- lifetime `'a` declared here ... 1386 \| ) -> Option<impl Drop + '_> { \| ^^ this elided lifetime gets resolved as `'a` \| = note: `#[warn(elided_named_lifetimes)]` on by default warning: elided lifetime has a name --> pageserver/src/metrics.rs:1537:46 \| 1534 \| pub(crate) fn start_recording<'c: 'a, 'a>( \| -- lifetime `'a` declared here ... 1537 \| ) -> BasebackupQueryTimeOngoingRecording<'_, '_> { \| ^^ this elided lifetime gets resolved as `'a` warning: elided lifetime has a name --> pageserver/src/metrics.rs:1537:50 \| 1534 \| pub(crate) fn start_recording<'c: 'a, 'a>( \| -- lifetime `'a` declared here ... 1537 \| ) -> BasebackupQueryTimeOngoingRecording<'_, '_> { \| ^^ this elided lifetime gets resolved as `'a` warning: elided lifetime has a name --> pageserver/src/tenant.rs:3630:25 \| 3622 \| async fn prepare_new_timeline<'a>( \| -- lifetime `'a` declared here ... 3630 \| ) -> anyhow::Result<UninitializedTimeline> { \| ^^^^^^^^^^^^^^^^^^^^^ this elided lifetime gets resolved as `'a`	2024-09-23 23:31:32 +02:00
Matthias van de Meent	d865881d59	NOAI (#9084 ) We can't FlushOneBuffer when we're in redo-only mode on PageServer, so make execution of that function conditional on us not running in pageserver walredo mode.	2024-09-23 21:16:42 +00:00
Konstantin Knizhnik	1c5d6e59a0	Maintain number of used pages for LFC (#9088 ) ## Problem LFC cache entry is chunk (right now size of chunk is 1Mb). LFC statistics shows number of chunks, but not number of used pages. And autoscaling team wants to know how sparse LFC is: https://neondb.slack.com/archives/C04DGM6SMTM/p1726782793595969 It is possible to obtain it from the view `select count() from local_cache`. Nut it is expensive operation, enumerating all entries in LFC under lock. ## Summary of changes This PR added "file_cache_used_pages" to `neon_lfc_stats` view: ``` select from neon_lfc_stats; lfc_key \| lfc_value -----------------------+----------- file_cache_misses \| 3139029 file_cache_hits \| 4098394 file_cache_used \| 1024 file_cache_writes \| 3173728 file_cache_size \| 1024 file_cache_used_pages \| 25689 (6 rows) ``` Please notice that this PR doesn't change neon extension API, so no need to create new version of Neon extension. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-09-23 22:05:32 +03:00
Heikki Linnakangas	263dfba6ee	Add views for metrics about pageserver requests (#9008 ) The metrics include a histogram of how long we need to wait for a GetPage request, number of reconnects, and number of requests among other things. The metrics are not yet exported anywhere, but you can query them manually. Note: This does not bump the default version of the 'neon' extension. We will do that later, as a separate PR. The reason is that this allows us to roll back the compute image smoothly, if necessary. Once the image that includes the new extension .so file with the new functions has been rolled out, and we're confident that we don't need to roll back the image anymore, we can change default extension version and actually start using the new functions and views. This is what the view looks like: ``` postgres=# select * from neon_perf_counters ; metric \| bucket_le \| value ---------------------------------------+-----------+---------- getpage_wait_seconds_count \| \| 300 getpage_wait_seconds_sum \| \| 0.048506 getpage_wait_seconds_bucket \| 2e-05 \| 0 getpage_wait_seconds_bucket \| 3e-05 \| 0 getpage_wait_seconds_bucket \| 6e-05 \| 71 getpage_wait_seconds_bucket \| 0.0001 \| 124 getpage_wait_seconds_bucket \| 0.0002 \| 248 getpage_wait_seconds_bucket \| 0.0003 \| 279 getpage_wait_seconds_bucket \| 0.0006 \| 297 getpage_wait_seconds_bucket \| 0.001 \| 298 getpage_wait_seconds_bucket \| 0.002 \| 298 getpage_wait_seconds_bucket \| 0.003 \| 298 getpage_wait_seconds_bucket \| 0.006 \| 300 getpage_wait_seconds_bucket \| 0.01 \| 300 getpage_wait_seconds_bucket \| 0.02 \| 300 getpage_wait_seconds_bucket \| 0.03 \| 300 getpage_wait_seconds_bucket \| 0.06 \| 300 getpage_wait_seconds_bucket \| 0.1 \| 300 getpage_wait_seconds_bucket \| 0.2 \| 300 getpage_wait_seconds_bucket \| 0.3 \| 300 getpage_wait_seconds_bucket \| 0.6 \| 300 getpage_wait_seconds_bucket \| 1 \| 300 getpage_wait_seconds_bucket \| 2 \| 300 getpage_wait_seconds_bucket \| 3 \| 300 getpage_wait_seconds_bucket \| 6 \| 300 getpage_wait_seconds_bucket \| 10 \| 300 getpage_wait_seconds_bucket \| 20 \| 300 getpage_wait_seconds_bucket \| 30 \| 300 getpage_wait_seconds_bucket \| 60 \| 300 getpage_wait_seconds_bucket \| 100 \| 300 getpage_wait_seconds_bucket \| Infinity \| 300 getpage_prefetch_requests_total \| \| 69 getpage_sync_requests_total \| \| 231 getpage_prefetch_misses_total \| \| 0 getpage_prefetch_discards_total \| \| 0 pageserver_requests_sent_total \| \| 323 pageserver_requests_disconnects_total \| \| 0 pageserver_send_flushes_total \| \| 323 file_cache_hits_total \| \| 0 (39 rows) ```	2024-09-23 21:28:50 +03:00
Heikki Linnakangas	df3996265f	test: Downgrade info message on removing empty directories (#9093 ) It was pretty noisy. It changed from debug to info level in commit `78938d1b59`, but I believe that was not purpose.	2024-09-23 20:10:22 +02:00
Alex Chi Z.	29699529df	feat(pageserver): filter keys with gc-compaction (#9004 ) Part of https://github.com/neondatabase/neon/issues/8002 Close https://github.com/neondatabase/neon/issues/8920 Legacy compaction (as well as gc-compaction) rely on the GC process to remove unused layer files, but this relies on many factors (i.e., key partition) to ensure data in a dropped table can be eventually removed. In gc-compaction, we consider the keyspace information when doing the compaction process. If a key is not in the keyspace, we will skip that key and not include it in the final output. However, this is not easy to implement because gc-compaction considers branch points (i.e., retain_lsns) and the retained keyspaces could change across different LSNs. Therefore, for now, we only remove aux v1 keys in the compaction process. ## Summary of changes * Add `FilterIterator` to filter out keys. * Integrate `FilterIterator` with gc-compaction. * Add `collect_gc_compaction_keyspace` for a spec of keyspaces that can be retained during the gc-compaction process. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-23 16:30:44 +00:00
Nikita Kalyanov	f446e08fb8	change HTTP method to comply with spec (#9100 ) There is discrepancy with the spec, it has PUT	2024-09-23 15:53:06 +02:00
Christian Schwarz	4d5add9ca0	compact_level0_phase1: remove final traces of value access mode config (#8935 ) refs https://github.com/neondatabase/neon/issues/8184 stacked atop https://github.com/neondatabase/neon/pull/8934 This PR changes from ignoring the config field to rejecting configs that contain it. PR https://github.com/neondatabase/infra/pull/1903 removes the field usage from `pageserver.toml`. It rolls into prod sooner or in the same release as this PR.	2024-09-23 15:05:22 +02:00
Christian Schwarz	59b4c2eaf9	walredo: add a ping method (#8952 ) Not used in production, but in benchmarks, to demonstrate minimal RTT. (It would be nice to not have to copy the 8KiB of zeroes, but, that would require larger protocol changes). Found this useful in investigation https://github.com/neondatabase/neon/pull/8952.	2024-09-23 10:19:37 +00:00
Vlad Lazar	5432155b0d	storcon: update compute hook state on detach (#9045 ) ## Problem Previously, the storage controller may send compute notifications containing stale pageservers (i.e. pageserver serving the shard was detached). This happened because detaches did not update the compute hook state. ## Summary of Changes Update compute hook state on shard detach. Fixes #8928	2024-09-23 10:05:02 +01:00
Heikki Linnakangas	e16e82749f	Remove unused crates from workspace Cargo.toml These were not referenced in any of the other Cargo.toml files in the workspace. They were not being built because of that, so there was little harm in having them listed, but let's be tidy.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	9f653893b9	Update a few dependencies, removing some indirect dependencies cargo update ciborium iana-time-zone lazy_static schannel uuid cargo update hyper@0.14 cargo update --precise 2.9.7 ureq It might be worthwhile just update all our dependencies at some point, but this is aimed at pruning the dependency tree, to make the build a little faster. That's also why I didn't update ureq to the latest version: that would've added a dependency to yet another version of rustls.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	913af44219	Update "memoffset" crate To eliminate one version of it from our dependency tree.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	ecd615ab6d	Update "hostname" crate We were already building v0.4.0 as an indirect dependency, so this avoids having to build two different versions of it.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	c9b2ec9ff1	Check submodule forward progress (#8949 ) We frequently mess up our submodule references. This adds one safeguard: it checks that the submodule references are only updated "forwards", not to some older commit, or a commit that's not a descended of the previous one. As next step, I'm thinking that we should automate things so that when you merge a PR to the 'neon' repository that updates the submodule references, the REL_*_STABLE_neon branches are automatically updated to match the submodule references. That way, you never need to manually merge PRs in the postgres repository, it's all triggered from commits in the 'neon' repository. But that's not included here.	2024-09-22 21:46:53 +03:00
Arpad Müller	a3800dcb0c	Move load_timeline_metadata into separate function (#9080 ) Moves the per-timeline code to load timeline metadata into a new dedicated function called `load_timeline_metadata`. The old `load_timeline_metadata` becomes `load_timelines_metadata`. Split out of #8907 Part of #8088	2024-09-21 12:36:41 +00:00
Heikki Linnakangas	9a32aa828d	Fix init of WAL page header at startup (#8914 ) If the primary is started at an LSN within the first of a 16 MB WAL segment, the "long XLOG page header" at the beginning of the segment was not initialized correctly. That has gone unnnoticed, because under normal circumstances, nothing looks at the page header. The WAL that is streamed to the safekeepers starts at the new record's LSN, not at the beginning of the page, so that bogus page header didn't propagate elsewhere, and a primary server doesn't normally read the WAL its written. Which is good because the contents of the page would be bogus anyway, as it wouldn't contain any of the records before the LSN where the new record is written. Except that in the following cases a primary does read its own WAL: 1. When there are two-phase transactions in prepared state at checkpoint. The checkpointer reads the two-phase state from the XLOG_XACT_PREPARE record, and writes it to a file in pg_twophase/. 2. Logical decoding reads the WAL starting from the replication slot's restart LSN. This PR fixes the problem with two-phase transactions. For that, it's sufficient to initialize the page header correctly. The checkpointer only needs to read XLOG_XACT_PREPARE records that were generated after the server startup, so it's still OK that older WAL is missing / bogus. I have not investigated if we have a problem with logical decoding, however. Let's deal with that separately. Special thanks to @Lzjing-1997, who independently found the same bug and opened a PR to fix it, although I did not use that PR.	2024-09-21 04:00:38 +03:00
Christian Schwarz	ec0550e8ce	Merge pull request #9085 from neondatabase/releases/2024-09-20-hotfix storage hotfix release 2024-09-20 This storage hotfix release adds valuable metrics to pageserver. We will only deploy this hotfix manually to a dedicated pageserver that is currently empty. Context https://neondb.slack.com/archives/C07MU9ES6NP/p1726827244185729 Created using ``` git switch -c releases/2024-09-20-hotfix git reset --hard origin/release git merge `ec5dce04eb` ```	2024-09-20 21:09:43 +02:00
Christian Schwarz	126cbd2e8b	Merge commit 'ec5dce04ebfa51b727dfc9bc04ebb1e68aef6434' into releases/2024-09-20-hotfix	2024-09-20 18:51:08 +00:00
Anastasia Lubennikova	f03f7b3868	Bump vendor/postgres to include extension path fix (#9076 ) This is a pre requisite for https://github.com/neondatabase/neon/pull/8681	2024-09-20 20:24:40 +03:00
Christian Schwarz	ec5dce04eb	pageserver: throttling: per-tenant metrics + more metrics to help understand throttle queue depth (#9077 )	2024-09-20 16:48:26 +00:00
John Spray	6014f15157	pageserver: suppress noisy "layer became visible" logs (#9064 ) ## Problem When layer visibility was added, an info log was included for the situation where actual access to a layer disagrees with the visibility calculation. This situation is safe, but I was interested in seeing when it happens. The log is pretty high volume, so this PR refines it to fire less often. ## Summary of changes - For cases where accessing non-visible layers is normal, don't log at all. - Extend a unit test to increase confidence that the updates to visibility on access are working as expected - During compaction, only call the visibility calculation routine if some image layers were created: previously, frequent calls resulted in the visibility of layers getting reset every time we passed through create_image_layers.	2024-09-20 16:07:09 +00:00
Conrad Ludgate	e675a21346	utils: leaky bucket should only report throttled if the notify queue is blocked on sleep (#9072 ) ## Problem Seems that PS might be too eager in reporting throttled tasks ## Summary of changes Introduce a sleep counter. If the sleep counter increases, then the acquire tasks was throttled.	2024-09-20 16:09:39 +01:00
Alex Chi Z.	6b93230270	fix(pageserver): receive body error now 500 (#9052 ) close https://github.com/neondatabase/neon/issues/8903 In https://github.com/neondatabase/neon/issues/8903 we observed JSON decoding error to have the following error message in the log: ``` Error processing HTTP request: Resource temporarily unavailable: 3956 (pageserver-6.ap-southeast-1.aws.neon.tech) error receiving body: error decoding response body ``` This is hard to understand. In this patch, we make the error message more reasonable. ## Summary of changes * receive body error is now an internal server error, passthrough the `reqwest::Error` (only decoding error) as `anyhow::Error`. * instead of formatting the error using `to_string`, we use the alternative `anyhow::Error` formatting, so that it prints out the cause of the error (i.e., what exactly cannot serde decode). I would expect seeing something like `error receiving body: error decoding response body: XXX field not found` after this patch, though I didn't set up a testing environment to observe the exact behavior. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-20 10:37:28 -04:00
Heikki Linnakangas	797aa4ffaa	Skip running clippy in --release mode. (#9073 ) It's pretty expensive to run, and there is very little difference between debug and release builds that could lead to different clippy warnings. This is extracted from PR #8912. That PR wandered off into various improvements we could make, but we seem to have consensus on this part at least.	2024-09-20 17:22:58 +03:00
Christian Schwarz	c45b56e0bb	pageserver: add counters for started smgr/getpage requests (#9069 ) After this PR ``` curl localhost:9898/metrics \| grep smgr_ \| grep start ``` ``` pageserver_smgr_query_started_count{shard_id="0000",smgr_query_type="get_page_at_lsn",tenant_id="...",timeline_id="..."} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_db_size"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_page_at_lsn"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_rel_exists"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_rel_size"} 0 pageserver_smgr_query_started_global_count{smgr_query_type="get_slru_segment"} 0 ``` We instantiate the per-tenant counter only for `get_page_at_lsn`.	2024-09-20 14:55:50 +01:00
Alexander Bayandin	3104f0f250	Safekeeper: fix OpenAPI spec (#9066 ) ## Problem Safekeeper's OpenAPI spec is incorrect: ``` Semantic error at paths./v1/tenant/{tenant_id}/timeline/{timeline_id}.get.responses.404.content.application/json.schema.$ref $refs must reference a valid location in the document Jump to line 126 ``` Checked on https://editor.swagger.io ## Summary of changes - Add `NotFoundError` - Add `description` and `license` fields to make Cloud OpenAPI spec linter happy	2024-09-20 12:00:05 +01:00
Arseny Sher	f2c08195f0	Bump vendor/postgres. Includes PRs: - ERROR out instead of segfaulting when walsender slots are full. - logical worker: respond to publisher even under dense stream.	2024-09-20 12:38:42 +03:00
Alex Chi Z.	d0cbfda15c	refactor(pageserver): check layer map valid in one place (#9051 ) We have 3 places where we implement layer map checks. ## Summary of changes Now we have a single check function being called in all places. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-19 20:29:28 +00:00
Yuchen Liang	1708743e78	pageserver: wait for lsn lease duration after transition into AttachedSingle (#9024 ) Part of #7497, closes https://github.com/neondatabase/neon/issues/8890. ## Problem Since leases are in-memory objects, we need to take special care of them after pageserver restarts and while doing a live migration. The approach we took for pageserver restart is to wait for at least lease duration before doing first GC. We want to do the same for live migration. Since we do not do any GC when a tenant is in `AttachedStale` or `AttachedMulti` mode, only the transition from `AttachedMulti` to `AttachedSingle` requires this treatment. ## Summary of changes - Added `lsn_lease_deadline` field in `GcBlock::reasons`: the tenant is temporarily blocked from GC until we reach the deadline. This information does not persist to S3. - In `GCBlock::start`, skip the GC iteration if we are blocked by the lsn lease deadline. - In `TenantManager::upsert_location`, set the lsn_lease_deadline to `Instant::now() + lsn_lease_length` so the granted leases have a chance to be renewed before we run GC for the first time after transitioned from AttachedMulti to AttachedSingle. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-09-19 17:27:10 +01:00
Conrad Ludgate	0a1ca7670c	proxy: remove auth info from http conn info & fixup jwt api trait (#9047 ) misc changes split out from #8855 - allow cloning the request context in a read-only fashion for background tasks - propagate endpoint and request context through the jwk cache - only allow password based auth for md5 during testing - remove auth info from conn info	2024-09-19 15:09:30 +00:00
Alex Chi Z.	ff9f065c43	impr(pageserver): log image layer creation (#9050 ) https://github.com/neondatabase/neon/pull/9028 changed the image layer creation log into trace level. However, I personally find logging image layer creation useful when reading the logs -- it makes it clear that the image layer creation is happening and gives a clear idea of the progress. Therefore, I propose to continue logging them for create_image_layers set of functions. ## Summary of changes * Add info logging for all image layers created in legacy compaction. * Add info logging for all layers creation in testing functions. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-19 10:43:12 -04:00
Vlad Lazar	21eeafaaa5	pageserver: simple fix for vectored read image layer skip (#9026 ) ## Problem Different keyspaces may require different floor LSNs in vectored delta layer visits. This patch adds support for such cases. ## Summary of changes Different keyspaces wishing to read the same layer might require different stop lsns (or lsn floor). The start LSN of the read (or the lsn ceil) will always be the same. With this observation, we fix skipping of image layers by indexing the fringe by layer id plus lsn floor. This is very simple, but means that we can visit delta layers twice in certain cases. Still, I think it's very unlikely for any extra merging to have taken place in this case, so perhaps it makes sense to go with the simpler patch. Fixes https://github.com/neondatabase/neon/issues/9012 Alternative to https://github.com/neondatabase/neon/pull/9025	2024-09-19 14:51:00 +01:00
Arseny Sher	32a0e759bd	safekeeper: add wal_last_modified to debug_dump. Adds to debug_dump option to include highest modified time among all WAL segments. In passing replace some str with OsStr to have less unwraps.	2024-09-19 16:17:25 +03:00
Heikki Linnakangas	7c489092b7	Remove unused duplicate DEFAULT_INGEST_BATCH_SIZE constant This constant in 'tenant_conf_defaults' was unused, but there's another constant with the same name in the global 'defaults'. I wish the setting was configurable per-tenant, but it isn't, so let's remove the confusing duplicate.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	06d55a3b12	Clean up concurrent logical size calc semaphore initialization The DEFAULT_CONCURRENT_TENANT_SIZE_LOGICAL_SIZE_QUERIES constant was unused, because we had just hardcoded it to 1 where the constant should've been used. Remove the ConfigurableSemaphore::Default implementation, since it was unused.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	5c68e6a172	Remove unused constant The code that used it was removed in commit `b9d2c7bdd5`	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	2753abc0d8	Remove leftover enums for configuring vectored get implementation The settings were removed in commit corb9d2c7b.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	a523548ed1	Remove unused cleanup_remaining_timeline_fs_traces function There's some more code that still checks for uninit and delete markers, see callers of is_delete_mark and is_uninit_mark, and github issue #5718. But these functions were outright dead.	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	2d4e5af18b	Remove unused code for parsing a postgresql.conf file	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	5da2340e74	Remove misc dead code in control_plane/	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	7b34c2d7af	Remove misc dead code in libs/	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	15ae1fc3df	Remove a few postgres constants that were not used Dead code is generally useless, but with Postgres constants in particular, I'm also worried that if they're not used anywhere, we might fail to update them at a Postgres version update, and get very confused later when they have wrong values.	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	728b79b9dd	Remove some unnecessary derives	2024-09-19 11:57:10 +03:00
Alex Chi Z.	9d1c6f23d3	fix(storage-scrubber): log version after initialize the logger (#9049 ) When I checked the log in Grafana I couldn't find the scrubber version. Then I realized that it should be logged after the logger gets initialized. ## Summary of changes Log after initializing the logger for the scrubber. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-18 14:13:57 -04:00
Christian Schwarz	035a49a6b2	`neon_local start`: parallel startup to break cyclic dependency (#8950 ) (Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.) Problem ------- Before this PR, `neon_local` sequentially does the following: 1. launch storcon process 2. wait for storcon to signal readiness [here](`75310fe441/control_plane/src/storage_controller.rs (L804-L808)`) 3. start pageserver 4. wait for pageserver to become ready [here](`c43e664ff5/control_plane/src/pageserver.rs (L343-L346)`) 5. etc The problem is that storcon's readiness waits for the [`startup_reconcile`](`cbcd4058ed/storage_controller/src/service.rs (L520-L523)`) to complete. But pageservers aren't started at this point. So, worst case we wait for `STARTUP_RECONCILE_TIMEOUT/2`, i.e., 15s. This is more than the 10s default timeout allowed by neon_local. So, the result is that `neon_local start` fails to start storcon and stops everything. Solution -------- In this PR I choose the the radical solution to start everything in parallel. It junks up the output because we do stuff like `print!(".")` to indicate progress. We should just abandon that. And switch to `utils::logging` + `tracing` with separate spans for each component. I can do that in this PR or we leave it as a follow-up. Alternatives Considered ----------------------- The Pageserver's `/v1/status` or in fact any endpoint of the mgmt API will not `accept()` on the mgmt API socket until after the `re-attach` call to storcon returned success. So, it's insufficient to change the startup order to start Pageservers first. We cannot easily change Pageserver startup order because `init_tenant_mgr` must complete before we start serving the mgmt API. Otherwise tenant detach calls et al can race with `init_tenant_mgr`. We'd have to add a "loading" state to tenant mgr and make all API endpoints except `/v1/status` wait for _that_ to complete. Related ------- - https://github.com/neondatabase/neon/pull/6475	2024-09-18 18:17:55 +02:00
Folke Behrens	794bd4b866	proxy: mock cplane usable without allowed-ips table (#9046 )	2024-09-18 17:14:53 +02:00
Alexander Bayandin	ac6a1151ae	test_postgres_version: reenable version check for prereleased versions	2024-09-18 14:51:59 +01:00
Tristan Partin	2f37f0384c	Add v17 to revisions.json Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-18 14:51:59 +01:00
Alexander Bayandin	e161a2fa42	CI(deploy): fix deploy to staging and prod (#9030 ) ## Problem It turns out the previous approach (with `skip_if` input) doesn't work (from https://github.com/neondatabase/neon/pull/9017). Revert it and use more straightforward if-conditions ## Summary of changes - Revert `efbe8db7f1` - Add if-condition to`promote-compatibility-data` job and relevant comments	2024-09-18 14:26:47 +01:00
Folke Behrens	c5cd8577ff	proxy: make sql-over-http max request/response sizes configurable (#9029 )	2024-09-18 13:58:51 +02:00
Heikki Linnakangas	3454ef7507	Refactor ImageLayerWriter to avoid passing a Timeline to finish() (#9028 ) Commit `ca5390a89d` made a similar change to DeltaLayerWriter. We bumped into this with Stas with our hackathon project, to create a standalong program to create image layers directly from a Postgres data directory. It needs to create image layers without having a Timeline and other pageserver machinery. This downgrades the "created image layer {}" message from INFO to TRACE level. TRACE is used for the corresponding message on delta layer creation too. The path logged in the message is now the temporary path, before the file is renamed to its final name. Again commit `ca5390a89d` made the same change for the message on delta layer creation.	2024-09-18 13:16:51 +03:00
Christian Schwarz	135e7e4306	add `neon_local` subcommand for the broker & use that from regression tests (#8948 ) There's currently no way to just start/stop broker from `neon_local`. This PR * adds a sub-command * uses that sub-command from the test suite instead of the pre-existing Python `subprocess` based approach. Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.	2024-09-18 09:10:27 +02:00
Christian Schwarz	3cd2a3f931	refactor(walredo): process launch & kill-on-error machinery (#8951 ) Immediate benefit: easier to spot what's going on. Later benefit: use the extracted method in PR - https://github.com/neondatabase/neon/pull/8952 which adds a `ping` command to walredo. Found this useful during investigation https://github.com/neondatabase/cloud/issues/16886.	2024-09-17 19:16:33 +00:00
Alexander Bayandin	d78f5ce6da	CI: don't fetch the whole git history if it's not required (#9021 ) ## Problem We do use `actions/checkout` with `fetch-depth: 0` when it's not required ## Summary of changes - Remove unneeded `fetch-depth: 0` - Add a comment if `fetch-depth: 0` is required	2024-09-17 18:40:05 +01:00
Arpad Müller	a1b71b73fe	Rename some S3 usages to "remote storage" in exposed messages (#8999 ) In exposed messages like log messages we mentioned "S3", which is not entirely accurate as we support Azure blob storage now as well.	2024-09-17 19:15:01 +02:00
Tristan Partin	6138eb50e9	Fix test code related to migrations We added another migration in `5876c441ab`, but didn't bump this value. This had no effect, but best to fix it anyway. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-17 15:56:05 +01:00
Heikki Linnakangas	d211f00f05	Remove unnecessary dependencies (#9000 ) Found by "cargo machete"	2024-09-17 17:55:45 +03:00
Alexander Bayandin	cd4276fd65	CI: fix release pipeline (#9017 ) ## Problem We've got 2 non-blocking failures on the release pipeline: - `promote-compatibility-data` job got skipped _presumably_ because one of the dependencies of `deploy` job (`push-to-acr-dev`) got skipped (https://github.com/neondatabase/neon/pull/8940) - `coverage-report` job fails because we don't build debug artifacts in the release branch (https://github.com/neondatabase/neon/pull/8561) ## Summary of changes - Always run `push-to-acr-dev` / `push-to-acr-prod` jobs, but add `skip_if` parameter to the reusable workflow, which can skip the job internally, without skipping externally - Do not run `coverage-report` on release branches	2024-09-17 10:17:48 +01:00
Vlad Lazar	b719d58863	storcon: forward requests from stepped down instance to the current leader (#8954 ) ## Problem It turns out that we can't rely on external orchestration to promptly route trafic to the new leader. This is downtime inducing. Forwarding provides a safe way out. ## Safety We forward when: 1. Request is not one of ["/control/v1/step_down", "/status", "/ready", "/metrics"] 2. Current instance is in [`LeadershipStatus::SteppedDown`] state 3. There is a leader in the database to forward to 4. Leader from step (3) is not the current instance If a storcon instance is persisted in the database, then we know that it is the current leader. There's one exception: time between handling step-down request and the new leader updating the database. Let's treat the happy case first. The stepped down node does not produce any side effects, since all request handling happens on the leader. As for the edge case, we are guaranteed to always have a maximum of two running instances. Hence, if we are in the edge case scenario the leader persisted in the database is the stepped down instance that received the request. Condition (4) above covers this scenario. ## Summary of changes * Conversion utilities for reqwest <-> hyper. I'm not happy with these, but I don't see a better way. Open to suggestions. * Add request forwarding logic * Update each request handler. Again, not happy with this. If anyone knows a nice to wrap the handlers, lmk. Me and Joonas tried :/ * Update each handler to maybe forward * Tweak tests to showcase new behaviour	2024-09-17 09:25:42 +01:00
Heikki Linnakangas	2db840d8b8	Move a few test functions related to auth tokens to separate file (#9018 ) For readability. neon_fixtures.py is huge.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	4295ff0f07	Mark a couple of test fixtures as session-scoped (#9018 ) pg_distrib_dir doesn't include the Postgres version and only depends on env variables which cannot change during a test run, so it can be marked as session-scoped. Similarly, the platform cannot change during a test run.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	c6f56b8462	Remove redundant get_dir_size() function (#9018 ) There was another copy of it in utils.py. The only difference is that the version in utils.py tolerates files that are concurrently removed. That seems fine for the few callers in neon_fixtures.py too.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	fec9321fc0	Use Path type in a few more places in neon_fixtures.py (#9018 ) This is in preparation of replacing neon_fixtures.get_dir_size with neon_fixtures.utils.get_dir_size() in next commit.	2024-09-17 06:53:18 +03:00
Heikki Linnakangas	3a52e356c1	Remove unused function (#9018 )	2024-09-17 06:53:18 +03:00
Tristan Partin	5e16c7bb0b	Generate pgbench data on the server for most tests This should generally be faster when running tests, especially those that run with higher scales. Ignoring test_lfc_resize since it seems like we are hitting a query timeout for some reason that I have yet to investigate. A little bit of improvemnt is better than none. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-16 23:37:36 +01:00
Heikki Linnakangas	2bbb4d3e1c	Remove misc unused code (#9014 )	2024-09-16 18:45:19 +00:00
Matthias van de Meent	c8bedca582	Fix PG17's extension modifications (#9010 ) This also reduces the GRANT statements to one per created _reset function	2024-09-16 17:06:31 +01:00
Tristan Partin	5876c441ab	Grant access to pg_show_replication_origin_status for neon_superuser Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-16 16:38:55 +01:00
Joonas Koivunen	6ceaca96e5	Merge pull request #9005 from neondatabase/rc/2024-09-16 Storage & Compute release 2024-09-16	2024-09-16 15:35:22 +03:00
Alexander Bayandin	b2c83db54d	CI(gather-rust-build-stats): set PQ_LIB_DIR to Postgres 17 (#9001 ) ## Problem `gather-rust-build-stats` extra CI job fails with ``` "PQ_LIB_DIR" doesn't exist in the configured path: "/__w/neon/neon/pg_install/v16/lib" ``` ## Summary of changes - Use the path to Postgres 17 for the `gather-rust-build-stats` job. The job uses Postgres built by `make walproposer-lib`	2024-09-16 12:44:26 +01:00
Matthias van de Meent	0a8c5e1214	Fix broken image for PG17 (#8998 ) Most extensions are not required to run Neon-based PostgreSQL, but the Neon extension is _quite_ critical, so let's make sure we include it. ## Problem Staging doesn't have working compute images for PG17 ## Summary of changes Disable some PG17 filters so that we get the critical components into the PG17 image	2024-09-13 15:10:52 +01:00
Matthias van de Meent	78938d1b59	[compute/postgres] feature: PostgreSQL 17 (#8573 ) This adds preliminary PG17 support to Neon, based on RC1 / 2024-09-04 `07b828e9d4` NOTICE: The data produced by the included version of the PostgreSQL fork may not be compatible with the future full release of PostgreSQL 17 due to expected or unexpected future changes in magic numbers and internals. DO NOT EXPECT DATA IN V17-TENANTS TO BE COMPATIBLE WITH THE 17.0 RELEASE! Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-09-12 23:18:41 +01:00
Christian Schwarz	2f0b3e7ae2	Merge pull request #8959 from neondatabase/rc/2024-09-07 Storage release 2024-09-07	2024-09-07 15:09:13 +02:00
Alex Chi Z.	b5d41eaff4	Merge pull request #8883 from neondatabase/rc/2024-09-02 Storage & Compute release 2024-09-02	2024-09-02 23:15:52 +08:00
Anastasia Lubennikova	aa8c5d1ee9	Merge pull request #8858 from neondatabase/releases/2024-08-28-compute-only Compute release 2024-08-28	2024-08-28 20:00:51 +01:00
Christian Schwarz	4355dba46c	Merge pull request #8827 from neondatabase/rc/2024-08-26 Storage & Compute release 2024-08-26	2024-08-26 12:10:03 +02:00
Arseny Sher	cdd8014692	Merge pull request #8751 from neondatabase/rc/2024-08-19 Storage & Compute release 2024-08-19	2024-08-21 06:34:17 +03:00
Arseny Sher	c9491a5acb	Merge pull request #8765 from neondatabase/rc/2024-08-12-fixed Merge main into release with merge commit. This is a no-op PR which will incorporate into release branch last commits from main under their original SHA to prevent merge conflicts when doing release.	2024-08-21 06:31:39 +03:00
John Spray	5090281b4a	Merge pull request #8688 from neondatabase/rc/2024-08-12 Storage & Compute release 2024-08-12	2024-08-12 13:12:10 +01:00
dependabot[bot]	d69f79c7eb	chore(deps): bump aiohttp from 3.9.4 to 3.10.2 (#8684 )	2024-08-12 09:17:55 +01:00
Arpad Müller	c7c58eeab8	Also pass HOME env var in access_env_vars (#8685 ) Noticed this while debugging a test failure in #8673 which only occurs with real S3 instead of mock S3: if you authenticate to S3 via `AWS_PROFILE`, then it requires the `HOME` env var to be set so that it can read inside the `~/.aws` directory. The scrubber abstraction `StorageScrubber::scrubber_cli` in `neon_fixtures.py` would otherwise not work. My earlier PR #6556 has done similar things for the `neon_local` wrapper. You can try: ``` aws sso login --profile dev export ENABLE_REAL_S3_REMOTE_STORAGE=y REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests REMOTE_STORAGE_S3_REGION=eu-central-1 AWS_PROFILE=dev RUST_BACKTRACE=1 BUILD_TYPE=debug DEFAULT_PG_VERSION=16 ./scripts/pytest -vv --tb=short -k test_scrubber_tenant_snapshot ``` before and after this patch: this patch fixes it.	2024-08-12 09:17:55 +01:00
John Spray	66f86f184b	Update docs/SUMMARY.md (#8665 ) ## Problem This page had many dead links, and was confusing for folks looking for documentation about our product. Closes: https://github.com/neondatabase/neon/issues/8535 ## Summary of changes - Add a link to the product docs up top - Remove dead/placeholder links	2024-08-12 09:17:55 +01:00
Alexander Bayandin	642aa1e160	Dockerfiles: remove cachepot (#8666 ) ## Problem We install and try to use `cachepot`. But it is not configured correctly and doesn't work (after https://github.com/neondatabase/neon/pull/2290) ## Summary of changes - Remove `cachepot`	2024-08-12 09:17:55 +01:00
Vlad Lazar	494023f5df	storcon: skip draining shard if it's secondary is lagging too much (#8644 ) ## Problem Migrations of tenant shards with cold secondaries are holding up drains in during production deployments. ## Summary of changes If a secondary locations is lagging by more than 256MiB (configurable, but that's the default), then skip cutting it over to the secondary as part of the node drain.	2024-08-12 09:17:55 +01:00
John Spray	e9a378d1aa	pageserver: don't treat NotInitialized::Stopped as unexpected (#8675 ) ## Problem This type of error can happen during shutdown & was triggering a circuit breaker alert. ## Summary of changes - Map NotIntialized::Stopped to CompactionError::ShuttingDown, so that we may handle it cleanly	2024-08-12 09:17:55 +01:00
Alexander Bayandin	cbba8e3390	CI(pin-build-tools-image): fix permissions for Azure login (#8671 ) ## Problem Azure login fails in `pin-build-tools-image` workflow because the job doesn't have the required permissions. ``` Error: Please make sure to give write permissions to id-token in the workflow. Error: Login failed with Error: Error message: Unable to get ACTIONS_ID_TOKEN_REQUEST_URL env variable. Double check if the 'auth-type' is correct. Refer to https://github.com/Azure/login#readme for more information. ``` ## Summary of changes - Add `id-token: write` permission to `pin-build-tools-image` - Add an input to force image tagging - Unify pushing to Docker Hub with other registries - Split the job into two to have less if's	2024-08-12 09:17:55 +01:00
Alex Chi Z.	f8c0da43b5	fix(neon): disable create tablespace stmt (#8657 ) part of https://github.com/neondatabase/neon/issues/8653 Disable create tablespace stmt. It turns out it requires much less effort to do the regress test mode flag than patching the test cases, and given that we might need to support tablespaces in the future, I decided to add a new flag `regress_test_mode` to change the behavior of create tablespace. Tested manually that without setting regress_test_mode, create tablespace will be rejected. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-12 09:17:55 +01:00
Conrad Ludgate	9dfed93f70	Revert "proxy: update tokio-postgres to allow arbitrary config params (#8076 )" (#8654 ) This reverts #8076 - which was already reverted from the release branch since forever (it would have been a breaking change to release for all users who currently set TimeZone options). It's causing conflicts now so we should revert it here as well.	2024-08-12 09:17:55 +01:00
Peter Bendel	a8eebdb072	Run a subset of benchmarking job steps on GitHub action runners in Azure - closer to the system under test (#8651 ) ## Problem Latency from one cloud provider to another one is higher than within the same cloud provider. Some of our benchmarks are latency sensitive - we run a pgbench or psql in the github action runner and the system under test is running in Neon (database project). For realistic perf tps and latency results we need to compare apples to apples and run the database client in the same "latency distance" for all tests. ## Summary of changes Move job steps that test Neon databases deployed on Azure into Azure action runners. - bench strategy variant using azure database - pgvector strategy variant using azure database - pgbench-compare strategy variants using azure database ## Test run https://github.com/neondatabase/neon/actions/runs/10314848502	2024-08-12 09:17:55 +01:00
Alexander Bayandin	af8c865903	Dockerfiles: fix LegacyKeyValueFormat & JSONArgsRecommended (#8664 ) ## Problem CI complains in all PRs: ``` "ENV key=value" should be used instead of legacy "ENV key value" format ``` https://docs.docker.com/reference/build-checks/legacy-key-value-format/ See - https://github.com/neondatabase/neon/pull/8644/files ("Unchanged files with check annotations" section) - https://github.com/neondatabase/neon/actions/runs/10304090562?pr=8644 ("Annotations" section) ## Summary of changes - Use `ENV key=value` instead of `ENV key value` in all Dockerfiles	2024-08-12 09:17:55 +01:00
Alexander Bayandin	c725a3e4b1	CI(build-tools): update Rust, Python, Mold (#8667 ) ## Problem - Rust 1.80.1 has been released: https://blog.rust-lang.org/2024/08/08/Rust-1.80.1.html - Python 3.9.19 has been released: https://www.python.org/downloads/release/python-3919/ - Mold 2.33.0 has been released: https://github.com/rui314/mold/releases/tag/v2.33.0 - Unpinned `cargo-deny` in `build-tools` got updated to the latest version and doesn't work anymore with the current config file ## Summary of changes - Bump Rust to 1.80.1 - Bump Python to 3.9.19 - Bump Mold to 2.33.0 - Pin `cargo-deny`, `cargo-hack`, `cargo-hakari`, `cargo-nextest`, `rustfilt` versions - Update `deny.toml` to the latest format, see https://github.com/EmbarkStudios/cargo-deny/pull/611	2024-08-12 09:17:55 +01:00
John Spray	857ad70b71	tests: don't require kafka client for regular tests (#8662 ) ## Problem We're adding more third party dependencies to support more diverse + realistic test cases in `test_runner/logical_repl`. I ❤️ these tests, they are a good thing. The slight glitch is that python packaging is hard, and some third party python packages have issues. For example the current kafka dependency doesn't work on latest python. We can mitigate that by only importing these more specialized dependencies in the tests that use them. ## Summary of changes - Move the `kafka` import into a test body, so that folks running the regular `test_runner/regress` tests don't have to have a working kafka client package.	2024-08-12 09:17:55 +01:00
John Spray	56077caaf9	pageserver: remove paranoia double-calculation of retain_lsns (#8617 ) ## Problem This code was to mitigate risk in https://github.com/neondatabase/neon/pull/8427 As expected, we did not hit this code path - the new continuous updates of gc_info are working fine, we can remove this code now. ## Summary of changes - Remove block that double-checks retain_lsns	2024-08-12 09:17:55 +01:00
Joonas Koivunen	552832b819	fix: stop leaking BackgroundPurges (#8650 ) avoid "leaking" the completions of BackgroundPurges by: 1. switching it to TaskTracker for provided close+wait 2. stop using tokio::fs::remove_dir_all which will consume two units of memory instead of one blocking task Additionally, use more graceful shutdown in tests which do actually some background cleanup.	2024-08-12 09:17:55 +01:00
Joonas Koivunen	48ae1214c5	fix(test): do not fail test for filesystem race (#8643 ) evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8632/10287641784/index.html#suites/0e58fb04d9998963e98e45fe1880af7d/c7a46335515142b/	2024-08-12 09:17:55 +01:00
Konstantin Knizhnik	2a210d4c58	Use sycnhronous commit for logical replicaiton worker (#8645 ) ## Problem See https://neondb.slack.com/archives/C03QLRH7PPD/p1723038557449239?thread_ts=1722868375.476789&cid=C03QLRH7PPD Logical replication subscription by default use `synchronous_commit=off` which cause problems with safekeeper ## Summary of changes Set `synchronous_commit=on` for logical replication subscription in test_subscriber_restart.py ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-12 09:17:54 +01:00
John Spray	acaacd4680	pageserver: make bench_ingest build (but panic) on macOS (#8641 ) ## Problem Some developers build on MacOS, which doesn't have io_uring. ## Summary of changes - Add `io_engine_for_bench`, which on linux will give io_uring or panic if it's unavailable, and on MacOS will always panic. We do not want to run such benchmarks with StdFs: the results aren't interesting, and will actively waste the time of any developers who start investigating performance before they realize they're using a known-slow I/O backend. Why not just conditionally compile this benchmark on linux only? Because even on linux, I still want it to refuse to run if it can't get io_uring.	2024-08-12 09:17:54 +01:00
Yuchen Liang	77bb6c4cc4	feat(pageserver): add direct io pageserver config (#8622 ) Part of #8130, [RFC: Direct IO For Pageserver](https://github.com/neondatabase/neon/blob/problame/direct-io-rfc/docs/rfcs/034-direct-io-for-pageserver.md) ## Description Add pageserver config for evaluating/enabling direct I/O. - Disabled: current default, uses buffered io as is. - Evaluate: still uses buffered io, but could do alignment checking and perf simulation (pad latency by direct io RW to a fake file). - Enabled: uses direct io, behavior on alignment error is configurable. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-12 09:17:54 +01:00
Cihan Demirci	e082226a32	cicd: push build-tools image to ACR as well (#8638 ) https://github.com/neondatabase/cloud/issues/15899	2024-08-12 09:17:54 +01:00
Joonas Koivunen	40e3c913bb	refactor(timeline_detach_ancestor): replace ordered reparented with a hashset (#8629 ) Earlier I was thinking we'd need a (ancestor_lsn, timeline_id) ordered list of reparented. Turns out we did not need it at all. Replace it with an unordered hashset. Additionally refactor the reparented direct children query out, it will later be used from more places. Split off from #8430. Cc: #6994	2024-08-12 09:17:54 +01:00
Alex Chi Z.	658d763915	fix(pageserver): dump the key when it's invalid (#8633 ) We see an assertion error in staging. Dump the key to guess where it was from, and then we can fix it. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-12 09:17:54 +01:00
Joonas Koivunen	c0776b8724	fix: EphemeralFiles can outlive their Timeline via `enum LayerManager` (#8229 ) Ephemeral files cleanup on drop but did not delay shutdown, leading to problems with restarting the tenant. The solution is as proposed: - make ephemeral files carry the gate guard to delay `Timeline::gate` closing - flush in-memory layers and strong references to those on `Timeline::shutdown` The above are realized by making LayerManager an `enum` with `Open` and `Closed` variants, and fail requests to modify `LayerMap`. Additionally: - fix too eager anyhow conversions in compaction - unify how we freeze layers and handle errors - optimize likely_resident_layers to read LayerFileManager hashmap values instead of bouncing through LayerMap Fixes: #7830	2024-08-12 09:17:54 +01:00
Conrad Ludgate	1f73dfb842	proxy: random changes (#8602 ) ## Problem 1. Hard to correlate startup parameters with the endpoint that provided them. 2. Some configurations are not needed in the `ProxyConfig` struct. ## Summary of changes Because of some borrow checker fun, I needed to switch to an interior-mutability implementation of our `RequestMonitoring` context system. Using https://docs.rs/try-lock/latest/try_lock/ as a cheap lock for such a use-case (needed to be thread safe). Removed the lock of each startup message, instead just logging only the startup params in a successful handshake. Also removed from values from `ProxyConfig` and kept as arguments. (needed for local-proxy config)	2024-08-12 09:17:54 +01:00
Arpad Müller	38f184bc91	Add missing colon to ArchivalConfigRequest specification (#8627 ) Add a missing colon to the API specification of `ArchivalConfigRequest`. The `state` field is required. Pointed out by Gleb.	2024-08-12 09:17:54 +01:00
Arpad Müller	c75e6fbc46	Lower level for timeline cancellations during gc (#8626 ) Timeline cancellation running in parallel with gc yields error log lines like: ``` Gc failed 1 times, retrying in 2s: TimelineCancelled ``` They are completely harmless though and normal to occur. Therefore, only print those messages at an info level. Still print them at all so that we know what is going on if we focus on a single timeline.	2024-08-12 09:17:54 +01:00
Arpad Müller	9a3bc5556a	storage broker: only print one line for version and build tag in init (#8624 ) This makes it more consistent with pageserver and safekeeper. Also, it is easier to collect the two values into one data point.	2024-08-12 09:17:54 +01:00
Yuchen Liang	22790fc907	scrubber: clean up `scan_metadata` before prod (#8565 ) Part of #8128. ## Problem Currently, scrubber `scan_metadata` command will return with an error code if the metadata on remote storage is corrupted with fatal errors. To safely deploy this command in a cronjob, we want to differentiate between failures while running scrubber command and the erroneous metadata. At the same time, we also want our regression tests to catch corrupted metadata using the scrubber command. ## Summary of changes - Return with error code only when the scrubber command fails - Uses explicit checks on errors and warnings to determine metadata health in regression tests. Resolve conflict with `tenant-snapshot` command (after shard split): [`test_scrubber_tenant_snapshot`](https://github.com/neondatabase/neon/blob/yuchen/scrubber-scan-cleanup-before-prod/test_runner/regress/test_storage_scrubber.py#L23) failed before applying `422a8443dd` - When taking a snapshot, the old `index_part.json` in the unsharded tenant directory is not kept. - The current `list_timeline_blobs` implementation consider no `index_part.json` as a parse error. - During the scan, we are only analyzing shards with highest shard count, so we will not get a parse error. but we do need to add the layers to tenant object listing, otherwise we will get index is referencing a layer that is not in remote storage error. - Action: Add s3_layers from `list_timeline_blobs` regardless of parsing error Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-12 09:17:54 +01:00
John Spray	ba4e5b51a0	pageserver: add `bench_ingest` (#7409 ) ## Problem We lack a rust bench for the inmemory layer and delta layer write paths: it is useful to benchmark these components independent of postgres & WAL decoding. Related: https://github.com/neondatabase/neon/issues/8452 ## Summary of changes - Refactor DeltaLayerWriter to avoid carrying a Timeline, so that it can be cleanly tested + benched without a Tenant/Timeline test harness. It only needed the Timeline for building `Layer`, so this can be done in a separate step. - Add `bench_ingest`, which exercises a variety of workload "shapes" (big values, small values, sequential keys, random keys) - Include a small uncontroversial optimization: in `freeze`, only exhaustively walk values to assert ordering relative to end_lsn in debug mode. These benches are limited by drive performance on a lot of machines, but still useful as a local tool for iterating on CPU/memory improvements around this code path. Anecdotal measurements on Hetzner AX102 (Ryzen 7950xd): ``` ingest-small-values/ingest 128MB/100b seq time: [1.1160 s 1.1230 s 1.1289 s] thrpt: [113.38 MiB/s 113.98 MiB/s 114.70 MiB/s] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild Benchmarking ingest-small-values/ingest 128MB/100b rand: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 18.9s. ingest-small-values/ingest 128MB/100b rand time: [1.9001 s 1.9056 s 1.9110 s] thrpt: [66.982 MiB/s 67.171 MiB/s 67.365 MiB/s] Benchmarking ingest-small-values/ingest 128MB/100b rand-1024keys: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 10.0s. You may wish to increase target time to 11.0s. ingest-small-values/ingest 128MB/100b rand-1024keys time: [1.0715 s 1.0828 s 1.0937 s] thrpt: [117.04 MiB/s 118.21 MiB/s 119.46 MiB/s] ingest-small-values/ingest 128MB/100b seq, no delta time: [425.49 ms 429.07 ms 432.04 ms] thrpt: [296.27 MiB/s 298.32 MiB/s 300.83 MiB/s] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low mild ingest-big-values/ingest 128MB/8k seq time: [373.03 ms 375.84 ms 379.17 ms] thrpt: [337.58 MiB/s 340.57 MiB/s 343.13 MiB/s] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild ingest-big-values/ingest 128MB/8k seq, no delta time: [81.534 ms 82.811 ms 83.364 ms] thrpt: [1.4994 GiB/s 1.5095 GiB/s 1.5331 GiB/s] Found 1 outliers among 10 measurements (10.00%) ```	2024-08-12 09:17:54 +01:00
John Spray	6519f875b9	pageserver: use layer visibility when composing heatmap (#8616 ) ## Problem Sometimes, a layer is Covered by hasn't yet been evicted from local disk (e.g. shortly after image layer generation). It is not good use of resources to download these to a secondary location, as there's a good chance they will never be read. This follows the previous change that added layer visibility: - #8511 Part of epic: - https://github.com/neondatabase/neon/issues/8398 ## Summary of changes - When generating heatmaps, only include Visible layers - Update test_secondary_downloads to filter to visible layers when listing layers from an attached location	2024-08-12 09:17:54 +01:00
John Spray	ea7be4152a	pageserver: fixes for layer visibility metric (#8603 ) ## Problem In staging, we could see that occasionally tenants were wrapping their pageserver_visible_physical_size metric past zero to 2^64. This is harmless right now, but will matter more later when we start using visible size in things like the /utilization endpoint. ## Summary of changes - Add debug asserts that detect this case. `test_gc_of_remote_layers` works as a reproducer for this issue once the asserts are added. - Tighten up the interface around access_stats so that only Layer can mutate it. - In Layer, wrap calls to `record_access` in code that will update the visible size statistic if the access implicitly marks the layer visible (this was what caused the bug) - In LayerManager::rewrite_layers, use the proper set_visibility layer function instead of directly using access_stats (this is an additional path where metrics could go bad.) - Removed unused instances of LayerAccessStats in DeltaLayer and ImageLayer which I noticed while reviewing the code paths that call record_access.	2024-08-12 09:17:54 +01:00
John Spray	8d8e428d4c	tests: improve stability of `test_storage_controller_many_tenants` (#8607 ) ## Problem The controller scale test does random migrations. These mutate secondary locations, and therefore can cause secondary optimizations to happen in the background, violating the test's expectation that consistency_check will work as there are no reconciliations running. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/10247161379/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/6316beacd3fb3060/ ## Summary of changes - Only migrate to existing secondary locations, not randomly picked nodes, so that we can do a fast reconcile_until_idle (otherwise reconcile_until_idle is takes a long time to create new secondary locations). - Do a reconcile_until_idle before consistency_check.	2024-08-12 09:17:54 +01:00
a-masterov	0be952fb89	enable rum test (#8380 ) ## Problem We need to test the rum extension automatically as a path of the GitHub workflow ## Summary of changes rum test is enabled	2024-08-12 09:17:54 +01:00
a-masterov	13e794a35c	Add a test using Debezium as a client for the logical replication (#8568 ) ## Problem We need to test the logical replication with some external consumers. ## Summary of changes A test of the logical replication with Debezium as a consumer was added. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-08-12 09:17:54 +01:00
Arseny Sher	bd276839ad	Add package-mode=false to poetry. We don't use it for packaging, and 'poetry install' will soon error otherwise. Also remove name and version fields as these are not required for non-packaging mode.	2024-08-12 09:17:54 +01:00
Arpad Müller	44d9975799	storage_scrubber: migrate scan_safekeeper_metadata to remote_storage (#8595 ) Migrates the safekeeper-specific parts of `ScanMetadata` to GenericRemoteStorage, making it Azure-ready. Part of https://github.com/neondatabase/neon/issues/7547	2024-08-12 09:17:54 +01:00
Joonas Koivunen	814b090250	chore: bump index part version (#8611 ) #8600 missed the hunk changing index_part.json informative version. Include it in this PR, in addition add more non-warning index_part.json versions to scrubber.	2024-08-12 09:17:54 +01:00
Vlad Lazar	608c3cedbf	pageserver: remove legacy read path (#8601 ) ## Problem We have been maintaining two read paths (legacy and vectored) for a while now. The legacy read-path was only used for cross validation in some tests. ## Summary of changes * Tweak all tests that were using the legacy read path to use the vectored read path instead * Remove the read path dispatching based on the pageserver configs * Remove the legacy read path code We will be able to remove the single blob io code in `pageserver/src/tenant/blob_io.rs` when https://github.com/neondatabase/neon/issues/7386 is complete. Closes https://github.com/neondatabase/neon/issues/8005	2024-08-12 09:17:54 +01:00
Joonas Koivunen	b2bc5795be	feat: persistent gc blocking (#8600 ) Currently, we do not have facilities to persistently block GC on a tenant for whatever reason. We could do a tenant configuration update, but that is risky for generation numbers and would also be transient. Introduce a `gc_block` facility in the tenant, which manages per timeline blocking reasons. Additionally, add HTTP endpoints for enabling/disabling manual gc blocking for a specific timeline. For debugging, individual tenant status now includes a similar string representation logged when GC is skipped. Cc: #6994	2024-08-12 09:17:54 +01:00
Joonas Koivunen	c89ee814e1	fix: make Timeline::set_disk_consistent_lsn use fetch_max (#8311 ) now it is safe to use from multiple callers, as we have two callers.	2024-08-12 09:17:54 +01:00
Alex Chi Z.	83afea3edb	feat(pageserver): support dry-run for gc-compaction, add statistics (#8557 ) Add dry-run mode that does not produce any image layer + delta layer. I will use this code to do some experiments and see how much space we can reclaim for tenants on staging. Part of https://github.com/neondatabase/neon/issues/8002 * Add dry-run mode that runs the full compaction process without updating the layer map. (We never call finish on the writers and the files will be removed before exiting the function). * Add compaction statistics and print them at the end of compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-12 09:17:54 +01:00
Alexander Bayandin	3b4b9c1d0b	CI(benchmarking): set pub/sub projects for LR tests (#8483 ) ## Problem > Currently, long-running LR tests recreate endpoints every night. We'd like to have along-running buildup of history to exercise the pageserver in this case (instead of "unit-testing" the same behavior everynight). Closes #8317 ## Summary of changes - Update Postgres version for replication tests - Set `BENCHMARK_PROJECT_ID_PUB`/`BENCHMARK_PROJECT_ID_SUB` env vars to projects that were created for this purpose --------- Co-authored-by: Sasha Krassovsky <krassovskysasha@gmail.com>	2024-08-12 09:17:54 +01:00
Joonas Koivunen	e1339ac915	fix: allow awaiting logical size for root timelines (#8604 ) Currently if `GET /v1/tenant/x/timeline/y?force-await-initial-logical-size=true` is requested for a root timeline created within the current pageserver session, the request handler panics hitting the debug assertion. These timelines will always have an accurate (at initdb import) calculated logical size. Fix is to never attempt prioritizing timeline size calculation if we already have an exact value. Split off from #8528.	2024-08-12 09:17:54 +01:00
Alexander Bayandin	6564afb822	CI(trigger-e2e-tests): fix deadlock with Build and Test workflow (#8606 ) ## Problem In some cases, a deadlock between `build-and-test` and `trigger-e2e-tests` workflows can happen: ``` Build and Test Canceling since a deadlock for concurrency group 'Build and Test-8600/merge-anysha' was detected between 'top level workflow' and 'trigger-e2e-tests' ``` I don't understand the reason completely, probably `${{ github.workflow }}` got evaluated to the same value and somehow caused the issue. We don't need to limit concurrency for `trigger-e2e-tests` workflow. See https://neondb.slack.com/archives/C059ZC138NR/p1722869486708179?thread_ts=1722869027.960029&cid=C059ZC138NR	2024-08-12 09:17:54 +01:00
Alexander Bayandin	274c2c40b9	CI(trigger-e2e-tests): wait for promote-images job from the last commit (#8592 ) ## Problem We don't trigger e2e tests for draft PRs, but we do trigger them once a PR is in the "Ready for review" state. Sometimes, a PR can be marked as "Ready for review" before we finish image building. In such cases, triggering e2e tests fails. ## Summary of changes - Make `trigger-e2e-tests` job poll status of `promote-images` job from the build-and-test workflow for the last commit. And trigger only if the status is `success` - Remove explicit image checking from the workflow - Add `concurrency` for `triggere-e2e-tests` workflow to make it possible to cancel jobs in progress (if PR moves from "Draft" to "Ready for review" several times in a row)	2024-08-12 09:17:54 +01:00
Konstantin Knizhnik	afdbe0a7d0	Update Postgres versions to use smgrexists() instead of access() to check if Oid is used (#8597 ) ## Problem PR #7992 was merged without correspondent changes in Postgres submodules and this is why test_oid_overflow.py is failed now. ## Summary of changes Bump Postgres versions ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-08-12 09:17:54 +01:00
Alex Chi Z.	5945eadd42	feat(pageserver): support split delta layers (#8599 ) part of https://github.com/neondatabase/neon/issues/8002 Similar to https://github.com/neondatabase/neon/pull/8574, we add auto-split support for delta layers. Tests are reused from image layer split writers. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-12 09:17:54 +01:00
dotdister	b76ab45cbe	safekeeper: remove unused partial_backup_enabled option (#8547 ) ## Problem There is an unused safekeeper option `partial_backup_enabled`. `partial_backup_enabled` was implemented in #6530, but this option was always turned into enabled in #8022. If you intended to keep this option for a specific reason, I will close this PR. ## Summary of changes I removed an unused safekeeper option `partial_backup_enabled`.	2024-08-12 09:17:54 +01:00
Arpad Müller	7b7d77c817	Merge pull request #8642 from neondatabase/arpad/release-ram-hot-fix Storage release 2024-08-07	2024-08-07 20:00:43 +02:00
Joonas Koivunen	7ec831c956	fix: drain completed page_service connections (#8632 ) We've noticed increased memory usage with the latest release. Drain the joinset of `page_service` connection handlers to avoid leaking them until shutdown. An alternative would be to use a TaskTracker. TaskTracker was not discussed in original PR #8339 review, so not hot fixing it in here either.	2024-08-07 19:17:40 +02:00
Arpad Müller	1a36516d75	Merge pull request #8598 from neondatabase/rc/2024-08-05 Storage & Compute release 2024-08-05	2024-08-05 14:21:20 +02:00
Alex Chi Z.	fde8aa103e	feat(pageserver): support auto split layers based on size (#8574 ) part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes Add a `SplitImageWriter` that automatically splits image layer based on estimated target image layer size. This does not consider compression and we might need a better metrics. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-08-05 08:56:00 +02:00
Alex Chi Z.	8624aabc98	fix(pageserver): deadlock in gc-compaction (#8590 ) We need both compaction and gc lock for gc-compaction. The lock order should be the same everywhere, otherwise there could be a deadlock where A waits for B and B waits for A. We also had a double-lock issue. The compaction lock gets acquired in the outer `compact` function. Note that the unit tests directly call `compact_with_gc`, and therefore not triggering the issue. ## Summary of changes Ensure all places acquire compact lock and then gc lock. Remove an extra compact lock acqusition. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-05 08:55:59 +02:00
John Spray	3a10bf8c82	tests: add test_historic_storage_formats (#8423 ) ## Problem Currently, our backward compatibility tests only look one release back. That means, for example, that when we switch on image layer compression by default, we'll test reading of uncompressed layers for one release, and then stop doing it. When we make an index_part.json format change, we'll test against the old format for a week, then stop (unless we write separate unit tests for each old format). The reality in the field is that data in old formats will continue to exist for weeks/months/years. When we make major format changes, we should retain examples of the old format data, and continuously verify that the latest code can still read them. This test uses contents from a new path in the public S3 bucket, `compatibility-data-snapshots/`. It is populated by hand. The first important artifact is one from before we switch on compression, so that we will keep testing reads of uncompressed data. We will generate more artifacts ahead of other key changes, like when we update remote storage format for archival timelines. Closes: https://github.com/neondatabase/cloud/issues/15576	2024-08-05 08:55:59 +02:00
Arthur Petukhovsky	1758c10dec	Improve safekeepers eviction rate limiting (#8456 ) This commit tries to fix regular load spikes on staging, caused by too many eviction and partial upload operations running at the same time. Usually it was hapenning after restart, for partial backup the load was delayed. - Add a semaphore for evictions (2 permits by default) - Rename `resident_since` to `evict_not_before` and smooth out the curve by using random duration - Use random duration in partial uploads as well related to https://github.com/neondatabase/neon/issues/6338 some discussion in https://neondb.slack.com/archives/C033RQ5SPDH/p1720601531744029	2024-08-05 08:55:59 +02:00
Arpad Müller	7eb3d6bb2d	Wait for completion of the upload queue in flush_frozen_layer (#8550 ) Makes `flush_frozen_layer` add a barrier to the upload queue and makes it wait for that barrier to be reached until it lets the flushing be completed. This gives us backpressure and ensures that writes can't build up in an unbounded fashion. Fixes #7317	2024-08-05 08:55:59 +02:00
John Spray	3833e30d44	storage_controller: start adding chaos hooks (#7946 ) Chaos injection bridges the gap between automated testing (where we do lots of different things with small, short-lived tenants), and staging (where we do many fewer things, but with larger, long-lived tenants). This PR adds a first type of chaos which isn't really very chaotic: it's live migration of tenants between healthy pageservers. This nevertheless provides continuous checks that things like clean, prompt shutdown of tenants works for realistically deployed pageservers with realistically large tenants.	2024-08-05 08:55:59 +02:00
John Spray	4631179320	pageserver: refine how we delete timelines after shard split (#8436 ) ## Problem Previously, when we do a timeline deletion, shards will delete layers that belong to an ancestor. That is not a correctness issue, because when we delete a timeline, we're always deleting it from all shards, and destroying data for that timeline is clearly fine. However, there exists a race where one shard might start doing this deletion while another shard has not yet received the deletion request, and might try to access an ancestral layer. This creates ambiguity over the "all layers referenced by my index should always exist" invariant, which is important to detecting and reporting corruption. Now that we have a GC mode for clearing up ancestral layers, we can rely on that to clean up such layers, and avoid deleting them right away. This makes things easier to reason about: there are now no cases where a shard will delete a layer that belongs to a ShardIndex other than itself. ## Summary of changes - Modify behavior of RemoteTimelineClient::delete_all - Add `test_scrubber_physical_gc_timeline_deletion` to exercise this case - Tweak AWS SDK config in the scrubber to enable retries. Motivated by seeing the test for this feature encounter some transient "service error" S3 errors (which are probably nothing to do with the changes in this PR)	2024-08-05 08:55:59 +02:00
Alexander Bayandin	4eea3ce705	test_runner: don't create artifacts if Allure is not enabled (#8580 ) ## Problem `allure_attach_from_dir` method might create `tar.zst` archives even if `--alluredir` is not set (i.e. Allure results collection is disabled) ## Summary of changes - Don't run `allure_attach_from_dir` if `--alluredir` is not set	2024-08-05 08:55:59 +02:00
Alex Chi Z.	a9bcabe503	fix(pageserver): skip existing layers for btm-gc-compaction (#8498 ) part of https://github.com/neondatabase/neon/issues/8002 Due to the limitation of the current layer map implementation, we cannot directly replace a layer. It's interpreted as an insert and a deletion, and there will be file exist error when renaming the newly-created layer to replace the old layer. We work around that by changing the end key of the image layer. A long-term fix would involve a refactor around the layer file naming. For delta layers, we simply skip layers with the same key range produced, though it is possible to add an extra key as an alternative solution. * The image layer range for the layers generated from gc-compaction will be Key::MIN..(Key..MAX-1), to avoid being recognized as an L0 delta layer. * Skip existing layers if it turns out that we need to generate a layer with the same persistent key in the same generation. Note that it is possible that the newly-generated layer has different content from the existing layer. For example, when the user drops a retain_lsn, the compaction could have combined or dropped some records, therefore creating a smaller layer than the existing one. We discard the "optimized" layer for now because we cannot deal with such rewrites within the same generation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-05 08:55:59 +02:00
Alex Chi Z.	7a2625b803	storage-scrubber: log version on start (#8571 ) Helps us better identify which version of storage scrubber is running. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-05 08:55:59 +02:00
John Spray	f51dc6a44e	pageserver: add layer visibility calculation (#8511 ) ## Problem We recently added a "visibility" state to layers, but nothing initializes it. Part of: - #8398 ## Summary of changes - Add a dependency on `range-set-blaze`, which is used as a fast incrementally updated alternative to KeySpace. We could also use this to replace the internals of KeySpaceRandomAccum if we wanted to. Writing a type that does this kind of "BtreeMap & merge overlapping entries" thing isn't super complicated, but no reason to write this ourselves when there's a third party impl available. - Add a function to layermap to calculate visibilities for each layer - Add a function to Timeline to call into layermap and then apply these visibilities to the Layer objects. - Invoke the calculation during startup, after image layer creations, and when removing branches. Branch removal and image layer creation are the two ways that a layer can go from Visible to Covered. - Add unit test & benchmark for the visibility calculation - Expose `pageserver_visible_physical_size` metric, which should always be <= `pageserver_remote_physical_size`. - This metric will feed into the /v1/utilization endpoint later: the visible size indicates how much space we would like to use on this pageserver for this tenant. - When `pageserver_visible_physical_size` is greater than `pageserver_resident_physical_size`, this is a sign that the tenant has long-idle branches, which result in layers that are visible in principle, but not used in practice. This does not keep visibility hints up to date in all cases: particularly, when creating a child timeline, any previously covered layers will not get marked Visible until they are accessed. Updates after image layer creation could be implemented as more of a special case, but this would require more new code: the existing depth calculation code doesn't maintain+yield the list of deltas that would be covered by an image layer. ## Performance This operation is done rarely (at startup and at timeline deletion), so needs to be efficient but not ultra-fast. There is a new `visibility` bench that measures runtime for a synthetic 100k layers case (`sequential`) and a real layer map (`real_map`) with ~26k layers. The benchmark shows runtimes of single digit milliseconds (on a ryzen 7950). This confirms that the runtime shouldn't be a problem at startup (as we already incur S3-level latencies there), but that it's slow enough that we definitely shouldn't call it more often than necessary, and it may be worthwhile to optimize further later (things like: when removing a branch, only bother scanning layers below the branchpoint) ``` visibility/sequential time: [4.5087 ms 4.5894 ms 4.6775 ms] change: [+2.0826% +3.9097% +5.8995%] (p = 0.00 < 0.05) Performance has regressed. Found 24 outliers among 100 measurements (24.00%) 2 (2.00%) high mild 22 (22.00%) high severe min: 0/1696070, max: 93/1C0887F0 visibility/real_map time: [7.0796 ms 7.0832 ms 7.0871 ms] change: [+0.3900% +0.4505% +0.5164%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe min: 0/1696070, max: 93/1C0887F0 visibility/real_map_many_branches time: [4.5285 ms 4.5355 ms 4.5434 ms] change: [-1.0012% -0.8004% -0.5969%] (p = 0.00 < 0.05) Change within noise threshold. ```	2024-08-05 08:55:59 +02:00
Arpad Müller	a22361b57b	Reduce linux-raw-sys duplication (#8577 ) Before, we had four versions of linux-raw-sys in our dependency graph: ``` linux-raw-sys@0.1.4 linux-raw-sys@0.3.8 linux-raw-sys@0.4.13 linux-raw-sys@0.6.4 ``` now it's only two: ``` linux-raw-sys@0.4.13 linux-raw-sys@0.6.4 ``` The changes in this PR are minimal. In order to get to its state one only has to update procfs in Cargo.toml to 0.16 and do `cargo update -p tempfile -p is-terminal -p prometheus`.	2024-08-05 08:55:59 +02:00
Christian Schwarz	1e6a1ac9fa	pageserver: shutdown all walredo managers 8s into shutdown (#8572 ) # Motivation The working theory for hung systemd during PS deploy (https://github.com/neondatabase/cloud/issues/11387) is that leftover walredo processes trigger a race condition. In https://github.com/neondatabase/neon/pull/8150 I arranged that a clean Tenant shutdown does actually kill its walredo processes. But many prod machines don't manage to shut down all their tenants until the 10s systemd timeout hits and, presumably, triggers the race condition in systemd / the Linux kernel that causes the frozen systemd # Solution This PR bolts on a rather ugly mechanism to shut down tenant managers out of order 8s after we've received the SIGTERM from systemd. # Changes - add a global registry of `Weak<WalRedoManager>` - add a special thread spawned during `shutdown_pageserver` that sleeps for 8s, then shuts down all redo managers in the registry and prevents new redo managers from being created - propagate the new failure mode of tenant spawning throughout the code base - make sure shut down tenant manager results in PageReconstructError::Cancelled so that if Timeline::get calls come in after the shutdown, they do the right thing	2024-08-05 08:55:59 +02:00
Alex Chi Z.	02e8fd0b52	test(pageserver): add test_gc_feedback_with_snapshots (#8474 ) should be working after https://github.com/neondatabase/neon/pull/8328 gets merged. Part of https://github.com/neondatabase/neon/issues/8002 adds a new perf benchmark case that ensures garbages can be collected with branches --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-05 08:55:59 +02:00
Alexander Bayandin	8adc4031d0	CI(create-test-report): fix missing benchmark results in Allure report (#8540 ) ## Problem In https://github.com/neondatabase/neon/pull/8241 I've accidentally removed `create-test-report` dependency on `benchmarks` job ## Summary of changes - Run `create-test-report` after `benchmarks` job	2024-08-05 08:55:59 +02:00
Arpad Müller	46379cd3f2	storage_scrubber: migrate FindGarbage to remote_storage (#8548 ) Uses the newly added APIs from #8541 named `stream_tenants_generic` and `stream_objects_with_retries` and extends them with `list_objects_with_retries_generic` and `stream_tenant_timelines_generic` to migrate the `find-garbage` command of the scrubber to `GenericRemoteStorage`. Part of https://github.com/neondatabase/neon/issues/7547	2024-08-05 08:55:59 +02:00
John Spray	b3a76d9601	controller: simplify reconciler generation increment logic (#8560 ) ## Problem This code was confusing, untested and covered: - an impossible case, where intent state is AttacheStale (we never do this) - a rare edge case (going from AttachedMulti to Attached), which we were not testing, and in any case the pageserver internally does the same Tenant reset in this transition as it would do if we incremented generation. Closes: https://github.com/neondatabase/neon/issues/8367 ## Summary of changes - Simplify the logic to only skip incrementing the generation if the location already has the expected generation and the exact same mode.	2024-08-05 08:55:59 +02:00
Cihan Demirci	6c1bbe8434	cicd: change Azure storage details [2/2] (#8562 ) Change Azure storage configuration to point to updated variables/secrets. Also update subscription id variable.	2024-08-05 08:55:59 +02:00
Tristan Partin	a006f7656e	Fix negative replication delay metric In some cases, we can get a negative metric for replication_delay_bytes. My best guess from all the research I've done is that we evaluate pg_last_wal_receive_lsn() before pg_last_wal_replay_lsn(), and that by the time everything is said and done, the replay LSN has advanced past the receive LSN. In this case, our lag can effectively be modeled as 0 due to the speed of the WAL reception and replay.	2024-08-05 08:55:59 +02:00
Christian Schwarz	31122adee3	refactor(page_service): Timeline gate guard holding + cancellation + shutdown (#8339 ) Since the introduction of sharding, the protocol handling loop in `handle_pagerequests` cannot know anymore which concrete `Tenant`/`Timeline` object any of the incoming `PagestreamFeMessage` resolves to. In fact, one message might resolve to one `Tenant`/`Timeline` while the next one may resolve to another one. To avoid going to tenant manager, we added the `shard_timelines` which acted as an ever-growing cache that held timeline gate guards open for the lifetime of the connection. The consequence of holding the gate guards open was that we had to be sensitive to every cached `Timeline::cancel` on each interaction with the network connection, so that Timeline shutdown would not have to wait for network connection interaction. We can do better than that, meaning more efficiency & better abstraction. I proposed a sketch for it in * https://github.com/neondatabase/neon/pull/8286 and this PR implements an evolution of that sketch. The main idea is is that `mod page_service` shall be solely concerned with the following: 1. receiving requests by speaking the protocol / pagestream subprotocol 2. dispatching the request to a corresponding method on the correct shard/`Timeline` object 3. sending response by speaking the protocol / pagestream subprotocol. The cancellation sensitivity responsibilities are clear cut: * while in `page_service` code, sensitivity to page_service cancellation is sufficient * while in `Timeline` code, sensitivity to `Timeline::cancel` is sufficient To enforce these responsibilities, we introduce the notion of a `timeline::handle::Handle` to a `Timeline` object that is checked out from a `timeline::handle::Cache` for each request. The `Handle` derefs to `Timeline` and is supposed to be used for a single async method invocation on `Timeline`. See the lengthy doc comment in `mod handle` for details of the design.	2024-08-05 08:55:59 +02:00
Alex Chi Z.	311cc71b08	feat(pageserver): support btm-gc-compaction for child branches (#8519 ) part of https://github.com/neondatabase/neon/issues/8002 For child branches, we will pull the image of the modified keys from the parant into the child branch, which creates a full history for generating key retention. If there are not enough delta keys, the image won't be wrote eventually, and we will only keep the deltas inside the child branch. We could avoid the wasteful work to pull the image from the parent if we can know the number of deltas in advance, in the future (currently we always pull image for all modified keys in the child branch) --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-05 08:55:59 +02:00
Alexander Bayandin	0356fc426b	CI(regress-tests): run less regression tests (#8561 ) ## Problem We run regression tests on `release` & `debug` builds for each of the three supported Postgres versions (6 in total). With upcoming ARM support and Postgres 17, the number of jobs will jump to 16, which is a lot. See the internal discussion here: https://neondb.slack.com/archives/C033A2WE6BZ/p1722365908404329 ## Summary of changes - Run `regress-tests` job in debug builds only with the latest Postgres version - Do not do `debug` builds on release branches	2024-08-05 08:55:59 +02:00
Christian Schwarz	35738ca37f	compaction_level0_phase1: bypass PS PageCache for data blocks (#8543 ) part of https://github.com/neondatabase/neon/issues/8184 # Problem We want to bypass PS PageCache for all data block reads, but `compact_level0_phase1` currently uses `ValueRef::load` to load the WAL records from delta layers. Internally, that maps to `FileBlockReader:read_blk` which hits the PageCache [here](`e78341e1c2/pageserver/src/tenant/block_io.rs (L229-L236)`). # Solution This PR adds a mode for `compact_level0_phase1` that uses the `MergeIterator` for reading the `Value`s from the delta layer files. `MergeIterator` is a streaming k-merge that uses vectored blob_io under the hood, which bypasses the PS PageCache for data blocks. Other notable changes: * change the `DiskBtreeReader::into_stream` to buffer the node, instead of holding a `PageCache` `PageReadGuard`. * Without this, we run out of page cache slots in `test_pageserver_compaction_smoke`. * Generally, `PageReadGuard`s aren't supposed to be held across await points, so, this is a general bugfix. # Testing / Validation / Performance `MergeIterator` has not yet been used in production; it's being developed as part of * https://github.com/neondatabase/neon/issues/8002 Therefore, this PR adds a validation mode that compares the existing approach's value iterator with the new approach's stream output, item by item. If they're not identical, we log a warning / fail the unit/regression test. To avoid flooding the logs, we apply a global rate limit of once per 10 seconds. In any case, we use the existing approach's value. Expected performance impact that will be monitored in staging / nightly benchmarks / eventually pre-prod: * with validation: * increased CPU usage * ~doubled VirtualFile read bytes/second metric * no change in disk IO usage because the kernel page cache will likely have the pages buffered on the second read * without validation: * slightly higher DRAM usage because each iterator participating in the k-merge has a dedicated buffer (as opposed to before, where compactions would rely on the PS PageCaceh as a shared evicting buffer) * less disk IO if previously there were repeat PageCache misses (likely case on a busy production Pageserver) * lower CPU usage: PageCache out of the picture, fewer syscalls are made (vectored blob io batches reads) # Rollout The new code is used with validation mode enabled-by-default. This gets us validation everywhere by default, specifically in - Rust unit tests - Python tests - Nightly pagebench (shouldn't really matter) - Staging Before the next release, I'll merge the following aws.git PR that configures prod to continue using the existing behavior: * https://github.com/neondatabase/aws/pull/1663 # Interactions With Other Features This work & rollout should complete before Direct IO is enabled because Direct IO would double the IOPS & latency for each compaction read (#8240). # Future Work The streaming k-merge's memory usage is proportional to the amount of memory per participating layer. But `compact_level0_phase1` still loads all keys into memory for `all_keys_iter`. Thus, it continues to have active memory usage proportional to the number of keys involved in the compaction. Future work should replace `all_keys_iter` with a streaming keys iterator. This PR has a draft in its first commit, which I later reverted because it's not necessary to achieve the goal of this PR / issue #8184.	2024-08-05 08:55:59 +02:00
Cihan Demirci	fa24d27d38	cicd: change Azure storage details [1/2] (#8553 ) Change Azure storage configuration to point to new variables/secrets. They have the `_NEW` suffix in order not to disrupt any tests while we complete the switch.	2024-08-05 08:55:59 +02:00
Christian Schwarz	fb6c1e9390	cleanup(compact_level0_phase1): some commentary and wrapping into block expressions (#8544 ) Byproduct of scouting done for https://github.com/neondatabase/neon/issues/8184 refs https://github.com/neondatabase/neon/issues/8184	2024-08-05 08:55:59 +02:00
Yuchen Liang	d1d4631c8f	feat(scrubber): post `scan_metadata` results to storage controller (#8502 ) Part of #8128, followup to #8480. closes #8421. Enable scrubber to optionally post metadata scan health results to storage controller. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-05 08:55:59 +02:00
Yuchen Liang	b87a1384f0	feat(storcon): store scrubber metadata scan result (#8480 ) Part of #8128, followed by #8502. ## Problem Currently we lack mechanism to alert unhealthy `scan_metadata` status if we start running this scrubber command as part of a cronjob. With the storage controller client introduced to storage scrubber in #8196, it is viable to set up alert by storing health status in the storage controller database. We intentionally do not store the full output to the database as the json blobs potentially makes the table really huge. Instead, only a health status and a timestamp recording the last time metadata health status is posted on a tenant shard. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-05 08:55:59 +02:00
Anton Chaporgin	5702e1cb46	[neon/acr] impr: push to ACR while building images (#8545 ) This tests the ability to push into ACR using OIDC. Proved it worked by running slightly modified YAML. In `promote-images` we push the following images `neon compute-tools {vm-,}compute-node-{v14,v15,v16}` into `neoneastus2`. https://github.com/neondatabase/cloud/issues/14640	2024-08-05 08:55:59 +02:00
Alexander Bayandin	5be3e09082	CI(benchmarking): make neonvm default provisioner (#8538 ) ## Problem We don't allow regular end-users to use `k8s-pod` provisioner, but we still use it in nightly benchmarks ## Summary of changes - Remove `provisioner` input from `neon-create-project` action, use `k8s-neonvm` as a default provioner - Change `neon-` platform prefix to `neonvm-` - Remove `neon-captest-freetier` and `neon-captest-new` as we already have their `neonvm` counterparts	2024-08-05 08:55:59 +02:00
Arpad Müller	cd3f4b3a53	scrubber: add remote_storage based listing APIs and use them in find-large-objects (#8541 ) Add two new functions `stream_objects_with_retries` and `stream_tenants_generic` and use them in the `find-large-objects` subcommand, migrating it to `remote_storage`. Also adds the `size` field to the `ListingObject` struct. Part of #7547	2024-08-05 08:55:59 +02:00
Arpad Müller	57f22178d7	Add metrics for input data considered and taken for compression (#8522 ) If compression is enabled, we currently try compressing each image larger than a specific size and if the compressed version is smaller, we write that one, otherwise we use the uncompressed image. However, this might sometimes be a wasteful process, if there is a substantial amount of images that don't compress well. The compression metrics added in #8420 `pageserver_compression_image_in_bytes_total` and `pageserver_compression_image_out_bytes_total` are well designed for answering the question how space efficient the total compression process is end-to-end, which helps one to decide whether to enable it or not. To answer the question of how much waste there is in terms of trial compression, so CPU time, we add two metrics: * one about the images that have been trial-compressed (considered), and * one about the images where the compressed image has actually been written (chosen). There is different ways of weighting them, like for example one could look at the count, or the compressed data. But the main contributor to compression CPU usage is amount of data processed, so we weight the images by their uncompressed size. In other words, the two metrics are: * `pageserver_compression_image_in_bytes_considered` * `pageserver_compression_image_in_bytes_chosen` Part of #5431	2024-08-05 08:55:59 +02:00
John Spray	3f05758d09	scrubber: enable cleaning up garbage tenants from known deletion bugs, add object age safety check (#8461 ) ## Problem Old storage buckets can contain a lot of tenants that aren't known to the control plane at all, because they belonged to test jobs that get their control plane state cleaned up shortly after running. In general, it's somewhat unsafe to purge these, as it's hard to distinguish "control plane doesn't know about this, so it's garbage" from "control plane said it didn't know about this, which is a bug in the scrubber, control plane, or API URL configured". However, the most common case is that we see only a small husk of a tenant in S3 from a specific old behavior of the software, for example: - We had a bug where heatmaps weren't deleted on tenant delete - When WAL DR was first deployed, we didn't delete initdb.tar.zst on tenant deletion ## Summary of changes - Add a KnownBug variant for the garbage reason - Include such cases in the "safe" deletion mode (`--mode=deleted`) - Add code that inspects tenants missing in control plane to identify cases of known bugs (this is kind of slow, but should go away once we've cleaned all these up) - Add an additional `-min-age` safety check similar to physical GC, where even if everything indicates objects aren't needed, we won't delete something that has been modified too recently. --------- Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-08-05 08:55:59 +02:00
Christian Schwarz	010203a49e	l0_flush: use mode=direct by default => coverage in automated tests (#8534 ) Testing in staging and pre-prod has been [going well](https://github.com/neondatabase/neon/issues/7418#issuecomment-2255474917). This PR enables mode=direct by default, thereby providing additional coverage in the automated tests: - Rust tests - Integration tests - Nightly pagebench (likely irrelevant because it's read-only) Production deployments continue to use `mode=page-cache` for the time being: https://github.com/neondatabase/aws/pull/1655 refs https://github.com/neondatabase/neon/issues/7418	2024-08-05 08:55:59 +02:00
John Spray	7c40266c82	pageserver: fix return code from secondary_download_handler (#8508 ) ## Problem The secondary download HTTP API is meant to return 200 if the download is complete, and 202 if it is still in progress. In #8198 the download implementation was changed to drop out with success early if it over-runs a time budget, which resulted in 200 responses for incomplete downloads. This breaks storcon_cli's "tenant-warmup" command, which uses the OK status to indicate download complete. ## Summary of changes - Only return 200 if we get an Ok() _and_ the progress stats indicate the download is complete.	2024-08-05 08:55:59 +02:00
Joonas Koivunen	7b3f94c1f0	test: deflake test_duplicate_creation (#8536 ) By including comparison of `remote_consistent_lsn_visible` we risk flakyness coming from outside of timeline creation. Mask out the `remote_consistent_lsn_visible` for the comparison. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8489/10142336315/index.html#suites/ffbb7f9930a77115316b58ff32b7c719/89ff0270bf58577a	2024-08-05 08:55:59 +02:00
a-masterov	d8205248e2	Add a test for clickhouse as a logical replication consumer (#8408 ) ## Problem We need to test logical replication with 3rd-party tools regularly. ## Summary of changes Added a test using ClickHouse as a client Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-08-05 08:55:59 +02:00
Arpad Müller	a4d3e0c747	Adopt list_streaming in tenant deletion (#8504 ) Uses the Stream based `list_streaming` function added by #8457 in tenant deletion, as suggested in https://github.com/neondatabase/neon/pull/7932#issuecomment-2150480180 . We don't have to worry about retries, as the function is wrapped inside an outer retry block. If there is a retryable error either during the listing or during deletion, we just do a fresh start. Also adds `+ Send` bounds as they are required by the `delete_tenant_remote` function.	2024-08-05 08:55:59 +02:00
Joonas Koivunen	df0748289b	Merge pull request #8533 from neondatabase/rc/2024-07-29 Storage & Compute release 2024-07-29	2024-07-29 19:14:29 +03:00
Joonas Koivunen	407bf968c1	Merge remote-tracking branch 'origin/release' into rc/2024-07-29	2024-07-29 15:15:04 +00:00
Christian Schwarz	e0a5bb17ed	pageserver: fail if `id` is present in pageserver.toml (#8489 ) Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4 --- `identity.toml` is the authoritative place for `id` as of https://github.com/neondatabase/neon/pull/7766 refs https://github.com/neondatabase/neon/issues/7736	2024-07-29 15:08:15 +00:00
Stas Kelvich	6026cbfb63	Merge pull request #8530 from neondatabase/releases/2024-07-26-compute-only-sk Compute release 2024-06-26	2024-07-26 17:32:22 +01:00
Em Sharnoff	3a0ee16ed5	Fix sql-exporter-autoscaling for pg < 16 (#8523 ) The lfc_approximate_working_set_size_windows query was failing on pg14 and pg15 with pq: subquery in FROM must have an alias Because aliases in that position became optional only in pg16. Some context here: https://neondb.slack.com/archives/C04DGM6SMTM/p1721970322601679?thread_ts=1721921122.528849	2024-07-26 16:35:16 +01:00
Stas Kelvich	dbcfc01471	Merge pull request #8514 from neondatabase/releases/2024-07-25-compute-only Compute release 2024-07-25	2024-07-25 22:42:17 +01:00
Anastasia Lubennikova	8bf597c4d7	Update pgrx to v 0.11.3 (#8515 ) update pg_jsonschema extension to v 0.3.1 update pg_graphql extension to v1.5.7 update pgx_ulid extension to v0.1.5 update pg_tiktoken extension, patch Cargo.toml to use new pgrx	2024-07-25 13:22:53 -07:00
Em Sharnoff	138ae15a91	vm-image: Expose new LFC working set size metrics (#8298 ) In general, replace: * 'lfc_approximate_working_set_size' with * 'lfc_approximate_working_set_size_windows' For the "main" metrics that are actually scraped and used internally, the old one is just marked as deprecated. For the "autoscaling" metrics, we're not currently using the old one, so we can get away with just replacing it. Also, for the user-visible metrics we'll only store & expose a few different time windows, to avoid making the UI overly busy or bloating our internal metrics storage. But for the autoscaling-related scraper, we aren't storing the metrics, and it's useful to be able to programmatically operate on the trendline of how WSS increases (or doesn't!) with window size. So there, we can just output datapoints for each minute. Part of neondatabase/autoscaling#872 See also https://www.notion.so/neondatabase/cca38138fadd45eaa753d81b859490c6	2024-07-25 16:34:29 +01:00
Konstantin Knizhnik	59eeadabe9	Change default version of Neon extensio to 1.4	2024-07-25 16:33:49 +01:00
Christian Schwarz	daf8edd986	Merge pull request #8468 from neondatabase/rc/2024-07-23-manual Storage release 2024-07-23 We did not deploy yesterday's * https://github.com/neondatabase/neon/pull/8451 because of CICD troubles with pre-prod. Also, it was missing * https://github.com/neondatabase/neon/pull/7766 which is low-risk and unblocks more cleanup work that would otherwise have to wait until after next week's release. So, this PR cherry-picks #7766 and creates a new storage release. Compute will release separately later this week. Back pointer to Slack thread: https://neondb.slack.com/archives/C03H1K0PGKH/p1721650191019099	2024-07-24 12:02:14 +02:00
Vlad Lazar	a1272b6ed8	pageserver: use identity file as node id authority and remove init command and config-override flags (#7766 ) Ansible will soon write the node id to `identity.toml` in the work dir for new pageservers. On the pageserver side, we read the node id from the identity file if it is present and use that as the source of truth. If the identity file is missing, cannot be read, or does not deserialise, start-up is aborted. This PR also removes the `--init` mode and the `--config-override` flag from the `pageserver` binary. The neon_local is already not using these flags anymore. Ansible still uses them until the linked change is merged & deployed, so, this PR has to land simultaneously or after the Ansible change due to that. Related Ansible change: https://github.com/neondatabase/aws/pull/1322 Cplane change to remove config-override usages: https://github.com/neondatabase/cloud/pull/13417 Closes: https://github.com/neondatabase/neon/issues/7736 Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-23 12:55:46 +02:00
Christian Schwarz	28ee7cdede	Merge pull request #8451 from neondatabase/rc/2024-07-22 ## Storage & Compute release 2024-07-22 This PR has so many commits because the release branch diverged from `main`. Details https://neondb.slack.com/archives/C033A2WE6BZ/p1721650938949059?thread_ts=1721308848.034069&cid=C033A2WE6BZ The commit range that is truly new since the last storage release are the the `main` commit which I cherry-picked using this command ``` git cherry-pick 8a8b83df27383a07bb7dbba519325c15d2f46357..4e547e6 ```	2024-07-22 19:17:01 +02:00
Christian Schwarz	7b63092958	Merge commit '4e547e6' into rc/2024-07-22 See https://neondb.slack.com/archives/C033A2WE6BZ/p1721650938949059?thread_ts=1721308848.034069&cid=C033A2WE6BZ	2024-07-22 14:40:55 +02:00
Arpad Müller	31bfeaf934	Use DefaultCredentialsChain AWS authentication in remote_storage (#8440 ) PR #8299 has switched the storage scrubber to use `DefaultCredentialsChain`. Now we do this for `remote_storage`, as it allows us to use `remote_storage` from inside kubernetes. Most of the diff is due to `GenericRemoteStorage::from_config` becoming `async fn`.	2024-07-22 14:36:56 +02:00
Arpad Müller	21b3a191bf	Add archival_config endpoint to pageserver (#8414 ) This adds an archival_config endpoint to the pageserver. Currently it has no effect, and always "works", but later the intent is that it will make a timeline archived/unarchived. - [x] add yml spec - [x] add endpoint handler Part of https://github.com/neondatabase/neon/issues/8088	2024-07-22 14:36:56 +02:00
Shinya Kato	f7f9b4aaec	Fix openapi specification (#8273 ) ## Problem There are some swagger errors in `pageserver/src/http/openapi_spec.yml` ``` Error 431 15000 Object includes not allowed fields Error 569 3100401 should always have a 'required' Error 569 15000 Object includes not allowed fields Error 1111 10037 properties members must be schemas ``` ## Summary of changes Fixed the above errors.	2024-07-22 14:36:56 +02:00
John Spray	bba062e262	tests: longer timeouts in test_timeline_deletion_with_files_stuck_in_upload_queue (#8438 ) ## Problem This test had two locations with 2 second timeouts, which is rather low when we run on a highly contended test machine running lots of tests in parallel. It usually passes, but today I've seen both of these locations time out on separate PRs. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8432/10007868041/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/6c6a092be083d27c ## Summary of changes - Change 2 second timeouts to 20 second timeouts	2024-07-22 14:36:56 +02:00
Shinya Kato	067363fe95	safekeeper: remove unused safekeeper runtimes (#8433 ) There are unused safekeeper runtimes `WAL_REMOVER_RUNTIME` and `METRICS_SHIFTER_RUNTIME`. `WAL_REMOVER_RUNTIME` was implemented in [#4119](https://github.com/neondatabase/neon/pull/4119) and removed in [#7887](https://github.com/neondatabase/neon/pull/7887). `METRICS_SHIFTER_RUNTIME` was also implemented in [#4119](https://github.com/neondatabase/neon/pull/4119) but has never been used. I removed unused safekeeper runtimes `WAL_REMOVER_RUNTIME` and `METRICS_SHIFTER_RUNTIME`.	2024-07-22 14:36:56 +02:00
John Spray	affe408433	storage scrubber: GC ancestor shard layers (#8196 ) ## Problem After a shard split, the pageserver leaves the ancestor shard's content in place. It may be referenced by child shards, but eventually child shards will de-reference most ancestor layers as they write their own data and do GC. We would like to eventually clean up those ancestor layers to reclaim space. ## Summary of changes - Extend the physical GC command with `--mode=full`, which includes cleaning up unreferenced ancestor shard layers - Add test `test_scrubber_physical_gc_ancestors` - Remove colored log output: in testing this is irritating ANSI code spam in logs, and in interactive use doesn't add much. - Refactor storage controller API client code out of storcon_client into a `storage_controller/client` crate - During physical GC of ancestors, call into the storage controller to check that the latest shards seen in S3 reflect the latest state of the tenant, and there is no shard split in progress.	2024-07-22 14:36:56 +02:00
Christian Schwarz	9b883e4651	pageserver: remove obsolete cached_metric_collection_interval (#8370 ) We're removing the usage of this long-meaningless config field in https://github.com/neondatabase/aws/pull/1599 Once that PR has been deployed to staging and prod, we can merge this PR.	2024-07-22 14:36:56 +02:00
Peter Bendel	b98b301d56	Bodobolero/fix root permissions (#8429 ) ## Problem My prior PR https://github.com/neondatabase/neon/pull/8422 caused leftovers in the GitHub action runner work directory with root permission. As an example see here https://github.com/neondatabase/neon/actions/runs/10001857641/job/27646237324#step:3:37 To work-around we install vanilla postgres as non-root using deb packages in /home/nonroot user directory ## Summary of changes - since we cannot use root we install the deb pkgs directly and create symbolic links for psql, pgbench and libs in expected places - continue jobs an aws even if azure jobs fail (because this region is currently unreliable)	2024-07-22 14:36:56 +02:00
Arpad Müller	ed7ee73cba	Enable zstd in tests (#8368 ) Successor of #8288 , just enable zstd in tests. Also adds a test that creates easily compressable data. Part of #5431 --------- Co-authored-by: John Spray <john@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-22 14:36:56 +02:00
Arthur Petukhovsky	fceace835b	Change log level for GuardDrop error (#8305 ) The error means that manager exited earlier than `ResidenceGuard` and it's not unexpected with current deletion implementation. This commit changes log level to reduse noise.	2024-07-22 14:36:56 +02:00
Peter Bendel	1b508a6082	Temporarily use vanilla pgbench and psql (client) for running pgvector benchmark (#8422 ) ## Problem https://github.com/neondatabase/neon/issues/8275 is not yet fixed Periodic benchmarking fails with SIGABRT in pgvector step, see https://github.com/neondatabase/neon/actions/runs/9967453263/job/27541159738#step:7:393 ## Summary of changes Instead of using pgbench and psql from Neon artifacts, download vanilla postgres binaries into the container and use those to run the client side of the test.	2024-07-22 14:36:56 +02:00
Alex Chi Z.	f87b031876	pageserver: integrate k-merge with bottom-most compaction (#8415 ) Use the k-merge iterator in the compaction process to reduce memory footprint. part of https://github.com/neondatabase/neon/issues/8002 ## Summary of changes * refactor the bottom-most compaction code to use k-merge iterator * add Send bound on some structs as it is used across the await points --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-22 14:36:56 +02:00
Arthur Petukhovsky	9f1ba2c4bf	Fix partial upload bug with invalid remote state (#8383 ) We have an issue that some partial uploaded segments can be actually missing in remote storage. I found this issue when was looking at the logs in staging, and it can be triggered by failed uploads: 1. Code tries to upload `SEG_TERM_LSN_LSN_sk5.partial`, but receives error from S3 2. The failed attempt is saved to `segments` vec 3. After some time, the code tries to upload `SEG_TERM_LSN_LSN_sk5.partial` again 4. This time the upload is successful and code calls `gc()` to delete previous uploads 5. Since new object and old object share the same name, uploaded data gets deleted from remote storage This commit fixes the issue by patching `gc()` not to delete objects with the same name as currently uploaded. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-22 14:36:56 +02:00
John Spray	9868bb3346	tests: turn on safekeeper eviction by default (#8352 ) ## Problem Ahead of enabling eviction in the field, where it will become the normal/default mode, let's enable it by default throughout our tests in case any issues become visible there. ## Summary of changes - Make default `extra_opts` for safekeepers enable offload & deletion - Set low timeouts in `extra_opts` so that tests running for tens of seconds have a chance to hit some of these background operations.	2024-07-22 14:36:56 +02:00
John Spray	27da0e9cf5	tests: increase test_pg_regress and test_isolation timeouts (#8418 ) ## Problem These tests time out ~1 in 50 runs when in debug mode. There is no indication of a real issue: they're just wrappers that have large numbers of individual tests contained within on pytest case. ## Summary of changes - Bump pg_regress timeout from 600 to 900s - Bump test_isolation timeout from 300s (default) to 600s In future it would be nice to break out these tests to run individual cases (or batches thereof) as separate tests, rather than this monolith.	2024-07-22 14:36:56 +02:00
John Spray	de9bf2af6c	tests: fix metrics check in test_s3_eviction (#8419 ) ## Problem This test would occasionally fail its metric check. This could happen in the rare case that the nodes had all been restarted before their most recent eviction. The metric check was added in https://github.com/neondatabase/neon/pull/8348 ## Summary of changes - Check metrics before each restart, accumulate into a bool that we assert on at the end of the test	2024-07-22 14:36:56 +02:00
Christian Schwarz	3d2c2ce139	NeonEnv.from_repo_dir: use storage_controller_db instead of `attachments.json` (#8382 ) When `NeonEnv.from_repo_dir` was introduced, storage controller stored its state exclusively `attachments.json`. Since then, it has moved to using Postgres, which stores its state in `storage_controller_db`. But `NeonEnv.from_repo_dir` wasn't adjusted to do this. This PR rectifies the situation. Context for this is failures in `test_pageserver_characterize_throughput_with_n_tenants` CF: https://neondb.slack.com/archives/C033RQ5SPDH/p1721035799502239?thread_ts=1720901332.293769&cid=C033RQ5SPDH Notably, `from_repo_dir` is also used by the backwards- and forwards-compatibility. Thus, the changes in this PR affect those tests as well. However, it turns out that the compatibility snapshot already contains the `storage_controller_db`. Thus, it should just work and in fact we can remove hacks like `fixup_storage_controller`. Follow-ups created as part of this work: * https://github.com/neondatabase/neon/issues/8399 * https://github.com/neondatabase/neon/issues/8400	2024-07-22 14:36:56 +02:00
dotdister	82a2081d61	Fix comment in Control Plane (#8406 ) ## Problem There are something wrong in the comment of `control_plane/src/broker.rs` and `control_plane/src/pageserver.rs` ## Summary of changes Fixed the comment about component name and their data path in `control_plane/src/broker.rs` and `control_plane/src/pageserver.rs`.	2024-07-22 14:36:56 +02:00
Joonas Koivunen	ff174a88c0	test: allow requests to any pageserver get cancelled (#8413 ) Fix flakyness on `test_sharded_timeline_detach_ancestor` which does not reproduce on a fast enough runner by allowing cancelled request before completing on all pageservers. It was only allowed on half of the pageservers. Failure evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8352/9972357040/index.html#suites/a1c2be32556270764423c495fad75d47/7cca3e3d94fe12f2	2024-07-22 14:36:56 +02:00
John Spray	ef3ebfaf67	pageserver: layer count & size metrics (#8410 ) ## Problem We lack insight into: - How much of a tenant's physical size is image vs. delta layers - Average sizes of image vs. delta layers - Total layer counts per timeline, indicating size of index_part object As well as general observability love, this is motivated by https://github.com/neondatabase/neon/issues/6738, where we need to define some sensible thresholds for storage amplification, and using total physical size may not work well (if someone does a lot of DROPs then it's legitimate for the physical-synthetic ratio to be huge), but the ratio between image layer size and delta layer size may be a better indicator of whether we're generating unreasonable quantities of image layers. ## Summary of changes - Add pageserver_layer_bytes and pageserver_layer_count metrics, labelled by timeline and `kind` (delta or image) - Add & subtract these with LayerInner's lifetime. I'm intentionally avoiding using a generic metric RAII guard object, to avoid bloating LayerInner: it already has all the information it needs to update metric on new+drop.	2024-07-22 14:36:56 +02:00
Yuchen Liang	ae1af558b4	docs: update storage controller db name in doc (#8411 ) The db name was renamed to storage_controller from attachment_service. Doc was stale.	2024-07-22 14:36:56 +02:00
John Spray	c150ad4ee2	tests: add test_compaction_l0_memory (#8403 ) This test reproduces the case of a writer creating a deep stack of L0 layers. It uses realistic layer sizes and writes several gigabytes of data, therefore runs as a performance test although it is validating memory footprint rather than performance per se. It acts a regression test for two recent fixes: - https://github.com/neondatabase/neon/pull/8401 - https://github.com/neondatabase/neon/pull/8391 In future it will demonstrate the larger improvement of using a k-merge iterator for L0 compaction (#8184) This test can be extended to enforce limits on the memory consumption of other housekeeping steps, by restarting the pageserver and then running other things to do the same "how much did RSS increase" measurement.	2024-07-22 14:36:56 +02:00
Alex Chi Z.	a98ccd185b	test(pageserver): more k-merge tests on duplicated keys (#8404 ) Existing tenants and some selection of layers might produce duplicated keys. Add tests to ensure the k-merge iterator handles it correctly. We also enforced ordering of the k-merge iterator to put images before deltas. part of https://github.com/neondatabase/neon/issues/8002 --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-07-22 14:36:56 +02:00
Peter Bendel	9f796ebba9	Bodobolero/pgbench compare azure (#8409 ) ## Problem We want to run performance tests on all supported cloud providers. We want to run most tests on the postgres version which is default for new projects in production, currently (July 24) this is postgres version 16 ## Summary of changes - change default postgres version for some (performance) tests to 16 (which is our default for new projects in prod anyhow) - add azure region to pgbench_compare jobs - add azure region to pgvector benchmarking jobs - re-used project `weathered-snowflake-88107345` was prepared with 1 million embeddings running on 7 minCU 7 maxCU in azure region to compare with AWS region (pgvector indexing and hnsw queries) - see job pgbench-pgvector - Note we now have a 11 environments combinations where we run pgbench-compare and 5 are for k8s-pod (deprecated) which we can remove in the future once auto-scaling team approves. ## Logs A current run with the changes from this pull request is running here https://github.com/neondatabase/neon/actions/runs/9972096222 Note that we currently expect some failures due to - https://github.com/neondatabase/neon/issues/8275 - instability of projects on azure region	2024-07-22 14:36:56 +02:00
John Spray	d51ca338c4	docs/rfcs: timeline ancestor detach API (#6888 ) ## Problem When a tenant creates a new timeline that they will treat as their 'main' history, it is awkward to permanently retain an 'old main' timeline as its ancestor. Currently this is necessary because it is forbidden to delete a timeline which has descendents. ## Summary of changes A new pageserver API is proposed to 'adopt' data from a parent timeline into one of its children, such that the link between ancestor and child can be severed, leaving the parent in a state where it may then be deleted. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-22 14:36:56 +02:00
John Spray	07e78102bf	pageserver: reduce size of delta layer ValueRef (#8401 ) ## Problem ValueRef is an unnecessarily large structure, because it carries a cursor. L0 compaction currently instantiates gigabytes of these under some circumstances. ## Summary of changes - Carry a ref to the parent layer instead of a cursor, and construct a cursor on demand. This reduces RSS high watermark during L0 compaction by about 20%.	2024-07-22 14:36:56 +02:00
John Spray	b21e131d11	pageserver: exclude un-read layers from short residence statistic (#8396 ) ## Problem The `evictions_with_low_residence_duration` is used as an indicator of cache thrashing. However, there are situations where it is quite legitimate to only have a short residence during compaction, where a delta is downloaded, used to generate an image layer, and then discarded. This can lead to false positive alerts. ## Summary of changes - Only track low residence duration for layers that have been accessed at least once (compaction doesn't count as an access). This will give us a metric that indicates thrashing on layers that the _user_ is using, rather than those we're downloading for housekeeping purposes. Once we add "layer visibility" as an explicit property of layers, this can also be used as a cleaner condition (residence of non-visible layers should never be alertable)	2024-07-22 14:36:56 +02:00
Alex Chi Z.	abe3b4e005	fix(pageserver): limit num of delta layers for l0 compaction (#8391 ) ## Problem close https://github.com/neondatabase/neon/issues/8389 ## Summary of changes A quick mitigation for tenants with fast writes. We compact at most 60 delta layers at a time, expecting a memory footprint of 15GB. We will pick the oldest 60 L0 layers. This should be a relatively safe change so no test is added. Question is whether to make this parameter configurable via tenant config. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2024-07-22 14:36:56 +02:00
Tristan Partin	18e7c2b7a1	Add some typing to Endpoint.respec()	2024-07-22 14:36:56 +02:00
Tristan Partin	ad5d784fb7	Hide import behind TYPE_CHECKING	2024-07-22 14:36:56 +02:00
Tristan Partin	85d47637ee	Run each migration in its own transaction Previously, every migration was run in the same transaction. This is preparatory work for fixing CVE-2024-4317.	2024-07-22 14:36:56 +02:00
Tristan Partin	7e818ee390	Rename compute migrations to start at 1 This matches what we put into the neon_migration.migration_id table.	2024-07-22 14:36:56 +02:00
John Spray	bff505426e	pageserver: clean up GcCutoffs names (#8379 ) - `horizon` is a confusing term, it's not at all obvious that this means space-based retention limit, rather than the total GC history limit. Rename to `GcCutoffs::space`. - `pitr` is less confusing, but still an unecessary level of indirection from what we really mean: a time-based condition. The fact that we use that that time-history for Point In Time Recovery doesn't mean we have to refer to time as "pitr" everywhere. Rename to `GcCutoffs::time`.	2024-07-22 14:36:56 +02:00
dependabot[bot]	bf7de92dc2	build(deps): bump setuptools from 65.5.1 to 70.0.0 (#8387 ) Bumps [setuptools](https://github.com/pypa/setuptools) from 65.5.1 to 70.0.0. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>	2024-07-22 14:36:56 +02:00
Arpad Müller	9dc71f5a88	Avoid the storage controller in test_tenant_creation_fails (#8392 ) As described in #8385, the likely source for flakiness in test_tenant_creation_fails is the following sequence of events: 1. test instructs the storage controller to create the tenant 2. storage controller adds the tenant and persists it to the database. issues a creation request 3. the pageserver restarts with the failpoint disabled 4. storage controller's background reconciliation still wants to create the tenant 5. pageserver gets new request to create the tenant from background reconciliation This commit just avoids the storage controller entirely. It has its own set of issues, as the re-attach request will obviously not include the tenant, but it's still useful to test for non-existence of the tenant. The generation is also not optional any more during tenant attachment. If you omit it, the pageserver yields an error. We change the signature of `tenant_attach` to reflect that. Alternative to #8385 Fixes #8266	2024-07-22 14:36:56 +02:00
Anastasia Lubennikova	2ede9d7a25	Compute: add compatibility patch for rum Fixes #8251	2024-07-22 14:36:56 +02:00
John Spray	ea5460843c	pageserver: un-Arc Timeline::layers (#8386 ) ## Problem This structure was in an Arc<> unnecessarily, making it harder to reason about its lifetime (i.e. it was superficially possible for LayerManager to outlive timeline, even though no code used it that way) ## Summary of changes - Remove the Arc<>	2024-07-22 14:36:56 +02:00
Arpad Müller	5b16624bcc	Allow the new clippy::doc_lazy_continuation lint (#8388 ) The `doc_lazy_continuation` lint of clippy is still unknown on latest rust stable. Fixes fall-out from #8151.	2024-07-22 14:36:56 +02:00
Sasha Krassovsky	349373cb11	Allow reusing projects between runs of logical replication benchmarks (#8393 )	2024-07-22 14:36:56 +02:00
Joonas Koivunen	957f99cad5	feat(timeline_detach_ancestor): success idempotency (#8354 ) Right now timeline detach ancestor reports an error (409, "no ancestor") on a new attempt after successful completion. This makes it troublesome for storage controller retries. Fix it to respond with `200 OK` as if the operation had just completed quickly. Additionally, the returned timeline identifiers in the 200 OK response are now ordered so that responses between different nodes for error comparison are done by the storage controller added in #8353. Design-wise, this PR introduces a new strategy for accessing the latest uploaded IndexPart: `RemoteTimelineClient::initialized_upload_queue(&self) -> Result<UploadQueueAccessor<'_>, NotInitialized>`. It should be a more scalable way to query the latest uploaded `IndexPart` than to add a query method for each question directly on `RemoteTimelineClient`. GC blocking will need to be introduced to make the operation fully idempotent. However, it is idempotent for the cases demonstrated by tests. Cc: #6994	2024-07-22 14:36:56 +02:00
John Spray	2a3a136474	pageserver: use PITR GC cutoffs as authoritative (#8365 ) ## Problem Pageserver GC uses a size-based condition (GC "horizon" in addition to time-based "PITR"). Eventually we plan to retire the size-based condition: https://github.com/neondatabase/neon/issues/6374 Currently, we always apply the more conservative of the two, meaning that tenants always retain at least 64MB of history (default horizon), even after a very long time has passed. This is particularly acute in cases where someone has dropped tables/databases, and then leaves a database idle: the horizon can prevent GCing very large quantities of historical data (we already account for this in synthetic size by ignoring gc horizon). We're not entirely removing GC horizon right now because we don't want to 100% rely on standby_horizon for robustness of physical replication, but we can tweak our logic to avoid retaining that 64MB LSN length indefinitely. ## Summary of changes - Rework `Timeline::find_gc_cutoffs`, with new logic: - If there is no PITR set, then use `DEFAULT_PITR_INTERVAL` (1 week) to calculate a time threshold. Retain either the horizon or up to that thresholds, whichever requires less data. - When there is a PITR set, and we have unambiguously resolved the timestamp to an LSN, then ignore the GC horizon entirely. For typical PITRs (1 day, 1 week), this will still easily retain enough data to avoid stressing read only replicas. The key property we end up with, whether a PITR is set or not, is that after enough time has passed, our GC cutoff on an idle timeline will catch up with the last_record_lsn. Using `DEFAULT_PITR_INTERVAL` is a bit of an arbitrary hack, but this feels like it isn't really worth the noise of exposing in TenantConfig. We could just make it a different named constant though. The end-end state will be that there is no gc_horizon at all, and that tenants with pitr_interval=0 would truly retain no history, so this constant would go away.	2024-07-22 14:36:56 +02:00
Joonas Koivunen	cfaf30f5e8	feat(storcon): timeline detach ancestor passthrough (#8353 ) Currently storage controller does not support forwarding timeline detach ancestor requests to pageservers. Add support for forwarding `PUT .../:tenant_id/timelines/:timeline_id/detach_ancestor`. Implement the support mostly as is, because the timeline detach ancestor will be made (mostly) idempotent in future PR. Cc: #6994	2024-07-22 14:36:56 +02:00
Christian Schwarz	72c2d0812e	remove page_service `show <tenant_id>` (#8372 ) This operation isn't used in practice, so let's remove it. Context: in https://github.com/neondatabase/neon/pull/8339	2024-07-22 14:36:56 +02:00
Arseny Sher	537ecf45f8	Fix test_timeline_copy flakiness. fixes https://github.com/neondatabase/neon/issues/8355	2024-07-22 14:31:12 +02:00
Luca Bruno	1637a6ee05	proxy/http: switch to typed_json (#8377 ) ## Summary of changes This switches JSON rendering logic to `typed_json` in order to reduce the number of allocations in the HTTP responder path. Followup from https://github.com/neondatabase/neon/pull/8319#issuecomment-2216991760. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-07-22 14:30:53 +02:00
Alex Chi Z	d74fb7b879	Merge pull request #8374 from neondatabase/rc/2024-07-15 Storage & Compute release 2024-07-15	2024-07-15 11:02:18 -04:00
Konstantin Knizhnik	7973c3e941	Add neon.running_xacts_overflow_policy to make it possible for RO replica to startup without primary even in case running xacts overflow (#8323 ) ## Problem Right now if there are too many running xacts to be restored from CLOG at replica startup, then replica is not trying to restore them and wait for non-overflown running-xacs WAL record from primary. But if primary is not active, then replica will not start at all. Too many running xacts can be caused by transactions with large number of subtractions. But right now it can be also cause by two reasons: - Lack of shutdown checkpoint which updates `oldestRunningXid` (because of immediate shutdown) - nextXid alignment on 1024 boundary (which cause loosing ~1k XIDs on each restart) Both problems are somehow addressed now. But we have existed customers with "sparse" CLOG and lack of checkpoints. To be able to start RO replicas for such customers I suggest to add GUC which allows replica to start even in case of subxacts overflow. ## Summary of changes Add `neon.running_xacts_overflow_policy` with the following values: - ignore: restore from CLOG last N XIDs and accept connections - skip: do not restore any XIDs from CXLOGbut still accept connections - wait: wait non-overflown running xacts record from primary node ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-15 09:34:35 -04:00
Vlad Lazar	085bbaf5f8	tests: allow list breaching min resident size in statvfs test (#8358 ) ## Problem This test would sometimes violate the min resident size during disk eviction and fail due to the generate warning log. Disk usage candidate collection only takes into account active tenants. However, the statvfs call takes into account the entire tenants directory, which includes tenants which haven't become active yet. After re-starting the pageserver, disk usage eviction may kick in before both tenants have become active. Hence, the logic will try to satisfy thedisk usage requirements by evicting everything belonging to the active tenant, and hence violating the tenant minimum resident size. ## Summary of changes Allow the warning	2024-07-15 09:28:35 -04:00
Alex Chi Z	85b5219861	fix(pageserver): unique test harness name for merge_in_between (#8366 ) As title, there should be a way to detect duplicated harness names in the future :( Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-15 09:28:35 -04:00
Conrad Ludgate	7472c69954	Fix nightly warnings 2024 june (#8151 ) ## Problem new clippy warnings on nightly. ## Summary of changes broken up each commit by warning type. 1. Remove some unnecessary refs. 2. In edition 2024, inference will default to `!` and not `()`. 3. Clippy complains about doc comment indentation 4. Fix `Trait + ?Sized` where `Trait: Sized`. 5. diesel_derives triggering `non_local_defintions`	2024-07-15 09:28:35 -04:00
John Spray	3f8819827c	pageserver: circuit breaker on compaction (#8359 ) ## Problem We already back off on compaction retries, but the impact of a failing compaction can be so great that backing off up to 300s isn't enough. The impact is consuming a lot of I/O+CPU in the case of image layer generation for large tenants, and potentially also leaking disk space. Compaction failures are extremely rare and almost always indicate a bug, frequently a bug that will not let compaction to proceed until it is fixed. Related: https://github.com/neondatabase/neon/issues/6738 ## Summary of changes - Introduce a CircuitBreaker type - Add a circuit breaker for compaction, with a policy that after 5 failures, compaction will not be attempted again for 24 hours. - Add metrics that we can alert on: any >0 value for `pageserver_circuit_breaker_broken_total` should generate an alert. - Add a test that checks this works as intended. Couple notes to reviewers: - Circuit breakers are intrinsically a defense-in-depth measure: this is not the solution to any underlying issues, it is just a general mitigation for "unknown unknowns" that might be encountered in future. - This PR isn't primarily about writing a perfect CircuitBreaker type: the one in this PR is meant to be just enough to mitigate issues in compaction, and make it easy to monitor/alert on these failures. We can refine this type in future as/when we want to use it elsewhere.	2024-07-15 09:28:35 -04:00
Japin Li	c440756410	Remove fs2 dependency (#8350 ) The fs2 dependency is not needed anymore after commit `d42700280`.	2024-07-15 09:28:35 -04:00
Arpad Müller	0e600eb921	Implement decompression for vectored reads (#8302 ) Implement decompression of images for vectored reads. This doesn't implement support for still treating blobs as uncompressed with the bits we reserved for compression, as we have removed that functionality in #8300 anyways. Part of #5431	2024-07-15 09:28:35 -04:00
Arpad Müller	a1df835e28	Pass configured compression param to image generation (#8363 ) We need to pass on the configured compression param during image layer generation. This was an oversight of #8106, and the likely cause why #8288 didn't bring any interesting regressions. Part of https://github.com/neondatabase/neon/issues/5431	2024-07-15 09:28:35 -04:00
Sasha Krassovsky	119ddf6ccf	Grant execute on snapshot functions to neon_superuser (#8346 ) ## Problem I need `neon_superuser` to be allowed to create snapshots for replication tests ## Summary of changes Adds a migration that grants these functions to neon_superuser	2024-07-15 09:28:35 -04:00
Joonas Koivunen	90f447b79d	test: limit `test_layer_download_timeouted` to MOCK_S3 (#8331 ) Requests against REAL_S3 on CI can consistently take longer than 1s; testing the short timeouts against it made no sense in hindsight, as MOCK_S3 works just as well. evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8229/9857994025/index.html#suites/b97efae3a617afb71cb8142f5afa5224/6828a50921660a32	2024-07-15 09:28:35 -04:00
Alex Chi Z	7dd71f4126	feat(pageserver): rewrite streaming vectored read planner (#8242 ) Rewrite streaming vectored read planner to be a separate struct. The API is designed to produce batches around `max_read_size` instead of exactly less than that so that `handle_XX` returns one batch a time. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-15 09:28:35 -04:00
Arseny Sher	8532d72276	Fix memory context of NeonWALReader allocation. Allocating it in short living context is wrong because it is reused during backend lifetime.	2024-07-15 09:28:35 -04:00
John Spray	d3ff47f572	storage controller: add node deletion API (#8226 ) ## Problem In anticipation of later adding a really nice drain+delete API, I initially only added an intentionally basic `/drop` API that is just about usable for deleting nodes in a pinch, but requires some ugly storage controller restarts to persuade it to restart secondaries. ## Summary of changes I started making a few tiny fixes, and ended up writing the delete API... - Quality of life nit: ordering of node + tenant listings in storcon_cli - Papercut: Fix the attach_hook using the wrong operation type for reporting slow locks - Make Service::spawn tolerate `generation_pageserver` columns that point to nonexistent node IDs. I started out thinking of this as a general resilience thing, but when implementing the delete API I realized it was actually a legitimate end state after the delete API is called (as that API doesn't wait for all reconciles to succeed). - Add a `DELETE` API for nodes, which does not gracefully drain, but does reschedule everything. This becomes safe to use when the system is in any state, but will incur availability gaps for any tenants that weren't already live-migrated away. If tenants have already been drained, this becomes a totally clean + safe way to decom a node. - Add a test and a storcon_cli wrapper for it This is meant to be a robust initial API that lets us remove nodes without doing ugly things like restarting the storage controller -- it's not quite a totally graceful node-draining routine yet. There's more work in https://github.com/neondatabase/neon/issues/8333 to get to our end-end state.	2024-07-15 09:28:35 -04:00
John Spray	8cc768254f	safekeeper: eviction metrics (#8348 ) ## Problem Follow up to https://github.com/neondatabase/neon/pull/8335, to improve observability of how many evict/restores we are doing. ## Summary of changes - Add `safekeeper_eviction_events_started_total` and `safekeeper_eviction_events_completed_total`, with a "kind" label of evict or restore. This gives us rates, and also ability to calculate how many are in progress. - Generalize SafekeeperMetrics test type to use the same helpers as pageserver, and enable querying any metric. - Read the new metrics at the end of the eviction test.	2024-07-15 09:28:35 -04:00
Vlad Lazar	5c80743c9c	storage_controller: fix ReconcilerWaiter::get_status (#8341 ) ## Problem SeqWait::would_wait_for returns Ok in the case when we would not wait for the sequence number and Err otherwise. ReconcilerWaiter::get_status uses it the wrong way around. This can cause the storage controller to go into a busy loop and make it look unavailable to the k8s controller. ## Summary of changes Use `SeqWait::would_wait_for` correctly.	2024-07-15 09:28:35 -04:00
Christian Schwarz	5bba3e3c75	pageserver: remove `trace_read_requests` (#8338 ) `trace_read_requests` is a per `Tenant`-object option. But the `handle_pagerequests` loop doesn't know which `Tenant` object (i.e., which shard) the request is for. The remaining use of the `Tenant` object is to check `tenant.cancel`. That check is incorrect [if the pageserver hosts multiple shards](https://github.com/neondatabase/neon/issues/7427#issuecomment-2220577518). I'll fix that in a future PR where I completely eliminate the holding of `Tenant/Timeline` objects across requests. See [my code RFC](https://github.com/neondatabase/neon/pull/8286) for the high level idea. Note that we can always bring the tracing functionality if we need it. But since it's actually about logging the `page_service` wire bytes, it should be a `page_service`-level config option, not per-Tenant. And for enabling tracing on a single connection, we can implement a `set pageserver_trace_connection;` option.	2024-07-15 09:28:35 -04:00
Peter Bendel	6caf702417	Run Performance bench on more platforms (#8312 ) ## Problem https://github.com/neondatabase/cloud/issues/14721 ## Summary of changes add one more platform to benchmarking job `57535c039c/.github/workflows/benchmarking.yml (L57C3-L126)` Run with pg 16, provisioner k8-neonvm by default on the new platform. Adjust some test cases to - not depend on database client <-> database server latency by pushing loops into server side pl/pgSQL functions - increase statement and test timeouts First successful run of these job steps https://github.com/neondatabase/neon/actions/runs/9869817756/job/27254280428	2024-07-15 09:28:35 -04:00
John Spray	32f668f5e7	rfcs: add RFC for timeline archival (#8221 ) A design for a cheap low-resource state for idle timelines: - #8088	2024-07-15 09:28:35 -04:00
Stas Kelvich	a91f9d5832	Enable core dumps for postgres (#8272 ) Set core rmilit to ulimited in compute_ctl, so that all child processes inherit it. We could also set rlimit in relevant startup script, but that way we would depend on external setup and might inadvertently disable it again (core dumping worked in pods, but not in VMs with inittab-based startup).	2024-07-15 09:28:35 -04:00
John Spray	547acde6cd	safekeeper: add eviction_min_resident to stop evictions thrashing (#8335 ) ## Problem - The condition for eviction is not time-based: it is possible for a timeline to be restored in response to a client, that client times out, and then as soon as the timeline is restored it is immediately evicted again. - There is no delay on eviction at startup of the safekeeper, so when it starts up and sees many idle timelines, it does many evictions which will likely be immediately restored when someone uses the timeline. ## Summary of changes - Add `eviction_min_resident` parameter, and use it in `ready_for_eviction` to avoid evictions if the timeline has been resident for less than this period. - This also implicitly delays evictions at startup for `eviction_min_resident` - Set this to a very low number for the existing eviction test, which expects immediate eviction. The default period is 15 minutes. The general reasoning for that is that in the worst case where we thrash ~10k timelines on one safekeeper, downloading 16MB for each one, we should set a period that would not overwhelm the node's bandwidth.	2024-07-15 09:28:35 -04:00
Alex Chi Z	bea6532881	feat(pageserver): add k-merge layer iterator with lazy loading (#8053 ) Part of https://github.com/neondatabase/neon/issues/8002. This pull request adds a k-merge iterator for bottom-most compaction. ## Summary of changes * Added back lsn_range / key_range in delta layer inner. This was removed due to https://github.com/neondatabase/neon/pull/8050, but added back because iterators need that information to process lazy loading. * Added lazy-loading k-merge iterator. * Added iterator wrapper as a unified iterator type for image+delta iterator. The current status and test should cover the use case for L0 compaction so that the L0 compaction process can bypass page cache and have a fixed amount of memory usage. The next step is to integrate this with the new bottom-most compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-15 09:28:35 -04:00
Arpad Müller	8e2fe6b22e	Remove ImageCompressionAlgorithm::DisabledNoDecompress (#8300 ) Removes the `ImageCompressionAlgorithm::DisabledNoDecompress` variant. We now assume any blob with the specific bits set is actually a compressed blob. The `ImageCompressionAlgorithm::Disabled` variant still remains and is the new default. Reverts large parts of #8238 , as originally intended in that PR. Part of #5431	2024-07-15 09:28:35 -04:00
dependabot[bot]	4d75e1ef81	build(deps-dev): bump zipp from 3.8.1 to 3.19.1 Bumps [zipp](https://github.com/jaraco/zipp) from 3.8.1 to 3.19.1. - [Release notes](https://github.com/jaraco/zipp/releases) - [Changelog](https://github.com/jaraco/zipp/blob/main/NEWS.rst) - [Commits](https://github.com/jaraco/zipp/compare/v3.8.1...v3.19.1) --- updated-dependencies: - dependency-name: zipp dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2024-07-15 09:28:35 -04:00
Conrad Ludgate	4c7c00268c	proxy: remove some trace logs (#8334 )	2024-07-15 09:28:35 -04:00
John Spray	f28abb953d	tests: stabilize test_sharding_split_compaction (#8318 ) ## Problem This test incorrectly assumed that a post-split compaction would only drop content. This was easily destabilized by any changes to image generation rules. ## Summary of changes - Before split, do a full image layer generation pass, to guarantee that post-split compaction should only drop data, never create it. - Fix the force_image_layer_creation mode of compaction that we use from tests like this: previously it would try and generate image layers even if one already existed with the same layer key, which caused compaction to fail.	2024-07-15 09:28:35 -04:00
Conrad Ludgate	4df39d7304	proxy: pg17 fixes (#8321 ) ## Problem #7809 - we do not support sslnegotiation=direct #7810 - we do not support negotiating down the protocol extensions. ## Summary of changes 1. Same as postgres, check the first startup packet byte for tls header `0x16`, and check the ALPN. 2. Tell clients using protocol >3.0 to downgrade	2024-07-15 09:28:35 -04:00
Christian Schwarz	bfc7338246	pageserver: move `page_service`'s `import basebackup` / `import wal` to mgmt API (#8292 ) I want to fix bugs in `page_service` ([issue](https://github.com/neondatabase/neon/issues/7427)) and the `import basebackup` / `import wal` stand in the way / make the refactoring more complicated. We don't use these methods anyway in practice, but, there have been some objections to removing the functionality completely. So, this PR preserves the existing functionality but moves it into the HTTP management API. Note that I don't try to fix existing bugs in the code, specifically not fixing * it only ever worked correctly for unsharded tenants * it doesn't clean up on error All errors are mapped to `ApiError::InternalServerError`.	2024-07-15 09:28:35 -04:00
Christian Schwarz	35dac6e6c8	fix(l0_flush): drops permit before fsync, potential cause for OOMs (#8327 ) ## Problem Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1720511577862519 We're seeing OOMs in staging on a pageserver that has l0_flush.mode=Direct enabled. There's a strong correlation between jumps in `maxrss_kb` and `pageserver_timeline_ephemeral_bytes`, so, it's quite likely that l0_flush.mode=Direct is the culprit. Notably, the expected max memory usage on that staging server by the l0_flush.mode=Direct is ~2GiB but we're seeing as much as 24GiB max RSS before the OOM kill. One hypothesis is that we're dropping the semaphore permit before all the dirtied pages have been flushed to disk. (The flushing to disk likely happens in the fsync inside the `.finish()` call, because we're using ext4 in data=ordered mode). ## Summary of changes Hold the permit until after we're done with `.finish()`.	2024-07-15 09:28:35 -04:00
Christian Schwarz	e619e8703e	refactor: postgres_backend: replace abstract shutdown_watcher with CancellationToken (#8295 ) Preliminary refactoring while working on https://github.com/neondatabase/neon/issues/7427 and specifically https://github.com/neondatabase/neon/pull/8286	2024-07-15 09:28:35 -04:00
Tristan Partin	6fd35bfe32	Add an application_name to more Neon connections Helps identify connections in the logs.	2024-07-15 09:28:35 -04:00
Tristan Partin	547a431b0d	Refactor how migrations are ran Just a small improvement I noticed while looking at fixing CVE-2024-4317 in Neon.	2024-07-15 09:28:35 -04:00
Alex Chi Z	f8c01c6341	fix(storage-scrubber): use default AWS authentication (#8299 ) part of https://github.com/neondatabase/cloud/issues/14024 close https://github.com/neondatabase/neon/issues/7665 Things running in k8s container use this authentication: https://docs.aws.amazon.com/sdkref/latest/guide/feature-container-credentials.html while we did not configure the client to use it. This pull request simply uses the default s3 client credential chain for storage scrubber. It might break compatibility with minio. ## Summary of changes * Use default AWS credential provider chain. * Improvements for s3 errors, we now have detailed errors and correct backtrace on last trial of the operation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-15 09:28:35 -04:00
Conrad Ludgate	1145700f87	chore: fix nightly build (#8142 ) ## Problem `cargo +nightly check` fails ## Summary of changes Updates `measured`, `time`, and `crc32c`. * `measured`: updated to fix https://github.com/rust-lang/rust/issues/125763. * `time`: updated to fix https://github.com/rust-lang/rust/issues/125319 * `crc32c`: updated to remove some nightly feature detection with a removed nightly feature	2024-07-15 09:28:35 -04:00
Alex Chi Z	44339f5b70	chore(storage-scrubber): allow disable file logging (#8297 ) part of https://github.com/neondatabase/cloud/issues/14024, k8s does not always have a volume available for logging, and I'm running into weird permission errors... While I could spend time figuring out how to create temp directories for logging, I think it would be better to just disable file logging as k8s containers are ephemeral and we cannot retrieve anything on the fs after the container gets removed. ## Summary of changes `PAGESERVER_DISABLE_FILE_LOGGING=1` -> file logging disabled Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-15 09:28:35 -04:00
Luca BRUNO	7b4a9c1d82	proxy/http: avoid spurious vector reallocations This tweaks the rows-to-JSON rendering logic in order to avoid allocating 0-sized temporary vectors and later growing them to insert elements. As the exact size is known in advance, both vectors can be built with an exact capacity upfront. This will avoid further vector growing/reallocation in the rendering hotpath. Signed-off-by: Luca BRUNO <lucab@lucabruno.net>	2024-07-15 09:28:35 -04:00
Alexander Bayandin	3b2fc27de4	CI(promote-compatibility-data): take into account commit sha (#8283 ) ## Problem In https://github.com/neondatabase/neon/pull/8161, we changed the path to Neon artefacts by adding commit sha to it, but we missed adding these changes to `promote-compatibility-data` job that we use for backward/forward- compatibility testing. ## Summary of changes - Add commit sha to `promote-compatibility-data`	2024-07-15 09:28:35 -04:00
Yuchen Liang	0b6492e7d3	tests: increase approx size equal threshold to avoid `test_lsn_lease_size` flakiness (#8282 ) ## Summary of changes Increase the `assert_size_approx_equal` threshold to avoid flakiness of `test_lsn_lease_size`. Still needs more investigation to fully resolve #8293. - Also set `autovacuum=off` for the endpoint we are running in the test. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-15 09:28:35 -04:00
John Spray	7cfaecbeb6	tests: stabilize test_timeline_size_quota_on_startup (#8255 ) ## Problem `test_timeline_size_quota_on_startup` assumed that writing data beyond the size limit would always be blocked. This is not so: the limit is only enforced if feedback makes it back from the pageserver to the safekeeper + compute. Closes: https://github.com/neondatabase/neon/issues/6562 ## Summary of changes - Modify the test to wait for the pageserver to catch up. The size limit was never actually being enforced robustly, the original version of this test was just writing much more than 30MB and about 98% of the time getting lucky such that the feedback happened to arrive before the tests for loop was done. - If the test fails, log the logical size as seen by the pageserver.	2024-07-15 09:28:35 -04:00
Alex Chi Z	472acae615	fix(pageserver): write to both v1+v2 for aux tenant import (#8316 ) close https://github.com/neondatabase/neon/issues/8202 ref https://github.com/neondatabase/neon/pull/6560 For tenant imports, we now write the aux files into both v1+v2 storage, so that the test case can pick either one for testing. Given the API is only used for testing, this looks like a safe change. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-15 09:28:35 -04:00
John Spray	108bf56e44	tests: use smaller layers in test_pg_regress (#8232 ) ## Problem Debug-mode runs of test_pg_regress are rather slow since https://github.com/neondatabase/neon/pull/8105, and occasionally exceed their 600s timeout. ## Summary of changes - Use 8MiB layer files, avoiding large ephemeral layers On a hetzner AX102, this takes the runtime from 230s to 190s. Which hopefully will be enough to get the runtime on github runners more reliably below its 600s timeout. This has the side benefit of exercising more of the pageserver stack (including compaction) under a workload that exercises a more diverse set of postgres functionality than most of our tests.	2024-07-15 09:28:35 -04:00
Alexey Kondratov	e83a499ab4	compute_ctl: Use 'fast' shutdown for Postgres termination (#8289 ) ## Problem We currently use 'immediate' mode in the most commonly used shutdown path, when the control plane calls a `compute_ctl` API to terminate Postgres inside compute without waiting for the actual pod / VM termination. Yet, 'immediate' shutdown doesn't create a shutdown checkpoint and ROs have bad times figuring out the list of running xacts during next start. ## Summary of changes Use 'fast' mode, which creates a shutdown checkpoint that is important for ROs to get a list of running xacts faster instead of going through the CLOG. On the control plane side, we poll this `compute_ctl` termination API for 10s, it should be enough as we don't really write any data at checkpoint time. If it times out, we anyway switch to the slow k8s-based termination. See https://www.postgresql.org/docs/current/server-shutdown.html for the list of modes and signals. The default VM shutdown hook already uses `fast` mode, see [1] [1] `c9fd8d7693/vm-image-spec.yaml (L30-L31)` Related to #6211	2024-07-15 09:28:35 -04:00
Yuchen Liang	ebf3bfadde	refactor: move part of sharding API from `pageserver_api` to `utils` (#8254 ) ## Problem LSN Leases introduced in #8084 is a new API that is made shard-aware from day 1. To support ephemeral endpoint in #7994 without linking Postgres C API against `compute_ctl`, part of the sharding needs to reside in `utils`. ## Summary of changes - Create a new `shard` module in utils crate. - Move more interface related part of tenant sharding API to utils and re-export them in pageserver_api. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-15 09:28:35 -04:00
John Spray	ab06240fae	pageserver: respect has_relmap_file in collect_keyspace (#8276 ) ## Problem Rarely, a dbdir entry can exist with no `relmap_file_key` data. This causes compaction to fail, because it assumes that if the database exists, then so does the relmap file. Basebackup already handled this using a boolean to record whether such a key exists, but `collect_keyspace` didn't. ## Summary of changes - Respect the flag for whether a relfilemap exists in collect_keyspace - The reproducer for this issue will merge separately in https://github.com/neondatabase/neon/pull/8232	2024-07-15 09:28:35 -04:00
Tristan Partin	cec216c5c0	Add long running replication tests These tests will help verify that replication, both physical and logical, works as expected in Neon. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-15 09:28:35 -04:00
Tristan Partin	930201e033	Add PgBin.run_nonblocking() Allows a process to run without blocking program execution, which can be useful for certain test scenarios. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-15 09:28:35 -04:00
Tristan Partin	8328580dc2	Log PG environment variables when a PgBin runs Useful for debugging situations like connecting to databases. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-15 09:28:35 -04:00
Tristan Partin	8d9b632f2a	Add Neon HTTP API test fixture This is a Python binding to the Neon HTTP API. It isn't complete, but can be extended as necessary. Co-authored-by: Sasha Krassovsky <sasha@neon.tech>	2024-07-15 09:28:35 -04:00
Tristan Partin	55d37c77b9	Hide import behind TYPE_CHECKING No need to import it if we aren't type checking anything.	2024-07-15 09:28:35 -04:00
John Spray	0948fb6bf1	pageserver: switch to jemalloc (#8307 ) ## Problem - Resident memory on long running pageserver processes tends to climb: memory fragmentation is suspected. - Total resident memory may be a limiting factor for running on smaller nodes. ## Summary of changes - As a low-energy experiment, switch the pageserver to use jemalloc (not a net-new dependency, proxy already use it) - Decide at end of week whether to revert before next release.	2024-07-15 09:28:35 -04:00
Alex Chi Z	285c6d2974	fix(pageserver): ensure sparse keyspace is ordered (#8285 ) ## Problem Sparse keyspaces were constructed with ranges out of order: this didn't break things obviously, but meant that users of KeySpace functions that assume ordering would assert out. Closes https://github.com/neondatabase/neon/issues/8277 ## Summary of changes make sure the sparse keyspace has ordered keyspace parts	2024-07-15 09:28:35 -04:00
Vlad Lazar	a5491463e1	Merge pull request #8304 from neondatabase/rc/2024-07-08 Storage & Compute release 2024-07-08	2024-07-08 20:25:54 +01:00
dependabot[bot]	a58827f952	build(deps): bump certifi from 2023.7.22 to 2024.7.4 (#8301 )	2024-07-08 17:22:36 +01:00
Arpad Müller	36b790f282	Add concurrency to the find-large-objects scrubber subcommand (#8291 ) The find-large-objects scrubber subcommand is quite fast if you run it in an environment with low latency to the S3 bucket (say an EC2 instance in the same region). However, the higher the latency gets, the slower the command becomes. Therefore, add a concurrency param and make it parallelized. This doesn't change that general relationship, but at least lets us do multiple requests in parallel and therefore hopefully faster. Running with concurrency of 64 (default): ``` 2024-07-05T17:30:22.882959Z INFO lazy_load_identity [...] [...] 2024-07-05T17:30:28.289853Z INFO Scanned 500 shards. [...] ``` With concurrency of 1, simulating state before this PR: ``` 2024-07-05T17:31:43.375153Z INFO lazy_load_identity [...] [...] 2024-07-05T17:33:51.987092Z INFO Scanned 500 shards. [...] ``` In other words, to list 500 shards, speed is increased from 2:08 minutes to 6 seconds. Follow-up of #8257, part of #5431	2024-07-08 17:22:36 +01:00
Arpad Müller	3ef7748e6b	Improve parsing of `ImageCompressionAlgorithm` (#8281 ) Improve parsing of the `ImageCompressionAlgorithm` enum to allow level customization like `zstd(1)`, as strum only takes `Default::default()`, i.e. `None` as the level. Part of #5431	2024-07-08 17:22:36 +01:00
Christian Schwarz	f3310143e4	pageserver_live_connections: track as counter pair (#8227 ) Generally counter pairs are preferred over gauges. In this case, I found myself asking what the typical rate of accepted page_service connections on a pageserver is, and I couldn't answer it with the gauge metric. There are a few dashboards using this metric: https://github.com/search?q=repo%3Aneondatabase%2Fgrafana-dashboard-export%20pageserver_live_connections&type=code I'll convert them to use the new metric once this PR reaches prod. refs https://github.com/neondatabase/neon/issues/7427	2024-07-08 17:22:36 +01:00
Konstantin Knizhnik	05b4169644	Increase timeout for wating subscriber caught-up (#8118 ) ## Problem test_subscriber_restart has quit large failure rate' https://neonprod.grafana.net/d/fddp4rvg7k2dcf/regression-test-failures?orgId=1&var-test_name=test_subscriber_restart&var-max_count=100&var-restrict=false I can be caused by too small timeout (5 seconds) to wait until changes are propagated. Related to #8097 ## Summary of changes Increase timeout to 30 seconds. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-08 17:22:36 +01:00
Alexander Bayandin	d1495755e7	SELECT 💣(); (#8270 ) ## Problem We want to be able to test how our infrastructure reacts on segfaults in Postgres (for example, we collect cores, and get some required logs/metrics, etc) ## Summary of changes - Add `trigger_segfauls` function to `neon_test_utils` to trigger a segfault in Postgres - Add `trigger_panic` function to `neon_test_utils` to trigger SIGABRT (by using `elog(PANIC, ...)) - Fix cleanup logic in regression tests in endpoint crashed	2024-07-08 17:22:36 +01:00
Vlad Lazar	c8dd78c6c8	pageserver: add time based image layer creation check (#8247 ) ## Problem Assume a timeline with the following workload: very slow ingest of updates to a small number of keys that fit within the same partition (as decided by `KeySpace::partition`). These tenants will create small L0 layers since due to time based rolling, and, consequently, the L1 layers will also be small. Currently, by default, we need to ingest 512 MiB of WAL before checking if an image layer is required. This scheme works fine under the assumption that L1s are roughly of checkpoint distance size, but as the first paragraph explained, that's not the case for all workloads. ## Summary of changes Check if new image layers are required at least once every checkpoint timeout interval.	2024-07-08 17:22:36 +01:00
John Spray	b44ee3950a	safekeeper: add separate `tombstones` map for deleted timelines (#8253 ) ## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: https://github.com/neondatabase/neon/issues/6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes https://github.com/neondatabase/neon/pull/6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.	2024-07-08 17:22:36 +01:00
John Spray	64334f497d	tests: make location_conf_churn more robust (#8271 ) ## Problem This test directly manages locations on pageservers and configuration of an endpoint. However, it did not switch off the parts of the storage controller that attempt to do the same: occasionally, the test would fail in a strange way such as a compute failing to accept a reconfiguration request. ## Summary of changes - Wire up the storage controller's compute notification hook to a no-op handler - Configure the tenant's scheduling policy to Stop.	2024-07-08 17:22:35 +01:00
Peter Bendel	5ffcb688cc	correct error handling for periodic pagebench runner status (#8274 ) ## Problem the following periodic pagebench run was failed but was still shown as successful https://github.com/neondatabase/neon/actions/runs/9798909458/job/27058179993#step:9:47 ## Summary of changes if the ec2 test runner reports a failure fail the job step and thus the workflow --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-07-08 17:22:35 +01:00
John Spray	32fc2dd683	tests: extend allow list in deletion test (#8268 ) ## Problem `1ea5d8b132` tolerated this as an error message, but it can show up in logs as well. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8201/9780147712/index.html#testresult/263422f5f5f292ea/retries ## Summary of changes - Tolerate "failed to delete 1 objects" in pageserver logs, this occurs occasionally when injected failures exhaust deletion's retries.	2024-07-08 17:22:35 +01:00
Peter Bendel	d35ddfbab7	add checkout depth1 to workflow to access local github actions like generate allure report (#8259 ) ## Problem job step to create allure report fails https://github.com/neondatabase/neon/actions/runs/9781886710/job/27006997416#step:11:1 ## Summary of changes Shallow checkout of sources to get access to local github action needed in the job step ## Example run example run with this change https://github.com/neondatabase/neon/actions/runs/9790647724 do not merge this PR until the job is clean --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-07-08 17:22:35 +01:00
Konstantin Knizhnik	3ee82a9895	implement rolling hyper-log-log algorithm (#8068 ) ## Problem See #7466 ## Summary of changes Implement algorithm descried in https://hal.science/hal-00465313/document Now new GUC is added: `neon.wss_max_duration` which specifies size of sliding window (in seconds). Default value is 1 hour. It is possible to request estimation of working set sizes (within this window using new function `approximate_working_set_size_seconds`. Old function `approximate_working_set_size` is preserved for backward compatibility. But its scope is also limited by `neon.wss_max_duration`. Version of Neon extension is changed to 1.4 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Matthias van de Meent <matthias@neon.tech>	2024-07-08 17:22:35 +01:00
Arpad Müller	e770aeee92	Flatten compression algorithm setting (#8265 ) This flattens the compression algorithm setting, removing the `Option<_>` wrapping layer and making handling of the setting easier. It also adds a specific setting for disabled compression with the continued ability to read copmressed data, giving us the option to more easily back out of a compression rollout, should the need arise, which was one of the limitations of #8238. Implements my suggestion from https://github.com/neondatabase/neon/pull/8238#issuecomment-2206181594 , inspired by Christian's review in https://github.com/neondatabase/neon/pull/8238#pullrequestreview-2156460268 . Part of #5431	2024-07-08 17:22:35 +01:00
Yuchen Liang	32828cddd6	feat(pageserver): integrate lsn lease into synthetic size (#8220 ) Part of #7497, closes #8071. (accidentally closed #8208, reopened here) ## Problem After the changes in #8084, we need synthetic size to also account for leased LSNs so that users do not get free retention by running a small ephemeral endpoint for a long time. ## Summary of changes This PR integrates LSN leases into the synthetic size calculation. We model leases as read-only branches started at the leased LSN (except it does not have a timeline id). Other changes: - Add new unit tests testing whether a lease behaves like a read-only branch. - Change `/size_debug` response to include lease point in the SVG visualization. - Fix `/lsn_lease` HTTP API to do proper parsing for POST. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-08 17:22:35 +01:00
Arpad Müller	bd2046e1ab	Add find-large-objects subcommand to scrubber (#8257 ) Adds a find-large-objects subcommand to the scrubber to allow listing layer objects larger than a specific size. To be used like: ``` AWS_PROFILE=dev REGION=us-east-2 BUCKET=neon-dev-storage-us-east-2 cargo run -p storage_scrubber -- find-large-objects --min-size 250000000 --ignore-deltas ``` Part of #5431	2024-07-08 17:22:35 +01:00
John Spray	7e2a3d2728	pageserver: downgrade stale generation messages to INFO (#8256 ) ## Problem When generations were new, these messages were an important way of noticing if something unexpected was going on. We found some real issues when investigating tests that unexpectedly tripped them. At time has gone on, this code is now pretty battle-tested, and as we do more live migrations etc, it's fairly normal to see the occasional message from a node with a stale generation. At this point the cognitive load on developers to selectively allow-list these logs outweighs the benefit of having them at warn severity. Closes: https://github.com/neondatabase/neon/issues/8080 ## Summary of changes - Downgrade "Dropped remote consistent LSN updates" and "Dropping stale deletions" messages to INFO - Remove all the allow-list entries for these logs.	2024-07-08 17:22:35 +01:00
Alexander Bayandin	0e4832308d	CI(pg-clients): unify workflow with build-and-test (#8160 ) ## Problem `pg-clients` workflow looks different from the main `build-and-test` workflow for historical reasons (it was my very first task at Neon, and back then I wasn't really familiar with the rest of the CI pipelines). This PR unifies `pg-clients` workflow with `build-and-test` ## Summary of changes - Rename `pg_clients.yml` to `pg-clients.yml` - Run the workflow on changes in relevant files - Create Allure report for tests - Send slack notifications to `#on-call-qa-staging-stream` channel (instead of `#on-call-staging-stream`) - Update Client libraries once we're here	2024-07-08 17:22:35 +01:00
Arpad Müller	0a63bc4818	Use bool param for round_trip_test_compressed (#8252 ) As per @koivunej 's request in https://github.com/neondatabase/neon/pull/8238#discussion_r1663892091 , use a runtime param instead of monomorphizing the function based on the value. Part of https://github.com/neondatabase/neon/issues/5431	2024-07-08 17:22:35 +01:00
Vlad Lazar	2897dcc9aa	pageserver: increase rate limit duration for layer visit log (#8263 ) ## Problem I'd like to keep this in the tree since it might be useful in prod as well. It's a bit too noisy as is and missing the lsn. ## Summary of changes Add an lsn field and and increase the rate limit duration.	2024-07-08 17:22:35 +01:00
Alexander Bayandin	1d0ec50ddb	CI(build-and-test): add conclusion job (#8246 ) ## Problem Currently, if you need to rename a job and the job is listed in [branch protection rules](https://github.com/neondatabase/neon/settings/branch_protection_rules), the PR won't be allowed to merge. ## Summary of changes - Add `conclusion` job that fails if any of its dependencies don't finish successfully	2024-07-08 17:22:35 +01:00
Conrad Ludgate	a86b43fcd7	proxy: cache certain non-retriable console errors for a short time (#8201 ) ## Problem If there's a quota error, it makes sense to cache it for a short window of time. Many clients do not handle database connection errors gracefully, so just spam retry 🤡 ## Summary of changes Updates the node_info cache to support storing console errors. Store console errors if they cannot be retried (using our own heuristic. should only trigger for quota exceeded errors).	2024-07-08 17:22:35 +01:00
Vlad Lazar	b917868ada	tests: perform graceful rolling restarts in storcon scale test (#8173 ) ## Problem Scale test doesn't exercise drain & fill. ## Summary of changes Make scale test exercise drain & fill	2024-07-08 17:22:35 +01:00
John Spray	7b7d16f52e	pageserver: add supplementary branch usage stats (#8131 ) ## Problem The metrics we have today aren't convenient for planning around the impact of timeline archival on costs. Closes: https://github.com/neondatabase/neon/issues/8108 ## Summary of changes - Add metric `pageserver_archive_size`, which indicates the logical bytes of data which we would expect to write into an archived branch. - Add metric `pageserver_pitr_history_size`, which indicates the distance between last_record_lsn and the PITR cutoff. These metrics are somewhat temporary: when we implement #8088 and associated consumption metric changes, these will reach a final form. For now, an "archived" branch is just any branch outside of its parent's PITR window: later, archival will become an explicit state (which will _usually_ correspond to falling outside the parent's PITR window). The overall volume of timeline metrics is something to watch, but we are removing many more in https://github.com/neondatabase/neon/pull/8245 than this PR is adding.	2024-07-08 17:22:35 +01:00
Alex Chi Z	fee4169b6b	fix(pageserver): ensure test creates valid layer map (#8191 ) I'd like to add some constraints to the layer map we generate in tests. (1) is the layer map that the current compaction algorithm will produce. There is a property that for all delta layer, all delta layer overlaps with it on the LSN axis will have the same LSN range. (2) is the layer map that cannot be produced with the legacy compaction algorithm. (3) is the layer map that will be produced by the future tiered-compaction algorithm. The current validator does not allow that but we can modify the algorithm to allow it in the future. ## Summary of changes Add a validator to check if the layer map is valid and refactor the test cases to include delta layer start/end LSN. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-08 17:22:35 +01:00
Christian Schwarz	47e06a2cc6	page_service: stop exposing `get_last_record_rlsn` (#8244 ) Compute doesn't use it, let's eliminate it. Ref to Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1719920261995529	2024-07-08 17:22:35 +01:00
Japin Li	c4423c0623	Fix outdated comment (#8149 ) Commit `97b48c23f` changes the log wait timeout from 1 second to 100 milliseconds but forgets to update the comment.	2024-07-08 17:22:35 +01:00
John Spray	a11cf03123	pageserver: reduce ops tracked at per-timeline detail (#8245 ) ## Problem We record detailed histograms for all page_service op types, which mostly aren't very interesting, but make our prometheus scrapes huge. Closes: #8223 ## Summary of changes - Only track GetPageAtLsn histograms on a per-timeline granularity. For all other operation types, rely on existing node-wide histograms.	2024-07-08 17:22:35 +01:00
Peter Bendel	08b33adfee	add pagebench test cases for periodic pagebench on dedicated hardware (#8233 ) we want to run some specific pagebench test cases on dedicated hardware to get reproducible results run1: 1 client per tenant => characterize throughput with n tenants. - 500 tenants - scale 13 (200 MB database) - 1 hour duration - ca 380 GB layer snapshot files run2.singleclient: 1 client per tenant => characterize latencies run2.manyclient: N clients per tenant => characterize throughput scalability within one tenant. - 1 tenant with 1 client for latencies - 1 tenant with 64 clients because typically for a high number of connections we recommend the connection pooler which by default uses 64 connections (for scalability) - scale 136 (2048 MB database) - 20 minutes each	2024-07-08 17:22:35 +01:00
Arpad Müller	4fb50144dd	Only support compressed reads if the compression setting is present (#8238 ) PR #8106 was created with the assumption that no blob is larger than `256 MiB`. Due to #7852 we have checking for writes of blobs larger than that limit, but we didn't have checking for reads of such large blobs: in theory, we could be reading these blobs every day but we just don't happen to write the blobs for some reason. Therefore, we now add a warning for reads of such large blobs as well. To make deploying compression less dangerous, we therefore only assume a blob is compressed if the compression setting is present in the config. This also means that we can't back out of compression once we enabled it. Part of https://github.com/neondatabase/neon/issues/5431	2024-07-08 17:22:35 +01:00
John Spray	c500137ca9	pageserver: don't try to flush if shutdown during attach (#8235 ) ## Problem test_location_conf_churn fails on log errors when it tries to shutdown a pageserver immediately after starting a tenant attach, like this: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8224/9761000525/index.html#/testresult/15fb6beca5c7327c ``` shutdown:shutdown{tenant_id=35f5c55eb34e7e5e12288c5d8ab8b909 shard_id=0000}:timeline_shutdown{timeline_id=30936747043353a98661735ad09cbbfe shutdown_mode=FreezeAndFlush}: failed to freeze and flush: cannot flush frozen layers when flush_loop is not running, state is Exited\n') ``` This is happening because Tenant::shutdown fires its cancellation token early if the tenant is not fully attached by the time shutdown is called, so the flush loop is shutdown by the time we try and flush. ## Summary of changes - In the early-cancellation case, also set the shutdown mode to Hard to skip trying to do a flush that will fail.	2024-07-08 17:22:35 +01:00
Alexander Bayandin	252c4acec9	CI: update docker/* actions to latest versions (#7694 ) ## Problem GitHub Actions complain that we use actions that depend on deprecated Node 16: ``` Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: docker/setup-buildx-action@v2 ``` But also, the latest `docker/setup-buildx-action` fails with the following error: ``` /nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:175 throw new Error(`Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.`); ^ Error: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved. at Object.rejected (/nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:175:1) at Generator.next (<anonymous>) at fulfilled (/nvme/actions-runner/_work/_actions/docker/setup-buildx-action/v3/webpack:/docker-setup-buildx/node_modules/@actions/cache/lib/cache.js:29:1) ``` We can work this around by setting `cache-binary: false` for `uses: docker/setup-buildx-action@v3` ## Summary of changes - Update `docker/setup-buildx-action` from `v2` to `v3`, set `cache-binary: false` - Update `docker/login-action` from `v2` to `v3` - Update `docker/build-push-action` from `v4`/`v5` to `v6`	2024-07-08 17:22:35 +01:00
Heikki Linnakangas	db70c175e6	Simplify test_wal_page_boundary_start test (#8214 ) All the code to ensure the WAL record lands at a page boundary was unnecessary for reproducing the original problem. In fact, it's a pretty basic test that checks that outbound replication (= neon as publisher) still works after restarting the endpoint. It just used to be very broken before commit `5ceccdc7de`, which also added this test. To verify that: 1. Check out commit `f3af5f4660` (because the next commit, `7dd58e1449`, fixed the same bug in a different way, making it infeasible to revert the bug fix in an easy way) 2. Revert the bug fix from commit `5ceccdc7de` with this: ``` diff --git a/pgxn/neon/walproposer_pg.c b/pgxn/neon/walproposer_pg.c index 7debb6325..9f03bbd99 100644 --- a/pgxn/neon/walproposer_pg.c +++ b/pgxn/neon/walproposer_pg.c @@ -1437,8 +1437,10 @@ XLogWalPropWrite(WalProposer wp, char buf, Size nbytes, XLogRecPtr recptr) * * https://github.com/neondatabase/neon/issues/5749 */ +#if 0 if (!wp->config->syncSafekeepers) XLogUpdateWalBuffers(buf, recptr, nbytes); +#endif while (nbytes > 0) { ``` 3. Run the test_wal_page_boundary_start regression test. It fails, as expected 4. Apply this commit to the test, and run it again. It still fails, with the same error mentioned in issue #5749: ``` PG:2024-06-30 20:49:08.805 GMT [1248196] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') PG:2024-06-30 21:37:52.567 GMT [1467972] LOG: starting logical decoding for slot "sub1" PG:2024-06-30 21:37:52.567 GMT [1467972] DETAIL: Streaming transactions committing after 0/1532330, reading WAL from 0/1531C78. PG:2024-06-30 21:37:52.567 GMT [1467972] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') PG:2024-06-30 21:37:52.567 GMT [1467972] LOG: logical decoding found consistent point at 0/1531C78 PG:2024-06-30 21:37:52.567 GMT [1467972] DETAIL: There are no running transactions. PG:2024-06-30 21:37:52.567 GMT [1467972] STATEMENT: START_REPLICATION SLOT "sub1" LOGICAL 0/0 (proto_version '4', origin 'any', publication_names '"pub1"') PG:2024-06-30 21:37:52.568 GMT [1467972] ERROR: could not find record while sending logically-decoded data: invalid contrecord length 312 (expected 6) at 0/1533FD8 ```	2024-07-08 17:22:35 +01:00
Alex Chi Z	ed3b4a58b4	docker: add storage_scrubber into the docker image (#8239 ) ## Problem We will run this tool in the k8s cluster. To make it accessible from k8s, we need to package it into the docker image. part of https://github.com/neondatabase/cloud/issues/14024	2024-07-08 17:22:35 +01:00
Konstantin Knizhnik	2863d1df63	Add test for proper handling of connection failure to avoid 'cannot wait on socket event without a socket' error (#8231 ) ## Problem See https://github.com/neondatabase/cloud/issues/14289 and PR #8210 ## Summary of changes Add test for problems fixed in #8210 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-08 17:22:35 +01:00
Alex Chi Z	320b24eab3	fix(pageserver): comments about metadata key range (#8236 ) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-08 17:22:35 +01:00
John Spray	13a8a5b09b	tense of errors (#8234 ) I forgot a commit when merging https://github.com/neondatabase/neon/pull/8177	2024-07-08 17:22:35 +01:00
Alexander Bayandin	64ccdf65e0	CI(benchmarking): move psql queries to actions/run-python-test-set (#8230 ) ## Problem Some of the Nightly benchmarks fail with the error ``` + /tmp/neon/pg_install/v14/bin/pgbench --version /tmp/neon/pg_install/v14/bin/pgbench: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory ``` Originally, we added the `pgbench --version` call to check that `pgbench` is installed and to fail earlier if it's not. The failure happens because we don't have `LD_LIBRARY_PATH` set for every job, and it also affects `psql` command. We can move it to `actions/run-python-test-set` so as not to duplicate code (as it already have `LD_LIBRARY_PATH` set). ## Summary of changes - Remove `pgbench --version` call - Move `psql` commands to common `actions/run-python-test-set`	2024-07-08 17:22:35 +01:00
Christian Schwarz	1ae6aa09dd	L0 flush: opt-in mechanism to bypass PageCache reads and writes (#8190 ) part of https://github.com/neondatabase/neon/issues/7418 # Motivation (reproducing #7418) When we do an `InMemoryLayer::write_to_disk`, there is a tremendous amount of random read I/O, as deltas from the ephemeral file (written in LSN order) are written out to the delta layer in key order. In benchmarks (https://github.com/neondatabase/neon/pull/7409) we can see that this delta layer writing phase is substantially more expensive than the initial ingest of data, and that within the delta layer write a significant amount of the CPU time is spent traversing the page cache. # High-Level Changes Add a new mode for L0 flush that works as follows: * Read the full ephemeral file into memory -- layers are much smaller than total memory, so this is afforable * Do all the random reads directly from this in memory buffer instead of using blob IO/page cache/disk reads. * Add a semaphore to limit how many timelines may concurrently do this (limit peak memory). * Make the semaphore configurable via PS config. # Implementation Details The new `BlobReaderRef::Slice` is a temporary hack until we can ditch `blob_io` for `InMemoryLayer` => Plan for this is laid out in https://github.com/neondatabase/neon/issues/8183 # Correctness The correctness of this change is quite obvious to me: we do what we did before (`blob_io`) but read from memory instead of going to disk. The highest bug potential is in doing owned-buffers IO. I refactored the API a bit in preliminary PR https://github.com/neondatabase/neon/pull/8186 to make it less error-prone, but still, careful review is requested. # Performance I manually measured single-client ingest performance from `pgbench -i ...`. Full report: https://neondatabase.notion.site/2024-06-28-benchmarking-l0-flush-performance-e98cff3807f94cb38f2054d8c818fe84?pvs=4 tl;dr: * no speed improvements during ingest, but * significantly lower pressure on PS PageCache (eviction rate drops to 1/3) * (that's why I'm working on this) * noticable but modestly lower CPU time This is good enough for merging this PR because the changes require opt-in. We'll do more testing in staging & pre-prod. # Stability / Monitoring memory consumption: there's no _hard_ limit on max `InMemoryLayer` size (aka "checkpoint distance") , hence there's no hard limit on the memory allocation we do for flushing. In practice, we a) [log a warning](`23827c6b0d/pageserver/src/tenant/timeline.rs (L5741-L5743)`) when we flush oversized layers, so we'd know which tenant is to blame and b) if we were to put a hard limit in place, we would have to decide what to do if there is an InMemoryLayer that exceeds the limit. It seems like a better option to guarantee a max size for frozen layer, dependent on `checkpoint_distance`. Then limit concurrency based on that. metrics: we do have the [flush_time_histo](`23827c6b0d/pageserver/src/tenant/timeline.rs (L3725-L3726)`), but that includes the wait time for the semaphore. We could add a separate metric for the time spent after acquiring the semaphore, so one can infer the wait time. Seems unnecessary at this point, though.	2024-07-08 17:22:35 +01:00
Arpad Müller	aeb68e51df	Add support for reading and writing compressed blobs (#8106 ) Add support for reading and writing zstd-compressed blobs for use in image layer generation, but maybe one day useful also for delta layers. The reading of them is unconditional while the writing is controlled by the `image_compression` config variable allowing for experiments. For the on-disk format, we re-use some of the bitpatterns we currently keep reserved for blobs larger than 256 MiB. This assumes that we have never ever written any such large blobs to image layers. After the preparation in #7852, we now are unable to read blobs with a size larger than 256 MiB (or write them). A non-goal of this PR is to come up with good heuristics of when to compress a bitpattern. This is left for future work. Parts of the PR were inspired by #7091. cc #7879 Part of #5431	2024-07-08 17:22:35 +01:00
Vlad Lazar	c3e5223a5d	pageserver: rate limit log for loads of layers visited (#8228 ) ## Problem At high percentiles we see more than 800 layers being visited by the read path. We need the tenant/timeline to investigate. ## Summary of changes Add a rate limited log line when the average number of layers visited per key is in the last specified histogram bucket. I plan to use this to identify tenants in us-east-2 staging that exhibit this behaviour. Will revert before next week's release.	2024-07-08 17:22:35 +01:00
Christian Schwarz	daaa3211a4	fix: noisy logging when download gets cancelled during shutdown (#8224 ) Before this PR, during timeline shutdown, we'd occasionally see log lines like this one: ``` 2024-06-26T18:28:11.063402Z INFO initial_size_calculation{tenant_id=$TENANT,shard_id=0000 timeline_id=$TIMELINE}:logical_size_calculation_task:get_or_maybe_download{layer=000000000000000000000000000000000000-000000067F0001A3950001C1630100000000__0000000D88265898}: layer file download failed, and caller has been cancelled: Cancelled, shutting down Stack backtrace: 0: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1964:27 pageserver::tenant::remote_timeline_client::RemoteTimelineClient::download_layer_file::{{closure}} at /home/nonroot/pageserver/src/tenant/remote_timeline_client.rs:531:13 pageserver::tenant::storage_layer::layer::LayerInner::download_and_init::{{closure}} at /home/nonroot/pageserver/src/tenant/storage_layer/layer.rs:1136:14 pageserver::tenant::storage_layer::layer::LayerInner::download_init_and_wait::{{closure}}::{{closure}} at /home/nonroot/pageserver/src/tenant/storage_layer/layer.rs:1082:74 ``` We can eliminate the anyhow backtrace with no loss of information because the conversion to anyhow::Error happens in exactly one place. refs #7427	2024-07-08 17:22:35 +01:00
John Spray	7ff9989dd5	pageserver: simpler, stricter config error handling (#8177 ) ## Problem Tenant attachment has error paths for failures to write local configuration, but these types of local storage I/O errors should be considered fatal for the process. Related thread on an earlier PR that touched this code: https://github.com/neondatabase/neon/pull/7947#discussion_r1655134114 ## Summary of changes - Make errors writing tenant config fatal (abort process) - When reading tenant config, make all I/O errors except ENOENT fatal - Replace use of bare anyhow errors with `LoadConfigError`	2024-07-08 17:22:35 +01:00
Christian Schwarz	ed3b97604c	remote_storage config: move handling of empty inline table `{}` to callers (#8193 ) Before this PR, `RemoteStorageConfig::from_toml` would support deserializing an empty `{}` TOML inline table to a `None`, otherwise try `Some()`. We can instead let * in proxy: let clap derive handle the Option * in PS & SK: assume that if the field is specified, it must be a valid RemtoeStorageConfig (This PR started with a much simpler goal of factoring out the `deserialize_item` function because I need that in another PR).	2024-07-08 17:22:35 +01:00
Konstantin Knizhnik	47c50ec460	Check status of connection after PQconnectStartParams (#8210 ) ## Problem See https://github.com/neondatabase/cloud/issues/14289 ## Summary of changes Check connection status after calling PQconnectStartParams ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-08 17:22:35 +01:00
Vlad Lazar	8c0ec2f681	docs: Graceful storage controller cluster restarts RFC (#7704 ) RFC for "Graceful Restarts of Storage Controller Managed Clusters". Related https://github.com/neondatabase/neon/issues/7387	2024-07-08 17:22:35 +01:00
Heikki Linnakangas	588bda98e7	tests: Make neon_xlogflush() flush all WAL, if you omit the LSN arg (#8215 ) This makes it much more convenient to use in the common case that you want to flush all the WAL. (Passing pg_current_wal_insert_lsn() as the argument doesn't work for the same reasons as explained in the comments: we need to be back off to the beginning of a page if the previous record ended at page boundary.) I plan to use this to fix the issue that Arseny Sher called out at https://github.com/neondatabase/neon/pull/7288#discussion_r1660063852	2024-07-08 17:22:35 +01:00
Alexander Bayandin	504ca7720f	CI(gather-rust-build-stats): fix build with libpq (#8219 ) ## Problem I've missed setting `PQ_LIB_DIR` in https://github.com/neondatabase/neon/pull/8206 in `gather-rust-build-stats` job and it fails now: ``` = note: /usr/bin/ld: cannot find -lpq collect2: error: ld returned 1 exit status error: could not compile `storage_controller` (bin "storage_controller") due to 1 previous error ``` https://github.com/neondatabase/neon/actions/runs/9743960062/job/26888597735 ## Summary of changes - Set `PQ_LIB_DIR` for `gather-rust-build-stats` job	2024-07-08 17:22:35 +01:00
Alex Chi Z	cf4ea92aad	fix(pageserver): include aux file in basebackup only once (#8207 ) Extracted from https://github.com/neondatabase/neon/pull/6560, currently we include multiple copies of aux files in the basebackup. ## Summary of changes Fix the loop. Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-08 17:22:35 +01:00
Alexander Bayandin	325294bced	CI(build-tools): Remove libpq from build image (#8206 ) ## Problem We use `build-tools` image as a base image to build other images, and it has a pretty old `libpq-dev` installed (v13; it wasn't that old until I removed system Postgres 14 from `build-tools` image in https://github.com/neondatabase/neon/pull/6540) ## Summary of changes - Remove `libpq-dev` from `build-tools` image - Set `LD_LIBRARY_PATH` for tests (for different Postgres binaries that we use, like psql and pgbench) - Set `PQ_LIB_DIR` to build Storage Controller - Set `LD_LIBRARY_PATH`/`DYLD_LIBRARY_PATH` in the Storage Controller where it calls Postgres binaries	2024-07-08 17:22:35 +01:00
John Spray	86c8ba2563	pageserver: add metric `pageserver_secondary_resident_physical_size` (#8204 ) ## Problem We lack visibility of how much local disk space is used by secondary tenant locations Close: https://github.com/neondatabase/neon/issues/8181 ## Summary of changes - Add `pageserver_secondary_resident_physical_size`, tagged by tenant - Register & de-register label sets from SecondaryTenant - Add+use wrappers in SecondaryDetail that update metrics when adding+removing layers/timelines	2024-07-08 17:22:35 +01:00
Arseny Sher	feeb2dc6fa	Merge pull request #8217 from neondatabase/rc/2024-07-01 Storage & Compute release 2024-07-01	2024-07-04 20:22:51 +03:00
Heikki Linnakangas	57f476ff5a	Restore running xacts from CLOG on replica startup (#7288 ) We have one pretty serious MVCC visibility bug with hot standby replicas. We incorrectly treat any transactions that are in progress in the primary, when the standby is started, as aborted. That can break MVCC for queries running concurrently in the standby. It can also lead to hint bits being set incorrectly, and that damage can last until the replica is restarted. The fundamental bug was that we treated any replica start as starting from a shut down server. The fix for that is straightforward: we need to set 'wasShutdown = false' in InitWalRecovery() (see changes in the postgres repo). However, that introduces a new problem: with wasShutdown = false, the standby will not open up for queries until it receives a running-xacts WAL record from the primary. That's correct, and that's how Postgres hot standby always works. But it's a problem for Neon, because: * It changes the historical behavior for existing users. Currently, the standby immediately opens up for queries, so if they now need to wait, we can breka existing use cases that were working fine (assuming you don't hit the MVCC issues). * The problem is much worse for Neon than it is for standalone PostgreSQL, because in Neon, we can start a replica from an arbitrary LSN. In standalone PostgreSQL, the replica always starts WAL replay from a checkpoint record, and the primary arranges things so that there is always a running-xacts record soon after each checkpoint record. You can still hit this issue with PostgreSQL if you have a transaction with lots of subtransactions running in the primary, but it's pretty rare in practice. To mitigate that, we introduce another way to collect the running-xacts information at startup, without waiting for the running-xacts WAL record: We can the CLOG for XIDs that haven't been marked as committed or aborted. It has limitations with subtransactions too, but should mitigate the problem for most users. See https://github.com/neondatabase/neon/issues/7236. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-04 18:58:34 +03:00
Heikki Linnakangas	7ee2bebdb7	tests: Make neon_xlogflush() flush all WAL, if you omit the LSN arg This makes it much more convenient to use in the common case that you want to flush all the WAL. (Passing pg_current_wal_insert_lsn() as the argument doesn't work for the same reasons as explained in the comments: we need to be back off to the beginning of a page if the previous record ended at page boundary.) I plan to use this to fix the issue that Arseny Sher called out at https://github.com/neondatabase/neon/pull/7288#discussion_r1660063852	2024-07-04 18:58:28 +03:00
Heikki Linnakangas	be598f1bf4	tests: remove a leftover 'running' flag (#8216 ) The 'running' boolean was replaced with a semaphore in commit `f0e2bb79b2`, but this initialization was missed. Remove it so that if a test tries to access it, you get an error rather than always claiming that the endpoint is not running. Spotted by Arseny at https://github.com/neondatabase/neon/pull/7288#discussion_r1660068657	2024-07-04 18:58:20 +03:00
John Spray	939b5954a5	Merge pull request #8138 from neondatabase/rc/2024-06-24 Storage & Compute release 2024-06-24	2024-06-24 10:57:45 +01:00
Arpad Müller	371020fe6a	Merge pull request #8069 from neondatabase/rc/2024-06-17 Release 2024-06-17	2024-06-17 15:29:35 +02:00
Christian Schwarz	f45818abed	Merge pull request #7999 from neondatabase/rc/2024-06-10 Release 2024-06-10	2024-06-10 19:08:03 +02:00
Christian Schwarz	0384267d58	Revert "Include openssl and ICU statically linked" (#8003 ) Reverts neondatabase/neon#7956 Rationale: compute incompatibilties Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1718011276665839?thread_ts=1718008160.431869&cid=C033RQ5SPDH Relevant quotes from @hlinnaka > If we go through with the current release candidate, but the compute is pinned, people who create new projects will get that warning, which is silly. To them, it looks like the ICU version was downgraded, because initdb was run with newer version. > We should upgrade the ICU version eventually. And when we do that, users with old projects that use ICU will start to see that warning. I think that's acceptable, as long as we do homework, notify users, and communicate that properly. > When do that, we should to try to upgrade the storage and compute versions at roughly the same time.	2024-06-10 14:35:50 +02:00
Arseny Sher	62b3bd968a	Merge pull request #7936 from neondatabase/rc/2024-06-03 Release 2024-06-03	2024-06-04 05:41:36 +03:00
Anastasia Lubennikova	e3e3bc3542	Merge pull request #7920 from neondatabase/compute-only-may-31 Compute release 2024-05-31	2024-05-31 12:47:05 +01:00
Konstantin Knizhnik	be014a2222	Do not produce error if gin page is not restored in redo (#7876 ) ## Problem See https://github.com/neondatabase/cloud/issues/10845 ## Summary of changes Do not report error if GIN page is not restored ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-31 09:21:40 +01:00
Joonas Koivunen	2e1fe71cc0	Merge pull request #7888 from neondatabase/rc/2024-05-27 Release 2024-05-27	2024-05-27 20:30:48 +03:00
Konstantin Knizhnik	068c158ca5	Fix connect to PS on MacOS/X (#7885 ) ## Problem After [`0e4f182680`] which introduce async connect Neon is not able to connect to page server. ## Summary of changes Perform sync commit at MacOS/X ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-05-27 13:09:44 +00:00
Sasha Krassovsky	b16e4f689f	Merge pull request #7869 from neondatabase/rc/2024-05-23 Metrics hotfix release	2024-05-23 14:05:30 -07:00
Sasha Krassovsky	dbff725a0c	Remove apostrophe (#7868 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-05-23 13:47:16 -07:00
Andreas Scherbaum	7fa4628434	Merge pull request #7837 from neondatabase/rc/2024-05-22 Compute-Only Release 2024-05-22	2024-05-22 19:34:39 +02:00
Arthur Petukhovsky	fc538a38b9	Merge pull request #7807 from neondatabase/rc/2024-05-20 Release 2024-05-20	2024-05-20 12:16:00 +01:00
Vlad Lazar	c2e7cb324f	Merge pull request #7735 from neondatabase/vlad/release-2024-05-13 Handmade Release 2024-05-13	2024-05-13 16:27:38 +01:00
Vlad Lazar	101043122e	Revert protocol version upgrade (#7727 ) ## Problem "John pointed out that the switch to protocol version 2 made test_gc_aggressive test flaky: https://github.com/neondatabase/neon/issues/7692. I tracked it down, and that is indeed an issue. Conditions for hitting the issue: The problem occurs in the primary GC horizon is set to a very low value, e.g. 0. If the primary is actively writing WAL, and GC runs in the pageserver at the same time that the primary sends a GetPage request, it's possible that the GC advances the GC horizon past the GetPage request's LSN. I'm working on a fix here: https://github.com/neondatabase/neon/pull/7708." - Heikki ## Summary of changes Use protocol version 1 as default.	2024-05-13 14:17:36 +01:00
Christian Schwarz	c4d7d59825	Merge pull request #7615 from neondatabase/rc/2024-05-06 Release 2024-05-06	2024-05-07 09:41:02 +02:00
Arpad Müller	0de1e1d664	Merge pull request #7530 from neondatabase/rc/2024-04-29 Release 2024-04-29	2024-04-29 15:09:58 +02:00
Joonas Koivunen	271598b77f	Merge pull request #7447 from neondatabase/rc/2024-04-22 Release 2024-04-22	2024-04-22 16:10:03 +03:00
John Spray	459bc479dc	pageserver: fix unlogged relations with sharding (#7454 ) ## Problem - #7451 INIT_FORKNUM blocks must be stored on shard 0 to enable including them in basebackup. This issue can be missed in simple tests because creating an unlogged table isn't sufficient -- to repro I had to create an _index_ on an unlogged table (then restart the endpoint). Closes: #7451 ## Summary of changes - Add a reproducer for the issue. - Tweak the condition for `key_is_shard0` to include anything that isn't a normal relation block _and_ any normal relation block whose forknum is INIT_FORKNUM. - To enable existing databases to recover from the issue, add a special case that omits relations if they were stored on the wrong INITFORK. This enables postgres to start and the user to drop the table and recreate it.	2024-04-22 11:55:24 +00:00
Christian Schwarz	c213373a59	Merge pull request #7378 from neondatabase/rc/2024-04-15 Release 2024-04-15	2024-04-15 15:48:14 +03:00
Em Sharnoff	e0addc100d	Merge pull request #7356 from neondatabase/rc/2024-04-11-#7348 Release 2024-04-11 (cherry-pick #7348 only) See here for more: https://neondb.slack.com/archives/C04DGM6SMTM/p1712776981582679	2024-04-11 09:46:34 -07:00
Em Sharnoff	0519138b04	compute_ctl: Auto-set dynamic_shared_memory_type (#7348 ) Part of neondatabase/cloud#12047. The basic idea is that for our VMs, we want to enable swap and disable Linux memory overcommit. Alongside these, we should set postgres' dynamic_shared_memory_type to mmap, but we want to avoid setting it to mmap if swap is not enabled. Implementing this in the control plane would be fiddly, but it's relatively straightforward to add to compute_ctl.	2024-04-10 13:13:08 -07:00
Vlad Lazar	5da39b469c	Merge pull request #7338 from neondatabase/rc/2024-04-08 Release 2024-04-08	2024-04-08 13:10:24 +01:00
Arseny Sher	82027e22dd	Merge pull request #7284 from neondatabase/rc/2024-04-01 Release 2024-04-01	2024-04-02 18:15:28 +03:00
Alex Chi Z	c431e2f1c5	Merge pull request #7263 from neondatabase/rc/2024-03-27 Release 2024-03-27 - compute only release	2024-03-27 14:52:38 -04:00
John Spray	4e5724d9c3	Merge pull request #7248 from neondatabase/rc/2024-03-26 Release 2024-03-26	2024-03-26 15:17:00 +00:00
John Spray	0d3e499059	Merge pull request #7219 from neondatabase/rc/2024-03-25 Release 2024-03-25	2024-03-25 12:28:09 +00:00
Arpad Müller	7b860b837c	Merge pull request #7154 from neondatabase/rc/2024-03-18 Release 2024-03-18	2024-03-19 12:07:14 +01:00
Christian Schwarz	41fc96e20f	fixup(#7160 / tokio_epoll_uring_ext): double-panic caused by info! in thread-local's drop() (#7164 ) Manual testing of the changes in #7160 revealed that, if the thread-local destructor ever runs (it apparently doesn't in our test suite runs, otherwise #7160 would not have auto-merged), we can encounter an `abort()` due to a double-panic in the tracing code. This github comment here contains the stack trace: https://github.com/neondatabase/neon/pull/7160#issuecomment-2003778176 This PR reverts #7160 and uses a atomic counter to identify the thread-local in log messages, instead of the memory address of the thread local, which may be re-used.	2024-03-18 16:28:17 +01:00
Christian Schwarz	fb2b1ce57b	fixup(#7141 / tokio_epoll_uring_ext): high frequency log message The PR #7141 added log message ``` ThreadLocalState is being dropped and id might be re-used in the future ``` which was supposed to be emitted when the thread-local is destroyed. Instead, it was emitted on _each_ call to `thread_local_system()`, ie.., on each tokio-epoll-uring operation.	2024-03-18 13:01:17 +01:00
Joonas Koivunen	464717451b	build: make procfs linux only dependency (#7156 ) the dependency refuses to build on macos so builds on `main` are broken right now, including the `release` PR.	2024-03-18 09:32:49 +00:00
Joonas Koivunen	c6ed86d3d0	Merge pull request #7081 from neondatabase/rc/2024-03-11 Release 2024-03-11	2024-03-11 14:41:39 +02:00
Roman Zaynetdinov	f0a9017008	Export db size, deadlocks and changed row metrics (#7050 ) ## Problem We want to report metrics for the oldest user database.	2024-03-11 11:55:06 +00:00
Christian Schwarz	bb7949ba00	Merge pull request #6993 from neondatabase/rc/2024-03-04 Release 2024-03-04	2024-03-04 13:08:44 +01:00
Arthur Petukhovsky	1df0f69664	Merge pull request #6973 from neondatabase/rc/2024-02-29-manual Release 2024-02-29	2024-02-29 17:26:33 +00:00
Vlad Lazar	970066a914	libs: fix expired token in auth decode test (#6963 ) The test token expired earlier today (1709200879). I regenerated the token, but without an expiration date this time.	2024-02-29 17:23:25 +00:00
Arthur Petukhovsky	1ebd3897c0	Merge pull request #6956 from neondatabase/rc/2024-02-28 Release 2024-02-28	2024-02-29 16:39:52 +00:00
Arthur Petukhovsky	6460beffcd	Merge pull request #6901 from neondatabase/rc/2024-02-26 Release 2024-02-26	2024-02-26 17:08:19 +00:00
John Spray	6f7f8958db	pageserver: only write out legacy tenant config if no generation (#6891 ) ## Problem Previously we always wrote out both legacy and modern tenant config files. The legacy write enabled rollbacks, but we are long past the point where that is needed. We still need the legacy format for situations where someone is running tenants without generations (that will be yanked as well eventually), but we can avoid writing it out at all if we do have a generation number set. We implicitly also avoid writing the legacy config if our mode is Secondary (secondary mode is newer than generations). ## Summary of changes - Make writing legacy tenant config conditional on there being no generation number set.	2024-02-26 10:25:25 +00:00
Christian Schwarz	936a00e077	pageserver: remove two obsolete/unused per-timeline metrics (#6893 ) over-compensating the addition of a new per-timeline metric in https://github.com/neondatabase/neon/pull/6834 part of https://github.com/neondatabase/neon/issues/6737	2024-02-26 09:16:24 +00:00
Nikita Kalyanov	96a4e8de66	Add /terminate API (#6745 ) (#6853 ) this is to speed up suspends, see https://github.com/neondatabase/cloud/issues/10284 Cherry-pick to release branch to build new compute images	2024-02-22 11:51:19 +02:00
Arseny Sher	01180666b0	Merge pull request #6803 from neondatabase/releases/2024-02-19 Release 2024-02-19	2024-02-19 16:38:35 +04:00
Conrad Ludgate	6c94269c32	Merge pull request #6758 from neondatabase/release-proxy-2024-02-14 2024-02-14 Proxy Release	2024-02-15 09:45:08 +00:00
Anna Khanova	edc691647d	Proxy: remove fail fast logic to connect to compute (#6759 ) ## Problem Flaky tests ## Summary of changes Remove failfast logic	2024-02-15 07:42:12 +00:00
Conrad Ludgate	855d7b4781	hold cancel session (#6750 ) ## Problem In a recent refactor, we accidentally dropped the cancel session early ## Summary of changes Hold the cancel session during proxy passthrough	2024-02-14 14:57:22 +00:00
Anna Khanova	c49c9707ce	Proxy: send cancel notifications to all instances (#6719 ) ## Problem If cancel request ends up on the wrong proxy instance, it doesn't take an effect. ## Summary of changes Send redis notifications to all proxy pods about the cancel request. Related issue: https://github.com/neondatabase/neon/issues/5839, https://github.com/neondatabase/cloud/issues/10262	2024-02-14 14:57:22 +00:00
Anna Khanova	2227540a0d	Proxy refactor auth+connect (#6708 ) ## Problem Not really a problem, just refactoring. ## Summary of changes Separate authenticate from wake compute. Do not call wake compute second time if we managed to connect to postgres or if we got it not from cache.	2024-02-14 14:57:22 +00:00
Conrad Ludgate	f1347f2417	proxy: add more http logging (#6726 ) ## Problem hard to see where time is taken during HTTP flow. ## Summary of changes add a lot more for query state. add a conn_id field to the sql-over-http span	2024-02-14 14:57:22 +00:00
Conrad Ludgate	30b295b017	proxy: some more parquet data (#6711 ) ## Summary of changes add auth_method and database to the parquet logs	2024-02-14 14:57:22 +00:00
Anna Khanova	1cef395266	Proxy: copy bidirectional fork (#6720 ) ## Problem `tokio::io::copy_bidirectional` doesn't close the connection once one of the sides closes it. It's not really suitable for the postgres protocol. ## Summary of changes Fork `copy_bidirectional` and initiate a shutdown for both connections. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-02-14 14:57:22 +00:00
John Spray	78d160f76d	Merge pull request #6721 from neondatabase/releases/2024-02-12 Release 2024-02-12	2024-02-12 09:35:30 +00:00
Vlad Lazar	b9238059d6	Merge pull request #6617 from neondatabase/releases/2024-02-05 Release 2024-02-05	2024-02-05 12:50:38 +00:00
Arpad Müller	d0cb4b88c8	Don't preserve temp files on creation errors of delta layers (#6612 ) There is currently no cleanup done after a delta layer creation error, so delta layers can accumulate. The problem gets worse as the operation gets retried and delta layers accumulate on the disk. Therefore, delete them from disk (if something has been written to disk).	2024-02-05 09:58:18 +00:00
John Spray	1ec3e39d4e	Merge pull request #6504 from neondatabase/releases/2024-01-29 Release 2024-01-29	2024-01-29 10:05:01 +00:00
John Spray	a1a74eef2c	Merge pull request #6420 from neondatabase/releases/2024-01-22 Release 2024-01-22	2024-01-22 17:24:11 +00:00
John Spray	90e689adda	pageserver: mark tenant broken when cancelling attach (#6430 ) ## Problem When a tenant is in Attaching state, and waiting for the `concurrent_tenant_warmup` semaphore, it also listens for the tenant cancellation token. When that token fires, Tenant::attach drops out. Meanwhile, Tenant::set_stopping waits forever for the tenant to exit Attaching state. Fixes: https://github.com/neondatabase/neon/issues/6423 ## Summary of changes - In the absence of a valid state for the tenant, it is set to Broken in this path. A more elegant solution will require more refactoring, beyond this minimal fix. (cherry picked from commit `93572a3e99`)	2024-01-22 16:20:57 +00:00
Christian Schwarz	f0b2d4b053	fixup(#6037 ): actually fix the issue, #6388 failed to do so (#6429 ) Before this patch, the select! still retured immediately if `futs` was empty. Must have tested a stale build in my manual testing of #6388. (cherry picked from commit `15c0df4de7`)	2024-01-22 15:23:12 +00:00
Anna Khanova	299d9474c9	Proxy: fix gc (#6426 ) ## Problem Gc currently doesn't work properly. ## Summary of changes Change statement on running gc.	2024-01-22 14:39:09 +01:00
Conrad Ludgate	7234208b36	bump shlex (#6421 ) ## Problem https://rustsec.org/advisories/RUSTSEC-2024-0006 ## Summary of changes `cargo update -p shlex` (cherry picked from commit `5559b16953`)	2024-01-22 09:49:33 +00:00
Christian Schwarz	93450f11f5	Merge pull request #6354 from neondatabase/releases/2024-01-15 Release 2024-01-15 NB: the previous release PR https://github.com/neondatabase/neon/pull/6286 was accidentally merged by merge-by-squash instead of merge-by-merge-commit. See https://github.com/neondatabase/neon/pull/6354#issuecomment-1891706321 for more context.	2024-01-15 14:30:25 +01:00
Christian Schwarz	2f0f9edf33	Merge remote-tracking branch 'origin/release' into releases/2024-01-15	2024-01-15 09:36:42 +00:00
Christian Schwarz	d424f2b7c8	empty commit so we can produce a merge commit	2024-01-15 09:36:22 +00:00
Christian Schwarz	21315e80bc	Merge branch 'releases/2024-01-08--not-squashed' into releases/2024-01-15	2024-01-15 09:31:07 +00:00
vipvap	483b66d383	Merge branch 'release' into releases/2024-01-08 (not-squashed merge of #6286 ) Release PR https://github.com/neondatabase/neon/pull/6286 got accidentally merged-by-squash intstead of merge-by-merge-commit. This commit shows how things would look like if 6286 had been merged-by-squash. ``` git reset --hard `9f1327772` git merge --no-ff `5c0264b591` ``` Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-01-15 09:28:08 +00:00
vipvap	aa72a22661	Release 2024-01-08 (#6286 ) Release 2024-01-08	2024-01-08 09:26:27 +00:00
Shany Pozin	5c0264b591	Merge branch 'release' into releases/2024-01-08	2024-01-08 09:34:06 +02:00
Arseny Sher	9f13277729	Merge pull request #6242 from neondatabase/releases/2024-01-02 Release 2024-01-02	2024-01-02 12:04:43 +04:00
Arseny Sher	54aa319805	Don't split WAL record across two XLogData's when sending from safekeepers. As protocol demands. Not following this makes standby complain about corrupted WAL in various ways. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 closes https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:54:00 +04:00
Arseny Sher	4a227484bf	Add large insertion and slow WAL sending to test_hot_standby. To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL records torn across several XLogData messages. Add failpoint to safekeeper to slow down sending. Also check for corrupted WAL complains in standby log. Make the test a bit simpler in passing, e.g. we don't need explicit commits as autocommit is enabled by default. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:54:00 +04:00
Arseny Sher	2f83f85291	Add failpoint support to safekeeper. Just a copy paste from pageserver.	2024-01-02 10:54:00 +04:00
Arseny Sher	d6cfcb0d93	Move failpoint support code to utils. To enable them in safekeeper as well.	2024-01-02 10:54:00 +04:00
Arseny Sher	392843ad2a	Fix safekeeper START_REPLICATION (term=n). It was giving WAL only up to commit_lsn instead of flush_lsn, so recovery of uncommitted WAL since `cdb08f03` hanged. Add test for this.	2024-01-02 10:54:00 +04:00
Arseny Sher	bd4dae8f4a	compute_ctl: kill postgres and sync-safekeeprs on exit. Otherwise they are left orphaned when compute_ctl is terminated with a signal. It was invisible most of the time because normally neon_local or k8s kills postgres directly and then compute_ctl finishes gracefully. However, in some tests compute_ctl gets stuck waiting for sync-safekeepers which intentionally never ends because safekeepers are offline, and we want to stop compute_ctl without leaving orphanes behind. This is a quite rough approach which doesn't wait for children termination. A better way would be to convert compute_ctl to async which would make waiting easy.	2024-01-02 10:54:00 +04:00
Shany Pozin	b05fe53cfd	Merge pull request #6240 from neondatabase/releases/2024-01-01 Release 2024-01-01	2024-01-01 11:07:30 +02:00
Christian Schwarz	c13a2f0df1	Merge pull request #6192 from neondatabase/releases/2023-12-19 Release 2023-12-19 We need to do a config change that requires restarting the pageservers. Slip in two metrics-related commits that didn't make this week's regularly release.	2023-12-19 14:52:47 +01:00
Christian Schwarz	39be366fc5	higher resolution histograms for getpage@lsn (#6177 ) part of https://github.com/neondatabase/cloud/issues/7811	2023-12-19 13:46:59 +00:00
Christian Schwarz	6eda0a3158	[PRE-MERGE] fix metric `pageserver_initial_logical_size_start_calculation` (This is a pre-merge cherry-pick of https://github.com/neondatabase/neon/pull/6191) It wasn't being incremented. Fixup of commit `1c88824ed0` Author: Christian Schwarz <christian@neon.tech> Date: Fri Dec 1 12:52:59 2023 +0100 initial logical size calculation: add a bunch of metrics (#5995)	2023-12-19 13:46:55 +00:00
Shany Pozin	306c7a1813	Merge pull request #6173 from neondatabase/sasha_release_bypassrls_replication Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles	2023-12-18 22:16:36 +02:00
Sasha Krassovsky	80be423a58	Grant BYPASSRLS and REPLICATION explicitly to neon_superuser roles	2023-12-18 10:22:36 -08:00
Shany Pozin	5dcfef82f2	Merge pull request #6163 from neondatabase/releases/2023-12-18 Release 2023-12-18-2	2023-12-18 15:34:17 +02:00
Christian Schwarz	e67b8f69c0	[PRE-MERGE] pageserver: Reduce tracing overhead in timeline::get #6115 Pre-merge `git merge --squash` of https://github.com/neondatabase/neon/pull/6115 Lowering the tracing level in get_value_reconstruct_data and get_or_maybe_download from info to debug reduces the overhead of span creation in non-debug environments.	2023-12-18 13:39:48 +01:00
Shany Pozin	e546872ab4	Merge pull request #6158 from neondatabase/releases/2023-12-18 Release 2023-12-18	2023-12-18 14:24:34 +02:00
John Spray	322ea1cf7c	pageserver: on-demand activation cleanups (#6157 ) ## Problem #6112 added some logs and metrics: clean these up a bit: - Avoid counting startup completions for tenants launched after startup - exclude no-op cases from timing histograms - remove a rogue log messages	2023-12-18 11:14:19 +00:00
Vadim Kharitonov	3633742de9	Merge pull request #6121 from neondatabase/releases/2023-12-13 Release 2023-12-13	2023-12-13 12:39:43 +01:00
Joonas Koivunen	079d3a37ba	Merge remote-tracking branch 'origin/release' into releases/2023-12-13 this handles the hotfix introduced conflict.	2023-12-13 10:07:19 +00:00
Vadim Kharitonov	a46e77b476	Merge pull request #6090 from neondatabase/releases/2023-12-11 Release 2023-12-11	2023-12-12 12:10:35 +01:00
Tristan Partin	a92702b01e	Add submodule paths as safe directories as a precaution The check-codestyle-rust-arm job requires this for some reason, so let's just add them everywhere we do this workaround.	2023-12-11 22:00:35 +00:00
Tristan Partin	8ff3253f20	Fix git ownership issue in check-codestyle-rust-arm We have this workaround for other jobs. Looks like this one was forgotten about.	2023-12-11 22:00:35 +00:00
Joonas Koivunen	04b82c92a7	fix: accidential return Ok (#6106 ) Error indicating request cancellation OR timeline shutdown was deemed as a reason to exit the background worker that calculated synthetic size. Fix it to only be considered for avoiding logging such of such errors. This conflicted on tenant_shard_id having already replaced tenant_id on `main`.	2023-12-11 21:41:36 +00:00
Vadim Kharitonov	e5bf423e68	Merge branch 'release' into releases/2023-12-11	2023-12-11 11:55:48 +01:00
Vadim Kharitonov	60af392e45	Merge pull request #6057 from neondatabase/vk/patch_timescale_for_production Revert timescaledb for pg14 and pg15 (#6056)	2023-12-06 16:21:16 +01:00
Vadim Kharitonov	661fc41e71	Revert timescaledb for pg14 and pg15 (#6056 ) ``` could not start the compute node: compute is in state "failed": db error: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory Caused by: ERROR: could not access file "$libdir/timescaledb-2.10.1": No such file or directory ```	2023-12-06 16:14:07 +01:00
Shany Pozin	702c488f32	Merge pull request #6022 from neondatabase/releases/2023-12-04 Release 2023-12-04	2023-12-05 17:03:28 +02:00
Sasha Krassovsky	45c5122754	Remove trusted from wal2json	2023-12-04 12:36:19 -08:00
Shany Pozin	558394f710	fix merge	2023-12-04 11:41:27 +02:00
Shany Pozin	73b0898608	Merge branch 'release' into releases/2023-12-04	2023-12-04 11:36:26 +02:00
Joonas Koivunen	e65be4c2dc	Merge pull request #6013 from neondatabase/releases/2023-12-01-hotfix fix: use create_new instead of create for mutex file	2023-12-01 15:35:56 +02:00
Joonas Koivunen	40087b8164	fix: use create_new instead of create for mutex file	2023-12-01 12:54:49 +00:00
Shany Pozin	c762b59483	Merge pull request #5986 from neondatabase/Release-11-30-hotfix Notify safekeeper readiness with systemd.	2023-11-30 10:01:05 +02:00
Arseny Sher	5d71601ca9	Notify safekeeper readiness with systemd. To avoid downtime during deploy, as in busy regions initial load can currently take ~30s.	2023-11-30 08:23:31 +03:00
Shany Pozin	a113c3e433	Merge pull request #5945 from neondatabase/release-2023-11-28-hotfix Release 2023 11 28 hotfix	2023-11-28 08:14:59 +02:00
Anastasia Lubennikova	e81fc598f4	Update neon extension relocatable for existing installations (#5943 )	2023-11-28 00:12:39 +00:00
Anastasia Lubennikova	48b845fa76	Make neon extension relocatable to allow SET SCHEMA (#5942 )	2023-11-28 00:12:32 +00:00
Shany Pozin	27096858dc	Merge pull request #5922 from neondatabase/releases/2023-11-27 Release 2023-11-27	2023-11-27 09:58:51 +02:00
Shany Pozin	4430d0ae7d	Merge pull request #5876 from neondatabase/releases/2023-11-17 Release 2023-11-17	2023-11-20 09:11:58 +02:00
Joonas Koivunen	6e183aa0de	Merge branch 'main' into releases/2023-11-17	2023-11-19 15:25:47 +00:00
Vadim Kharitonov	fd6d0b7635	Merge branch 'release' into releases/2023-11-17	2023-11-17 10:51:45 +01:00
Vadim Kharitonov	3710c32aae	Merge pull request #5778 from neondatabase/releases/2023-11-03 Release 2023-11-03	2023-11-03 16:06:58 +01:00
Vadim Kharitonov	be83bee49d	Merge branch 'release' into releases/2023-11-03	2023-11-03 11:18:15 +01:00
Alexander Bayandin	cf28e5922a	Merge pull request #5685 from neondatabase/releases/2023-10-26 Release 2023-10-26	2023-10-27 10:42:12 +01:00
Em Sharnoff	7d384d6953	Bump vm-builder v0.18.2 -> v0.18.4 (#5666 ) Only applicable change was neondatabase/autoscaling#584, setting pgbouncer auth_dbname=postgres in order to fix superuser connections from preventing dropping databases.	2023-10-26 20:15:45 +01:00
Em Sharnoff	4b3b37b912	Bump vm-builder v0.18.1 -> v0.18.2 (#5646 ) Only applicable change was neondatabase/autoscaling#571, removing the postgres_exporter flags `--auto-discover-databases` and `--exclude-databases=...`	2023-10-26 20:15:29 +01:00
Shany Pozin	1d8d200f4d	Merge pull request #5668 from neondatabase/sp/aux_files_cherry_pick Cherry pick: Ignore missed AUX_FILES_KEY when generating image layer (#5660)	2023-10-26 10:08:16 +03:00
Konstantin Knizhnik	0d80d6ce18	Ignore missed AUX_FILES_KEY when generating image layer (#5660 ) ## Problem Logical replication requires new AUX_FILES_KEY which is definitely absent in existed database. We do not have function to check if key exists in our KV storage. So I have to handle the error in `list_aux_files` method. But this key is also included in key space range and accessed y `create_image_layer` method. ## Summary of changes Check if AUX_FILES_KEY exists before including it in keyspace. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-10-26 09:30:28 +03:00
Shany Pozin	f653ee039f	Merge pull request #5638 from neondatabase/releases/2023-10-24 Release 2023-10-24	2023-10-24 12:10:52 +03:00
Em Sharnoff	e614a95853	Merge pull request #5610 from neondatabase/sharnoff/rc-2023-10-20-vm-monitor-fixes Release 2023-10-20: vm-monitor memory.high throttling fixes	2023-10-20 00:11:06 -07:00
Em Sharnoff	850db4cc13	vm-monitor: Deny not fail downscale if no memory stats yet (#5606 ) Fixes an issue we observed on staging that happens when the autoscaler-agent attempts to immediately downscale the VM after binding, which is typical for pooled computes. The issue was occurring because the autoscaler-agent was requesting downscaling before the vm-monitor had gathered sufficient cgroup memory stats to be confident in approving it. When the vm-monitor returned an internal error instead of denying downscaling, the autoscaler-agent retried the connection and immediately hit the same issue (in part because cgroup stats are collected per-connection, rather than globally).	2023-10-19 21:56:55 -07:00
Em Sharnoff	8a316b1277	vm-monitor: Log full error on message handling failure (#5604 ) There's currently an issue with the vm-monitor on staging that's not really feasible to debug because the current display impl gives no context to the errors (just says "failed to downscale"). Logging the full error should help. For communications with the autoscaler-agent, it's ok to only provide the outermost cause, because we can cross-reference with the VM logs. At some point in the future, we may want to change that.	2023-10-19 21:56:50 -07:00
Em Sharnoff	4d13bae449	vm-monitor: Switch from memory.high to polling memory.stat (#5524 ) tl;dr it's really hard to avoid throttling from memory.high, and it counts tmpfs & page cache usage, so it's also hard to make sense of. In the interest of fixing things quickly with something that should be good enough, this PR switches to instead periodically fetch memory statistics from the cgroup's memory.stat and use that data to determine if and when we should upscale. This PR fixes #5444, which has a lot more detail on the difficulties we've hit with memory.high. This PR also supersedes #5488.	2023-10-19 21:56:36 -07:00
Vadim Kharitonov	49377abd98	Merge pull request #5577 from neondatabase/releases/2023-10-17 Release 2023-10-17	2023-10-17 12:21:20 +02:00
Christian Schwarz	a6b2f4e54e	limit imitate accesses concurrency, using same semaphore as compactions (#5578 ) Before this PR, when we restarted pageserver, we'd see a rush of `$number_of_tenants` concurrent eviction tasks starting to do imitate accesses building up in the period of `[init_order allows activations, $random_access_delay + EvictionPolicyLayerAccessThreshold::period]`. We simply cannot handle that degree of concurrent IO. We already solved the problem for compactions by adding a semaphore. So, this PR shares that semaphore for use by evictions. Part of https://github.com/neondatabase/neon/issues/5479 Which is again part of https://github.com/neondatabase/neon/issues/4743 Risks / Changes In System Behavior ================================== * we don't do evictions as timely as we currently do * we log a bunch of warnings about eviction taking too long * imitate accesses and compactions compete for the same concurrency limit, so, they'll slow each other down through this shares semaphore Changes ======= - Move the `CONCURRENT_COMPACTIONS` semaphore into `tasks.rs` - Rename it to `CONCURRENT_BACKGROUND_TASKS` - Use it also for the eviction imitate accesses: - Imitate acceses are both per-TIMELINE and per-TENANT - The per-TENANT is done through coalescing all the per-TIMELINE tasks via a tokio mutex `eviction_task_tenant_state`. - We acquire the CONCURRENT_BACKGROUND_TASKS permit early, at the beginning of the eviction iteration, much before the imitate acesses start (and they may not even start at all in the given iteration, as they happen only every $threshold). - Acquiring early is sub-optimal because when the per-timline tasks coalesce on the `eviction_task_tenant_state` mutex, they are already holding a CONCURRENT_BACKGROUND_TASKS permit. - It's also unfair because tenants with many timelines win the CONCURRENT_BACKGROUND_TASKS more often. - I don't think there's another way though, without refactoring more of the imitate accesses logic, e.g, making it all per-tenant. - Add metrics for queue depth behind the semaphore. I found these very useful to understand what work is queued in the system. - The metrics are tagged by the new `BackgroundLoopKind`. - On a green slate, I would have used `TaskKind`, but we already had pre-existing labels whose names didn't map exactly to task kind. Also the task kind is kind of a lower-level detail, so, I think it's fine to have a separate enum to identify background work kinds. Future Work =========== I guess I could move the eviction tasks from a ticker to "sleep for $period". The benefit would be that the semaphore automatically "smears" the eviction task scheduling over time, so, we only have the rush on restart but a smeared-out rush afterward. The downside is that this perverts the meaning of "$period", as we'd actually not run the eviction at a fixed period. It also means the the "took to long" warning & metric becomes meaningless. Then again, that is already the case for the compaction and gc tasks, which do sleep for `$period` instead of using a ticker. (cherry picked from commit `9256788273`)	2023-10-17 12:16:26 +02:00
Shany Pozin	face60d50b	Merge pull request #5526 from neondatabase/releases/2023-10-11 Release 2023-10-11	2023-10-11 11:16:39 +03:00
Shany Pozin	9768aa27f2	Merge pull request #5516 from neondatabase/releases/2023-10-10 Release 2023-10-10	2023-10-10 14:16:47 +03:00
Shany Pozin	96b2e575e1	Merge pull request #5445 from neondatabase/releases/2023-10-03 Release 2023-10-03	2023-10-04 13:53:37 +03:00
Alexander Bayandin	7222777784	Update checksums for pg_jsonschema & pg_graphql (#5455 ) ## Problem Folks have re-taged releases for `pg_jsonschema` and `pg_graphql` (to increase timeouts on their CI), for us, these are a noop changes, but unfortunately, this will cause our builds to fail due to checksums mismatch (this might not strike right away because of the build cache). - `8ba7c7be9d` - `aa7509370a` ## Summary of changes - `pg_jsonschema` update checksum - `pg_graphql` update checksum	2023-10-03 18:44:30 +01:00
Em Sharnoff	5469fdede0	Merge pull request #5422 from neondatabase/sharnoff/rc-2023-09-28-fix-restart-on-postmaster-SIGKILL Release 2023-09-28: Fix (lack of) restart on neonvm postmaster SIGKILL	2023-09-28 10:48:51 -07:00
MMeent	72aa6b9fdd	Fix neon_zeroextend's WAL logging (#5387 ) When you log more than a few blocks, you need to reserve the space in advance. We didn't do that, so we got errors. Now we do that, and shouldn't get errors.	2023-09-28 09:37:28 -07:00
Em Sharnoff	ae0634b7be	Bump vm-builder v0.17.11 -> v0.17.12 (#5407 ) Only relevant change is neondatabase/autoscaling#534 - refer there for more details.	2023-09-28 09:28:04 -07:00
Shany Pozin	70711f32fa	Merge pull request #5375 from neondatabase/releases/2023-09-26 Release 2023-09-26	2023-09-26 15:19:45 +03:00
Vadim Kharitonov	52a88af0aa	Merge pull request #5336 from neondatabase/releases/2023-09-19 Release 2023-09-19	2023-09-19 11:16:43 +02:00
Alexander Bayandin	b7a43bf817	Merge branch 'release' into releases/2023-09-19	2023-09-19 09:07:20 +01:00
Alexander Bayandin	dce91b33a4	Merge pull request #5318 from neondatabase/releases/2023-09-15-1 Postgres 14/15: Use previous extensions versions	2023-09-15 16:30:44 +01:00
Alexander Bayandin	23ee4f3050	Revert plv8 only	2023-09-15 15:45:23 +01:00
Alexander Bayandin	46857e8282	Postgres 14/15: Use previous extensions versions	2023-09-15 15:27:00 +01:00
Alexander Bayandin	368ab0ce54	Merge pull request #5313 from neondatabase/releases/2023-09-15 Release 2023-09-15	2023-09-15 10:39:56 +01:00
Konstantin Knizhnik	a5987eebfd	References to old and new blocks were mixed in xlog_heap_update handler (#5312 ) ## Problem See https://neondb.slack.com/archives/C05L7D1JAUS/p1694614585955029 https://www.notion.so/neondatabase/Duplicate-key-issue-651627ce843c45188fbdcb2d30fd2178 ## Summary of changes Swap old/new block references ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-15 10:11:41 +01:00
Alexander Bayandin	6686ede30f	Update checksum for pg_hint_plan (#5309 ) ## Problem The checksum for `pg_hint_plan` doesn't match: ``` sha256sum: WARNING: 1 computed checksum did NOT match ``` Ref https://github.com/neondatabase/neon/actions/runs/6185715461/job/16793609251?pr=5307 It seems that the release was retagged yesterday: https://github.com/ossc-db/pg_hint_plan/releases/tag/REL16_1_6_0 I don't see any malicious changes from 15_1.5.1: https://github.com/ossc-db/pg_hint_plan/compare/REL15_1_5_1...REL16_1_6_0, so it should be ok to update. ## Summary of changes - Update checksum for `pg_hint_plan` 16_1.6.0	2023-09-15 09:54:42 +01:00
Em Sharnoff	373c7057cc	vm-monitor: Fix cgroup throttling (#5303 ) I believe this (not actual IO problems) is the cause of the "disk speed issue" that we've had for VMs recently. See e.g.: 1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH 2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659 The vm-informant (and now, the vm-monitor, its replacement) is supposed to gradually increase the `neon-postgres` cgroup's memory.high value, because otherwise the kernel will throttle all the processes in the cgroup. This PR fixes a bug with the vm-monitor's implementation of this behavior. --- Other references, for the vm-informant's implementation: - Original issue: neondatabase/autoscaling#44 - Original PR: neondatabase/autoscaling#223	2023-09-15 09:54:42 +01:00
Shany Pozin	7d6ec16166	Merge pull request #5296 from neondatabase/releases/2023-09-13 Release 2023-09-13	2023-09-13 13:49:14 +03:00
Shany Pozin	0e6fdc8a58	Merge pull request #5283 from neondatabase/releases/2023-09-12 Release 2023-09-12	2023-09-12 14:56:47 +03:00
Christian Schwarz	521438a5c6	fix deadlock around TENANTS (#5285 ) The sequence that can lead to a deadlock: 1. DELETE request gets all the way to `tenant.shutdown(progress, false).await.is_err() ` , while holding TENANTS.read() 2. POST request for tenant creation comes in, calls `tenant_map_insert`, it does `let mut guard = TENANTS.write().await;` 3. Something that `tenant.shutdown()` needs to wait for needs a `TENANTS.read().await`. The only case identified in exhaustive manual scanning of the code base is this one: Imitate size access does `get_tenant().await`, which does `TENANTS.read().await` under the hood. In the above case (1) waits for (3), (3)'s read-lock request is queued behind (2)'s write-lock, and (2) waits for (1). Deadlock. I made a reproducer/proof-that-above-hypothesis-holds in https://github.com/neondatabase/neon/pull/5281 , but, it's not ready for merge yet and we want the fix _now_. fixes https://github.com/neondatabase/neon/issues/5284	2023-09-12 14:13:13 +03:00
Vadim Kharitonov	07d7874bc8	Merge pull request #5202 from neondatabase/releases/2023-09-05 Release 2023-09-05	2023-09-05 12:16:06 +02:00
Anastasia Lubennikova	1804111a02	Merge pull request #5161 from neondatabase/rc-2023-08-31 Release 2023-08-31	2023-08-31 16:53:17 +03:00
Arthur Petukhovsky	cd0178efed	Merge pull request #5150 from neondatabase/release-sk-fix-active-timeline Release 2023-08-30	2023-08-30 11:43:39 +02:00
Shany Pozin	333574be57	Merge pull request #5133 from neondatabase/releases/2023-08-29 Release 2023-08-29	2023-08-29 14:02:58 +03:00
Alexander Bayandin	79a799a143	Merge branch 'release' into releases/2023-08-29	2023-08-29 11:17:57 +01:00
Conrad Ludgate	9da06af6c9	Merge pull request #5113 from neondatabase/release-http-connection-fix Release 2023-08-25	2023-08-25 17:21:35 +01:00
Conrad Ludgate	ce1753d036	proxy: dont return connection pending (#5107 ) ## Problem We were returning Pending when a connection had a notice/notification (introduced recently in #5020). When returning pending, the runtime assumes you will call `cx.waker().wake()` in order to continue processing. We weren't doing that, so the connection task would get stuck ## Summary of changes Don't return pending. Loop instead	2023-08-25 16:42:30 +01:00
Alek Westover	67db8432b4	Fix cargo deny errors (#5068 ) ## Problem cargo deny lint broken Links to the CVEs: [rustsec.org/advisories/RUSTSEC-2023-0052](https://rustsec.org/advisories/RUSTSEC-2023-0052) [rustsec.org/advisories/RUSTSEC-2023-0053](https://rustsec.org/advisories/RUSTSEC-2023-0053) One is fixed, the other one isn't so we allow it (for now), to unbreak CI. Then later we'll try to get rid of webpki in favour of the rustls fork. ## Summary of changes ``` +ignore = ["RUSTSEC-2023-0052"] ```	2023-08-25 16:42:30 +01:00
Vadim Kharitonov	4e2e44e524	Enable neon-pool-opt-in (#5062 )	2023-08-22 09:06:14 +01:00
Vadim Kharitonov	ed786104f3	Merge pull request #5060 from neondatabase/releases/2023-08-22 Release 2023-08-22	2023-08-22 09:41:02 +02:00
Stas Kelvich	84b74f2bd1	Merge pull request #4997 from neondatabase/sk/proxy-release-23-07-15 Fix lint	2023-08-15 18:54:20 +03:00
Arthur Petukhovsky	fec2ad6283	Fix lint	2023-08-15 18:49:02 +03:00
Stas Kelvich	98eebd4682	Merge pull request #4996 from neondatabase/sk/proxy_release Disable neon-pool-opt-in	2023-08-15 18:37:50 +03:00
Arthur Petukhovsky	2f74287c9b	Disable neon-pool-opt-in	2023-08-15 18:34:17 +03:00
Shany Pozin	aee1bf95e3	Merge pull request #4990 from neondatabase/releases/2023-08-15 Release 2023-08-15	2023-08-15 15:34:38 +03:00
Shany Pozin	b9de9d75ff	Merge branch 'release' into releases/2023-08-15	2023-08-15 14:35:00 +03:00
Stas Kelvich	7943b709e6	Merge pull request #4940 from neondatabase/sk/release-23-05-25-proxy-fixup Release: proxy retry fixup	2023-08-09 13:53:19 +03:00
Conrad Ludgate	d7d066d493	proxy: delay auth on retry (#4929 ) ## Problem When an endpoint is shutting down, it can take a few seconds. Currently when starting a new compute, this causes an "endpoint is in transition" error. We need to add delays before retrying to ensure that we allow time for the endpoint to shutdown properly. ## Summary of changes Adds a delay before retrying in auth. connect_to_compute already has this delay	2023-08-09 12:54:24 +03:00
Felix Prasanna	e78ac22107	release fix: revert vm builder bump from 0.13.1 -> 0.15.0-alpha1 (#4932 ) This reverts commit `682dfb3a31`. hotfix for a CLI arg issue in the monitor	2023-08-08 21:08:46 +03:00
Vadim Kharitonov	76a8f2bb44	Merge pull request #4923 from neondatabase/releases/2023-08-08 Release 2023-08-08	2023-08-08 11:44:38 +02:00
Vadim Kharitonov	8d59a8581f	Merge branch 'release' into releases/2023-08-08	2023-08-08 10:54:34 +02:00
Vadim Kharitonov	b1ddd01289	Define NEON_SMGR to make it possible for extensions to use Neon SMG API (#4889 ) Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-08-03 16:28:31 +03:00
Alexander Bayandin	6eae4fc9aa	Release 2023-08-02: update pg_embedding (#4877 ) Cherry-picking `ca4d71a954` from `main` into the `release` Co-authored-by: Vadim Kharitonov <vadim2404@users.noreply.github.com>	2023-08-03 08:48:09 +02:00
Christian Schwarz	765455bca2	Merge pull request #4861 from neondatabase/releases/2023-08-01--2-fix-pipeline ci: fix upload-postgres-extensions-to-s3 job	2023-08-01 13:22:07 +02:00
Christian Schwarz	4204960942	ci: fix upload-postgres-extensions-to-s3 job commit commit `5f8fd640bf` Author: Alek Westover <alek.westover@gmail.com> Date: Wed Jul 26 08:24:03 2023 -0400 Upload Test Remote Extensions (#4792) switched to using the release tag instead of `latest`, but, the `promote-images` job only uploads `latest` to the prod ECR. The switch to using release tag was good in principle, but, reverting that part to make the release pipeine work. Note that a proper fix should abandon use of `:latest` tag at all: currently, if a `main` pipeline runs concurrently with a `release` pipeline, the `release` pipeline may end up using the `main` pipeline's images.	2023-08-01 12:01:45 +02:00
Christian Schwarz	67345d66ea	Merge pull request #4858 from neondatabase/releases/2023-08-01 Release 2023-08-01	2023-08-01 10:44:01 +02:00
Shany Pozin	2266ee5971	Merge pull request #4803 from neondatabase/releases/2023-07-25 Release 2023-07-25	2023-07-25 14:21:07 +03:00
Shany Pozin	b58445d855	Merge pull request #4746 from neondatabase/releases/2023-07-18 Release 2023-07-18	2023-07-18 14:45:39 +03:00
Conrad Ludgate	36050e7f3d	Merge branch 'release' into releases/2023-07-18	2023-07-18 12:00:09 +01:00
Alexander Bayandin	33360ed96d	Merge pull request #4705 from neondatabase/release-2023-07-12 Release 2023-07-12 (only proxy)	2023-07-12 19:44:36 +01:00
Conrad Ludgate	39a28d1108	proxy wake_compute loop (#4675 ) ## Problem If we fail to wake up the compute node, a subsequent connect attempt will definitely fail. However, kubernetes won't fail the connection immediately, instead it hangs until we timeout (10s). ## Summary of changes Refactor the loop to allow fast retries of compute_wake and to skip a connect attempt.	2023-07-12 18:40:11 +01:00
Conrad Ludgate	efa6aa134f	allow repeated IO errors from compute node (#4624 ) ## Problem #4598 compute nodes are not accessible some time after wake up due to kubernetes DNS not being fully propagated. ## Summary of changes Update connect retry mechanism to support handling IO errors and sleeping for 100ms ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-07-12 18:40:06 +01:00
Alexander Bayandin	2c724e56e2	Merge pull request #4646 from neondatabase/releases/2023-07-06-hotfix Release 2023-07-06 (add pg_embedding extension only)	2023-07-06 12:19:52 +01:00
Alexander Bayandin	feff887c6f	Compile `pg_embedding` extension (#4634 ) ``` CREATE EXTENSION embedding; CREATE TABLE t (val real[]); INSERT INTO t (val) VALUES ('{0,0,0}'), ('{1,2,3}'), ('{1,1,1}'), (NULL); CREATE INDEX ON t USING hnsw (val) WITH (maxelements = 10, dims=3, m=3); INSERT INTO t (val) VALUES (array[1,2,4]); SELECT * FROM t ORDER BY val <-> array[3,3,3]; val --------- {1,2,3} {1,2,4} {1,1,1} {0,0,0} (5 rows) ```	2023-07-06 09:39:41 +01:00
Vadim Kharitonov	353d915fcf	Merge pull request #4633 from neondatabase/releases/2023-07-05 Release 2023-07-05	2023-07-05 15:10:47 +02:00
Vadim Kharitonov	2e38098cbc	Merge branch 'release' into releases/2023-07-05	2023-07-05 12:41:48 +02:00
Vadim Kharitonov	a6fe5ea1ac	Merge pull request #4571 from neondatabase/releases/2023-06-27 Release 2023-06-27	2023-06-27 12:55:33 +02:00
Vadim Kharitonov	05b0aed0c1	Merge branch 'release' into releases/2023-06-27	2023-06-27 12:22:12 +02:00
Alex Chi Z	cd1705357d	Merge pull request #4561 from neondatabase/releases/2023-06-23-hotfix Release 2023-06-23 (pageserver-only)	2023-06-23 15:38:50 -04:00
Christian Schwarz	6bc7561290	don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker The consumption metrics synthetic size worker does logical size calculation. Logical size calculation currently does synchronous disk IO. This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving other futures. While there's work on the way to move the synchronous disk IO into spawn_blocking, the quickfix here is to use the BACKGROUND_RUNTIME instead of MGMT_REQUEST_RUNTIME. Actually it's not just a quickfix. We simply shouldn't be blocking MGMT_REQUEST_RUNTIME executor threads on CPU or sync disk IO. That work isn't done yet, as many of the mgmt tasks still _do_ disk IO. But it's not as intensive as the logical size calculations that we're fixing here. While we're at it, fix disk-usage-based eviction in a similar way. It wasn't the culprit here, according to prod logs, but it can theoretically be a little CPU-intensive. More context, including graphs from Prod: https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949 (cherry picked from commit `d6e35222ea`)	2023-06-23 20:54:07 +02:00
Christian Schwarz	fbd3ac14b5	Merge pull request #4544 from neondatabase/releases/2023-06-21-hotfix Release 2023-06-21 (fixup for post-merge failed 2023-06-20)	2023-06-21 16:54:34 +03:00
Christian Schwarz	e437787c8f	cargo update -p openssl (#4542 ) To unblock release https://github.com/neondatabase/neon/pull/4536#issuecomment-1600678054 Context: https://rustsec.org/advisories/RUSTSEC-2023-0044	2023-06-21 15:52:56 +03:00
Christian Schwarz	3460dbf90b	Merge pull request #4536 from neondatabase/releases/2023-06-20 Release 2023-06-20 (actually 2023-06-21)	2023-06-21 14:19:14 +03:00
Vadim Kharitonov	6b89d99677	Merge pull request #4521 from neondatabase/release_2023-06-15 Release 2023 06 15	2023-06-15 17:40:01 +02:00
Vadim Kharitonov	6cc8ea86e4	Merge branch 'main' into release_2023-06-15	2023-06-15 16:50:44 +02:00
Shany Pozin	e62a492d6f	Merge pull request #4486 from neondatabase/releases/2023-06-13 Release 2023-06-13	2023-06-13 15:21:35 +03:00
Alexey Kondratov	a475cdf642	[compute_ctl] Fix logging if catalog updates are skipped (#4480 ) Otherwise, it wasn't clear from the log when Postgres started up completely if catalog updates were skipped. Follow-up for `4936ab6`	2023-06-13 13:37:24 +02:00
Stas Kelvich	7002c79a47	Merge pull request #4447 from neondatabase/release_proxy_08-06-2023 Release proxy 08 06 2023	2023-06-08 21:02:54 +03:00
Vadim Kharitonov	ee6cf357b4	Merge pull request #4427 from neondatabase/releases/2023-06-06 Release 2023-06-06	2023-06-06 14:42:21 +02:00
Vadim Kharitonov	e5c2086b5f	Merge branch 'release' into releases/2023-06-06	2023-06-06 12:33:56 +02:00
Shany Pozin	5f1208296a	Merge pull request #4395 from neondatabase/releases/2023-06-01 Release 2023-06-01	2023-06-01 10:58:00 +03:00
Stas Kelvich	88e8e473cd	Merge pull request #4345 from neondatabase/release-23-05-25-proxy Release 23-05-25, take 3	2023-05-25 19:40:43 +03:00
Stas Kelvich	b0a77844f6	Add SQL-over-HTTP endpoint to Proxy This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON response structure resembling that of the node-postgres driver. This method, using HTTP POST, achieves smaller amortized latencies in edge setups due to fewer round trips and an enhanced open connection reuse by the v8 engine. This update involves several intricacies: 1. SQL injection protection: We employed the extended query protocol, modifying the rust-postgres driver to send queries in one roundtrip using a text protocol rather than binary, bypassing potential issues like those identified in https://github.com/sfackler/rust-postgres/issues/1030. 2. Postgres type compatibility: As not all postgres types have binary representations (e.g., acl's in pg_class), we adjusted rust-postgres to respond with text protocol, simplifying serialization and fixing queries with text-only types in response. 3. Data type conversion: Considering JSON supports fewer data types than Postgres, we perform conversions where possible, passing all other types as strings. Key conversions include: - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain text) - postgres bool, null, text -> json bool, null, string - postgres array -> json array - postgres json and jsonb -> json object 4. Alignment with node-postgres: To facilitate integration with js libraries, we've matched the response structure of node-postgres, returning command tags and column oids. Command tag capturing was added to the rust-postgres functionality as part of this change.	2023-05-25 17:59:17 +03:00
Vadim Kharitonov	1baf464307	Merge pull request #4309 from neondatabase/releases/2023-05-23 Release 2023-05-23	2023-05-24 11:56:54 +02:00
Alexander Bayandin	e9b8e81cea	Merge branch 'release' into releases/2023-05-23	2023-05-23 12:54:08 +01:00
Alexander Bayandin	85d6194aa4	Fix regress-tests job for Postgres 15 on release branch (#4254 ) ## Problem Compatibility tests don't support Postgres 15 yet, but we're still trying to upload compatibility snapshot (which we do not collect). Ref https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129 ## Summary of changes Add `pg_version` parameter to `run-python-test-set` actions and do not upload compatibility snapshot for Postgres 15	2023-05-16 17:19:12 +01:00
Vadim Kharitonov	333a7a68ef	Merge pull request #4245 from neondatabase/releases/2023-05-16 Release 2023-05-16	2023-05-16 13:38:40 +02:00
Vadim Kharitonov	6aa4e41bee	Merge branch 'release' into releases/2023-05-16	2023-05-16 12:48:23 +02:00
Joonas Koivunen	840183e51f	try: higher page_service timeouts to isolate an issue	2023-05-11 16:24:53 +03:00
Shany Pozin	cbccc94b03	Merge pull request #4184 from neondatabase/releases/2023-05-09 Release 2023-05-09	2023-05-09 15:30:36 +03:00
Stas Kelvich	fce227df22	Merge pull request #4163 from neondatabase/main Release 23-05-05	2023-05-05 15:56:23 +03:00
Stas Kelvich	bd787e800f	Merge pull request #4133 from neondatabase/main Release 23-04-01	2023-05-01 18:52:46 +03:00
Shany Pozin	4a7704b4a3	Merge pull request #4131 from neondatabase/sp/hotfix_adding_sks_us_west Hotfix: Adding 4 new pageservers and two sets of safekeepers to us west 2	2023-05-01 15:17:38 +03:00
Shany Pozin	ff1119da66	Add 2 new sets of safekeepers to us-west2	2023-05-01 14:35:31 +03:00
Shany Pozin	4c3ba1627b	Add 4 new Pageservers for retool launch	2023-05-01 14:34:38 +03:00
Vadim Kharitonov	1407174fb2	Merge pull request #4110 from neondatabase/vk/release_2023-04-28 Release 2023 04 28	2023-04-28 17:43:16 +02:00
Vadim Kharitonov	ec9dcb1889	Merge branch 'release' into vk/release_2023-04-28	2023-04-28 16:32:26 +02:00
Joonas Koivunen	d11d781afc	revert: "Add check for duplicates of generated image layers" (#4104 ) This reverts commit `732acc5`. Reverted PR: #3869 As noted in PR #4094, we do in fact try to insert duplicates to the layer map, if L0->L1 compaction is interrupted. We do not have a proper fix for that right now, and we are in a hurry to make a release to production, so revert the changes related to this to the state that we have in production currently. We know that we have a bug here, but better to live with the bug that we've had in production for a long time, than rush a fix to production without testing it in staging first. Cc: #4094, #4088	2023-04-28 16:31:35 +02:00
Anastasia Lubennikova	4e44565b71	Merge pull request #4000 from neondatabase/releases/2023-04-11 Release 2023-04-11	2023-04-11 17:47:41 +03:00
Stas Kelvich	4ed51ad33b	Add more proxy cnames	2023-04-11 15:59:35 +03:00
Arseny Sher	1c1ebe5537	Merge pull request #3946 from neondatabase/releases/2023-04-04 Release 2023-04-04	2023-04-04 14:38:40 +04:00
Christian Schwarz	c19cb7f386	Merge pull request #3935 from neondatabase/releases/2023-04-03 Release 2023-04-03	2023-04-03 16:19:49 +02:00
Vadim Kharitonov	4b97d31b16	Merge pull request #3896 from neondatabase/releases/2023-03-28 Release 2023-03-28	2023-03-28 17:58:06 +04:00
Shany Pozin	923ade3dd7	Merge pull request #3855 from neondatabase/releases/2023-03-21 Release 2023-03-21	2023-03-21 13:12:32 +02:00
Arseny Sher	b04e711975	Merge pull request #3825 from neondatabase/release-2023-03-15 Release 2023.03.15	2023-03-15 15:38:00 +03:00
Arseny Sher	afd0a6b39a	Forward framed read buf contents to compute before proxy pass. Otherwise they get lost. Normally buffer is empty before proxy pass, but this is not the case with pipeline mode of out npm driver; fixes connection hangup introduced by `b80fe41af3` for it. fixes https://github.com/neondatabase/neon/issues/3822	2023-03-15 15:36:06 +04:00
Lassi Pölönen	99752286d8	Use RollingUpdate strategy also for legacy proxy (#3814 ) ## Describe your changes We have previously changed the neon-proxy to use RollingUpdate. This should be enabled in legacy proxy too in order to avoid breaking connections for the clients and allow for example backups to run even during deployment. (https://github.com/neondatabase/neon/pull/3683) ## Issue ticket number and link https://github.com/neondatabase/neon/issues/3333	2023-03-15 15:35:51 +04:00
Arseny Sher	15df93363c	Merge pull request #3804 from neondatabase/release-2023-03-13 Release 2023.03.13	2023-03-13 20:25:40 +03:00
Vadim Kharitonov	bc0ab741af	Merge pull request #3758 from neondatabase/releases/2023-03-07 Release 2023-03-07	2023-03-07 12:38:47 +01:00
Christian Schwarz	51d9dfeaa3	Merge pull request #3743 from neondatabase/releases/2023-03-03 Release 2023-03-03	2023-03-03 19:20:21 +01:00
Shany Pozin	f63cb18155	Merge pull request #3713 from neondatabase/releases/2023-02-28 Release 2023-02-28	2023-02-28 12:52:24 +02:00
Arseny Sher	0de603d88e	Merge pull request #3707 from neondatabase/release-2023-02-24 Release 2023-02-24 Hotfix for UNLOGGED tables. Contains #3706 Also contains rebase on 14.7 and 15.2 #3581	2023-02-25 00:32:11 +04:00
Heikki Linnakangas	240913912a	Fix UNLOGGED tables. Instead of trying to create missing files on the way, send init fork contents as main fork from pageserver during basebackup. Add test for that. Call put_rel_drop for init forks; previously they weren't removed. Bump vendor/postgres to revert previous approach on Postgres side. Co-authored-by: Arseny Sher <sher-ars@yandex.ru> ref https://github.com/neondatabase/postgres/pull/264 ref https://github.com/neondatabase/postgres/pull/259 ref https://github.com/neondatabase/neon/issues/1222	2023-02-24 23:54:53 +04:00
MMeent	91a4ea0de2	Update vendored PostgreSQL versions to 14.7 and 15.2 (#3581 ) ## Describe your changes Rebase vendored PostgreSQL onto 14.7 and 15.2 ## Issue ticket number and link #3579 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [x] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ``` The version of PostgreSQL that we use is updated to 14.7 for PostgreSQL 14 and 15.2 for PostgreSQL 15. ```	2023-02-24 23:54:42 +04:00
Arseny Sher	8608704f49	Merge pull request #3691 from neondatabase/release-2023-02-23 Release 2023-02-23 Hotfix for the unlogged tables with indexes issue. neondatabase/postgres#259 neondatabase/postgres#262	2023-02-23 13:39:33 +04:00
Arseny Sher	efef68ce99	Bump vendor/postgres to include hotfix for unlogged tables with indexes. https://github.com/neondatabase/postgres/pull/259 https://github.com/neondatabase/postgres/pull/262	2023-02-23 08:49:43 +04:00
Joonas Koivunen	8daefd24da	Merge pull request #3679 from neondatabase/releases/2023-02-22 Releases/2023-02-22	2023-02-22 15:56:55 +02:00
Arthur Petukhovsky	46cc8b7982	Remove safekeeper-1.ap-southeast-1.aws.neon.tech (#3671 ) We migrated all timelines to `safekeeper-3.ap-southeast-1.aws.neon.tech`, now old instance can be removed.	2023-02-22 15:07:57 +02:00
Sergey Melnikov	38cd90dd0c	Add -v to ansible invocations (#3670 ) To get more debug output on failures	2023-02-22 15:07:57 +02:00
Joonas Koivunen	a51b269f15	fix: hold permit until GetObject eof (#3663 ) previously we applied the ratelimiting only up to receiving the headers from s3, or somewhere near it. the commit adds an adapter which carries the permit until the AsyncRead has been disposed. fixes #3662.	2023-02-22 15:07:57 +02:00
Joonas Koivunen	43bf6d0a0f	calculate_logical_size: no longer use spawn_blocking (#3664 ) Calculation of logical size is now async because of layer downloads, so we shouldn't use spawn_blocking for it. Use of `spawn_blocking` exhausted resources which are needed by `tokio::io::copy` when copying from a stream to a file which lead to deadlock. Fixes: #3657	2023-02-22 15:07:57 +02:00
Joonas Koivunen	15273a9b66	chore: ignore all compaction inactive tenant errors (#3665 ) these are happening in tests because of #3655 but they sure took some time to appear. makes the `Compaction failed, retrying in 2s: Cannot run compaction iteration on inactive tenant` into a globally allowed error, because it has been seen failing on different test cases.	2023-02-22 15:07:57 +02:00
Joonas Koivunen	78aca668d0	fix: log download failed error (#3661 ) Fixes #3659	2023-02-22 15:07:57 +02:00
Vadim Kharitonov	acbf4148ea	Merge pull request #3656 from neondatabase/releases/2023-02-21 Release 2023-02-21	2023-02-21 16:03:48 +01:00
Vadim Kharitonov	6508540561	Merge branch 'release' into releases/2023-02-21	2023-02-21 15:31:16 +01:00
Arthur Petukhovsky	a41b5244a8	Add new safekeeper to ap-southeast-1 prod (#3645 ) (#3646 ) To trigger deployment of #3645 to production.	2023-02-20 15:22:49 +00:00
Shany Pozin	2b3189be95	Merge pull request #3600 from neondatabase/releases/2023-02-14 Release 2023-02-14	2023-02-15 13:31:30 +02:00
Vadim Kharitonov	248563c595	Merge pull request #3553 from neondatabase/releases/2023-02-07 Release 2023-02-07	2023-02-07 14:07:44 +01:00
Vadim Kharitonov	14cd6ca933	Merge branch 'release' into releases/2023-02-07	2023-02-07 12:11:56 +01:00
Vadim Kharitonov	eb36403e71	Release 2023 01 31 (#3497 ) Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com> Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com> Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Sergey Melnikov <sergey@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com> Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>	2023-01-31 15:06:35 +02:00
Anastasia Lubennikova	3c6f779698	Merge pull request #3411 from neondatabase/release_2023_01_23 Fix Release 2023 01 23	2023-01-23 20:10:03 +02:00
Joonas Koivunen	f67f0c1c11	More tenant size fixes (#3410 ) Small changes, but hopefully this will help with the panic detected in staging, for which we cannot get the debugging information right now (end-of-branch before branch-point).	2023-01-23 17:46:13 +02:00
Shany Pozin	edb02d3299	Adding pageserver3 to staging (#3403 )	2023-01-23 17:46:13 +02:00
Konstantin Knizhnik	664a69e65b	Fix slru_segment_key_range function: segno was assigned to incorrect Key field (#3354 )	2023-01-23 17:46:13 +02:00
Anastasia Lubennikova	478322ebf9	Fix tenant size orphans (#3377 ) Before only the timelines which have passed the `gc_horizon` were processed which failed with orphans at the tree_sort phase. Example input in added `test_branched_empty_timeline_size` test case. The PR changes iteration to happen through all timelines, and in addition to that, any learned branch points will be calculated as they would had been in the original implementation if the ancestor branch had been over the `gc_horizon`. This also changes how tenants where all timelines are below `gc_horizon` are handled. Previously tenant_size 0 was returned, but now they will have approximately `initdb_lsn` worth of tenant_size. The PR also adds several new tenant size tests that describe various corner cases of branching structure and `gc_horizon` setting. They are currently disabled to not consume time during CI. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-01-23 17:46:13 +02:00
Joonas Koivunen	802f174072	fix: dont stop pageserver if we fail to calculate synthetic size	2023-01-23 17:46:13 +02:00
Alexey Kondratov	47f9890bae	[compute_ctl] Make role deletion spec processing idempotent (#3380 ) Previously, we were trying to re-assign owned objects of the already deleted role. This were causing a crash loop in the case when compute was restarted with a spec that includes delta operation for role deletion. To avoid such cases, check that role is still present before calling `reassign_owned_objects`. Resolves neondatabase/cloud#3553	2023-01-23 17:46:13 +02:00
Christian Schwarz	262265daad	Revert "Use actual temporary dir for pageserver unit tests" This reverts commit `826e89b9ce`. The problem with that commit was that it deletes the TempDir while there are still EphemeralFile instances open. At first I thought this could be fixed by simply adding Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None)) to TenantHarness::drop, but it turned out to be insufficient. So, reverting the commit until we find a proper solution. refs https://github.com/neondatabase/neon/issues/3385	2023-01-23 17:46:13 +02:00
bojanserafimov	300da5b872	Improve layer map docstrings (#3382 )	2023-01-23 17:46:13 +02:00
Heikki Linnakangas	7b22b5c433	Switch to 'tracing' for logging, restructure code to make use of spans. Refactors Compute::prepare_and_run. It's split into subroutines differently, to make it easier to attach tracing spans to the different stages. The high-level logic for waiting for Postgres to exit is moved to the caller. Replace 'env_logger' with 'tracing', and add `#instrument` directives to different stages fo the startup process. This is a fairly mechanical change, except for the changes in 'spec.rs'. 'spec.rs' contained some complicated formatting, where parts of log messages were printed directly to stdout with `print`s. That was a bit messed up because the log normally goes to stderr, but those lines were printed to stdout. In our docker images, stderr and stdout both go to the same place so you wouldn't notice, but I don't think it was intentional. This changes the log format to the default 'tracing_subscriber::format' format. It's different from the Postgres log format, however, and because both compute_tools and Postgres print to the same log, it's now a mix of two different formats. I'm not sure how the Grafana log parsing pipeline can handle that. If it's a problem, we can build custom formatter to change the compute_tools log format to be the same as Postgres's, like it was before this commit, or we can change the Postgres log format to match tracing_formatter's, or we can start printing compute_tool's log output to a different destination than Postgres	2023-01-23 17:46:12 +02:00
Kirill Bulatov	ffca97bc1e	Enable logs in unit tests	2023-01-23 17:46:12 +02:00
Kirill Bulatov	cb356f3259	Use actual temporary dir for pageserver unit tests	2023-01-23 17:46:12 +02:00
Vadim Kharitonov	c85374295f	Change SENTRY_ENVIRONMENT from "development" to "staging"	2023-01-23 17:46:12 +02:00
Anastasia Lubennikova	4992160677	Fix metric_collection_endpoint for prod. It was incorrectly set to staging url	2023-01-23 17:46:12 +02:00
Heikki Linnakangas	bd535b3371	If an error happens while checking for core dumps, don't panic. If we panic, we skip the 30s wait in 'main', and don't give the console a chance to observe the error. Which is not nice. Spotted by @ololobus at https://github.com/neondatabase/neon/pull/3352#discussion_r1072806981	2023-01-23 17:46:12 +02:00
Kirill Bulatov	d90c5a03af	Add more io::Error context when fail to operate on a path (#3254 ) I have a test failure that shows ``` Caused by: 0: Failed to reconstruct a page image: 1: Directory not empty (os error 39) ``` but does not really show where exactly that happens. https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d The PR aims to add more context in debugging that issue.	2023-01-23 17:46:12 +02:00
Anastasia Lubennikova	2d02cc9079	Merge pull request #3365 from neondatabase/main Release 2023-01-17	2023-01-17 16:41:34 +02:00
Christian Schwarz	49ad94b99f	Merge pull request #3301 from neondatabase/release-2023-01-10 Release 2023-01-10	2023-01-10 16:42:26 +01:00
Christian Schwarz	948a217398	Merge commit '95bf19b85a06b27a7fc3118dee03d48648efab15' into release-2023-01-10 Conflicts: .github/helm-values/neon-stress.proxy-scram.yaml .github/helm-values/neon-stress.proxy.yaml .github/helm-values/staging.proxy-scram.yaml .github/helm-values/staging.proxy.yaml All of the above were deleted in `main` after we hotfixed them in `release. Deleting them here storage_broker/src/bin/storage_broker.rs Hotfix toned down logging, but `main` has sinced implemented a proper fix. Taken `main`'s side, see https://neondb.slack.com/archives/C033RQ5SPDH/p1673354385387479?thread_ts=1673354306.474729&cid=C033RQ5SPDH closes https://github.com/neondatabase/neon/issues/3287	2023-01-10 15:40:14 +01:00
Dmitry Rodionov	125381eae7	Merge pull request #3236 from neondatabase/dkr/retrofit-sk4-sk4-change Move zenith-1-sk-3 to zenith-1-sk-4 (#3164)	2022-12-30 14:13:50 +03:00
Arthur Petukhovsky	cd01bbc715	Move zenith-1-sk-3 to zenith-1-sk-4 (#3164 )	2022-12-30 12:32:52 +02:00
Dmitry Rodionov	d8b5e3b88d	Merge pull request #3229 from neondatabase/dkr/add-pageserver-for-release add pageserver to new region see https://github.com/neondatabase/aws/pull/116 decrease log volume for pageserver	2022-12-30 12:34:04 +03:00
Dmitry Rodionov	06d25f2186	switch to debug from info to produce less noise	2022-12-29 17:48:47 +02:00
Dmitry Rodionov	f759b561f3	add pageserver to new region see https://github.com/neondatabase/aws/pull/116	2022-12-29 17:17:35 +02:00
Sergey Melnikov	ece0555600	Push proxy metrics to Victoria Metrics (#3106 )	2022-12-16 14:44:49 +02:00
Joonas Koivunen	73ea0a0b01	fix(remote_storage): use cached credentials (#3128 ) IMDSv2 has limits, and if we query it on every s3 interaction we are going to go over those limits. Changes the s3_bucket client configuration to use: - ChainCredentialsProvider to handle env variables or imds usage - LazyCachingCredentialsProvider to actually cache any credentials Related: https://github.com/awslabs/aws-sdk-rust/issues/629 Possibly related: https://github.com/neondatabase/neon/issues/3118	2022-12-16 14:44:49 +02:00
Arseny Sher	d8f6d6fd6f	Merge pull request #3126 from neondatabase/broker-lb-release Deploy broker with L4 LB in new env.	2022-12-16 01:25:28 +03:00
Arseny Sher	d24de169a7	Deploy broker with L4 LB in new env. Seems to be fixing issue with missing keepalives.	2022-12-16 01:45:32 +04:00
Arseny Sher	0816168296	Hotfix: terminate subscription if channel is full. Might help as a hotfix, but need to understand root better.	2022-12-15 12:23:56 +03:00
Dmitry Rodionov	277b44d57a	Merge pull request #3102 from neondatabase/main Hotfix. See commits for details	2022-12-14 19:38:43 +03:00
MMeent	68c2c3880e	Merge pull request #3038 from neondatabase/main Release 22-12-14	2022-12-14 14:35:47 +01:00
Arthur Petukhovsky	49da498f65	Merge pull request #2833 from neondatabase/main Release 2022-11-16	2022-11-17 08:44:10 +01:00
Stas Kelvich	2c76ba3dd7	Merge pull request #2718 from neondatabase/main-rc-22-10-28 Release 22-10-28	2022-10-28 20:33:56 +03:00
Arseny Sher	dbe3dc69ad	Merge branch 'main' into main-rc-22-10-28 Release 22-10-28.	2022-10-28 19:10:11 +04:00
Arseny Sher	8e5bb3ed49	Enable etcd compaction in neon_local.	2022-10-27 12:53:20 +03:00
Stas Kelvich	ab0be7b8da	Avoid debian-testing packages in compute Dockerfiles plv8 can only be built with a fairly new gold linker version. We used to install it via binutils packages from testing, but it also updates libc and that causes troubles in the resulting image as different extensions were built against different libc versions. We could either use libc from debian-testing everywhere or restrain from using testing packages and install necessary programs manually. This patch uses the latter approach: gold for plv8 and cmake for h3 are installed manually. In a passing declare h3_postgis as a safe extension (previous omission).	2022-10-27 12:53:20 +03:00
bojanserafimov	b4c55f5d24	Move pagestream api to libs/pageserver_api (#2698 )	2022-10-27 12:53:20 +03:00
mikecaat	ede70d833c	Add a docker-compose example file (#1943 ) (#2666 ) Co-authored-by: Masahiro Ikeda <masahiro.ikeda.us@hco.ntt.co.jp>	2022-10-27 12:53:20 +03:00
Sergey Melnikov	70c3d18bb0	Do not release to new staging proxies on release (#2685 )	2022-10-27 12:53:20 +03:00
bojanserafimov	7a491f52c4	Add draw_timeline binary (#2688 )	2022-10-27 12:53:20 +03:00
Alexander Bayandin	323c4ecb4f	Add data format backward compatibility tests (#2626 )	2022-10-27 12:53:20 +03:00
Anastasia Lubennikova	3d2466607e	Merge pull request #2692 from neondatabase/main-rc Release 2022-10-25	2022-10-25 18:18:58 +03:00
Anastasia Lubennikova	ed478b39f4	Merge branch 'release' into main-rc	2022-10-25 17:06:33 +03:00
Stas Kelvich	91585a558d	Merge pull request #2678 from neondatabase/stas/hotfix_schema Hotfix to disable grant create on public schema	2022-10-22 02:54:31 +03:00
Stas Kelvich	93467eae1f	Hotfix to disable grant create on public schema `GRANT CREATE ON SCHEMA public` fails if there is no schema `public`. Disable it in release for now and make a better fix later (it is needed for v15 support).	2022-10-22 02:26:28 +03:00
Stas Kelvich	f3aac81d19	Merge pull request #2668 from neondatabase/main Release 2022-10-21	2022-10-21 15:21:42 +03:00
Stas Kelvich	979ad60c19	Merge pull request #2581 from neondatabase/main Release 2022-10-07	2022-10-07 16:50:55 +03:00
Stas Kelvich	9316cb1b1f	Merge pull request #2573 from neondatabase/main Release 2022-10-06	2022-10-07 11:07:06 +03:00
Anastasia Lubennikova	e7939a527a	Merge pull request #2377 from neondatabase/main Release 2022-09-01	2022-09-01 20:20:44 +03:00
Arthur Petukhovsky	36d26665e1	Merge pull request #2299 from neondatabase/main * Check for entire range during sasl validation (#2281) * Gen2 GH runner (#2128) * Re-add rustup override * Try s3 bucket * Set git version * Use v4 cache key to prevent problems * Switch to v5 for key * Add second rustup fix * Rebase * Add kaniko steps * Fix typo and set compress level * Disable global run default * Specify shell for step * Change approach with kaniko * Try less verbose shell spec * Add submodule pull * Add promote step * Adjust dependency chain * Try default swap again * Use env * Don't override aws key * Make kaniko build conditional * Specify runs on * Try without dependency link * Try soft fail * Use image with git * Try passing to next step * Fix duplicate * Try other approach * Try other approach * Fix typo * Try other syntax * Set env * Adjust setup * Try step 1 * Add link * Try global env * Fix mistake * Debug * Try other syntax * Try other approach * Change order * Move output one step down * Put output up one level * Try other syntax * Skip build * Try output * Re-enable build * Try other syntax * Skip middle step * Update check * Try first step of dockerhub push * Update needs dependency * Try explicit dir * Add missing package * Try other approach * Try other approach * Specify region * Use with * Try other approach * Add debug * Try other approach * Set region * Follow AWS example * Try github approach * Skip Qemu * Try stdin * Missing steps * Add missing close * Add echo debug * Try v2 endpoint * Use v1 endpoint * Try without quotes * Revert * Try crane * Add debug * Split steps * Fix duplicate * Add shell step * Conform to options * Add verbose flag * Try single step * Try workaround * First request fails hunch * Try bullseye image * Try other approach * Adjust verbose level * Try previous step * Add more debug * Remove debug step * Remove rogue indent * Try with larger image * Add build tag step * Update workflow for testing * Add tag step for test * Remove unused * Update dependency chain * Add ownership fix * Use matrix for promote * Force update * Force build * Remove unused * Add new image * Add missing argument * Update dockerfile copy * Update Dockerfile * Update clone * Update dockerfile * Go to correct folder * Use correct format * Update dockerfile * Remove cd * Debug find where we are * Add debug on first step * Changedir to postgres * Set workdir * Use v1 approach * Use other dependency * Try other approach * Try other approach * Update dockerfile * Update approach * Update dockerfile * Update approach * Update dockerfile * Update dockerfile * Add workspace hack * Update Dockerfile * Update Dockerfile * Update Dockerfile * Change last step * Cleanup pull in prep for review * Force build images * Add condition for latest tagging * Use pinned version * Try without name value * Remove more names * Shorten names * Add kaniko comments * Pin kaniko * Pin crane and ecr helper * Up one level * Switch to pinned tag for rust image * Force update for test Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box> * Add missing step output, revert one deploy step (#2285) * Add missing step output, revert one deploy step * Conform to syntax * Update approach * Add missing value * Add missing needs Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Error for fatal not git repo (#2286) Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Use main, not branch for ref check (#2288) * Use main, not branch for ref check * Add more debug * Count main, not head * Try new approach * Conform to syntax * Update approach * Get full history * Skip checkout * Cleanup debug * Remove more debug Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Fix docker zombie process issue (#2289) * Fix docker zombie process issue * Init everywhere Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Fix 1.63 clippy lints (#2282) * split out timeline metrics, track layer map loading and size calculation * reset rust cache for clippy run to avoid an ICE additionally remove trailing whitespaces * Rename pg_control_ffi.h to bindgen_deps.h, for clarity. The pg_control_ffi.h name implies that it only includes stuff related to pg_control.h. That's mostly true currently, but really the point of the file is to include everything that we need to generate Rust definitions from. * Make local mypy behave like CI mypy (#2291) * Fix flaky pageserver restarts in tests (#2261) * Remove extra type aliases (#2280) * Update cachepot endpoint (#2290) * Update cachepot endpoint * Update dockerfile & remove env * Update image building process * Cannot use metadata endpoint for this * Update workflow * Conform to kaniko syntax * Update syntax * Update approach * Update dockerfiles * Force update * Update dockerfiles * Update dockerfile * Cleanup dockerfiles * Update s3 test location * Revert s3 experiment * Add more debug * Specify aws region * Remove debug, add prefix * Remove one more debug Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * workflows/benchmarking: increase timeout (#2294) * Rework `init` in pageserver CLI (#2272) * Do not create initial tenant and timeline (adjust Python tests for that) * Rework config handling during init, add --update-config to manage local config updates * Fix: Always build images (#2296) * Always build images * Remove unused Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> * Move auto-generated 'bindings' to a separate inner module. Re-export only things that are used by other modules. In the future, I'm imagining that we run bindgen twice, for Postgres v14 and v15. The two sets of bindings would go into separate 'bindings_v14' and 'bindings_v15' modules. Rearrange postgres_ffi modules. Move function, to avoid Postgres version dependency in timelines.rs Move function to generate a logical-message WAL record to postgres_ffi. * fix cargo test * Fix walreceiver and safekeeper bugs (#2295) - There was an issue with zero commit_lsn `reason: LaggingWal { current_commit_lsn: 0/0, new_commit_lsn: 1/6FD90D38, threshold: 10485760 } }`. The problem was in `send_wal.rs`, where we initialized `end_pos = Lsn(0)` and in some cases sent it to the pageserver. - IDENTIFY_SYSTEM previously returned `flush_lsn` as a physical end of WAL. Now it returns `flush_lsn` (as it was) to walproposer and `commit_lsn` to everyone else including pageserver. - There was an issue with backoff where connection was cancelled right after initialization: `connected!` -> `safekeeper_handle_db: Connection cancelled` -> `Backoff: waiting 3 seconds`. The problem was in sleeping before establishing the connection. This is fixed by reworking retry logic. - There was an issue with getting `NoKeepAlives` reason in a loop. The issue is probably the same as the previous. - There was an issue with filtering safekeepers based on retry attempts, which could filter some safekeepers indefinetely. This is fixed by using retry cooldown duration instead of retry attempts. - Some `send_wal.rs` connections failed with errors without context. This is fixed by adding a timeline to safekeepers errors. New retry logic works like this: - Every candidate has a `next_retry_at` timestamp and is not considered for connection until that moment - When walreceiver connection is closed, we update `next_retry_at` using exponential backoff, increasing the cooldown on every disconnect. - When `last_record_lsn` was advanced using the WAL from the safekeeper, we reset the retry cooldown and exponential backoff, allowing walreceiver to reconnect to the same safekeeper instantly. * on safekeeper registration pass availability zone param (#2292) Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Rory de Zoete <33318916+zoete@users.noreply.github.com> Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@b04468bf-cdf4-41eb-9c94-aff4ca55e4bf.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@4795e9ee-4f32-401f-85f3-f316263b62b8.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@2f8bc4e5-4ec2-4ea2-adb1-65d863c4a558.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@27565b2b-72d5-4742-9898-a26c9033e6f9.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@ecc96c26-c6c4-4664-be6e-34f7c3f89a3c.fritz.box> Co-authored-by: Rory de Zoete <rdezoete@7caff3a5-bf03-4202-bd0e-f1a93c86bdae.fritz.box> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Anton Galitsyn <agalitsyn@users.noreply.github.com>	2022-08-18 15:32:33 +03:00
Arthur Petukhovsky	873347f977	Merge pull request #2275 from neondatabase/main * github/workflows: Fix git dubious ownership (#2223) * Move relation size cache from WalIngest to DatadirTimeline (#2094) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * refactor: replace lazy-static with once-cell (#2195) - Replacing all the occurrences of lazy-static with `once-cell::sync::Lazy` - fixes #1147 Signed-off-by: Ankur Srivastava <best.ankur@gmail.com> * Add more buckets to pageserver latency metrics (#2225) * ignore record property warning to fix benchmarks * increase statement timeout * use event so it fires only if workload thread successfully finished * remove debug log * increase timeout to pass test with real s3 * avoid duplicate parameter, increase timeout * Major migration script (#2073) This script can be used to migrate a tenant across breaking storage versions, or (in the future) upgrading postgres versions. See the comment at the top for an overview. Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> * Fix etcd typos * Fix links to safekeeper protocol docs. (#2188) safekeeper/README_PROTO.md was moved to docs/safekeeper-protocol.md in commit `0b14fdb078`, as part of reorganizing the docs into 'mdbook' format. Fixes issue #1475. Thanks to @banks for spotting the outdated references. In addition to fixing the above issue, this patch also fixes other broken links as a result of `0b14fdb078`. See https://github.com/neondatabase/neon/pull/2188#pullrequestreview-1055918480. Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech> * Update CONTRIBUTING.md * Update CONTRIBUTING.md * support node id and remote storage params in docker_entrypoint.sh * Safe truncate (#2218) * Move relation sie cache to layered timeline * Fix obtaining current LSN for relation size cache * Resolve merge conflicts * Resolve merge conflicts * Reestore 'lsn' field in DatadirModification * adjust DatadirModification lsn in ingest_record * Fix formatting * Pass lsn to get_relsize * Fix merge conflict * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Update pageserver/src/pgdatadir_mapping.rs Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Check if relation exists before trying to truncat it refer #1932 * Add test reporducing FSM truncate problem Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> * Fix exponential backoff values * Update back `vendor/postgres` back; it was changed accidentally. (#2251) Commit `4227cfc96e` accidentally reverted vendor/postgres to an older version. Update it back. * Add pageserver checkpoint_timeout option. To flush inmemory layer eventually when no new data arrives, which helps safekeepers to suspend activity (stop pushing to the broker). Default 10m should be ok. * Share exponential backoff code and fix logic for delete task failure (#2252) * Fix bug when import large (>1GB) relations (#2172) Resolves #2097 - use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get` - update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. + `small` is the old test + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build * Fix timeline physical size flaky tests (#2244) Resolves #2212. - use `wait_for_last_flush_lsn` in `test_timeline_physical_size_` tests ## Context Need to wait for the pageserver to catch up with the compute's last flush LSN because during the timeline physical size API call, it's possible that there are running `LayerFlushThread` threads. These threads flush new layers into disk and hence update the physical size. This results in a mismatch between the physical size reported by the API and the actual physical size on disk. ### Note The `LayerFlushThread` threads are processed concurrently, so it's possible that the above error still persists even with this patch. However, making the tests wait to finish processing all the WALs (not flushing) before calculating the physical size should help reduce the "flakiness" significantly postgres_ffi/waldecoder: validate more header fields * postgres_ffi/waldecoder: remove unused startlsn * postgres_ffi/waldecoder: introduce explicit `enum State` Previously it was emulated with a combination of nullable fields. This change should make the logic more readable. * disable `test_import_from_pageserver_multisegment` (#2258) This test failed consistently on `main` now. It's better to temporarily disable it to avoid blocking others' PRs while investigating the root cause for the test failure. See: #2255, #2256 * get_binaries uses DOCKER_TAG taken from docker image build step (#2260) * [proxy] Rework wire format of the password hack and some errors (#2236) The new format has a few benefits: it's shorter, simpler and human-readable as well. We don't use base64 anymore, since url encoding got us covered. We also show a better error in case we couldn't parse the payload; the users should know it's all about passing the correct project name. * test_runner/pg_clients: collect docker logs (#2259) * get_binaries script fix (#2263) * get_binaries uses DOCKER_TAG taken from docker image build step * remove docker tag discovery at all and fix get_binaries for version variable * Better storage sync logs (#2268) * Find end of WAL on safekeepers using WalStreamDecoder. We could make it inside wal_storage.rs, but taking into account that - wal_storage.rs reading is async - we don't need s3 here - error handling is different; error during decoding is normal I decided to put it separately. Test cargo test test_find_end_of_wal_last_crossing_segment prepared earlier by @yeputons passes now. Fixes https://github.com/neondatabase/neon/issues/544 https://github.com/neondatabase/cloud/issues/2004 Supersedes https://github.com/neondatabase/neon/pull/2066 * Improve walreceiver logic (#2253) This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers. - There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237 - Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down. - Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second. - `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast. - `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck. * increase timeout in wait_for_upload to avoid spurious failures when testing with real s3 * Bump vendor/postgres to include XLP_FIRST_IS_CONTRECORD fix. (#2274) * Set up a workflow to run pgbench against captest (#2077) Signed-off-by: Ankur Srivastava <best.ankur@gmail.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@zenith.tech> Co-authored-by: Ankur Srivastava <ansrivas@users.noreply.github.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Thang Pham <thang@neon.tech> Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com> Co-authored-by: Arseny Sher <sher-ars@yandex.ru> Co-authored-by: Egor Suvorov <egor@neon.tech> Co-authored-by: Andrey Taranik <andrey@cicd.team> Co-authored-by: Dmitry Ivanov <ivadmi5@gmail.com>	2022-08-15 21:30:45 +03:00
Arthur Petukhovsky	e814ac16f9	Merge pull request #2219 from neondatabase/main Release 2022-08-04	2022-08-04 20:06:34 +03:00
Heikki Linnakangas	ad3055d386	Merge pull request #2203 from neondatabase/release-uuid-ossp Deploy new storage and compute version to production Release 2022-08-02	2022-08-02 15:08:14 +03:00
Heikki Linnakangas	94e03eb452	Merge remote-tracking branch 'origin/main' into 'release' Release 2022-08-01	2022-08-02 12:43:49 +03:00
Sergey Melnikov	380f26ef79	Merge pull request #2170 from neondatabase/main (Release 2022-07-28) Release 2022-07-28	2022-07-28 14:16:52 +03:00
Arthur Petukhovsky	3c5b7f59d7	Merge pull request #2119 from neondatabase/main Release 2022-07-19	2022-07-19 11:58:48 +03:00
Arthur Petukhovsky	fee89f80b5	Merge pull request #2115 from neondatabase/main-2022-07-18 Release 2022-07-18	2022-07-18 19:21:11 +03:00
Arthur Petukhovsky	41cce8eaf1	Merge remote-tracking branch 'origin/release' into main-2022-07-18	2022-07-18 18:21:20 +03:00
Alexey Kondratov	f88fe0218d	Merge pull request #1842 from neondatabase/release-deploy-hotfix [HOTFIX] Release deploy fix This PR uses this branch neondatabase/postgres#171 and several required commits from the main to use only locally built compute-tools. This should allow us to rollout safekeepers sync issue fix on prod	2022-06-01 11:04:30 +03:00
Alexey Kondratov	cc856eca85	Install missing openssl packages in the Github Actions workflow	2022-05-31 21:31:31 +02:00
Alexey Kondratov	cf350c6002	Use :local compute-tools tag to build compute-node image	2022-05-31 21:31:16 +02:00
Arseny Sher	0ce6b6a0a3	Merge pull request #1836 from neondatabase/release-hotfix-basebackup-lsn-page-boundary Bump vendor/postgres to hotfix basebackup LSN comparison.	2022-05-31 16:54:03 +04:00
Arseny Sher	73f247d537	Bump vendor/postgres to hotfix basebackup LSN comparison.	2022-05-31 16:00:50 +04:00
Andrey Taranik	960be82183	Merge pull request #1792 from neondatabase/main Release 2202-05-25 (second)	2022-05-25 16:37:57 +03:00
Andrey Taranik	806e5a6c19	Merge pull request #1787 from neondatabase/main Release 2022-05-25	2022-05-25 13:34:11 +03:00
Alexey Kondratov	8d5df07cce	Merge pull request #1385 from zenithdb/main Release main 2022-03-22	2022-03-22 05:04:34 -05:00
Andrey Taranik	df7a9d1407	release fix 2022-03-16 (#1375 )	2022-03-17 00:43:28 +03:00

1325 changed files with 151670 additions and 46211 deletions

									
										10

.cargo/config.toml
									
												View File
												
				@@ -3,6 +3,16 @@

				# by the RUSTDOCFLAGS env var in CI.

				rustdocflags = ["-Arustdoc::private_intra_doc_links"]

				# Enable frame pointers. This may have a minor performance overhead, but makes it easier and more

				# efficient to obtain stack traces (and thus CPU/heap profiles). It may also avoid seg faults that

				# we've seen with libunwind-based profiling. See also:

				#

				# * <https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>

				# * <https://github.com/rust-lang/rust/pull/122646>

				#

				# NB: the RUSTFLAGS envvar will replace this. Make sure to update e.g. Dockerfile as well.

				rustflags = ["-Cforce-frame-pointers=yes"]

				[alias]

				build_testing = ["build", "--features", "testing"]

				neon = ["run", "--bin", "neon_local"]

									
										3

.config/hakari.toml
									
												View File
												
				@@ -46,6 +46,9 @@ workspace-members = [

				    "utils",

				    "wal_craft",

				    "walproposer",

				    "postgres-protocol2",

				    "postgres-types2",

				    "tokio-postgres2",

				]

				# Write out exact versions rather than a semver range. (Defaults to false.)

8

.dockerignore

View File

@@ -5,26 +5,24 @@
 !Cargo.toml
 !Makefile
 !rust-toolchain.toml
 !scripts/combine_control_files.py
 !scripts/ninstall.sh
 !vm-cgconfig.conf
 !docker-compose/run-tests.sh
 # Directories
 !.cargo/
 !.config/
 !compute/
 !compute_tools/
 !control_plane/
 !docker-compose/ext-src
 !libs/
 !neon_local/
 !pageserver/
 !patches/
 !pgxn/
 !proxy/
 !storage_scrubber/
 !safekeeper/
 !storage_broker/
 !storage_controller/
 !trace/
 !vendor/postgres-*/
 !workspace_hack/
 !build_tools/patches

1

.github/ISSUE_TEMPLATE/bug-template.md vendored

View File

@@ -3,6 +3,7 @@ name: Bug Template
 about: Used for describing bugs
 title: ''
 labels: t/bug
 type: Bug
 assignees: ''
 ---

1

.github/ISSUE_TEMPLATE/epic-template.md vendored

View File

@@ -4,6 +4,7 @@ about: A set of related tasks contributing towards specific outcome, comprising
   more than 1 week of work.
 title: 'Epic: '
 labels: t/Epic
 type: Epic
 assignees: ''
 ---

									
										21

.github/PULL_REQUEST_TEMPLATE/release-pr.md
									
										vendored
									
												View File
											
				@@ -1,21 +0,0 @@

				## Release 202Y-MM-DD

				**NB: this PR must be merged only by 'Create a merge commit'!**

				### Checklist when preparing for release

				- [ ] Read or refresh [the release flow guide](https://www.notion.so/neondatabase/Release-general-flow-61f2e39fd45d4d14a70c7749604bd70b)

				- [ ] Ask in the [cloud Slack channel](https://neondb.slack.com/archives/C033A2WE6BZ) that you are going to rollout the release. Any blockers?

				- [ ] Does this release contain any db migrations? Destructive ones? What is the rollback plan?

				<!-- List everything that should be done **before** release, any issues / setting changes / etc -->

				### Checklist after release

				- [ ] Make sure instructions from PRs included in this release and labeled `manual_release_instructions` are executed (either by you or by people who wrote them).

				- [ ] Based on the merged commits write release notes and open a PR into `website` repo ([example](https://github.com/neondatabase/website/pull/219/files))

				- [ ] Check [#dev-production-stream](https://neondb.slack.com/archives/C03F5SM1N02) Slack channel

				- [ ] Check [stuck projects page](https://console.neon.tech/admin/projects?sort=last_active&order=desc&stuck=true)

				- [ ] Check [recent operation failures](https://console.neon.tech/admin/operations?action=create_timeline%2Cstart_compute%2Cstop_compute%2Csuspend_compute%2Capply_config%2Cdelete_timeline%2Cdelete_tenant%2Ccreate_branch%2Ccheck_availability&sort=updated_at&order=desc&had_retries=some)

				- [ ] Check [cloud SLO dashboard](https://neonprod.grafana.net/d/_oWcBMJ7k/cloud-slos?orgId=1)

				- [ ] Check [compute startup metrics dashboard](https://neonprod.grafana.net/d/5OkYJEmVz/compute-startup-time)

				<!-- List everything that should be done **after** release, any admin UI configuration / Grafana dashboard / alert changes / setting changes / etc -->

									
										20

.github/actionlint.yml
									
										vendored
									
												View File
												
				@@ -4,9 +4,11 @@ self-hosted-runner:

				    - large

				    - large-arm64

				    - small

				    - small-metal

				    - small-arm64

				    - us-east-2

				config-variables:

				  - AWS_ECR_REGION

				  - AZURE_DEV_CLIENT_ID

				  - AZURE_DEV_REGISTRY_NAME

				  - AZURE_DEV_SUBSCRIPTION_ID

				@@ -14,9 +16,25 @@ config-variables:

				  - AZURE_PROD_REGISTRY_NAME

				  - AZURE_PROD_SUBSCRIPTION_ID

				  - AZURE_TENANT_ID

				  - BENCHMARK_INGEST_TARGET_PROJECTID

				  - BENCHMARK_LARGE_OLTP_PROJECTID

				  - BENCHMARK_PROJECT_ID_PUB

				  - BENCHMARK_PROJECT_ID_SUB

				  - DEV_AWS_OIDC_ROLE_ARN

				  - DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN

				  - HETZNER_CACHE_BUCKET

				  - HETZNER_CACHE_ENDPOINT

				  - HETZNER_CACHE_REGION

				  - NEON_DEV_AWS_ACCOUNT_ID

				  - NEON_PROD_AWS_ACCOUNT_ID

				  - PGREGRESS_PG16_PROJECT_ID

				  - PGREGRESS_PG17_PROJECT_ID

				  - REMOTE_STORAGE_AZURE_CONTAINER

				  - REMOTE_STORAGE_AZURE_REGION

				  - SLACK_CICD_CHANNEL_ID

				  - SLACK_ON_CALL_DEVPROD_STREAM

				  - SLACK_ON_CALL_QA_STAGING_STREAM

				  - SLACK_ON_CALL_STORAGE_STAGING_STREAM

				  - SLACK_RUST_CHANNEL_ID

				  - SLACK_STORAGE_CHANNEL_ID

				  - SLACK_UPCOMING_RELEASE_CHANNEL_ID

				  - DEV_AWS_OIDC_ROLE_ARN

									
										29

.github/actions/allure-report-generate/action.yml
									
										vendored
									
												View File
												
				@@ -7,6 +7,9 @@ inputs:

				    type: boolean

				    required: false

				    default: false

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				outputs:

				  base-url:

				@@ -35,11 +38,14 @@ runs:

				    #

				    - name: Set variables

				      shell: bash -euxo pipefail {0}

				      env:

				        PR_NUMBER: ${{ github.event.pull_request.number }}

				        BUCKET: neon-github-public-dev

				      run: |

				        PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)

				        if [ "${PR_NUMBER}" != "null" ]; then

				        if [ -n "${PR_NUMBER}" ]; then

				          BRANCH_OR_PR=pr-${PR_NUMBER}

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \

				             [ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then

				          # Shortcut for special branches

				          BRANCH_OR_PR=${GITHUB_REF_NAME}

				        else

				@@ -55,8 +61,6 @@ runs:

				        echo "LOCK_FILE=${LOCK_FILE}"       >> $GITHUB_ENV

				        echo "WORKDIR=${WORKDIR}"           >> $GITHUB_ENV

				        echo "BUCKET=${BUCKET}"             >> $GITHUB_ENV

				      env:

				        BUCKET: neon-github-public-dev

				    # TODO: We can replace with a special docker image with Java and Allure pre-installed

				    - uses: actions/setup-java@v4

				@@ -76,8 +80,15 @@ runs:

				          rm -f ${ALLURE_ZIP}

				        fi

				      env:

				        ALLURE_VERSION: 2.27.0

				        ALLURE_ZIP_SHA256: b071858fb2fa542c65d8f152c5c40d26267b2dfb74df1f1608a589ecca38e777

				        ALLURE_VERSION: 2.32.2

				        ALLURE_ZIP_SHA256: 3f28885e2118f6317c92f667eaddcc6491400af1fb9773c1f3797a5fa5174953

				    - uses: aws-actions/configure-aws-credentials@v4

				      if: ${{ !cancelled() }}

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

				    # Potentially we could have several running build for the same key (for example, for the main branch), so we use improvised lock for this

				    - name: Acquire lock

				@@ -183,7 +194,7 @@ runs:

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}

				    - name: Store Allure test stat in the DB (new)

				      if: ${{ !cancelled() && inputs.store-test-results-into-db == 'true' }}

				@@ -221,6 +232,8 @@ runs:

				        REPORT_URL: ${{ steps.generate-report.outputs.report-url }}

				        COMMIT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				      with:

				        # Retry script for 5XX server errors: https://github.com/actions/github-script#retries

				        retries: 5

				        script: |

				          const { REPORT_URL, COMMIT_SHA } = process.env

									
										21

.github/actions/allure-report-store/action.yml
									
										vendored
									
												View File
												
				@@ -8,6 +8,9 @@ inputs:

				  unique-key:

				    description: 'string to distinguish different results in the same run'

				    required: true

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				runs:

				  using: "composite"

				@@ -15,11 +18,14 @@ runs:

				  steps:

				    - name: Set variables

				      shell: bash -euxo pipefail {0}

				      env:

				        PR_NUMBER: ${{ github.event.pull_request.number }}

				        REPORT_DIR: ${{ inputs.report-dir }}

				      run: |

				        PR_NUMBER=$(jq --raw-output .pull_request.number "$GITHUB_EVENT_PATH" || true)

				        if [ "${PR_NUMBER}" != "null" ]; then

				        if [ -n "${PR_NUMBER}" ]; then

				          BRANCH_OR_PR=pr-${PR_NUMBER}

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || [ "${GITHUB_REF_NAME}" = "release-proxy" ]; then

				        elif [ "${GITHUB_REF_NAME}" = "main" ] || [ "${GITHUB_REF_NAME}" = "release" ] || \

				             [ "${GITHUB_REF_NAME}" = "release-proxy" ] || [ "${GITHUB_REF_NAME}" = "release-compute" ]; then

				          # Shortcut for special branches

				          BRANCH_OR_PR=${GITHUB_REF_NAME}

				        else

				@@ -28,8 +34,13 @@ runs:

				        echo "BRANCH_OR_PR=${BRANCH_OR_PR}" >> $GITHUB_ENV

				        echo "REPORT_DIR=${REPORT_DIR}"     >> $GITHUB_ENV

				      env:

				        REPORT_DIR: ${{ inputs.report-dir }}

				    - uses: aws-actions/configure-aws-credentials@v4

				      if: ${{ !cancelled() }}

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

				    - name: Upload test results

				      shell: bash -euxo pipefail {0}

									
										9

.github/actions/download/action.yml
									
										vendored
									
												View File
												
				@@ -15,10 +15,19 @@ inputs:

				  prefix:

				    description: "S3 prefix. Default is '${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"

				    required: false

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				runs:

				  using: "composite"

				  steps:

				    - uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600

				    - name: Download artifact

				      id: download-artifact

				      shell: bash -euxo pipefail {0}

									
										12

.github/actions/neon-branch-create/action.yml
									
										vendored
									
												View File
												
				@@ -84,7 +84,13 @@ runs:

				          --header "Authorization: Bearer ${API_KEY}"

				          )

				        role_name=$(echo $roles | jq --raw-output '.roles[] | select(.protected == false) | .name')

				        role_name=$(echo "$roles" | jq --raw-output '

				          (.roles | map(select(.protected == false))) as $roles |

				          if any($roles[]; .name == "neondb_owner")

				          then "neondb_owner"

				          else $roles[0].name

				          end

				        ')

				        echo "role_name=${role_name}" >> $GITHUB_OUTPUT

				      env:

				        API_HOST: ${{ inputs.api_host }}

				@@ -107,13 +113,13 @@ runs:

				            )

				          if [ -z "${reset_password}" ]; then

				            sleep 1

				            sleep $i

				            continue

				          fi

				          password=$(echo $reset_password | jq --raw-output '.role.password')

				          if [ "${password}" == "null" ]; then

				            sleep 1

				            sleep $i # increasing backoff

				            continue

				          fi

									
										74

.github/actions/neon-project-create/action.yml
									
										vendored
									
												View File
												
				@@ -17,6 +17,38 @@ inputs:

				  compute_units:

				    description: '[Min, Max] compute units'

				    default: '[1, 1]'

				  # settings below only needed if you want the project to be sharded from the beginning

				  shard_split_project:

				    description: 'by default new projects are not shard-split initiailly, but only when shard-split threshold is reached, specify true to explicitly shard-split initially'

				    required: false

				    default: 'false'

				  disable_sharding:

				    description: 'by default new projects use storage controller default policy to shard-split when shard-split threshold is reached, specify true to explicitly disable sharding'

				    required: false

				    default: 'false'

				  admin_api_key:

				    description: 'Admin API Key needed for shard-splitting. Must be specified if shard_split_project is true'

				    required: false

				  shard_count:

				    description: 'Number of shards to split the project into, only applies if shard_split_project is true'

				    required: false

				    default: '8'

				  stripe_size:

				    description: 'Stripe size, optional, in 8kiB pages.  e.g. set 2048 for 16MB stripes. Default is 128 MiB, only applies if shard_split_project is true'

				    required: false

				    default: '32768'

				  psql_path:

				    description: 'Path to psql binary - it is caller responsibility to provision the psql binary'

				    required: false

				    default: '/tmp/neon/pg_install/v16/bin/psql'

				  libpq_lib_path:

				    description: 'Path to directory containing libpq library - it is caller responsibility to provision the libpq library'

				    required: false

				    default: '/tmp/neon/pg_install/v16/lib'

				  project_settings:

				    description: 'A JSON object with project settings'

				    required: false

				    default: '{}'

				outputs:

				  dsn:

				@@ -48,7 +80,7 @@ runs:

				              \"provisioner\": \"k8s-neonvm\",

				              \"autoscaling_limit_min_cu\": ${MIN_CU},

				              \"autoscaling_limit_max_cu\": ${MAX_CU},

				              \"settings\": { }

				              \"settings\": ${PROJECT_SETTINGS}

				            }

				          }")

				@@ -63,6 +95,38 @@ runs:

				        echo "project_id=${project_id}" >> $GITHUB_OUTPUT

				        echo "Project ${project_id} has been created"

				        if [ "${SHARD_SPLIT_PROJECT}" = "true" ]; then

				          # determine tenant ID

				          TENANT_ID=`${PSQL} ${dsn} -t -A -c "SHOW neon.tenant_id"`

				          echo "Splitting project ${project_id} with tenant_id ${TENANT_ID} into $((SHARD_COUNT)) shards with stripe size $((STRIPE_SIZE))"

				          echo "Sending PUT request to https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split"

				          echo "with body {\"new_shard_count\": $((SHARD_COUNT)), \"new_stripe_size\": $((STRIPE_SIZE))}"

				          # we need an ADMIN API KEY to invoke storage controller API for shard splitting (bash -u above checks that the variable is set)

				          curl -X PUT \

				            "https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split" \

				            -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \

				            -d "{\"new_shard_count\": $SHARD_COUNT, \"new_stripe_size\": $STRIPE_SIZE}"

				        fi

				        if [ "${DISABLE_SHARDING}" = "true" ]; then

				          # determine tenant ID

				          TENANT_ID=`${PSQL} ${dsn} -t -A -c "SHOW neon.tenant_id"`

				          echo "Explicitly disabling shard-splitting for project ${project_id} with tenant_id ${TENANT_ID}"

				          echo "Sending PUT request to https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/policy"

				          echo "with body {\"scheduling\": \"Essential\"}"

				          # we need an ADMIN API KEY to invoke storage controller API for shard splitting (bash -u above checks that the variable is set)

				          curl -X PUT \

				            "https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/policy" \

				            -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \

				            -d "{\"scheduling\": \"Essential\"}"

				        fi

				      env:

				        API_HOST: ${{ inputs.api_host }}

				        API_KEY: ${{ inputs.api_key }}

				@@ -70,3 +134,11 @@ runs:

				        POSTGRES_VERSION: ${{ inputs.postgres_version }}

				        MIN_CU: ${{ fromJSON(inputs.compute_units)[0] }}

				        MAX_CU: ${{ fromJSON(inputs.compute_units)[1] }}

				        SHARD_SPLIT_PROJECT: ${{ inputs.shard_split_project }}

				        DISABLE_SHARDING: ${{ inputs.disable_sharding }}

				        ADMIN_API_KEY: ${{ inputs.admin_api_key }}

				        SHARD_COUNT: ${{ inputs.shard_count }}

				        STRIPE_SIZE: ${{ inputs.stripe_size }}

				        PSQL: ${{ inputs.psql_path }}

				        LD_LIBRARY_PATH: ${{ inputs.libpq_lib_path }}

				        PROJECT_SETTINGS: ${{ inputs.project_settings }}

									
										49

.github/actions/run-python-test-set/action.yml
									
										vendored
									
												View File
												
				@@ -36,18 +36,26 @@ inputs:

				    description: 'Region name for real s3 tests'

				    required: false

				    default: ''

				  rerun_flaky:

				    description: 'Whether to rerun flaky tests'

				  rerun_failed:

				    description: 'Whether to rerun failed tests'

				    required: false

				    default: 'false'

				  pg_version:

				    description: 'Postgres version to use for tests'

				    required: false

				    default: 'v16'

				  sanitizers:

				    description: 'enabled or disabled'

				    required: false

				    default: 'disabled'

				    type: string

				  benchmark_durations:

				    description: 'benchmark durations JSON'

				    required: false

				    default: '{}'

				  aws-oicd-role-arn:

				    description: 'OIDC role arn to interract with S3'

				    required: true

				runs:

				  using: "composite"

				@@ -56,8 +64,9 @@ runs:

				      if: inputs.build_type != 'remote'

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact

				        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}${{ inputs.sanitizers == 'enabled' && '-sanitized' || '' }}-artifact

				        path: /tmp/neon

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Download Neon binaries for the previous release

				      if: inputs.build_type != 'remote'

				@@ -66,6 +75,7 @@ runs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build_type }}-artifact

				        path: /tmp/neon-previous

				        prefix: latest

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Download compatibility snapshot

				      if: inputs.build_type != 'remote'

				@@ -77,6 +87,7 @@ runs:

				        # The lack of compatibility snapshot (for example, for the new Postgres version)

				        # shouldn't fail the whole job. Only relevant test should fail.

				        skip-if-does-not-exist: true

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Checkout

				      if: inputs.needs_postgres_source == 'true'

				@@ -88,7 +99,7 @@ runs:

				      uses: actions/cache@v4

				      with:

				        path: ~/.cache/pypoetry/virtualenvs

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}

				        key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}

				    - name: Install Python deps

				      shell: bash -euxo pipefail {0}

				@@ -104,8 +115,9 @@ runs:

				        COMPATIBILITY_SNAPSHOT_DIR: /tmp/compatibility_snapshot_pg${{ inputs.pg_version }}

				        ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'backward compatibility breakage')

				        ALLOW_FORWARD_COMPATIBILITY_BREAKAGE: contains(github.event.pull_request.labels.*.name, 'forward compatibility breakage')

				        RERUN_FLAKY: ${{ inputs.rerun_flaky }}

				        RERUN_FAILED: ${{ inputs.rerun_failed }}

				        PG_VERSION: ${{ inputs.pg_version }}

				        SANITIZERS: ${{ inputs.sanitizers }}

				      shell: bash -euxo pipefail {0}

				      run: |

				        # PLATFORM will be embedded in the perf test report

				@@ -115,6 +127,8 @@ runs:

				        export DEFAULT_PG_VERSION=${PG_VERSION#v}

				        export LD_LIBRARY_PATH=${POSTGRES_DISTRIB_DIR}/v${DEFAULT_PG_VERSION}/lib

				        export BENCHMARK_CONNSTR=${BENCHMARK_CONNSTR:-}

				        export ASAN_OPTIONS=detect_leaks=0:detect_stack_use_after_return=0:abort_on_error=1:strict_string_checks=1:check_initialization_order=1:strict_init_order=1

				        export UBSAN_OPTIONS=abort_on_error=1:print_stacktrace=1

				        if [ "${BUILD_TYPE}" = "remote" ]; then

				          export REMOTE_ENV=1

				@@ -150,15 +164,8 @@ runs:

				          EXTRA_PARAMS="--out-dir $PERF_REPORT_DIR $EXTRA_PARAMS"

				        fi

				        if [ "${RERUN_FLAKY}" == "true" ]; then

				          mkdir -p $TEST_OUTPUT

				          poetry run ./scripts/flaky_tests.py "${TEST_RESULT_CONNSTR}" \

				                                              --days 7 \

				                                              --output "$TEST_OUTPUT/flaky.json" \

				                                              --pg-version "${DEFAULT_PG_VERSION}" \

				                                              --build-type "${BUILD_TYPE}"

				          EXTRA_PARAMS="--flaky-tests-json $TEST_OUTPUT/flaky.json $EXTRA_PARAMS"

				        if [ "${RERUN_FAILED}" == "true" ]; then

				          EXTRA_PARAMS="--reruns 2 $EXTRA_PARAMS"

				        fi

				        # We use pytest-split plugin to run benchmarks in parallel on different CI runners

				@@ -218,10 +225,22 @@ runs:

				        name: compatibility-snapshot-${{ runner.arch }}-${{ inputs.build_type }}-pg${{ inputs.pg_version }}

				        # Directory is created by test_compatibility.py::test_create_snapshot, keep the path in sync with the test

				        path: /tmp/test_output/compatibility_snapshot_pg${{ inputs.pg_version }}/

				        # The lack of compatibility snapshot shouldn't fail the job

				        # (for example if we didn't run the test for non build-and-test workflow)

				        skip-if-does-not-exist: true

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - uses: aws-actions/configure-aws-credentials@v4

				      if: ${{ !cancelled() }}

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600 # 1 hour should be more than enough to upload report

				    - name: Upload test results

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-store

				      with:

				        report-dir: /tmp/test_output/allure/results

				        unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}

				        unique-key: ${{ inputs.build_type }}-${{ inputs.pg_version }}-${{ runner.arch }}

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

									
										2

.github/actions/save-coverage-data/action.yml
									
										vendored
									
												View File
												
				@@ -14,9 +14,11 @@ runs:

				        name: coverage-data-artifact

				        path: /tmp/coverage

				        skip-if-does-not-exist: true # skip if there's no previous coverage to download

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

				    - name: Upload coverage data

				      uses: ./.github/actions/upload

				      with:

				        name: coverage-data-artifact

				        path: /tmp/coverage

				        aws-oicd-role-arn: ${{ inputs.aws-oicd-role-arn }}

									
										36

.github/actions/set-docker-config-dir/action.yml
									
										vendored
									
												View File
											
				@@ -1,36 +0,0 @@

				name: "Set custom docker config directory"

				description: "Create a directory for docker config and set DOCKER_CONFIG"

				# Use custom DOCKER_CONFIG directory to avoid conflicts with default settings

				runs:

				  using: "composite"

				  steps:

				  - name: Show warning on GitHub-hosted runners

				    if: runner.environment == 'github-hosted'

				    shell: bash -euo pipefail {0}

				    run: |

				      # Using the following environment variables to find a path to the workflow file

				      # ${GITHUB_WORKFLOW_REF} - octocat/hello-world/.github/workflows/my-workflow.yml@refs/heads/my_branch

				      # ${GITHUB_REPOSITORY}   - octocat/hello-world

				      # ${GITHUB_REF}          - refs/heads/my_branch

				      # From https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/variables

				      filename_with_ref=${GITHUB_WORKFLOW_REF#"$GITHUB_REPOSITORY/"}

				      filename=${filename_with_ref%"@$GITHUB_REF"}

				      # https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#setting-a-warning-message

				      title='Unnecessary usage of `.github/actions/set-docker-config-dir`'

				      message='No need to use `.github/actions/set-docker-config-dir` action on GitHub-hosted runners'

				      echo "::warning file=${filename},title=${title}::${message}"

				  - uses: pyTooling/Actions/with-post-step@74afc5a42a17a046c90c68cb5cfa627e5c6c5b6b # v1.0.7

				    env:

				      DOCKER_CONFIG: .docker-custom-${{ github.run_id }}-${{ github.run_attempt }}

				    with:

				      main: |

				        mkdir -p "${DOCKER_CONFIG}"

				        echo DOCKER_CONFIG=${DOCKER_CONFIG} | tee -a $GITHUB_ENV

				      post: |

				        if [ -d "${DOCKER_CONFIG}" ]; then

				          rm -r "${DOCKER_CONFIG}"

				        fi

									
										29

.github/actions/upload/action.yml
									
										vendored
									
												View File
												
				@@ -7,18 +7,28 @@ inputs:

				  path:

				    description: "A directory or file to upload"

				    required: true

				  skip-if-does-not-exist:

				    description: "Allow to skip if path doesn't exist, fail otherwise"

				    default: false

				    required: false

				  prefix:

				    description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"

				    required: false

				  aws-oicd-role-arn:

				    description: "the OIDC role arn for aws auth"

				    required: false

				    default: ""

				runs:

				  using: "composite"

				  steps:

				    - name: Prepare artifact

				      id: prepare-artifact

				      shell: bash -euxo pipefail {0}

				      env:

				        SOURCE: ${{ inputs.path }}

				        ARCHIVE: /tmp/uploads/${{ inputs.name }}.tar.zst

				        SKIP_IF_DOES_NOT_EXIST: ${{ inputs.skip-if-does-not-exist }}

				      run: |

				        mkdir -p $(dirname $ARCHIVE)

				@@ -33,14 +43,29 @@ runs:

				        elif [ -f ${SOURCE} ]; then

				          time tar -cf ${ARCHIVE} --zstd ${SOURCE}

				        elif ! ls ${SOURCE} > /dev/null 2>&1; then

				          echo >&2 "${SOURCE} does not exist"

				          exit 2

				          if [ "${SKIP_IF_DOES_NOT_EXIST}" = "true" ]; then

				            echo 'SKIPPED=true' >> $GITHUB_OUTPUT

				            exit 0

				          else

				            echo >&2 "${SOURCE} does not exist"

				            exit 2

				          fi

				        else

				          echo >&2 "${SOURCE} is neither a directory nor a file, do not know how to handle it"

				          exit 3

				        fi

				        echo 'SKIPPED=false' >> $GITHUB_OUTPUT

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ inputs.aws-oicd-role-arn }}

				        role-duration-seconds: 3600

				    - name: Upload artifact

				      if: ${{ steps.prepare-artifact.outputs.SKIPPED == 'false' }}

				      shell: bash -euxo pipefail {0}

				      env:

				        SOURCE: ${{ inputs.path }}

									
										13

.github/file-filters.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,13 @@

				rust_code: ['**/*.rs', '**/Cargo.toml', '**/Cargo.lock']

				rust_dependencies: ['**/Cargo.lock']

				v14: ['vendor/postgres-v14/**', 'Makefile', 'pgxn/**']

				v15: ['vendor/postgres-v15/**', 'Makefile', 'pgxn/**']

				v16: ['vendor/postgres-v16/**', 'Makefile', 'pgxn/**']

				v17: ['vendor/postgres-v17/**', 'Makefile', 'pgxn/**']

				rebuild_neon_extra:

				    - .github/workflows/neon_extra_builds.yml

				rebuild_macos:

				    - .github/workflows/build-macos.yml

									
										11

.github/pull_request_template.md
									
										vendored
									
												View File
												
				@@ -1,14 +1,3 @@

				## Problem

				## Summary of changes

				## Checklist before requesting a review

				- [ ] I have performed a self-review of my code.

				- [ ] If it is a core feature, I have added thorough tests.

				- [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?

				- [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

				## Checklist before merging

				- [ ] Do not forget to reformat commit message to not include the above checklist

									
										69

.github/scripts/generate_image_maps.py
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,69 @@

				import itertools

				import json

				import os

				import sys

				source_tag = os.getenv("SOURCE_TAG")

				target_tag = os.getenv("TARGET_TAG")

				branch = os.getenv("BRANCH")

				dev_acr = os.getenv("DEV_ACR")

				prod_acr = os.getenv("PROD_ACR")

				dev_aws = os.getenv("DEV_AWS")

				prod_aws = os.getenv("PROD_AWS")

				aws_region = os.getenv("AWS_REGION")

				components = {

				    "neon": ["neon"],

				    "compute": [

				        "compute-node-v14",

				        "compute-node-v15",

				        "compute-node-v16",

				        "compute-node-v17",

				        "vm-compute-node-v14",

				        "vm-compute-node-v15",

				        "vm-compute-node-v16",

				        "vm-compute-node-v17",

				    ],

				}

				registries = {

				    "dev": [

				        "docker.io/neondatabase",

				        "ghcr.io/neondatabase",

				        f"{dev_aws}.dkr.ecr.{aws_region}.amazonaws.com",

				        f"{dev_acr}.azurecr.io/neondatabase",

				    ],

				    "prod": [

				        f"{prod_aws}.dkr.ecr.{aws_region}.amazonaws.com",

				        f"{prod_acr}.azurecr.io/neondatabase",

				    ],

				}

				release_branches = ["release", "release-proxy", "release-compute"]

				outputs: dict[str, dict[str, list[str]]] = {}

				target_tags = (

				    [target_tag, "latest"]

				    if branch == "main"

				    else [target_tag, "released"]

				    if branch in release_branches

				    else [target_tag]

				)

				target_stages = ["dev", "prod"] if branch in release_branches else ["dev"]

				for component_name, component_images in components.items():

				    for stage in target_stages:

				        outputs[f"{component_name}-{stage}"] = {

				            f"ghcr.io/neondatabase/{component_image}:{source_tag}": [

				                f"{registry}/{component_image}:{tag}"

				                for registry, tag in itertools.product(registries[stage], target_tags)

				                if not (registry == "ghcr.io/neondatabase" and tag == source_tag)

				            ]

				            for component_image in component_images

				        }

				with open(os.getenv("GITHUB_OUTPUT", "/dev/null"), "a") as f:

				    for key, value in outputs.items():

				        f.write(f"{key}={json.dumps(value)}\n")

				        print(f"Image map for {key}:\n{json.dumps(value, indent=2)}\n\n", file=sys.stderr)

									
										110

.github/scripts/lint-release-pr.sh
									
										vendored
									
										Executable file
									
												View File
												
				@@ -0,0 +1,110 @@

				#!/usr/bin/env bash

				set -euo pipefail

				DOCS_URL="https://docs.neon.build/overview/repositories/neon.html"

				message() {

				  if [[ -n "${GITHUB_PR_NUMBER:-}" ]]; then

				    gh pr comment --repo "${GITHUB_REPOSITORY}" "${GITHUB_PR_NUMBER}" --edit-last --body "$1" \

				      || gh pr comment --repo "${GITHUB_REPOSITORY}" "${GITHUB_PR_NUMBER}" --body "$1"

				  fi

				  echo "$1"

				}

				report_error() {

				  message "❌ $1

				  For more details, see the documentation: ${DOCS_URL}"

				  exit 1

				}

				case "$RELEASE_BRANCH" in

				  "release") COMPONENT="Storage" ;;

				  "release-proxy") COMPONENT="Proxy" ;;

				  "release-compute") COMPONENT="Compute" ;;

				  *)

				    report_error "Unknown release branch: ${RELEASE_BRANCH}"

				    ;;

				esac

				# Identify main and release branches

				MAIN_BRANCH="origin/main"

				REMOTE_RELEASE_BRANCH="origin/${RELEASE_BRANCH}"

				# Find merge base

				MERGE_BASE=$(git merge-base "${MAIN_BRANCH}" "${REMOTE_RELEASE_BRANCH}")

				echo "Merge base of ${MAIN_BRANCH} and ${RELEASE_BRANCH}: ${MERGE_BASE}"

				# Get the HEAD commit (last commit in PR, expected to be the merge commit)

				LAST_COMMIT=$(git rev-parse HEAD)

				MERGE_COMMIT_MESSAGE=$(git log -1 --format=%s "${LAST_COMMIT}")

				EXPECTED_MESSAGE_REGEX="^$COMPONENT release [0-9]{4}-[0-9]{2}-[0-9]{2}$"

				if ! [[ "${MERGE_COMMIT_MESSAGE}" =~ ${EXPECTED_MESSAGE_REGEX} ]]; then

				  report_error "Merge commit message does not match expected pattern: '<component> release YYYY-MM-DD'

				  Expected component: ${COMPONENT}

				  Found: '${MERGE_COMMIT_MESSAGE}'"

				fi

				echo "✅ Merge commit message is correctly formatted: '${MERGE_COMMIT_MESSAGE}'"

				LAST_COMMIT_PARENTS=$(git cat-file -p "${LAST_COMMIT}" | jq -sR '[capture("parent (?<parent>[0-9a-f]{40})"; "g") | .parent]')

				if [[ "$(echo "${LAST_COMMIT_PARENTS}" | jq 'length')" -ne 2 ]]; then

				  report_error "Last commit must be a merge commit with exactly two parents"

				fi

				EXPECTED_RELEASE_HEAD=$(git rev-parse "${REMOTE_RELEASE_BRANCH}")

				if echo "${LAST_COMMIT_PARENTS}" | jq -e --arg rel "${EXPECTED_RELEASE_HEAD}" 'index($rel) != null' > /dev/null; then

				  LINEAR_HEAD=$(echo "${LAST_COMMIT_PARENTS}" | jq -r '[.[] | select(. != $rel)][0]' --arg rel "${EXPECTED_RELEASE_HEAD}")

				else

				  report_error "Last commit must merge the release branch (${RELEASE_BRANCH})"

				fi

				echo "✅ Last commit correctly merges the previous commit and the release branch"

				echo "Top commit of linear history: ${LINEAR_HEAD}"

				MERGE_COMMIT_TREE=$(git rev-parse "${LAST_COMMIT}^{tree}")

				LINEAR_HEAD_TREE=$(git rev-parse "${LINEAR_HEAD}^{tree}")

				if [[ "${MERGE_COMMIT_TREE}" != "${LINEAR_HEAD_TREE}" ]]; then

				  report_error "Tree of merge commit (${MERGE_COMMIT_TREE}) does not match tree of linear history head (${LINEAR_HEAD_TREE})

				  This indicates that the merge of ${RELEASE_BRANCH} into this branch was not performed using the merge strategy 'ours'"

				fi

				echo "✅ Merge commit tree matches the linear history head"

				EXPECTED_PREVIOUS_COMMIT="${LINEAR_HEAD}"

				# Now traverse down the history, ensuring each commit has exactly one parent

				CURRENT_COMMIT="${EXPECTED_PREVIOUS_COMMIT}"

				while [[ "${CURRENT_COMMIT}" != "${MERGE_BASE}" && "${CURRENT_COMMIT}" != "${EXPECTED_RELEASE_HEAD}" ]]; do

				  CURRENT_COMMIT_PARENTS=$(git cat-file -p "${CURRENT_COMMIT}" | jq -sR '[capture("parent (?<parent>[0-9a-f]{40})"; "g") | .parent]')

				  if [[ "$(echo "${CURRENT_COMMIT_PARENTS}" | jq 'length')" -ne 1 ]]; then

				    report_error "Commit ${CURRENT_COMMIT} must have exactly one parent"

				  fi

				  NEXT_COMMIT=$(echo "${CURRENT_COMMIT_PARENTS}" | jq -r '.[0]')

				  if [[ "${NEXT_COMMIT}" == "${MERGE_BASE}" ]]; then

				    echo "✅ Reached merge base (${MERGE_BASE})"

				    PR_BASE="${MERGE_BASE}"

				  elif [[ "${NEXT_COMMIT}" == "${EXPECTED_RELEASE_HEAD}" ]]; then

				    echo "✅ Reached release branch (${EXPECTED_RELEASE_HEAD})"

				    PR_BASE="${EXPECTED_RELEASE_HEAD}"

				  elif [[ -z "${NEXT_COMMIT}" ]]; then

				    report_error "Unexpected end of commit history before reaching merge base"

				  fi

				  # Move to the next commit in the chain

				  CURRENT_COMMIT="${NEXT_COMMIT}"

				done

				echo "✅ All commits are properly ordered and linear"

				echo "✅ Release PR structure is valid"

				echo

				message "Commits that are part of this release:

				$(git log --oneline "${PR_BASE}..${LINEAR_HEAD}")"

31

.github/scripts/previous-releases.jq vendored Normal file

View File

@@ -0,0 +1,31 @@
 # Expects response from https://docs.github.com/en/rest/releases/releases?apiVersion=2022-11-28#list-releases as input,
 # with tag names `release` for storage, `release-compute` for compute and `release-proxy` for proxy releases.
 # Extract only the `tag_name` field from each release object
 [ .[].tag_name ]
 # Transform each tag name into a structured object using regex capture
 | reduce map(
     capture("^(?<full>release(-(?<component>proxy|compute))?-(?<version>\\d+))$")
     | {
         component: (.component // "storage"),  # Default to "storage" if no component is specified
         version: (.version | tonumber),        # Convert the version number to an integer
         full: .full                            # Store the full tag name for final output
       }
   )[] as $entry  # Loop over the transformed list
 # Accumulate the latest (highest-numbered) version for each component
 ({};
  .[$entry.component] |= (if . == null or $entry.version > .version then $entry else . end))
 # Ensure that each component exists, or fail
 | (["storage", "compute", "proxy"] - (keys)) as $missing
 | if ($missing | length) > 0 then
     "Error: Found no release for \($missing | join(", "))!\n" | halt_error(1)
   else . end
 # Convert the resulting object into an array of formatted strings
 | to_entries
 | map("\(.key)=\(.value.full)")
 # Output each string separately
 | .[]

									
										45

.github/scripts/push_with_image_map.py
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,45 @@

				import json

				import os

				import subprocess

				RED = "\033[91m"

				RESET = "\033[0m"

				image_map = os.getenv("IMAGE_MAP")

				if not image_map:

				    raise ValueError("IMAGE_MAP environment variable is not set")

				try:

				    parsed_image_map: dict[str, list[str]] = json.loads(image_map)

				except json.JSONDecodeError as e:

				    raise ValueError("Failed to parse IMAGE_MAP as JSON") from e

				failures = []

				pending = [(source, target) for source, targets in parsed_image_map.items() for target in targets]

				while len(pending) > 0:

				    if len(failures) > 10:

				        print("Error: more than 10 failures!")

				        for failure in failures:

				            print(f'"{failure[0]}" failed with the following output:')

				            print(failure[1])

				        raise RuntimeError("Retry limit reached.")

				    source, target = pending.pop(0)

				    cmd = ["docker", "buildx", "imagetools", "create", "-t", target, source]

				    print(f"Running: {' '.join(cmd)}")

				    result = subprocess.run(cmd, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

				    if result.returncode != 0:

				        failures.append((" ".join(cmd), result.stdout, target))

				        pending.append((source, target))

				        print(

				            f"{RED}[RETRY]{RESET} Push failed for {target}. Retrying... (failure count: {len(failures)})"

				        )

				        print(result.stdout)

				if len(failures) > 0 and (github_output := os.getenv("GITHUB_OUTPUT")):

				    failed_targets = [target for _, _, target in failures]

				    with open(github_output, "a") as f:

				        f.write(f"push_failures={json.dumps(failed_targets)}\n")

									
										60

.github/workflows/_benchmarking_preparation.yml
									
										vendored
									
												View File
												
				@@ -3,19 +3,26 @@ name: Prepare benchmarking databases by restoring dumps

				on:

				  workflow_call:

				    # no inputs needed

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				permissions:

				  contents: read

				jobs:

				  setup-databases:

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    strategy:

				      fail-fast: false

				      matrix:

				        platform: [ aws-rds-postgres, aws-aurora-serverless-v2-postgres, neon ] 

				        platform: [ aws-rds-postgres, aws-aurora-serverless-v2-postgres, neon, neon_pg17 ]

				        database: [ clickbench, tpch, userexample ]

				    env:

				      LD_LIBRARY_PATH: /tmp/neon/pg_install/v16/lib

				      PLATFORM: ${{ matrix.platform }}

				@@ -23,22 +30,33 @@ jobs:

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - name: Set up Connection String

				      id: set-up-prep-connstr

				      run: |

				        case "${PLATFORM}" in

				          neon)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }} 

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR }}

				            ;;

				          neon_pg17)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CONNSTR_PG17 }}

				            ;;

				          aws-rds-postgres)

				            CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }} 

				            CONNSTR=${{ secrets.BENCHMARK_RDS_POSTGRES_CONNSTR }}

				            ;;

				          aws-aurora-serverless-v2-postgres)

				            CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_CONNSTR }} 

				            CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_CONNSTR }}

				            ;;

				          *)

				            echo >&2 "Unknown PLATFORM=${PLATFORM}"

				@@ -46,9 +64,16 @@ jobs:

				            ;;

				        esac

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT  

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				    - uses: actions/checkout@v4

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				@@ -56,24 +81,25 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    # we create a table that has one row for each database that we want to restore with the status whether the restore is done    

				    # we create a table that has one row for each database that we want to restore with the status whether the restore is done

				    - name: Create benchmark_restore_status table if it does not exist

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-prep-connstr.outputs.connstr }}

				        DATABASE_NAME: ${{ matrix.database }}

				      # to avoid a race condition of multiple jobs trying to create the table at the same time, 

				      # to avoid a race condition of multiple jobs trying to create the table at the same time,

				      # we use an advisory lock

				      run: |

				        ${PG_BINARIES}/psql "${{ env.BENCHMARK_CONNSTR }}" -c "

				        SELECT pg_advisory_lock(4711);  

				        SELECT pg_advisory_lock(4711);

				        CREATE TABLE IF NOT EXISTS benchmark_restore_status (

				        databasename text primary key,

				        restore_done boolean

				        );

				        SELECT pg_advisory_unlock(4711);

				        "

				    - name: Check if restore is already done

				      id: check-restore-done

				      env:

				@@ -107,7 +133,7 @@ jobs:

				        DATABASE_NAME: ${{ matrix.database }}

				      run: |

				        mkdir -p /tmp/dumps

				        aws s3 cp s3://neon-github-dev/performance/pgdumps/$DATABASE_NAME/$DATABASE_NAME.pg_dump /tmp/dumps/ 

				        aws s3 cp s3://neon-github-dev/performance/pgdumps/$DATABASE_NAME/$DATABASE_NAME.pg_dump /tmp/dumps/

				    - name: Replace database name in connection string

				      if: steps.check-restore-done.outputs.skip != 'true'

				@@ -126,17 +152,17 @@ jobs:

				        else

				          new_connstr="${base_connstr}/${DATABASE_NAME}"

				        fi

				        echo "database_connstr=${new_connstr}" >> $GITHUB_OUTPUT  

				        echo "database_connstr=${new_connstr}" >> $GITHUB_OUTPUT

				    - name: Restore dump

				      if: steps.check-restore-done.outputs.skip != 'true'

				      env:

				        DATABASE_NAME: ${{ matrix.database }}

				        DATABASE_CONNSTR: ${{ steps.replace-dbname.outputs.database_connstr }}

				        # the following works only with larger computes: 

				        # the following works only with larger computes:

				        # PGOPTIONS: "-c maintenance_work_mem=8388608 -c max_parallel_maintenance_workers=7"

				        # we add the || true because:

				        # the dumps were created with Neon and contain neon extensions that are not 

				        # the dumps were created with Neon and contain neon extensions that are not

				        # available in RDS, so we will always report an error, but we can ignore it

				      run: |

				        ${PG_BINARIES}/pg_restore --clean --if-exists --no-owner --jobs=4 \

									
										204

.github/workflows/_build-and-test-locally.yml
									
										vendored
									
												View File
												
				@@ -19,10 +19,15 @@ on:

				        description: 'debug or release'

				        required: true

				        type: string

				      pg-versions:

				        description: 'a json array of postgres versions to run regression tests on'

				      test-cfg:

				        description: 'a json object of postgres versions and lfc states to run regression tests on'

				        required: true

				        type: string

				      sanitizers:

				        description: 'enabled or disabled'

				        required: false

				        default: 'disabled'

				        type: string

				defaults:

				  run:

				@@ -31,17 +36,21 @@ defaults:

				env:

				  RUST_BACKTRACE: 1

				  COPT: '-Werror'

				  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				permissions:

				  contents: read

				jobs:

				  build-neon:

				    runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      contents: read

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      # Raise locked memory limit for tokio-epoll-uring.

				      # On 5.10 LTS kernels < 5.10.162 (and generally mainline kernels < 5.12),

				      # io_uring will account the memory of the CQ and SQ as locked.

				@@ -53,21 +62,12 @@ jobs:

				      BUILD_TAG: ${{ inputs.build-tag }}

				    steps:

				      - name: Fix git ownership

				        run: |

				          # Workaround for `fatal: detected dubious ownership in repository at ...`

				          #

				          # Use both ${{ github.workspace }} and ${GITHUB_WORKSPACE} because they're different on host and in containers

				          #   Ref https://github.com/actions/checkout/issues/785

				          #

				          git config --global --add safe.directory ${{ github.workspace }}

				          git config --global --add safe.directory ${GITHUB_WORKSPACE}

				          for r in 14 15 16; do

				            git config --global --add safe.directory "${{ github.workspace }}/vendor/postgres-v$r"

				            git config --global --add safe.directory "${GITHUB_WORKSPACE}/vendor/postgres-v$r"

				          done

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: true

				@@ -83,6 +83,10 @@ jobs:

				        id: pg_v16_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT

				      - name: Set pg 17 revision for caching

				        id: pg_v17_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) >> $GITHUB_OUTPUT

				      # Set some environment variables used by all the steps.

				      #

				      # CARGO_FLAGS is extra options to pass to "cargo build", "cargo test" etc.

				@@ -96,6 +100,7 @@ jobs:

				      - name: Set env variables

				        env:

				          ARCH: ${{ inputs.arch }}

				          SANITIZERS: ${{ inputs.sanitizers }}

				        run: |

				          CARGO_FEATURES="--features testing"

				          if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' ]]; then

				@@ -108,8 +113,14 @@ jobs:

				            cov_prefix=""

				            CARGO_FLAGS="--locked --release"

				          fi

				          if [[ $SANITIZERS == 'enabled' ]]; then

				            make_vars="WITH_SANITIZERS=yes"

				          else

				            make_vars=""

				          fi

				          {

				            echo "cov_prefix=${cov_prefix}"

				            echo "make_vars=${make_vars}"

				            echo "CARGO_FEATURES=${CARGO_FEATURES}"

				            echo "CARGO_FLAGS=${CARGO_FLAGS}"

				            echo "CARGO_HOME=${GITHUB_WORKSPACE}/.cargo"

				@@ -117,54 +128,87 @@ jobs:

				      - name: Cache postgres v14 build

				        id: cache_pg_14

				        uses: actions/cache@v4

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_15

				        uses: actions/cache@v4

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_16

				        uses: actions/cache@v4

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Cache postgres v17 build

				        id: cache_pg_17

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v17

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v17_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}

				      - name: Build postgres v14

				        if: steps.cache_pg_14.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v14 -j$(nproc)

				        run: mold -run make ${make_vars} postgres-v14 -j$(nproc)

				      - name: Build postgres v15

				        if: steps.cache_pg_15.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v15 -j$(nproc)

				        run: mold -run make ${make_vars} postgres-v15 -j$(nproc)

				      - name: Build postgres v16

				        if: steps.cache_pg_16.outputs.cache-hit != 'true'

				        run: mold -run make postgres-v16 -j$(nproc)

				        run: mold -run make ${make_vars} postgres-v16 -j$(nproc)

				      - name: Build postgres v17

				        if: steps.cache_pg_17.outputs.cache-hit != 'true'

				        run: mold -run make ${make_vars} postgres-v17 -j$(nproc)

				      - name: Build neon extensions

				        run: mold -run make neon-pg-ext -j$(nproc)

				        run: mold -run make ${make_vars} neon-pg-ext -j$(nproc)

				      - name: Build walproposer-lib

				        run: mold -run make walproposer-lib -j$(nproc)

				        run: mold -run make ${make_vars} walproposer-lib -j$(nproc)

				      - name: Run cargo build

				        env:

				          WITH_TESTS: ${{ inputs.sanitizers != 'enabled' && '--tests' || '' }}

				        run: |

				          PQ_LIB_DIR=$(pwd)/pg_install/v16/lib

				          export PQ_LIB_DIR

				          ${cov_prefix} mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins --tests

				          export ASAN_OPTIONS=detect_leaks=0

				          ${cov_prefix} mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins ${WITH_TESTS}

				      # Do install *before* running rust tests because they might recompile the

				      # binaries with different features/flags.

				      - name: Install rust binaries

				        env:

				          ARCH: ${{ inputs.arch }}

				          SANITIZERS: ${{ inputs.sanitizers }}

				        run: |

				          # Install target binaries

				          mkdir -p /tmp/neon/bin/

				@@ -179,7 +223,7 @@ jobs:

				          done

				          # Install test executables and write list of all binaries (for code coverage)

				          if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' ]]; then

				          if [[ $BUILD_TYPE == "debug" && $ARCH == 'x64' && $SANITIZERS != 'enabled' ]]; then

				            # Keep bloated coverage data files away from the rest of the artifact

				            mkdir -p /tmp/coverage/

				@@ -204,13 +248,19 @@ jobs:

				            done

				          fi

				      - name: Configure AWS credentials

				        uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				        with:

				          aws-region: eu-central-1

				          role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				          role-duration-seconds: 18000 # 5 hours

				      - name: Run rust tests

				        if: ${{ inputs.sanitizers != 'enabled' }}

				        env:

				          NEXTEST_RETRIES: 3

				        run: |

				          PQ_LIB_DIR=$(pwd)/pg_install/v16/lib

				          export PQ_LIB_DIR

				          LD_LIBRARY_PATH=$(pwd)/pg_install/v16/lib

				          LD_LIBRARY_PATH=$(pwd)/pg_install/v17/lib

				          export LD_LIBRARY_PATH

				          #nextest does not yet support running doctests

				@@ -220,9 +270,12 @@ jobs:

				          ${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E '!package(pageserver)'

				          # run pageserver tests with different settings

				          for io_engine in std-fs tokio-epoll-uring ; do

				            for io_buffer_alignment in 0 1 512 ; do

				              NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IOENGINE=$io_engine NEON_PAGESERVER_UNIT_TEST_IO_BUFFER_ALIGNMENT=$io_buffer_alignment ${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES  -E 'package(pageserver)'

				          for get_vectored_concurrent_io in sequential sidecar-task; do

				            for io_engine in std-fs tokio-epoll-uring ; do

				              NEON_PAGESERVER_UNIT_TEST_GET_VECTORED_CONCURRENT_IO=$get_vectored_concurrent_io \

				                NEON_PAGESERVER_UNIT_TEST_VIRTUAL_FILE_IOENGINE=$io_engine \

				                ${cov_prefix} \

				                cargo nextest run $CARGO_FLAGS $CARGO_FEATURES  -E 'package(pageserver)'

				            done

				          done

				@@ -242,13 +295,43 @@ jobs:

				          ${cov_prefix} cargo nextest run $CARGO_FLAGS $CARGO_FEATURES -E 'package(remote_storage)' -E 'test(test_real_azure)'

				      - name: Install postgres binaries

				        run: cp -a pg_install /tmp/neon/pg_install

				        run: |

				          # Use tar to copy files matching the pattern, preserving the paths in the destionation

				          tar c \

				            pg_install/v* \

				            pg_install/build/*/src/test/regress/*.so \

				            pg_install/build/*/src/test/regress/pg_regress \

				            pg_install/build/*/src/test/isolation/isolationtester \

				            pg_install/build/*/src/test/isolation/pg_isolation_regress \

				            | tar  x -C /tmp/neon

				      - name: Upload Neon artifact

				        uses: ./.github/actions/upload

				        with:

				          name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-artifact

				          name: neon-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}${{ inputs.sanitizers == 'enabled' && '-sanitized' || '' }}-artifact

				          path: /tmp/neon

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      - name: Check diesel schema

				        if: inputs.build-type == 'release' && inputs.arch == 'x64'

				        env:

				          DATABASE_URL: postgresql://localhost:1235/storage_controller

				          POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				        run: |

				          export ASAN_OPTIONS=detect_leaks=0

				          /tmp/neon/bin/neon_local init

				          /tmp/neon/bin/neon_local storage_controller start

				          diesel print-schema > storage_controller/src/schema.rs

				          if [ -n "$(git diff storage_controller/src/schema.rs)" ]; then

				            echo >&2 "Uncommitted changes in diesel schema"

				            git diff .

				            exit 1

				          fi

				          /tmp/neon/bin/neon_local storage_controller stop

				      # XXX: keep this after the binaries.list is formed, so the coverage can properly work later

				      - name: Merge and upload coverage data

				@@ -258,27 +341,36 @@ jobs:

				  regress-tests:

				    # Don't run regression tests on debug arm64 builds

				    if: inputs.build-type != 'debug' || inputs.arch != 'arm64'

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      contents: read

				      statuses: write

				    needs: [ build-neon ]

				    runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', inputs.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      # for changed limits, see comments on `options:` earlier in this file

				      options: --init --shm-size=512mb --ulimit memlock=67108864:67108864

				    strategy:

				      fail-fast: false

				      matrix:

				        pg_version: ${{ fromJson(inputs.pg-versions) }}

				      matrix: ${{ fromJSON(format('{{"include":{0}}}', inputs.test-cfg)) }}

				    steps:

				      - uses: actions/checkout@v4

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: true

				      - name: Pytest regression tests

				        continue-on-error: ${{ matrix.lfc_state == 'with-lfc' && inputs.build-type == 'debug' }}

				        uses: ./.github/actions/run-python-test-set

				        timeout-minutes: 60

				        timeout-minutes: ${{ inputs.sanitizers != 'enabled' && 75 || 180 }}

				        with:

				          build_type: ${{ inputs.build-type }}

				          test_selection: regress

				@@ -286,13 +378,21 @@ jobs:

				          run_with_real_s3: true

				          real_s3_bucket: neon-github-ci-tests

				          real_s3_region: eu-central-1

				          rerun_flaky: true

				          rerun_failed: true

				          pg_version: ${{ matrix.pg_version }}

				          sanitizers: ${{ inputs.sanitizers }}

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				          # `--session-timeout` is equal to (timeout-minutes - 10 minutes) * 60 seconds.

				          # Attempt to stop tests gracefully to generate test reports

				          # until they are forcibly stopped by the stricter `timeout-minutes` limit.

				          extra_params: --session-timeout=${{ inputs.sanitizers != 'enabled' && 3000 || 10200 }}

				        env:

				          TEST_RESULT_CONNSTR: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				          CHECK_ONDISK_DATA_COMPATIBILITY: nonempty

				          BUILD_TAG: ${{ inputs.build-tag }}

				          PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring

				          PAGESERVER_GET_VECTORED_CONCURRENT_IO: sidecar-task

				          USE_LFC: ${{ matrix.lfc_state == 'with-lfc' && 'true' || 'false' }}

				      # Temporary disable this step until we figure out why it's so flaky

				      # Ref https://github.com/neondatabase/neon/issues/4540

									
										55

.github/workflows/_check-codestyle-python.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,55 @@

				name: Check Codestyle Python

				on:

				  workflow_call:

				    inputs:

				      build-tools-image:

				        description: 'build-tools image'

				        required: true

				        type: string

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				permissions:

				  contents: read

				jobs:

				  check-codestyle-python:

				    runs-on: [ self-hosted, small ]

				    permissions:

				      packages: read

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Cache poetry deps

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: ~/.cache/pypoetry/virtualenvs

				          key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}

				      - run: ./scripts/pysync

				      - run: poetry run ruff check .

				      - run: poetry run ruff format --check .

				      - run: poetry run mypy .

									
										102

.github/workflows/_check-codestyle-rust.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,102 @@

				name: Check Codestyle Rust

				on:

				  workflow_call:

				    inputs:

				      build-tools-image:

				        description: "build-tools image"

				        required: true

				        type: string

				      archs:

				        description: "Json array of architectures to run on"

				        type: string

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-codestyle-rust:

				    strategy:

				      matrix:

				        arch: ${{ fromJSON(inputs.archs) }}

				    runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'small-arm64' || 'small')) }}

				    permissions:

				      packages: read

				    container:

				      image: ${{ inputs.build-tools-image }}

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: true

				      - name: Cache cargo deps

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: |

				            ~/.cargo/registry

				            !~/.cargo/registry/src

				            ~/.cargo/git

				            target

				          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				      # Some of our rust modules use FFI and need those to be checked

				      - name: Get postgres headers

				        run: make postgres-headers -j$(nproc)

				      # cargo hack runs the given cargo subcommand (clippy in this case) for all feature combinations.

				      # This will catch compiler & clippy warnings in all feature combinations.

				      # TODO: use cargo hack for build and test as well, but, that's quite expensive.

				      # NB: keep clippy args in sync with ./run_clippy.sh

				      #

				      # The only difference between "clippy --debug" and "clippy --release" is that in --release mode,

				      # #[cfg(debug_assertions)] blocks are not built. It's not worth building everything for second

				      # time just for that, so skip "clippy --release".

				      - run: |

				          CLIPPY_COMMON_ARGS="$( source .neon_clippy_args; echo "$CLIPPY_COMMON_ARGS")"

				          if [ "$CLIPPY_COMMON_ARGS" = "" ]; then

				            echo "No clippy args found in .neon_clippy_args"

				            exit 1

				          fi

				          echo "CLIPPY_COMMON_ARGS=${CLIPPY_COMMON_ARGS}" >> $GITHUB_ENV

				      - name: Run cargo clippy (debug)

				        run: cargo hack --features default --ignore-unknown-features --feature-powerset clippy $CLIPPY_COMMON_ARGS

				      - name: Check documentation generation

				        run: cargo doc --workspace --no-deps --document-private-items

				        env:

				          RUSTDOCFLAGS: "-Dwarnings -Arustdoc::private_intra_doc_links"

				      # Use `${{ !cancelled() }}` to run quck tests after the longer clippy run

				      - name: Check formatting

				        if: ${{ !cancelled() }}

				        run: cargo fmt --all -- --check

				      # https://github.com/facebookincubator/cargo-guppy/tree/bec4e0eb29dcd1faac70b1b5360267fc02bf830e/tools/cargo-hakari#2-keep-the-workspace-hack-up-to-date-in-ci

				      - name: Check rust dependencies

				        if: ${{ !cancelled() }}

				        run: |

				          cargo hakari generate --diff  # workspace-hack Cargo.toml is up-to-date

				          cargo hakari manage-deps --dry-run  # all workspace crates depend on workspace-hack

									
										100

.github/workflows/_create-release-pr.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,100 @@

				name: Create Release PR

				on:

				  workflow_call:

				    inputs:

				      component-name:

				        description: 'Component name'

				        required: true

				        type: string

				      source-branch:

				        description: 'Source branch'

				        required: true

				        type: string

				    secrets:

				      ci-access-token:

				        description: 'CI access token'

				        required: true

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				permissions:

				  contents: read

				jobs:

				  create-release-branch:

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # for `git push`

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      with:

				        ref: ${{ inputs.source-branch }}

				        fetch-depth: 0

				    - name: Set variables

				      id: vars

				      env:

				        COMPONENT_NAME: ${{ inputs.component-name }}

				        RELEASE_BRANCH: >-

				          ${{

				            false

				            || inputs.component-name == 'Storage' && 'release'

				            || inputs.component-name == 'Proxy' && 'release-proxy'

				            || inputs.component-name == 'Compute' && 'release-compute'

				          }}

				      run: |

				        today=$(date +'%Y-%m-%d')

				        echo "title=${COMPONENT_NAME} release ${today}" | tee -a ${GITHUB_OUTPUT}

				        echo "rc-branch=rc/${RELEASE_BRANCH}/${today}"  | tee -a ${GITHUB_OUTPUT}

				        echo "release-branch=${RELEASE_BRANCH}"         | tee -a ${GITHUB_OUTPUT}

				    - name: Configure git

				      run: |

				        git config user.name "github-actions[bot]"

				        git config user.email "41898282+github-actions[bot]@users.noreply.github.com"

				    - name: Create RC branch

				      env:

				        RELEASE_BRANCH: ${{ steps.vars.outputs.release-branch }}

				        RC_BRANCH: ${{ steps.vars.outputs.rc-branch }}

				        TITLE: ${{ steps.vars.outputs.title }}

				      run: |

				        git switch -c "${RC_BRANCH}"

				        # Manually create a merge commit on the current branch, keeping the

				        # tree and setting the parents to the current HEAD and the HEAD of the

				        # release branch. This commit is what we'll fast-forward the release

				        # branch to when merging the release branch.

				        # For details on why, look at

				        # https://docs.neon.build/overview/repositories/neon.html#background-on-commit-history-of-release-prs

				        current_tree=$(git rev-parse 'HEAD^{tree}')

				        release_head=$(git rev-parse "origin/${RELEASE_BRANCH}")

				        current_head=$(git rev-parse HEAD)

				        merge_commit=$(git commit-tree -p "${current_head}" -p "${release_head}" -m "${TITLE}" "${current_tree}")

				        # Fast-forward the current branch to the newly created merge_commit

				        git merge --ff-only ${merge_commit}

				        git push origin "${RC_BRANCH}"

				    - name: Create a PR into ${{ steps.vars.outputs.release-branch }}

				      env:

				        GH_TOKEN: ${{ secrets.ci-access-token }}

				        RC_BRANCH: ${{ steps.vars.outputs.rc-branch }}

				        RELEASE_BRANCH: ${{ steps.vars.outputs.release-branch }}

				        TITLE: ${{ steps.vars.outputs.title }}

				      run: |

				        gh pr create --title "${TITLE}" \

				                     --body "" \

				                     --head "${RC_BRANCH}" \

				                     --base "${RELEASE_BRANCH}"

									
										169

.github/workflows/_meta.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,169 @@

				name: Generate run metadata

				on:

				  workflow_call:

				    inputs:

				      github-event-name:

				        type: string

				        required: true

				      github-event-json:

				        type: string

				        required: true

				    outputs:

				      build-tag:

				        description: "Tag for the current workflow run"

				        value: ${{ jobs.tags.outputs.build-tag }}

				      release-tag:

				        description: "Tag for the release if this is an RC PR run"

				        value: ${{ jobs.tags.outputs.release-tag }}

				      previous-storage-release:

				        description: "Tag of the last storage release"

				        value: ${{ jobs.tags.outputs.storage }}

				      previous-proxy-release:

				        description: "Tag of the last proxy release"

				        value: ${{ jobs.tags.outputs.proxy }}

				      previous-compute-release:

				        description: "Tag of the last compute release"

				        value: ${{ jobs.tags.outputs.compute }}

				      run-kind:

				        description: "The kind of run we're currently in. Will be one of `push-main`, `storage-release`, `compute-release`, `proxy-release`, `storage-rc-pr`, `compute-rc-pr`,  `proxy-rc-pr`, `pr`, or `workflow-dispatch`"

				        value: ${{ jobs.tags.outputs.run-kind }}

				      release-pr-run-id:

				        description: "Only available if `run-kind in [storage-release, proxy-release, compute-release]`. Contains the run ID of the `Build and Test` workflow, assuming one with the current commit can be found."

				        value: ${{ jobs.tags.outputs.release-pr-run-id }}

				      sha:

				        description: "github.event.pull_request.head.sha on release PRs, github.sha otherwise"

				        value: ${{ jobs.tags.outputs.sha }}

				permissions: {}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				jobs:

				  tags:

				    runs-on: ubuntu-22.04

				    outputs:

				      build-tag: ${{ steps.build-tag.outputs.build-tag }}

				      release-tag: ${{ steps.build-tag.outputs.release-tag }}

				      compute: ${{ steps.previous-releases.outputs.compute }}

				      proxy: ${{ steps.previous-releases.outputs.proxy }}

				      storage: ${{ steps.previous-releases.outputs.storage }}

				      run-kind: ${{ steps.run-kind.outputs.run-kind }}

				      release-pr-run-id: ${{ steps.release-pr-run-id.outputs.release-pr-run-id }}

				      sha: ${{ steps.sha.outputs.sha }}

				    permissions:

				      contents: read

				    steps:

				      # Need `fetch-depth: 0` to count the number of commits in the branch

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Get run kind

				        id: run-kind

				        env:

				          RUN_KIND: >-

				            ${{

				              false

				              || (inputs.github-event-name == 'push'         && github.ref_name == 'main')            && 'push-main'

				              || (inputs.github-event-name == 'push'         && github.ref_name == 'release')         && 'storage-release'

				              || (inputs.github-event-name == 'push'         && github.ref_name == 'release-compute') && 'compute-release'

				              || (inputs.github-event-name == 'push'         && github.ref_name == 'release-proxy')   && 'proxy-release'

				              || (inputs.github-event-name == 'pull_request' && github.base_ref == 'release')         && 'storage-rc-pr'

				              || (inputs.github-event-name == 'pull_request' && github.base_ref == 'release-compute') && 'compute-rc-pr'

				              || (inputs.github-event-name == 'pull_request' && github.base_ref == 'release-proxy')   && 'proxy-rc-pr'

				              || (inputs.github-event-name == 'pull_request')                                         && 'pr'

				              || (inputs.github-event-name == 'workflow_dispatch')                                    && 'workflow-dispatch'

				              || 'unknown'

				            }}

				        run: |

				          echo "run-kind=$RUN_KIND" | tee -a $GITHUB_OUTPUT

				      - name: Get the right SHA

				        id: sha

				        env:

				          SHA: >

				            ${{

				              contains(fromJSON('["storage-rc-pr", "proxy-rc-pr", "compute-rc-pr"]'), steps.run-kind.outputs.run-kind)

				              && fromJSON(inputs.github-event-json).pull_request.head.sha

				              || github.sha

				            }}

				        run: |

				          echo "sha=$SHA" | tee -a $GITHUB_OUTPUT

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          ref: ${{ steps.sha.outputs.sha }}

				      - name: Get build tag

				        id: build-tag

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				          CURRENT_BRANCH: ${{ github.head_ref || github.ref_name }}

				          CURRENT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				          RUN_KIND: ${{ steps.run-kind.outputs.run-kind }}

				        run: |

				          case $RUN_KIND in

				          push-main)

				            echo "build-tag=$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				            ;;

				          storage-release)

				            echo "build-tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				            ;;

				          proxy-release)

				            echo "build-tag=release-proxy-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				            ;;

				          compute-release)

				            echo "build-tag=release-compute-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				            ;;

				          pr|storage-rc-pr|compute-rc-pr|proxy-rc-pr)

				            BUILD_AND_TEST_RUN_ID=$(gh api --paginate \

				              -H "Accept: application/vnd.github+json" \

				              -H "X-GitHub-Api-Version: 2022-11-28" \

				              "/repos/${GITHUB_REPOSITORY}/actions/runs?head_sha=${CURRENT_SHA}&branch=${CURRENT_BRANCH}" \

				              | jq '[.workflow_runs[] | select(.name == "Build and Test")][0].id // ("Error: No matching workflow run found." | halt_error(1))')

				            echo "build-tag=$BUILD_AND_TEST_RUN_ID" | tee -a $GITHUB_OUTPUT

				            case $RUN_KIND in

				            storage-rc-pr)

				              echo "release-tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				              ;;

				            proxy-rc-pr)

				              echo "release-tag=release-proxy-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				              ;;

				            compute-rc-pr)

				              echo "release-tag=release-compute-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				              ;;

				            esac

				            ;;

				          workflow-dispatch)

				            echo "build-tag=$GITHUB_RUN_ID" | tee -a $GITHUB_OUTPUT

				            ;;

				          *)

				            echo "Unexpected RUN_KIND ('${RUN_KIND}'), failing to assign build-tag!"

				            exit 1

				          esac

				      - name: Get the previous release-tags

				        id: previous-releases

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: |

				          gh api --paginate \

				            -H "Accept: application/vnd.github+json" \

				            -H "X-GitHub-Api-Version: 2022-11-28" \

				            "/repos/${GITHUB_REPOSITORY}/releases" \

				          | jq -f .github/scripts/previous-releases.jq -r \

				          | tee -a "${GITHUB_OUTPUT}"

				      - name: Get the release PR run ID

				        id: release-pr-run-id

				        if: ${{ contains(fromJSON('["storage-release", "compute-release", "proxy-release"]'), steps.run-kind.outputs.run-kind) }}

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				          CURRENT_SHA: ${{ github.sha }}

				        run: |

				          RELEASE_PR_RUN_ID=$(gh api "/repos/${GITHUB_REPOSITORY}/actions/runs?head_sha=$CURRENT_SHA" | jq '[.workflow_runs[] | select(.name == "Build and Test") | select(.head_branch | test("^rc/release(-(proxy|compute))?/[0-9]{4}-[0-9]{2}-[0-9]{2}$"; "s"))] | first | .id // ("Failed to find Build and Test run from  RC PR!" | halt_error(1))')

				          echo "release-pr-run-id=$RELEASE_PR_RUN_ID" | tee -a $GITHUB_OUTPUT

									
										56

.github/workflows/_push-to-acr.yml
									
										vendored
									
												View File
											
				@@ -1,56 +0,0 @@

				name: Push images to ACR

				on:

				  workflow_call:

				    inputs:

				      client_id:

				        description: Client ID of Azure managed identity or Entra app

				        required: true

				        type: string

				      image_tag:

				        description: Tag for the container image

				        required: true

				        type: string

				      images:

				        description: Images to push

				        required: true

				        type: string

				      registry_name:

				        description: Name of the container registry

				        required: true

				        type: string

				      subscription_id:

				        description: Azure subscription ID

				        required: true

				        type: string

				      tenant_id:

				        description: Azure tenant ID

				        required: true

				        type: string

				jobs:

				  push-to-acr:

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read  # This is required for actions/checkout

				      id-token: write # This is required for Azure Login to work.

				    steps:

				      - name: Azure login

				        uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a  # @v2.1.1

				        with:

				          client-id: ${{ inputs.client_id }}

				          subscription-id: ${{ inputs.subscription_id }}

				          tenant-id: ${{ inputs.tenant_id }}

				      - name: Login to ACR

				        run: |

				          az acr login --name=${{ inputs.registry_name }}

				      - name: Copy docker images to ACR ${{ inputs.registry_name }}

				        run: |

				          images='${{ inputs.images }}'

				          for image in ${images}; do

				            docker buildx imagetools create \

				              -t ${{ inputs.registry_name }}.azurecr.io/neondatabase/${image}:${{ inputs.image_tag }} \

				                                        neondatabase/${image}:${{ inputs.image_tag }}

				          done

									
										128

.github/workflows/_push-to-container-registry.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,128 @@

				name: Push images to Container Registry

				on:

				  workflow_call:

				    inputs:

				      # Example: {"docker.io/neondatabase/neon:13196061314":["${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/neon:13196061314","neoneastus2.azurecr.io/neondatabase/neon:13196061314"]}

				      image-map:

				        description: JSON map of images, mapping from a source image to an array of target images that should be pushed.

				        required: true

				        type: string

				      aws-region:

				        description: AWS region to log in to. Required when pushing to ECR.

				        required: false

				        type: string

				      aws-account-id:

				        description: AWS account ID to log in to for pushing to ECR. Required when pushing to ECR.

				        required: false

				        type: string

				      aws-role-to-assume:

				        description: AWS role to assume to for pushing to ECR. Required when pushing to ECR.

				        required: false

				        type: string

				      azure-client-id:

				        description: Client ID of Azure managed identity or Entra app. Required when pushing to ACR.

				        required: false

				        type: string

				      azure-subscription-id:

				        description: Azure subscription ID. Required when pushing to ACR.

				        required: false

				        type: string

				      azure-tenant-id:

				        description: Azure tenant ID. Required when pushing to ACR.

				        required: false

				        type: string

				      acr-registry-name:

				        description: ACR registry name. Required when pushing to ACR.

				        required: false

				        type: string

				permissions: {}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				jobs:

				  push-to-container-registry:

				    runs-on: ubuntu-22.04

				    permissions:

				      id-token: write  # Required for aws/azure login

				      packages: write  # required for pushing to GHCR

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          sparse-checkout: .github/scripts/push_with_image_map.py

				          sparse-checkout-cone-mode: false

				      - name: Print image-map

				        run: echo '${{ inputs.image-map }}' | jq

				      - name: Configure AWS credentials

				        if: contains(inputs.image-map, 'amazonaws.com/')

				        uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				        with:

				          aws-region: "${{ inputs.aws-region }}"

				          role-to-assume: "arn:aws:iam::${{ inputs.aws-account-id }}:role/${{ inputs.aws-role-to-assume }}"

				          role-duration-seconds: 3600

				      - name: Login to ECR

				        if: contains(inputs.image-map, 'amazonaws.com/')

				        uses: aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 # v2.0.1

				        with:

				          registries: "${{ inputs.aws-account-id }}"

				      - name: Configure Azure credentials

				        if: contains(inputs.image-map, 'azurecr.io/')

				        uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a  # @v2.1.1

				        with:

				          client-id: ${{ inputs.azure-client-id }}

				          subscription-id: ${{ inputs.azure-subscription-id }}

				          tenant-id: ${{ inputs.azure-tenant-id }}

				      - name: Login to ACR

				        if: contains(inputs.image-map, 'azurecr.io/')

				        run: |

				          az acr login --name=${{ inputs.acr-registry-name }}

				      - name: Login to GHCR

				        if: contains(inputs.image-map, 'ghcr.io/')

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Log in to Docker Hub

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - name: Copy docker images to target registries

				        id: push

				        run: python3 .github/scripts/push_with_image_map.py

				        env:

				          IMAGE_MAP: ${{ inputs.image-map }}

				      - name: Notify Slack if container image pushing fails

				        if: steps.push.outputs.push_failures || failure()

				        uses: slackapi/slack-github-action@485a9d42d3a73031f12ec201c457e2162c45d02d # v2.0.0

				        with:

				          method: chat.postMessage

				          token: ${{ secrets.SLACK_BOT_TOKEN }}

				          payload: |

				            channel: ${{ vars.SLACK_ON_CALL_DEVPROD_STREAM }}

				            text: >

				              *Container image pushing ${{

				                steps.push.outcome == 'failure' && 'failed completely' || 'succeeded with some retries'

				              }}* in

				              <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				              ${{ steps.push.outputs.push_failures && format(

				                '*Failed targets:*\n• {0}', join(fromJson(steps.push.outputs.push_failures), '\n• ')

				              ) || '' }}

									
										11

.github/workflows/actionlint.yml
									
										vendored
									
												View File
												
				@@ -26,14 +26,19 @@ jobs:

				    needs: [ check-permissions ]

				    runs-on: ubuntu-22.04

				    steps:

				      - uses: actions/checkout@v4

				      - uses: reviewdog/action-actionlint@v1

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - uses: reviewdog/action-actionlint@a5524e1c19e62881d79c1f1b9b6f09f16356e281 # v1.65.2

				        env:

				          # SC2046 - Quote this to prevent word splitting. - https://www.shellcheck.net/wiki/SC2046

				          # SC2086 - Double quote to prevent globbing and word splitting. - https://www.shellcheck.net/wiki/SC2086

				          SHELLCHECK_OPTS: --exclude=SC2046,SC2086

				        with:

				          fail_on_error: true

				          fail_level: error

				          filter_mode: nofilter

				          level: error

									
										29

.github/workflows/approved-for-ci-run.yml
									
										vendored
									
												View File
												
				@@ -47,6 +47,11 @@ jobs:

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"

				  create-or-update-pr-for-ci-run:

				@@ -63,13 +68,18 @@ jobs:

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - run: gh pr --repo "${GITHUB_REPOSITORY}" edit "${PR_NUMBER}" --remove-label "approved-for-ci-run"

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: main

				          ref: ${{ github.event.pull_request.head.sha }}

				          token: ${{ secrets.CI_ACCESS_TOKEN }}

				      - name: Look for existing PR

				        id: get-pr

				        env:

				@@ -77,7 +87,7 @@ jobs:

				        run: |

				          ALREADY_CREATED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --base main --json number --jq '.[].number')"

				          echo "ALREADY_CREATED=${ALREADY_CREATED}" >> ${GITHUB_OUTPUT}

				      - name: Get changed labels

				        id: get-labels

				        if: steps.get-pr.outputs.ALREADY_CREATED != ''

				@@ -94,8 +104,6 @@ jobs:

				          echo "LABELS_TO_ADD=${LABELS_TO_ADD}" >> ${GITHUB_OUTPUT}

				          echo "LABELS_TO_REMOVE=${LABELS_TO_REMOVE}" >> ${GITHUB_OUTPUT}

				      - run: gh pr checkout "${PR_NUMBER}"

				      - run: git checkout -b "${BRANCH}"

				      - run: git push --force origin "${BRANCH}"

				@@ -103,7 +111,7 @@ jobs:

				      - name: Create a Pull Request for CI run (if required)

				        if: steps.get-pr.outputs.ALREADY_CREATED == ''

				        env: 

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: |

				          cat << EOF > body.md

				@@ -140,7 +148,7 @@ jobs:

				      - run: git push --force origin "${BRANCH}"

				        if: steps.get-pr.outputs.ALREADY_CREATED != ''

				  cleanup:

				    # Close PRs and delete branchs if the original PR is closed.

				@@ -155,6 +163,11 @@ jobs:

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Close PR and delete `ci-run/pr-${{ env.PR_NUMBER }}` branch

				        run: |

				          CLOSED="$(gh pr --repo ${GITHUB_REPOSITORY} list --head ${BRANCH} --json 'closed' --jq '.[].closed')"

									
										485

.github/workflows/benchmarking.yml
									
										vendored
									
												View File
												
				@@ -12,7 +12,6 @@ on:

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:   '0 3 * * *' # run once a day, timezone is utc

				  workflow_dispatch: # adds ability to run this manually

				    inputs:

				      region_id:

				@@ -59,26 +58,28 @@ jobs:

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # Required for OIDC authentication in azure runners

				      id-token: write # aws-actions/configure-aws-credentials

				    strategy:

				      fail-fast: false

				      matrix:

				        include:

				          - DEFAULT_PG_VERSION: 16

				          - PG_VERSION: 16

				            PLATFORM: "neon-staging"

				            region_id: ${{ github.event.inputs.region_id || 'aws-us-east-2' }}

				            RUNNER: [ self-hosted, us-east-2, x64 ]

				            IMAGE: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				          - DEFAULT_PG_VERSION: 16

				          - PG_VERSION: 17

				            PLATFORM: "neon-staging"

				            region_id: ${{ github.event.inputs.region_id || 'aws-us-east-2' }}

				            RUNNER: [ self-hosted, us-east-2, x64 ]

				          - PG_VERSION: 16

				            PLATFORM: "azure-staging"

				            region_id: 'azure-eastus2'

				            RUNNER: [ self-hosted, eastus2, x64 ]

				            IMAGE: neondatabase/build-tools:pinned

				    env:

				      TEST_PG_BENCH_DURATIONS_MATRIX: "300"

				      TEST_PG_BENCH_SCALES_MATRIX: "10,100"

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: ${{ matrix.DEFAULT_PG_VERSION }}

				      PG_VERSION: ${{ matrix.PG_VERSION }}

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				@@ -86,14 +87,22 @@ jobs:

				    runs-on: ${{ matrix.RUNNER }}

				    container:

				      image: ${{ matrix.IMAGE }}

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials # necessary on Azure runners

				      uses: aws-actions/configure-aws-credentials@v4

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				@@ -105,13 +114,14 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Create Neon Project

				      id: create-neon-project

				      uses: ./.github/actions/neon-project-create

				      with:

				        region_id: ${{ matrix.region_id }}

				        postgres_version: ${{ env.DEFAULT_PG_VERSION }}

				        postgres_version: ${{ env.PG_VERSION }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Run benchmark

				@@ -121,7 +131,8 @@ jobs:

				        test_selection: performance

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        # Set --sparse-ordering option of pytest-order plugin

				        # to ensure tests are running in order of appears in the file.

				        # It's important for test_perf_pgbench.py::test_pgbench_remote_* tests

				@@ -133,6 +144,10 @@ jobs:

				          --ignore test_runner/performance/test_perf_pgvector_queries.py

				          --ignore test_runner/performance/test_logical_replication.py

				          --ignore test_runner/performance/test_physical_replication.py

				          --ignore test_runner/performance/test_perf_ingest_using_pgcopydb.py

				          --ignore test_runner/performance/test_cumulative_statistics_persistence.py

				          --ignore test_runner/performance/test_perf_many_relations.py

				          --ignore test_runner/performance/test_perf_oltp_large_tenant.py

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				@@ -149,12 +164,14 @@ jobs:

				      id: create-allure-report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Periodic perf testing: ${{ job.status }}

				          <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				@@ -162,8 +179,72 @@ jobs:

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				  cumstats-test:

				    if: ${{ github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null }}

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 17

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: "neon-staging"

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Verify that cumulative statistics are preserved

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: ${{ env.BUILD_TYPE }}

				        test_selection: performance/test_cumulative_statistics_persistence.py

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 3600

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				        NEON_API_KEY: ${{ secrets.NEON_STAGING_API_KEY }}

				  replication-tests:

				    if: ${{ github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null }}

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 16

				@@ -174,12 +255,26 @@ jobs:

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				@@ -187,6 +282,7 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Run Logical Replication benchmarks

				      uses: ./.github/actions/run-python-test-set

				@@ -197,6 +293,7 @@ jobs:

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 5400

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				@@ -213,6 +310,7 @@ jobs:

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 5400

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				@@ -224,12 +322,14 @@ jobs:

				      uses: ./.github/actions/allure-report-generate

				      with:

				        store-test-results-into-db: true

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				    # Post both success and failure to the Slack channel

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      if: ${{ github.event.schedule && !cancelled() }}

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C06T9AMNDQQ" # on-call-compute-staging-stream

				        slack-message: |

				@@ -261,13 +361,18 @@ jobs:

				      tpch-compare-matrix: ${{ steps.tpch-compare-matrix.outputs.matrix }}

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - name: Generate matrix for pgbench benchmark

				      id: pgbench-compare-matrix

				      run: |

				        region_id_default=${{ env.DEFAULT_REGION_ID }}

				        runner_default='["self-hosted", "us-east-2", "x64"]'

				        runner_azure='["self-hosted", "eastus2", "x64"]'

				        image_default="369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned"

				        image_default="ghcr.io/neondatabase/build-tools:pinned-bookworm"

				        matrix='{

				          "pg_version" : [

				            16

				@@ -283,13 +388,18 @@ jobs:

				          "db_size": [ "10gb" ],

				          "runner": ['"$runner_default"'],

				          "image": [ "'"$image_default"'" ],

				          "include": [{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-freetier",       "db_size": "3gb" ,"runner": '"$runner_default"', "image": "'"$image_default"'" },

				                      { "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new",            "db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'" },

				                      { "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new",            "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'" },

				                      { "pg_version": 16, "region_id": "azure-eastus2",          "platform": "neonvm-azure-captest-freetier", "db_size": "3gb" ,"runner": '"$runner_azure"',   "image": "neondatabase/build-tools:pinned" },

				                      { "pg_version": 16, "region_id": "azure-eastus2",          "platform": "neonvm-azure-captest-new",      "db_size": "10gb","runner": '"$runner_azure"',   "image": "neondatabase/build-tools:pinned" },

				                      { "pg_version": 16, "region_id": "azure-eastus2",          "platform": "neonvm-azure-captest-new",      "db_size": "50gb","runner": '"$runner_azure"',   "image": "neondatabase/build-tools:pinned" },

				                      { "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-sharding-reuse", "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'" }]

				          "include": [{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-freetier",       "db_size": "3gb" ,"runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new",            "db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new-many-tables","db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new",            "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 16, "region_id": "azure-eastus2",          "platform": "neonvm-azure-captest-freetier", "db_size": "3gb" ,"runner": '"$runner_azure"',   "image": "ghcr.io/neondatabase/build-tools:pinned-bookworm" },

				                      { "pg_version": 16, "region_id": "azure-eastus2",          "platform": "neonvm-azure-captest-new",      "db_size": "10gb","runner": '"$runner_azure"',   "image": "ghcr.io/neondatabase/build-tools:pinned-bookworm" },

				                      { "pg_version": 16, "region_id": "azure-eastus2",          "platform": "neonvm-azure-captest-new",      "db_size": "50gb","runner": '"$runner_azure"',   "image": "ghcr.io/neondatabase/build-tools:pinned-bookworm" },

				                      { "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-sharding-reuse", "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 17, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-freetier",       "db_size": "3gb" ,"runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 17, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new",            "db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 17, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new-many-tables","db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'"                               },

				                      { "pg_version": 17, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new",            "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'"                               }]

				        }'

				        if [ "$(date +%A)" = "Saturday" ] || [ ${RUN_AWS_RDS_AND_AURORA} = "true" ]; then

				@@ -305,12 +415,15 @@ jobs:

				        matrix='{

				          "platform": [

				            "neonvm-captest-reuse"

				          ],

				          "pg_version" : [

				            16,17

				          ]

				        }'

				        if [ "$(date +%A)" = "Saturday" ] || [ ${RUN_AWS_RDS_AND_AURORA} = "true" ]; then

				          matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres" },

				                                                     { "platform": "rds-aurora"   }]')

				          matrix=$(echo "$matrix" | jq '.include += [{ "pg_version": 16, "platform": "rds-postgres" },

				                                                     { "pg_version": 16, "platform": "rds-aurora"   }]')

				        fi

				        echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT

				@@ -322,14 +435,14 @@ jobs:

				          "platform": [

				            "neonvm-captest-reuse"

				          ],

				          "scale": [

				            "10"

				          "pg_version" : [

				            16,17

				          ]

				        }'

				        if [ "$(date +%A)" = "Saturday" ] || [ ${RUN_AWS_RDS_AND_AURORA} = "true" ]; then

				          matrix=$(echo "$matrix" | jq '.include += [{ "platform": "rds-postgres", "scale": "10" },

				                                                     { "platform": "rds-aurora",   "scale": "10" }]')

				          matrix=$(echo "$matrix" | jq '.include += [{ "pg_version": 16, "platform": "rds-postgres" },

				                                                     { "pg_version": 16, "platform": "rds-aurora"   }]')

				        fi

				        echo "matrix=$(echo "$matrix" | jq --compact-output '.')" >> $GITHUB_OUTPUT

				@@ -344,17 +457,17 @@ jobs:

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # Required for OIDC authentication in azure runners

				      id-token: write # aws-actions/configure-aws-credentials

				    strategy:

				      fail-fast: false

				      matrix: ${{fromJson(needs.generate-matrices.outputs.pgbench-compare-matrix)}}

				      matrix: ${{fromJSON(needs.generate-matrices.outputs.pgbench-compare-matrix)}}

				    env:

				      TEST_PG_BENCH_DURATIONS_MATRIX: "60m"

				      TEST_PG_BENCH_SCALES_MATRIX: ${{ matrix.db_size }}

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: ${{ matrix.pg_version }}

				      PG_VERSION: ${{ matrix.pg_version }}

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				@@ -363,16 +476,24 @@ jobs:

				    runs-on: ${{ matrix.runner }}

				    container:

				      image: ${{ matrix.image }}

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    # Increase timeout to 8h, default timeout is 6h

				    timeout-minutes: 480

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - name: Configure AWS credentials # necessary on Azure runners

				      uses: aws-actions/configure-aws-credentials@v4

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				@@ -384,14 +505,15 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Create Neon Project

				      if: contains(fromJson('["neonvm-captest-new", "neonvm-captest-freetier", "neonvm-azure-captest-freetier", "neonvm-azure-captest-new"]'), matrix.platform)

				      if: contains(fromJSON('["neonvm-captest-new", "neonvm-captest-new-many-tables", "neonvm-captest-freetier", "neonvm-azure-captest-freetier", "neonvm-azure-captest-new"]'), matrix.platform)

				      id: create-neon-project

				      uses: ./.github/actions/neon-project-create

				      with:

				        region_id: ${{ matrix.region_id }}

				        postgres_version: ${{ env.DEFAULT_PG_VERSION }}

				        postgres_version: ${{ env.PG_VERSION }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				        compute_units: ${{ (contains(matrix.platform, 'captest-freetier') && '[0.25, 0.25]') || '[1, 1]' }}

				@@ -405,7 +527,7 @@ jobs:

				          neonvm-captest-sharding-reuse)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_SHARDING_CONNSTR }}

				            ;;

				          neonvm-captest-new | neonvm-captest-freetier | neonvm-azure-captest-new | neonvm-azure-captest-freetier)

				          neonvm-captest-new | neonvm-captest-new-many-tables | neonvm-captest-freetier | neonvm-azure-captest-new | neonvm-azure-captest-freetier)

				            CONNSTR=${{ steps.create-neon-project.outputs.dsn }}

				            ;;

				          rds-aurora)

				@@ -422,6 +544,26 @@ jobs:

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				    # we want to compare Neon project OLTP throughput and latency at scale factor 10 GB

				    # without (neonvm-captest-new)

				    # and with (neonvm-captest-new-many-tables) many relations in the database

				    - name: Create many relations before the run

				      if: contains(fromJSON('["neonvm-captest-new-many-tables"]'), matrix.platform)

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: ${{ env.BUILD_TYPE }}

				        test_selection: performance

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_perf_many_relations

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				        TEST_NUM_RELATIONS: 10000

				    - name: Benchmark init

				      uses: ./.github/actions/run-python-test-set

				      with:

				@@ -430,7 +572,8 @@ jobs:

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_init

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				@@ -444,7 +587,8 @@ jobs:

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_simple_update

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				@@ -458,7 +602,8 @@ jobs:

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_pgbench_remote_select_only

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				@@ -475,12 +620,14 @@ jobs:

				      id: create-allure-report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Periodic perf testing on ${{ matrix.platform }}: ${{ job.status }}

				          <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				@@ -492,54 +639,62 @@ jobs:

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # Required for OIDC authentication in azure runners

				      id-token: write # aws-actions/configure-aws-credentials

				    strategy:

				      fail-fast: false

				      matrix:

				        include:

				          - PLATFORM: "neonvm-captest-pgvector"

				            RUNNER: [ self-hosted, us-east-2, x64 ]

				            IMAGE: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				            postgres_version: 16

				          - PLATFORM: "neonvm-captest-pgvector-pg17"

				            RUNNER: [ self-hosted, us-east-2, x64 ]

				            postgres_version: 17

				          - PLATFORM: "azure-captest-pgvector"

				            RUNNER: [ self-hosted, eastus2, x64 ]

				            IMAGE: neondatabase/build-tools:pinned

				            postgres_version: 16

				    env:

				      TEST_PG_BENCH_DURATIONS_MATRIX: "15m"

				      TEST_PG_BENCH_SCALES_MATRIX: "1"

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 16

				      PG_VERSION: ${{ matrix.postgres_version }}

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      LD_LIBRARY_PATH: /home/nonroot/pg/usr/lib/x86_64-linux-gnu

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: ${{ matrix.PLATFORM }}

				    runs-on: ${{ matrix.RUNNER }}

				    container:

				      image: ${{ matrix.IMAGE }}

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    # until https://github.com/neondatabase/neon/issues/8275 is fixed we temporarily install postgresql-16

				    # instead of using Neon artifacts containing pgbench

				    - name: Install postgresql-16 where pytest expects it

				      run: |

				        cd /home/nonroot

				        wget -q https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/libpq5_16.4-1.pgdg110%2B1_amd64.deb

				        wget -q https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-client-16_16.4-1.pgdg110%2B1_amd64.deb

				        wget -q https://apt.postgresql.org/pub/repos/apt/pool/main/p/postgresql-16/postgresql-16_16.4-1.pgdg110%2B1_amd64.deb 

				        dpkg -x libpq5_16.4-1.pgdg110+1_amd64.deb pg

				        dpkg -x postgresql-client-16_16.4-1.pgdg110+1_amd64.deb pg

				        dpkg -x postgresql-16_16.4-1.pgdg110+1_amd64.deb pg

				        mkdir -p /tmp/neon/pg_install/v16/bin

				        ln -s /home/nonroot/pg/usr/lib/postgresql/16/bin/pgbench /tmp/neon/pg_install/v16/bin/pgbench  

				        ln -s /home/nonroot/pg/usr/lib/postgresql/16/bin/psql /tmp/neon/pg_install/v16/bin/psql  

				        ln -s /home/nonroot/pg/usr/lib/x86_64-linux-gnu /tmp/neon/pg_install/v16/lib 

				        /tmp/neon/pg_install/v16/bin/pgbench --version

				        /tmp/neon/pg_install/v16/bin/psql --version

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Set up Connection String

				      id: set-up-connstr

				@@ -548,6 +703,9 @@ jobs:

				          neonvm-captest-pgvector)

				            CONNSTR=${{ secrets.BENCHMARK_PGVECTOR_CONNSTR }}

				            ;;

				          neonvm-captest-pgvector-pg17)

				            CONNSTR=${{ secrets.BENCHMARK_PGVECTOR_CONNSTR_PG17 }}

				            ;;

				          azure-captest-pgvector)

				            CONNSTR=${{ secrets.BENCHMARK_PGVECTOR_CONNSTR_AZURE }}

				            ;;

				@@ -559,13 +717,6 @@ jobs:

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				    - name: Configure AWS credentials # necessary on Azure runners to read/write from/to S3

				      uses: aws-actions/configure-aws-credentials@v4

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Benchmark pgvector hnsw indexing

				      uses: ./.github/actions/run-python-test-set

				      with:

				@@ -574,7 +725,8 @@ jobs:

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_pgvector_indexing

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				@@ -588,7 +740,8 @@ jobs:

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				@@ -598,12 +751,14 @@ jobs:

				      id: create-allure-report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Periodic perf testing on ${{ env.PLATFORM }}: ${{ job.status }}

				          <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				@@ -620,15 +775,19 @@ jobs:

				    # *_CLICKBENCH_CONNSTR: Genuine ClickBench DB with ~100M rows

				    # *_CLICKBENCH_10M_CONNSTR: DB with the first 10M rows of ClickBench DB

				    if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    needs: [ generate-matrices, pgbench-compare, prepare_AWS_RDS_databases ]

				    strategy:

				      fail-fast: false

				      matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}

				      matrix: ${{ fromJSON(needs.generate-matrices.outputs.olap-compare-matrix) }}

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 16

				      PG_VERSION: ${{ matrix.pg_version }}

				      TEST_OUTPUT: /tmp/test_output

				      TEST_OLAP_COLLECT_EXPLAIN: ${{ github.event.inputs.collect_olap_explain }}

				      TEST_OLAP_COLLECT_PG_STAT_STATEMENTS: ${{ github.event.inputs.collect_pg_stat_statements }}

				@@ -638,11 +797,30 @@ jobs:

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    # Increase timeout to 12h, default timeout is 6h

				    # we have regression in clickbench causing it to run 2-3x longer

				    timeout-minutes: 720

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				@@ -650,13 +828,25 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Set up Connection String

				      id: set-up-connstr

				      run: |

				        case "${PLATFORM}" in

				          neonvm-captest-reuse)

				            CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CLICKBENCH_10M_CONNSTR }}

				            case "${PG_VERSION}" in

				              16)

				                CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CLICKBENCH_10M_CONNSTR }}

				                ;;

				              17)

				                CONNSTR=${{ secrets.BENCHMARK_CAPTEST_CLICKBENCH_CONNSTR_PG17 }}

				                ;;

				              *)

				                echo >&2 "Unsupported PG_VERSION=${PG_VERSION} for PLATFORM=${PLATFORM}"

				                exit 1

				                ;;

				            esac

				            ;;

				          rds-aurora)

				            CONNSTR=${{ secrets.BENCHMARK_RDS_AURORA_CLICKBENCH_10M_CONNSTR }}

				@@ -679,8 +869,9 @@ jobs:

				        test_selection: performance/test_perf_olap.py

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_clickbench

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        extra_params: -m remote_cluster --timeout 43200 -k test_clickbench

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				@@ -693,12 +884,14 @@ jobs:

				      id: create-allure-report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Periodic OLAP perf testing on ${{ matrix.platform }}: ${{ job.status }}

				          <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				@@ -713,29 +906,47 @@ jobs:

				    # We might change it after https://github.com/neondatabase/neon/issues/2900.

				    #

				    # *_TPCH_S10_CONNSTR: DB generated with scale factor 10 (~10 GB)

				    if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    # if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    needs: [ generate-matrices, clickbench-compare, prepare_AWS_RDS_databases ]

				    strategy:

				      fail-fast: false

				      matrix: ${{ fromJson(needs.generate-matrices.outputs.tpch-compare-matrix) }}

				      matrix: ${{ fromJSON(needs.generate-matrices.outputs.tpch-compare-matrix) }}

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 16

				      PG_VERSION: ${{ matrix.pg_version }}

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				      PLATFORM: ${{ matrix.platform }}

				      TEST_OLAP_SCALE: ${{ matrix.scale }}

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				@@ -743,18 +954,30 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Get Connstring Secret Name

				      run: |

				        case "${PLATFORM}" in

				          neonvm-captest-reuse)

				            ENV_PLATFORM=CAPTEST_TPCH

				            case "${PG_VERSION}" in

				              16)

				                CONNSTR_SECRET_NAME="BENCHMARK_CAPTEST_TPCH_S10_CONNSTR"

				                ;;

				              17)

				                CONNSTR_SECRET_NAME="BENCHMARK_CAPTEST_TPCH_CONNSTR_PG17"

				                ;;

				              *)

				                echo >&2 "Unsupported PG_VERSION=${PG_VERSION} for PLATFORM=${PLATFORM}"

				                exit 1

				                ;;

				            esac

				            ;;

				          rds-aurora)

				            ENV_PLATFORM=RDS_AURORA_TPCH

				            CONNSTR_SECRET_NAME="BENCHMARK_RDS_AURORA_TPCH_S10_CONNSTR"

				            ;;

				          rds-postgres)

				            ENV_PLATFORM=RDS_POSTGRES_TPCH

				            CONNSTR_SECRET_NAME="BENCHMARK_RDS_POSTGRES_TPCH_S10_CONNSTR"

				            ;;

				          *)

				            echo >&2 "Unknown PLATFORM=${PLATFORM}. Allowed only 'neonvm-captest-reuse', 'rds-aurora', or 'rds-postgres'"

				@@ -762,7 +985,6 @@ jobs:

				            ;;

				        esac

				        CONNSTR_SECRET_NAME="BENCHMARK_${ENV_PLATFORM}_S${TEST_OLAP_SCALE}_CONNSTR"

				        echo "CONNSTR_SECRET_NAME=${CONNSTR_SECRET_NAME}" >> $GITHUB_ENV

				    - name: Set up Connection String

				@@ -780,23 +1002,26 @@ jobs:

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_tpch

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        TEST_OLAP_SCALE: ${{ matrix.scale }}

				        TEST_OLAP_SCALE: 10

				    - name: Create Allure report

				      id: create-allure-report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Periodic TPC-H perf testing on ${{ matrix.platform }}: ${{ job.status }}

				          <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				@@ -805,16 +1030,20 @@ jobs:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				  user-examples-compare:

				    if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    # if: ${{ !cancelled() && (github.event.inputs.run_only_pgvector_tests == 'false' || github.event.inputs.run_only_pgvector_tests == null) }}

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    needs: [ generate-matrices, tpch-compare, prepare_AWS_RDS_databases ]

				    strategy:

				      fail-fast: false

				      matrix: ${{ fromJson(needs.generate-matrices.outputs.olap-compare-matrix) }}

				      matrix: ${{ fromJSON(needs.generate-matrices.outputs.olap-compare-matrix) }}

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      DEFAULT_PG_VERSION: 16

				      PG_VERSION: ${{ matrix.pg_version }}

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      SAVE_PERF_REPORT: ${{ github.event.inputs.save_perf_report || ( github.ref_name == 'main' ) }}

				@@ -822,11 +1051,26 @@ jobs:

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:pinned

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				@@ -834,13 +1078,25 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Set up Connection String

				      id: set-up-connstr

				      run: |

				        case "${PLATFORM}" in

				          neonvm-captest-reuse)

				            CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_CAPTEST_CONNSTR }}

				            case "${PG_VERSION}" in

				              16)

				                CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_CAPTEST_CONNSTR }}

				                ;;

				              17)

				                CONNSTR=${{ secrets.BENCHMARK_CAPTEST_USER_EXAMPLE_CONNSTR_PG17 }}

				                ;;

				              *)

				                echo >&2 "Unsupported PG_VERSION=${PG_VERSION} for PLATFORM=${PLATFORM}"

				                exit 1

				                ;;

				            esac

				            ;;

				          rds-aurora)

				            CONNSTR=${{ secrets.BENCHMARK_USER_EXAMPLE_RDS_AURORA_CONNSTR }}

				@@ -864,7 +1120,8 @@ jobs:

				        run_in_parallel: false

				        save_perf_report: ${{ env.SAVE_PERF_REPORT }}

				        extra_params: -m remote_cluster --timeout 21600 -k test_user_examples

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				@@ -874,12 +1131,14 @@ jobs:

				      id: create-allure-report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Periodic TPC-H perf testing on ${{ matrix.platform }}: ${{ job.status }}

				          <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

									
										178

.github/workflows/build-build-tools-image.yml
									
										vendored
									
												View File
												
				@@ -3,32 +3,100 @@ name: Build build-tools image

				on:

				  workflow_call:

				    inputs:

				      image-tag:

				        description: "build-tools image tag"

				        required: true

				      archs:

				        description: "Json array of architectures to build"

				        # Default values are set in `check-image` job, `set-variables` step

				        type: string

				        required: false

				      debians:

				        description: "Json array of Debian versions to build"

				        # Default values are set in `check-image` job, `set-variables` step

				        type: string

				        required: false

				    outputs:

				      image-tag:

				        description: "build-tools tag"

				        value: ${{ inputs.image-tag }}

				        value: ${{ jobs.check-image.outputs.tag }}

				      image:

				        description: "build-tools image"

				        value: neondatabase/build-tools:${{ inputs.image-tag }}

				        value: ghcr.io/neondatabase/build-tools:${{ jobs.check-image.outputs.tag }}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				concurrency:

				  group: build-build-tools-image-${{ inputs.image-tag }}

				  cancel-in-progress: false

				# The initial idea was to prevent the waste of resources by not re-building the `build-tools` image

				# for the same tag in parallel workflow runs, and queue them to be skipped once we have

				# the first image pushed to Docker registry, but GitHub's concurrency mechanism is not working as expected.

				# GitHub can't have more than 1 job in a queue and removes the previous one, it causes failures if the dependent jobs.

				#

				# Ref https://github.com/orgs/community/discussions/41518

				#

				# concurrency:

				#   group: build-build-tools-image-${{ inputs.image-tag }}

				#   cancel-in-progress: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-image:

				    uses: ./.github/workflows/check-build-tools-image.yml

				    runs-on: ubuntu-22.04

				    outputs:

				      archs: ${{ steps.set-variables.outputs.archs }}

				      debians: ${{ steps.set-variables.outputs.debians }}

				      tag: ${{ steps.set-variables.outputs.image-tag }}

				      everything: ${{ steps.set-more-variables.outputs.everything }}

				      found: ${{ steps.set-more-variables.outputs.found }}

				    permissions:

				      packages: read

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Set variables

				        id: set-variables

				        env:

				          ARCHS: ${{ inputs.archs || '["x64","arm64"]' }}

				          DEBIANS: ${{ inputs.debians || '["bullseye","bookworm"]' }}

				          IMAGE_TAG: |

				            ${{ hashFiles('build-tools.Dockerfile',

				                          '.github/workflows/build-build-tools-image.yml') }}

				        run: |

				          echo "archs=${ARCHS}"           | tee -a ${GITHUB_OUTPUT}

				          echo "debians=${DEBIANS}"       | tee -a ${GITHUB_OUTPUT}

				          echo "image-tag=${IMAGE_TAG}"   | tee -a ${GITHUB_OUTPUT}

				      - name: Set more variables

				        id: set-more-variables

				        env:

				          IMAGE_TAG: ${{ steps.set-variables.outputs.image-tag }}

				          EVERYTHING: |

				            ${{ contains(fromJSON(steps.set-variables.outputs.archs), 'x64') &&

				                contains(fromJSON(steps.set-variables.outputs.archs), 'arm64') &&

				                contains(fromJSON(steps.set-variables.outputs.debians), 'bullseye') &&

				                contains(fromJSON(steps.set-variables.outputs.debians), 'bookworm') }}

				        run: |

				          if docker manifest inspect ghcr.io/neondatabase/build-tools:${IMAGE_TAG}; then

				            found=true

				          else

				            found=false

				          fi

				          echo "everything=${EVERYTHING}" | tee -a ${GITHUB_OUTPUT}

				          echo "found=${found}"           | tee -a ${GITHUB_OUTPUT}

				  build-image:

				    needs: [ check-image ]

				@@ -36,68 +104,100 @@ jobs:

				    strategy:

				      matrix:

				        arch: [ x64, arm64 ]

				        arch: ${{ fromJSON(needs.check-image.outputs.archs) }}

				        debian: ${{ fromJSON(needs.check-image.outputs.debians) }}

				    runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    permissions:

				      packages: write

				    env:

				      IMAGE_TAG: ${{ inputs.image-tag }}

				    runs-on: ${{ fromJSON(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}

				    steps:

				      - name: Check `input.tag` is correct

				        env:

				          INPUTS_IMAGE_TAG: ${{ inputs.image-tag }}

				          CHECK_IMAGE_TAG : ${{ needs.check-image.outputs.image-tag }}

				        run: |

				          if [ "${INPUTS_IMAGE_TAG}" != "${CHECK_IMAGE_TAG}" ]; then

				            echo "'inputs.image-tag' (${INPUTS_IMAGE_TAG}) does not match the tag of the latest build-tools image 'inputs.image-tag' (${CHECK_IMAGE_TAG})"

				            exit 1

				          fi

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - uses: ./.github/actions/set-docker-config-dir

				      - uses: docker/setup-buildx-action@v3

				      - uses: neondatabase/dev-actions/set-docker-config-dir@6094485bf440001c94a94a3f9e221e81ff6b6193

				      - uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0

				        with:

				          cache-binary: false

				      - uses: docker/login-action@v3

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - uses: docker/login-action@v3

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: cache.neon.build

				          username: ${{ secrets.NEON_CI_DOCKERCACHE_USERNAME }}

				          password: ${{ secrets.NEON_CI_DOCKERCACHE_PASSWORD }}

				      - uses: docker/build-push-action@v6

				      - uses: docker/build-push-action@471d1dc4e07e5cdedd4c2171150001c434f0b7a4 # v6.15.0

				        with:

				          file: build-tools.Dockerfile

				          context: .

				          provenance: false

				          push: true

				          pull: true

				          file: Dockerfile.build-tools

				          cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.arch }}

				          cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0},mode=max', matrix.arch) || '' }}

				          tags: neondatabase/build-tools:${{ inputs.image-tag }}-${{ matrix.arch }}

				          build-args: |

				            DEBIAN_VERSION=${{ matrix.debian }}

				          cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.debian }}-${{ matrix.arch }}

				          cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0}-{1},mode=max', matrix.debian, matrix.arch) || '' }}

				          tags: |

				            ghcr.io/neondatabase/build-tools:${{ needs.check-image.outputs.tag }}-${{ matrix.debian }}-${{ matrix.arch }}

				  merge-images:

				    needs: [ build-image ]

				    needs: [ check-image, build-image ]

				    runs-on: ubuntu-22.04

				    env:

				      IMAGE_TAG: ${{ inputs.image-tag }}

				    permissions:

				      packages: write

				    steps:

				      - uses: docker/login-action@v3

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Create multi-arch image

				        env:

				          DEFAULT_DEBIAN_VERSION: bookworm

				          ARCHS: ${{ join(fromJSON(needs.check-image.outputs.archs), ' ') }}

				          DEBIANS: ${{ join(fromJSON(needs.check-image.outputs.debians), ' ') }}

				          EVERYTHING: ${{ needs.check-image.outputs.everything }}

				          IMAGE_TAG: ${{ needs.check-image.outputs.tag }}

				        run: |

				          docker buildx imagetools create -t neondatabase/build-tools:${IMAGE_TAG} \

				                                             neondatabase/build-tools:${IMAGE_TAG}-x64 \

				                                             neondatabase/build-tools:${IMAGE_TAG}-arm64

				          for debian in ${DEBIANS}; do

				            tags=("-t" "ghcr.io/neondatabase/build-tools:${IMAGE_TAG}-${debian}")

				            if [ "${EVERYTHING}" == "true" ] && [ "${debian}" == "${DEFAULT_DEBIAN_VERSION}" ]; then

				              tags+=("-t" "ghcr.io/neondatabase/build-tools:${IMAGE_TAG}")

				            fi

				            for arch in ${ARCHS}; do

				              tags+=("ghcr.io/neondatabase/build-tools:${IMAGE_TAG}-${debian}-${arch}")

				            done

				            docker buildx imagetools create "${tags[@]}"

				          done

									
										304

.github/workflows/build-macos.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,304 @@

				name: Check neon with MacOS builds

				on:

				  workflow_call:

				    inputs:

				      pg_versions:

				        description: "Array of the pg versions to build for, for example: ['v14', 'v17']"

				        type: string

				        default: '[]'

				        required: false

				      rebuild_rust_code:

				        description: "Rebuild Rust code"

				        type: boolean

				        default: false

				        required: false

				      rebuild_everything:

				        description: "If true, rebuild for all versions"

				        type: boolean

				        default: false

				        required: false

				env:

				  RUST_BACKTRACE: 1

				  COPT: '-Werror'

				# TODO: move `check-*` and `files-changed` jobs to the "Caller" Workflow

				# We should care about that as Github has limitations:

				# - You can connect up to four levels of workflows

				# - You can call a maximum of 20 unique reusable workflows from a single workflow file.

				# https://docs.github.com/en/actions/sharing-automations/reusing-workflows#limitations

				permissions:

				  contents: read

				jobs:

				  build-pgxn:

				    if: |

				      (inputs.pg_versions != '[]' || inputs.rebuild_everything) && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    timeout-minutes: 30

				    runs-on: macos-15

				    strategy:

				      matrix:

				        postgres-version: ${{ inputs.rebuild_everything && fromJSON('["v14", "v15", "v16", "v17"]') || fromJSON(inputs.pg_versions) }}

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout main repo

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Set pg ${{ matrix.postgres-version }} for caching

				        id: pg_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-${{ matrix.postgres-version }}) | tee -a "${GITHUB_OUTPUT}"

				      - name: Cache postgres ${{ matrix.postgres-version }} build

				        id: cache_pg

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/${{ matrix.postgres-version }}

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ matrix.postgres-version }}-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Checkout submodule vendor/postgres-${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          git submodule init vendor/postgres-${{ matrix.postgres-version }}

				          git submodule update --depth 1 --recursive

				      - name: Install build dependencies

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          brew install flex bison openssl protobuf icu4c

				      - name: Set extra env for macOS

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Build Postgres ${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          make postgres-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)

				      - name: Build Neon Pg Ext ${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          make "neon-pg-ext-${{ matrix.postgres-version }}" -j$(sysctl -n hw.ncpu)

				      - name: Get postgres headers ${{ matrix.postgres-version }}

				        if: steps.cache_pg.outputs.cache-hit != 'true'

				        run: |

				          make postgres-headers-${{ matrix.postgres-version }} -j$(sysctl -n hw.ncpu)

				  build-walproposer-lib:

				    if: |

				      (inputs.pg_versions != '[]' || inputs.rebuild_everything) && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    timeout-minutes: 30

				    runs-on: macos-15

				    needs: [build-pgxn]

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout main repo

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Set pg v17 for caching

				        id: pg_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"

				      - name: Cache postgres v17 build

				        id: cache_pg

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v17

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache walproposer-lib

				        id: cache_walproposer_lib

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/build/walproposer-lib

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Checkout submodule vendor/postgres-v17

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run: |

				          git submodule init vendor/postgres-v17

				          git submodule update --depth 1 --recursive

				      - name: Install build dependencies

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run: |

				          brew install flex bison openssl protobuf icu4c

				      - name: Set extra env for macOS

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Build walproposer-lib (only for v17)

				        if: steps.cache_walproposer_lib.outputs.cache-hit != 'true'

				        run:

				          make walproposer-lib -j$(sysctl -n hw.ncpu)

				  cargo-build:

				    if: |

				      (inputs.pg_versions != '[]' || inputs.rebuild_rust_code || inputs.rebuild_everything) && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    timeout-minutes: 30

				    runs-on: macos-15

				    needs: [build-pgxn, build-walproposer-lib]

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout main repo

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: true

				      - name: Set pg v14 for caching

				        id: pg_rev_v14

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) | tee -a "${GITHUB_OUTPUT}"

				      - name: Set pg v15 for caching

				        id: pg_rev_v15

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) | tee -a "${GITHUB_OUTPUT}"

				      - name: Set pg v16 for caching

				        id: pg_rev_v16

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) | tee -a "${GITHUB_OUTPUT}"

				      - name: Set pg v17 for caching

				        id: pg_rev_v17

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v17) | tee -a "${GITHUB_OUTPUT}"

				      - name: Cache postgres v14 build

				        id: cache_pg

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v14-${{ steps.pg_rev_v14.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_v15

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v15-${{ steps.pg_rev_v15.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_v16

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v16-${{ steps.pg_rev_v16.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v17 build

				        id: cache_pg_v17

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/v17

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache cargo deps (only for v17)

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: |

				            ~/.cargo/registry

				            !~/.cargo/registry/src

				            ~/.cargo/git

				            target

				          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				      - name: Cache walproposer-lib

				        id: cache_walproposer_lib

				        uses: tespkg/actions-cache@b7bf5fcc2f98a52ac6080eb0fd282c2f752074b1  # v1.8.0

				        with:

				          endpoint: ${{ vars.HETZNER_CACHE_REGION }}.${{ vars.HETZNER_CACHE_ENDPOINT }}

				          bucket: ${{ vars.HETZNER_CACHE_BUCKET }}

				          accessKey: ${{ secrets.HETZNER_CACHE_ACCESS_KEY }}

				          secretKey: ${{ secrets.HETZNER_CACHE_SECRET_KEY }}

				          use-fallback: false

				          path: pg_install/build/walproposer-lib

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-walproposer_lib-v17-${{ steps.pg_rev_v17.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Install build dependencies

				        run: |

				          brew install flex bison openssl protobuf icu4c

				      - name: Set extra env for macOS

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Run cargo build (only for v17)

				        run: cargo build --all --release -j$(sysctl -n hw.ncpu)

				      - name: Check that no warnings are produced (only for v17)

				        run: ./run_clippy.sh

1401

.github/workflows/build_and_test.yml vendored

View File

File diff suppressed because it is too large Load Diff

									
										144

.github/workflows/build_and_test_with_sanitizers.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,144 @@

				name: Build and Test with Sanitizers

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:   '0 1 * * *' # run once a day, timezone is utc

				  workflow_dispatch:

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				concurrency:

				  # Allow only one workflow per any non-`main` branch.

				  group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				  cancel-in-progress: true

				env:

				  RUST_BACKTRACE: 1

				  COPT: '-Werror'

				jobs:

				  tag:

				    runs-on: [ self-hosted, small ]

				    container: ${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/base:pinned

				    outputs:

				      build-tag: ${{steps.build-tag.outputs.tag}}

				    steps:

				      # Need `fetch-depth: 0` to count the number of commits in the branch

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Get build tag

				        run: |

				          echo run:$GITHUB_RUN_ID

				          echo ref:$GITHUB_REF_NAME

				          echo rev:$(git rev-list --count HEAD)

				          if [[ "$GITHUB_REF_NAME" == "main" ]]; then

				            echo "tag=$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release" ]]; then

				            echo "tag=release-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then

				            echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release-compute" ]]; then

				            echo "tag=release-compute-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          else

				            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release', 'release-proxy', 'release-compute'"

				            echo "tag=$GITHUB_RUN_ID" >> $GITHUB_OUTPUT

				          fi

				        shell: bash

				        id: build-tag

				  build-build-tools-image:

				    uses: ./.github/workflows/build-build-tools-image.yml

				    secrets: inherit

				  build-and-test-locally:

				    needs: [ tag, build-build-tools-image ]

				    strategy:

				      fail-fast: false

				      matrix:

				        arch: [ x64, arm64 ]

				        build-type: [ release ]

				    uses: ./.github/workflows/_build-and-test-locally.yml

				    with:

				      arch: ${{ matrix.arch }}

				      build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      build-tag: ${{ needs.tag.outputs.build-tag }}

				      build-type: ${{ matrix.build-type }}

				      test-cfg: '[{"pg_version":"v17"}]'

				      sanitizers: enabled

				    secrets: inherit

				  create-test-report:

				    needs: [ build-and-test-locally, build-build-tools-image ]

				    if: ${{ !cancelled() }}

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      statuses: write

				      contents: write

				      pull-requests: write

				    outputs:

				      report-url: ${{ steps.create-allure-report.outputs.report-url }}

				    runs-on: [ self-hosted, small ]

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Create Allure report

				        if: ${{ !cancelled() }}

				        id: create-allure-report

				        uses: ./.github/actions/allure-report-generate

				        with:

				          store-test-results-into-db: true

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				      - uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1

				        if: ${{ !cancelled() }}

				        with:

				          # Retry script for 5XX server errors: https://github.com/actions/github-script#retries

				          retries: 5

				          script: |

				            const report = {

				              reportUrl:     "${{ steps.create-allure-report.outputs.report-url }}",

				              reportJsonUrl: "${{ steps.create-allure-report.outputs.report-json-url }}",

				            }

				            const coverage = {}

				            const script = require("./scripts/comment-test-report.js")

				            await script({

				              github,

				              context,

				              fetch,

				              report,

				              coverage,

				            })

									
										69

.github/workflows/cargo-deny.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,69 @@

				name: cargo deny checks

				on:

				  workflow_call:

				    inputs:

				      build-tools-image:

				        required: false

				        type: string

				  schedule:

				    - cron: '0 10 * * *'

				permissions:

				  contents: read

				jobs:

				  cargo-deny:

				    strategy:

				      matrix:

				        ref: >-

				          ${{

				            fromJSON(

				              github.event_name == 'schedule'

				                && '["main","release","release-proxy","release-compute"]'

				                || format('["{0}"]', github.sha)

				            )

				          }}

				    runs-on: [self-hosted, small]

				    permissions:

				      packages: read

				    container:

				      image: ${{ inputs.build-tools-image || 'ghcr.io/neondatabase/build-tools:pinned' }}

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ matrix.ref }}

				      - name: Check rust licenses/bans/advisories/sources

				        env:

				          CARGO_DENY_TARGET: >-

				            ${{ github.event_name == 'schedule' && 'advisories' || 'all' }}

				        run: cargo deny check --hide-inclusion-graph $CARGO_DENY_TARGET

				      - name: Post to a Slack channel

				        if: ${{ github.event_name == 'schedule' && failure() }}

				        uses: slackapi/slack-github-action@485a9d42d3a73031f12ec201c457e2162c45d02d # v2.0.0

				        with:

				          method: chat.postMessage

				          token: ${{ secrets.SLACK_BOT_TOKEN }}

				          payload: |

				            channel: ${{ vars.SLACK_ON_CALL_DEVPROD_STREAM }}

				            text: |

				              Periodic cargo-deny on ${{ matrix.ref }}: ${{ job.status }}

				              <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				              Fixing the problem should be fairly straight forward from the logs. If not, <#${{ vars.SLACK_RUST_CHANNEL_ID }}> is there to help.

				              Pinging <!subteam^S0838JPSH32|@oncall-devprod>.

									
										51

.github/workflows/check-build-tools-image.yml
									
										vendored
									
												View File
											
				@@ -1,51 +0,0 @@

				name: Check build-tools image

				on:

				  workflow_call:

				    outputs:

				      image-tag:

				        description: "build-tools image tag"

				        value: ${{ jobs.check-image.outputs.tag }}

				      found:

				        description: "Whether the image is found in the registry"

				        value: ${{ jobs.check-image.outputs.found }}

				defaults:

				  run:

				    shell: bash -euo pipefail {0}

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  check-image:

				    runs-on: ubuntu-22.04

				    outputs:

				      tag: ${{ steps.get-build-tools-tag.outputs.image-tag }}

				      found: ${{ steps.check-image.outputs.found }}

				    steps:

				      - uses: actions/checkout@v4

				      - name: Get build-tools image tag for the current commit

				        id: get-build-tools-tag

				        env:

				          IMAGE_TAG: |

				            ${{ hashFiles('Dockerfile.build-tools',

				                          '.github/workflows/check-build-tools-image.yml',

				                          '.github/workflows/build-build-tools-image.yml') }}

				        run: |

				          echo "image-tag=${IMAGE_TAG}" | tee -a $GITHUB_OUTPUT

				      - name: Check if such tag found in the registry

				        id: check-image

				        env:

				          IMAGE_TAG: ${{ steps.get-build-tools-tag.outputs.image-tag }}

				        run: |

				          if docker manifest inspect neondatabase/build-tools:${IMAGE_TAG}; then

				            found=true

				          else

				            found=false

				          fi

				          echo "found=${found}" | tee -a $GITHUB_OUTPUT

									
										5

.github/workflows/check-permissions.yml
									
										vendored
									
												View File
												
				@@ -18,6 +18,11 @@ jobs:

				  check-permissions:

				    runs-on: ubuntu-22.04

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@v2

				      with:

				        egress-policy: audit

				    - name: Disallow CI runs on PRs from forks

				      if: |

				        inputs.github-event-name  == 'pull_request' &&

									
										5

.github/workflows/cleanup-caches-by-a-branch.yml
									
										vendored
									
												View File
												
				@@ -11,6 +11,11 @@ jobs:

				  cleanup:

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@v2

				        with:

				          egress-policy: audit

				      - name: Cleanup

				        run: |

				          gh extension install actions/gh-actions-cache

									
										138

.github/workflows/cloud-regress.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,138 @@

				name: Cloud Regression Test

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '45 1 * * *' # run once a day, timezone is utc

				  workflow_dispatch: # adds ability to run this manually

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				concurrency:

				  # Allow only one workflow

				  group: ${{ github.workflow }}

				  cancel-in-progress: true

				permissions:

				  id-token: write # aws-actions/configure-aws-credentials

				  statuses: write

				  contents: write

				jobs:

				  regress:

				    env:

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				    strategy:

				      fail-fast: false

				      matrix:

				        pg-version: [16, 17]

				    runs-on: us-east-2

				    container:

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: true

				      - name: Patch the test

				        env:

				          PG_VERSION: ${{matrix.pg-version}}

				        run: |

				          cd "vendor/postgres-v${PG_VERSION}"

				          patch -p1 < "../../compute/patches/cloud_regress_pg${PG_VERSION}.patch"

				      - name: Generate a random password

				        id: pwgen

				        run: |

				          set +x

				          DBPASS=$(dd if=/dev/random bs=48 count=1 2>/dev/null | base64)

				          echo "::add-mask::${DBPASS//\//}"

				          echo DBPASS="${DBPASS//\//}" >> "${GITHUB_OUTPUT}"

				      - name: Change tests according to the generated password

				        env:

				          DBPASS: ${{ steps.pwgen.outputs.DBPASS }}

				          PG_VERSION: ${{matrix.pg-version}}

				        run: |

				          cd vendor/postgres-v"${PG_VERSION}"/src/test/regress

				          for fname in sql/*.sql expected/*.out; do

				            sed -i.bak s/NEON_PASSWORD_PLACEHOLDER/"'${DBPASS}'"/ "${fname}"

				          done

				          for ph in $(grep NEON_MD5_PLACEHOLDER expected/password.out | awk '{print $3;}' | sort | uniq); do

				            USER=$(echo "${ph}" | cut -c 22-)

				            MD5=md5$(echo -n "${DBPASS}${USER}" | md5sum | awk '{print $1;}')

				            sed -i.bak "s/${ph}/${MD5}/" expected/password.out

				          done

				      - name: Download Neon artifact

				        uses: ./.github/actions/download

				        with:

				          name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				          path: /tmp/neon/

				          prefix: latest

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      - name: Create a new branch

				        id: create-branch

				        uses: ./.github/actions/neon-branch-create

				        with:

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				          project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}

				      - name: Run the regression tests

				        uses: ./.github/actions/run-python-test-set

				        with:

				          build_type: ${{ env.BUILD_TYPE }}

				          test_selection: cloud_regress

				          pg_version: ${{matrix.pg-version}}

				          extra_params: -m remote_cluster

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          BENCHMARK_CONNSTR: ${{steps.create-branch.outputs.dsn}}

				      - name: Delete branch

				        if: always()

				        uses: ./.github/actions/neon-branch-delete

				        with:

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				          project_id: ${{ vars[format('PGREGRESS_PG{0}_PROJECT_ID', matrix.pg-version)] }}

				          branch_id: ${{steps.create-branch.outputs.branch_id}}

				      - name: Create Allure report

				        id: create-allure-report

				        if: ${{ !cancelled() }}

				        uses: ./.github/actions/allure-report-generate

				        with:

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      - name: Post to a Slack channel

				        if: ${{ github.event.schedule && failure() }}

				        uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				        with:

				          channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}

				          slack-message: |

				            Periodic pg_regress on staging: ${{ job.status }}

				            <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				            <${{ steps.create-allure-report.outputs.report-url }}|Allure report>

				        env:

				          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

									
										41

.github/workflows/fast-forward.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,41 @@

				name: Fast forward merge

				on:

				  pull_request:

				    types: [labeled]

				    branches:

				      - release

				      - release-proxy

				      - release-compute

				jobs:

				  fast-forward:

				    if: ${{ github.event.label.name == 'fast-forward' }}

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@v2

				        with:

				          egress-policy: audit

				      - name: Remove fast-forward label to PR

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: |

				          gh pr edit ${{ github.event.pull_request.number }} --repo "${GITHUB_REPOSITORY}" --remove-label "fast-forward"

				      - name: Fast forwarding

				        uses: sequoia-pgp/fast-forward@ea7628bedcb0b0b96e94383ada458d812fca4979

				        # See https://docs.github.com/en/graphql/reference/enums#mergestatestatus

				        if: ${{ github.event.pull_request.mergeable_state  == 'clean' }}

				        with:

				          merge: true

				          comment: on-error

				          github_token: ${{ secrets.CI_ACCESS_TOKEN }}

				      - name: Comment if mergeable_state is not clean

				        if: ${{ github.event.pull_request.mergeable_state  != 'clean' }}

				        run: |

				          gh pr comment ${{ github.event.pull_request.number }} \

				            --repo "${GITHUB_REPOSITORY}" \

				            --body "Not trying to forward pull-request, because \`mergeable_state\` is \`${{ github.event.pull_request.mergeable_state }}\`, not \`clean\`."

									
										82

.github/workflows/force-test-extensions-upgrade.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,82 @@

				name: Force Test Upgrading of Extension

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '45 2 * * *' # run once a day, timezone is utc

				  workflow_dispatch: # adds ability to run this manually

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				concurrency:

				  # Allow only one workflow

				  group: ${{ github.workflow }}

				  cancel-in-progress: true

				permissions:

				  id-token: write # aws-actions/configure-aws-credentials

				  statuses: write

				  contents: read

				jobs:

				  regress:

				    strategy:

				      fail-fast: false

				      matrix:

				        pg-version: [16, 17]

				    runs-on: small

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: false

				      - name: Get the last compute release tag

				        id: get-last-compute-release-tag

				        env:

				          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: |

				          tag=$(gh api -q '[.[].tag_name | select(startswith("release-compute"))][0]'\

				            -H "Accept: application/vnd.github+json" \

				            -H "X-GitHub-Api-Version: 2022-11-28" \

				            "/repos/${GITHUB_REPOSITORY}/releases")

				          echo tag=${tag} >> ${GITHUB_OUTPUT}

				      - name: Test extension upgrade

				        timeout-minutes: 60

				        env:

				          NEW_COMPUTE_TAG: latest

				          OLD_COMPUTE_TAG: ${{ steps.get-last-compute-release-tag.outputs.tag }}

				          TEST_EXTENSIONS_TAG: ${{ steps.get-last-compute-release-tag.outputs.tag }}

				          PG_VERSION: ${{ matrix.pg-version }}

				          FORCE_ALL_UPGRADE_TESTS: true

				        run: ./docker-compose/test_extensions_upgrade.sh

				      - name: Print logs and clean up

				        if: always()

				        run: |

				          docker compose --profile test-extensions -f ./docker-compose/docker-compose.yml logs || true

				          docker compose --profile test-extensions -f ./docker-compose/docker-compose.yml down

				      - name: Post to the Slack channel

				        if: ${{ github.event.schedule && failure() }}

				        uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				        with:

				          channel-id: ${{ vars.SLACK_ON_CALL_QA_STAGING_STREAM }}

				          slack-message: |

				            Test upgrading of extensions: ${{ job.status }}

				            <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				        env:

				          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

									
										200

.github/workflows/ingest_benchmark.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,200 @@

				name: benchmarking ingest

				on:

				  # uncomment to run on push for debugging your PR

				  # push:

				  #   branches: [ your branch ]

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:   '0 9 * * *' # run once a day, timezone is utc

				  workflow_dispatch: # adds ability to run this manually

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				concurrency:

				  # Allow only one workflow globally because we need dedicated resources which only exist once

				  group: ingest-bench-workflow

				  cancel-in-progress: true

				permissions:

				  contents: read

				jobs:

				  ingest:

				    strategy:

				      fail-fast: false # allow other variants to continue even if one fails

				      matrix:

				        include:

				          - target_project: new_empty_project_stripe_size_2048 

				            stripe_size: 2048 # 16 MiB

				            postgres_version: 16

				            disable_sharding: false

				          - target_project: new_empty_project_stripe_size_32768

				            stripe_size: 32768 # 256 MiB # note that this is different from null because using null will shard_split the project only if it reaches the threshold

				                               # while here it is sharded from the beginning with a shard size of 256 MiB

				            disable_sharding: false

				            postgres_version: 16

				          - target_project: new_empty_project

				            stripe_size: null # run with neon defaults which will shard split only when reaching the threshold

				            disable_sharding: false

				            postgres_version: 16

				          - target_project: new_empty_project

				            stripe_size: null # run with neon defaults which will shard split only when reaching the threshold

				            disable_sharding: false

				            postgres_version: 17

				          - target_project: large_existing_project

				            stripe_size: null # cannot re-shared or choose different stripe size for existing, already sharded project

				            disable_sharding: false

				            postgres_version: 16

				          - target_project: new_empty_project_unsharded

				            stripe_size: null # run with neon defaults which will shard split only when reaching the threshold

				            disable_sharding: true

				            postgres_version: 16

				      max-parallel: 1 # we want to run each stripe size sequentially to be able to compare the results

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    env:

				      PG_CONFIG: /tmp/neon/pg_install/v16/bin/pg_config

				      PSQL: /tmp/neon/pg_install/v16/bin/psql

				      PG_16_LIB_PATH: /tmp/neon/pg_install/v16/lib

				      PGCOPYDB: /pgcopydb/bin/pgcopydb

				      PGCOPYDB_LIB_PATH: /pgcopydb/lib

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    timeout-minutes: 1440

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials # necessary to download artefacts

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Create Neon Project

				      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}

				      id: create-neon-project-ingest-target

				      uses: ./.github/actions/neon-project-create

				      with:

				        region_id: aws-us-east-2

				        postgres_version: ${{ matrix.postgres_version }}

				        compute_units: '[7, 7]' # we want to test large compute here to avoid compute-side bottleneck

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				        shard_split_project: ${{ matrix.stripe_size != null && 'true' || 'false' }}

				        admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }} 

				        shard_count: 8

				        stripe_size: ${{ matrix.stripe_size }}

				        disable_sharding: ${{ matrix.disable_sharding }} 

				    - name: Initialize Neon project

				      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}

				      env:

				          BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-project-ingest-target.outputs.dsn }}

				          NEW_PROJECT_ID: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}

				      run: |

				        echo "Initializing Neon project with project_id: ${NEW_PROJECT_ID}"

				        export LD_LIBRARY_PATH=${PG_16_LIB_PATH}

				        ${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"

				        echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV

				    - name: Create Neon Branch for large tenant

				      if: ${{ matrix.target_project == 'large_existing_project' }}

				      id: create-neon-branch-ingest-target

				      uses: ./.github/actions/neon-branch-create

				      with:

				        project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Initialize Neon project

				      if: ${{ matrix.target_project == 'large_existing_project' }}

				      env:

				          BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-branch-ingest-target.outputs.dsn }}

				          NEW_BRANCH_ID: ${{ steps.create-neon-branch-ingest-target.outputs.branch_id }}

				      run: |

				        echo "Initializing Neon branch with branch_id: ${NEW_BRANCH_ID}"

				        export LD_LIBRARY_PATH=${PG_16_LIB_PATH}

				        # Extract the part before the database name

				        base_connstr="${BENCHMARK_INGEST_TARGET_CONNSTR%/*}"

				        # Extract the query parameters (if any) after the database name

				        query_params="${BENCHMARK_INGEST_TARGET_CONNSTR#*\?}"

				        # Reconstruct the new connection string

				        if [ "$query_params" != "$BENCHMARK_INGEST_TARGET_CONNSTR" ]; then

				          new_connstr="${base_connstr}/neondb?${query_params}"

				        else

				          new_connstr="${base_connstr}/neondb"

				        fi

				        ${PSQL} "${new_connstr}" -c "drop database ludicrous;"

				        ${PSQL} "${new_connstr}" -c "CREATE DATABASE ludicrous;"

				        if [ "$query_params" != "$BENCHMARK_INGEST_TARGET_CONNSTR" ]; then

				          BENCHMARK_INGEST_TARGET_CONNSTR="${base_connstr}/ludicrous?${query_params}"

				        else

				          BENCHMARK_INGEST_TARGET_CONNSTR="${base_connstr}/ludicrous"

				        fi

				        ${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "CREATE EXTENSION IF NOT EXISTS neon; CREATE EXTENSION IF NOT EXISTS neon_utils;"

				        echo "BENCHMARK_INGEST_TARGET_CONNSTR=${BENCHMARK_INGEST_TARGET_CONNSTR}" >> $GITHUB_ENV

				    - name: Invoke pgcopydb

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: remote

				        test_selection: performance/test_perf_ingest_using_pgcopydb.py

				        run_in_parallel: false

				        extra_params: -s -m remote_cluster --timeout 86400 -k test_ingest_performance_using_pgcopydb

				        pg_version: v${{ matrix.postgres_version }}

				        save_perf_report: true

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_INGEST_SOURCE_CONNSTR: ${{ secrets.BENCHMARK_INGEST_SOURCE_CONNSTR }}

				        TARGET_PROJECT_TYPE: ${{ matrix.target_project }}

				        # we report PLATFORM in zenbenchmark NeonBenchmarker perf database and want to distinguish between new project and large tenant

				        PLATFORM: "${{ matrix.target_project }}-us-east-2-staging"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				    - name: show tables sizes after ingest

				      run: |

				        export LD_LIBRARY_PATH=${PG_16_LIB_PATH}

				        ${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "\dt+"

				    - name: Delete Neon Project

				      if: ${{ always() && startsWith(matrix.target_project, 'new_empty_project') }}

				      uses: ./.github/actions/neon-project-delete

				      with:

				        project_id: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Delete Neon Branch for large tenant

				      if: ${{ always() && matrix.target_project == 'large_existing_project' }}

				      uses: ./.github/actions/neon-branch-delete

				      with:

				        project_id: ${{ vars.BENCHMARK_INGEST_TARGET_PROJECTID }}

				        branch_id: ${{ steps.create-neon-branch-ingest-target.outputs.branch_id }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

									
										10

.github/workflows/label-for-external-users.yml
									
										vendored
									
												View File
												
				@@ -27,6 +27,11 @@ jobs:

				      is-member: ${{ steps.check-user.outputs.is-member }}

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@v2

				      with:

				        egress-policy: audit

				    - name: Check whether `${{ github.actor }}` is a member of `${{ github.repository_owner }}`

				      id: check-user

				      env:

				@@ -69,6 +74,11 @@ jobs:

				      issues: write        # for `gh issue edit`

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@v2

				      with:

				        egress-policy: audit

				    - name: Add `${{ env.LABEL }}` label

				      env:

				        GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

									
										194

.github/workflows/large_oltp_benchmark.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,194 @@

				name: large oltp benchmark

				on:

				  # uncomment to run on push for debugging your PR

				  #push:

				  #  branches: [ bodobolero/synthetic_oltp_workload ]

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │  ┌───────────── day of the month (1 - 31)

				    #          │ │  │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │  │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:   '0 15 * * 0,2,4' # run on Sunday, Tuesday, Thursday at 3 PM UTC

				  workflow_dispatch: # adds ability to run this manually

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				concurrency:

				  # Allow only one workflow globally because we need dedicated resources which only exist once

				  group: large-oltp-bench-workflow

				  cancel-in-progress: false

				permissions:

				  contents: read

				jobs:

				  oltp:

				    strategy:

				      fail-fast: false # allow other variants to continue even if one fails

				      matrix:

				        include:

				          - target: new_branch 

				            custom_scripts: insert_webhooks.sql@200 select_any_webhook_with_skew.sql@300 select_recent_webhook.sql@397 select_prefetch_webhook.sql@3 IUD_one_transaction.sql@100

				          - target: reuse_branch 

				            custom_scripts: insert_webhooks.sql@200 select_any_webhook_with_skew.sql@300 select_recent_webhook.sql@397 select_prefetch_webhook.sql@3 IUD_one_transaction.sql@100

				      max-parallel: 1 # we want to run each stripe size sequentially to be able to compare the results

				    permissions:

				      contents: write

				      statuses: write

				      id-token: write # aws-actions/configure-aws-credentials

				    env:

				      TEST_PG_BENCH_DURATIONS_MATRIX: "1h" # todo update to > 1 h 

				      TEST_PGBENCH_CUSTOM_SCRIPTS: ${{ matrix.custom_scripts }}

				      POSTGRES_DISTRIB_DIR: /tmp/neon/pg_install

				      PG_VERSION: 16 # pre-determined by pre-determined project

				      TEST_OUTPUT: /tmp/test_output

				      BUILD_TYPE: remote

				      PLATFORM: ${{ matrix.target }}

				    runs-on: [ self-hosted, us-east-2, x64 ]

				    container:

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    # Increase timeout to 2 days, default timeout is 6h - database maintenance can take a long time

				    # (normally 1h pgbench, 3h vacuum analyze 3.5h re-index) x 2 = 15h, leave some buffer for regressions

				    # in one run vacuum didn't finish within 12 hours

				    timeout-minutes: 2880

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Configure AWS credentials # necessary to download artefacts

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours is currently max associated with IAM role

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				      with:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Create Neon Branch for large tenant

				      if: ${{ matrix.target == 'new_branch' }}

				      id: create-neon-branch-oltp-target

				      uses: ./.github/actions/neon-branch-create

				      with:

				          project_id: ${{ vars.BENCHMARK_LARGE_OLTP_PROJECTID }}

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Set up Connection String

				      id: set-up-connstr

				      run: |

				        case "${{ matrix.target }}" in

				          new_branch)

				          CONNSTR=${{ steps.create-neon-branch-oltp-target.outputs.dsn }}

				          ;;

				          reuse_branch)

				          CONNSTR=${{ secrets.BENCHMARK_LARGE_OLTP_REUSE_CONNSTR }}

				          ;;

				          *)

				          echo >&2 "Unknown target=${{ matrix.target }}"

				          exit 1

				          ;;

				        esac

				        CONNSTR_WITHOUT_POOLER="${CONNSTR//-pooler/}"

				        echo "connstr=${CONNSTR}" >> $GITHUB_OUTPUT

				        echo "connstr_without_pooler=${CONNSTR_WITHOUT_POOLER}" >> $GITHUB_OUTPUT

				    - name: Delete rows from prior runs in reuse branch

				      if: ${{ matrix.target == 'reuse_branch' }}

				      env:

				          BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr_without_pooler }}

				          PG_CONFIG: /tmp/neon/pg_install/v16/bin/pg_config

				          PSQL: /tmp/neon/pg_install/v16/bin/psql

				          PG_16_LIB_PATH: /tmp/neon/pg_install/v16/lib

				      run: |

				        echo "$(date '+%Y-%m-%d %H:%M:%S') - Deleting rows in table webhook.incoming_webhooks from prior runs"

				        export LD_LIBRARY_PATH=${PG_16_LIB_PATH}

				        ${PSQL} "${BENCHMARK_CONNSTR}" -c "SET statement_timeout = 0; DELETE FROM webhook.incoming_webhooks WHERE created_at > '2025-02-27 23:59:59+00';"

				        echo "$(date '+%Y-%m-%d %H:%M:%S') - Finished deleting rows in table webhook.incoming_webhooks from prior runs"

				    - name: Benchmark pgbench with custom-scripts 

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: ${{ env.BUILD_TYPE }}

				        test_selection: performance

				        run_in_parallel: false

				        save_perf_report: true

				        extra_params: -m remote_cluster --timeout 7200 -k test_perf_oltp_large_tenant_pgbench

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				    - name: Benchmark database maintenance

				      uses: ./.github/actions/run-python-test-set

				      with:

				        build_type: ${{ env.BUILD_TYPE }}

				        test_selection: performance

				        run_in_parallel: false

				        save_perf_report: true

				        extra_params: -m remote_cluster --timeout 172800 -k test_perf_oltp_large_tenant_maintenance

				        pg_version: ${{ env.PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.set-up-connstr.outputs.connstr_without_pooler }}

				        VIP_VAP_ACCESS_TOKEN: "${{ secrets.VIP_VAP_ACCESS_TOKEN }}"

				        PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"

				    - name: Delete Neon Branch for large tenant

				      if: ${{ always() && matrix.target == 'new_branch' }}

				      uses: ./.github/actions/neon-branch-delete

				      with:

				        project_id: ${{ vars.BENCHMARK_LARGE_OLTP_PROJECTID }}

				        branch_id: ${{ steps.create-neon-branch-oltp-target.outputs.branch_id }}

				        api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				    - name: Configure AWS credentials # again because prior steps could have exceeded 5 hours

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        role-duration-seconds: 18000 # 5 hours

				    - name: Create Allure report

				      id: create-allure-report

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

				          Periodic large oltp perf testing: ${{ job.status }}

				          <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|GitHub Run>

				          <${{ steps.create-allure-report.outputs.report-url }}|Allure report>

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

									
										32

.github/workflows/lint-release-pr.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,32 @@

				name: Lint Release PR

				on:

				  pull_request:

				    branches:

				      - release

				      - release-proxy

				      - release-compute

				permissions:

				  contents: read

				jobs:

				  lint-release-pr:

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout PR branch

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0  # Fetch full history for git operations

				          ref: ${{ github.event.pull_request.head.ref }}

				      - name: Run lint script

				        env:

				          RELEASE_BRANCH: ${{ github.base_ref }}

				        run: |

				          ./.github/scripts/lint-release-pr.sh

									
										181

.github/workflows/neon_extra_builds.yml
									
										vendored
									
												View File
												
				@@ -26,124 +26,77 @@ jobs:

				    with:

				      github-event-name: ${{ github.event_name}}

				  check-build-tools-image:

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/check-build-tools-image.yml

				  build-build-tools-image:

				    needs: [ check-build-tools-image ]

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/build-build-tools-image.yml

				    with:

				      image-tag: ${{ needs.check-build-tools-image.outputs.image-tag }}

				    secrets: inherit

				  files-changed:

				    name: Detect what files changed

				    runs-on: ubuntu-22.04

				    timeout-minutes: 3

				    outputs:

				      v17: ${{ steps.files_changed.outputs.v17 }}

				      postgres_changes: ${{ steps.postgres_changes.outputs.changes }}

				      rebuild_rust_code: ${{ steps.files_changed.outputs.rust_code }}

				      rebuild_everything: ${{ steps.files_changed.outputs.rebuild_neon_extra || steps.files_changed.outputs.rebuild_macos }}

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: true

				      - name: Check for Postgres changes

				        uses: dorny/paths-filter@1441771bbfdd59dcd748680ee64ebd8faab1a242  #v3

				        id: files_changed

				        with:

				          token: ${{ github.token }}

				          filters: .github/file-filters.yaml

				          base: ${{ github.event_name != 'pull_request' && (github.event.merge_group.base_ref || github.ref_name) || '' }}

				          ref: ${{ github.event_name != 'pull_request' && (github.event.merge_group.head_ref || github.ref) || '' }}

				      - name: Filter out only v-string for build matrix

				        id: postgres_changes

				        run: |

				          v_strings_only_as_json_array=$(echo ${{ steps.files_changed.outputs.chnages }} | jq '.[]|select(test("v\\d+"))' | jq --slurp -c)

				          echo "changes=${v_strings_only_as_json_array}" | tee -a "${GITHUB_OUTPUT}"

				  check-macos-build:

				    needs: [ check-permissions ]

				    needs: [ check-permissions, files-changed ]

				    if: |

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-macos')  ||

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				      github.ref_name == 'main'

				    timeout-minutes: 90

				    runs-on: macos-14

				    env:

				      # Use release build only, to have less debug info around

				      # Hence keeping target/ (and general cache size) smaller

				      BUILD_TYPE: release

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Install macOS postgres dependencies

				        run: brew install flex bison openssl protobuf icu4c pkg-config

				      - name: Set pg 14 revision for caching

				        id: pg_v14_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v14) >> $GITHUB_OUTPUT

				      - name: Set pg 15 revision for caching

				        id: pg_v15_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v15) >> $GITHUB_OUTPUT

				      - name: Set pg 16 revision for caching

				        id: pg_v16_rev

				        run: echo pg_rev=$(git rev-parse HEAD:vendor/postgres-v16) >> $GITHUB_OUTPUT

				      - name: Cache postgres v14 build

				        id: cache_pg_14

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v14

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v15 build

				        id: cache_pg_15

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v15

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Cache postgres v16 build

				        id: cache_pg_16

				        uses: actions/cache@v4

				        with:

				          path: pg_install/v16

				          key: v1-${{ runner.os }}-${{ runner.arch }}-${{ env.BUILD_TYPE }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile') }}

				      - name: Set extra env for macOS

				        run: |

				          echo 'LDFLAGS=-L/usr/local/opt/openssl@3/lib' >> $GITHUB_ENV

				          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

				      - name: Cache cargo deps

				        uses: actions/cache@v4

				        with:

				          path: |

				            ~/.cargo/registry

				            !~/.cargo/registry/src

				            ~/.cargo/git

				            target

				          key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust

				      - name: Build postgres v14

				        if: steps.cache_pg_14.outputs.cache-hit != 'true'

				        run: make postgres-v14 -j$(sysctl -n hw.ncpu)

				      - name: Build postgres v15

				        if: steps.cache_pg_15.outputs.cache-hit != 'true'

				        run: make postgres-v15 -j$(sysctl -n hw.ncpu)

				      - name: Build postgres v16

				        if: steps.cache_pg_16.outputs.cache-hit != 'true'

				        run: make postgres-v16 -j$(sysctl -n hw.ncpu)

				      - name: Build neon extensions

				        run: make neon-pg-ext -j$(sysctl -n hw.ncpu)

				      - name: Build walproposer-lib

				        run: make walproposer-lib -j$(sysctl -n hw.ncpu)

				      - name: Run cargo build

				        run: PQ_LIB_DIR=$(pwd)/pg_install/v16/lib cargo build --all --release

				      - name: Check that no warnings are produced

				        run: ./run_clippy.sh

				    uses: ./.github/workflows/build-macos.yml

				    with:

				      pg_versions: ${{ needs.files-changed.outputs.postgres_changes }}

				      rebuild_rust_code: ${{ fromJSON(needs.files-changed.outputs.rebuild_rust_code) }}

				      rebuild_everything: ${{ fromJSON(needs.files-changed.outputs.rebuild_everything) }}

				  gather-rust-build-stats:

				    needs: [ check-permissions, build-build-tools-image ]

				    needs: [ check-permissions, build-build-tools-image, files-changed ]

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      statuses: write

				      contents: write

				    if: |

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||

				      contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				      github.ref_name == 'main'

				      (needs.files-changed.outputs.v17 == 'true' || needs.files-changed.outputs.rebuild_everything == 'true') && (

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-stats') ||

				        contains(github.event.pull_request.labels.*.name, 'run-extra-build-*') ||

				        github.ref_name == 'main'

				      )

				    runs-on: [ self-hosted, large ]

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    env:

				@@ -153,8 +106,13 @@ jobs:

				      CARGO_INCREMENTAL: 0

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Checkout

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          submodules: true

				@@ -166,26 +124,33 @@ jobs:

				        run: make walproposer-lib -j$(nproc)

				      - name: Produce the build stats

				        run: PQ_LIB_DIR=$(pwd)/pg_install/v16/lib cargo build --all --release --timings -j$(nproc)

				        run: cargo build --all --release --timings -j$(nproc)

				      - name: Configure AWS credentials

				        uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				        with:

				          aws-region: eu-central-1

				          role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				          role-duration-seconds: 3600

				      - name: Upload the build stats

				        id: upload-stats

				        env:

				          BUCKET: neon-github-public-dev

				          SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				        run: |

				          REPORT_URL=https://${BUCKET}.s3.amazonaws.com/build-stats/${SHA}/${GITHUB_RUN_ID}/cargo-timing.html

				          aws s3 cp --only-show-errors ./target/cargo-timings/cargo-timing.html "s3://${BUCKET}/build-stats/${SHA}/${GITHUB_RUN_ID}/"

				          echo "report-url=${REPORT_URL}" >> $GITHUB_OUTPUT

				      - name: Publish build stats report

				        uses: actions/github-script@v7

				        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1

				        env:

				          REPORT_URL: ${{ steps.upload-stats.outputs.report-url }}

				          SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				        with:

				          # Retry script for 5XX server errors: https://github.com/actions/github-script#retries

				          retries: 5

				          script: |

				            const { REPORT_URL, SHA } = process.env

									
										69

.github/workflows/periodic_pagebench.yml
									
										vendored
									
												View File
												
				@@ -3,12 +3,12 @@ name: Periodic pagebench performance test on dedicated EC2 machine in eu-central

				on:

				  schedule:

				    # * is a special character in YAML so you have to quote this string

				    #          ┌───────────── minute (0 - 59)

				    #          │ ┌───────────── hour (0 - 23)

				    #          │ │ ┌───────────── day of the month (1 - 31)

				    #          │ │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #          │ │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron:  '0 18 * * *' # Runs at 6 PM UTC every day

				    #        ┌───────────── minute (0 - 59)

				    #        │   ┌───────────── hour (0 - 23)

				    #        │   │ ┌───────────── day of the month (1 - 31)

				    #        │   │ │ ┌───────────── month (1 - 12 or JAN-DEC)

				    #        │   │ │ │ ┌───────────── day of the week (0 - 6 or SUN-SAT)

				    - cron: '0 */3 * * *' # Runs every 3 hours

				  workflow_dispatch: # Allows manual triggering of the workflow

				    inputs:

				      commit_hash:

				@@ -25,31 +25,49 @@ concurrency:

				  group: ${{ github.workflow }}

				  cancel-in-progress: false

				permissions:

				  contents: read

				jobs:

				  trigger_bench_on_ec2_machine_in_eu_central_1:

				    permissions:

				      id-token: write # aws-actions/configure-aws-credentials

				      statuses: write

				      contents: write

				      pull-requests: write

				    runs-on: [ self-hosted, small ]

				    container:

				      image: neondatabase/build-tools:pinned

				      image: ghcr.io/neondatabase/build-tools:pinned-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init

				    timeout-minutes: 360  # Set the timeout to 6 hours

				    env:

				      API_KEY: ${{ secrets.PERIODIC_PAGEBENCH_EC2_RUNNER_API_KEY }}

				      RUN_ID: ${{ github.run_id }}

				      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_ID }}

				      AWS_SECRET_ACCESS_KEY : ${{ secrets.AWS_EC2_US_TEST_RUNNER_ACCESS_KEY_SECRET }}

				      AWS_DEFAULT_REGION : "eu-central-1"

				      AWS_INSTANCE_ID : "i-02a59a3bf86bc7e74"

				    steps:

				    # we don't need the neon source code because we run everything remotely

				    # however we still need the local github actions to run the allure step below

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Show my own (github runner) external IP address - usefull for IP allowlisting

				      run: curl https://ifconfig.me

				    - name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}

				        role-duration-seconds: 3600

				    - name: Start EC2 instance and wait for the instance to boot up

				      run: |

				        aws ec2 start-instances --instance-ids $AWS_INSTANCE_ID

				@@ -68,18 +86,20 @@ jobs:

				      run: |

				        if [ -z "$INPUT_COMMIT_HASH" ]; then

				          echo "COMMIT_HASH=$(curl -s https://api.github.com/repos/neondatabase/neon/commits/main | jq -r '.sha')" >> $GITHUB_ENV

				          echo "COMMIT_HASH_TYPE=latest" >> $GITHUB_ENV

				        else

				          echo "COMMIT_HASH=$INPUT_COMMIT_HASH" >> $GITHUB_ENV

				          echo "COMMIT_HASH_TYPE=manual" >> $GITHUB_ENV

				        fi

				    - name: Start Bench with run_id   

				    - name: Start Bench with run_id

				      run: |

				        curl -k -X 'POST' \

				        "${EC2_MACHINE_URL_US}/start_test/${GITHUB_RUN_ID}" \

				        -H 'accept: application/json' \

				        -H 'Content-Type: application/json' \

				        -H "Authorization: Bearer $API_KEY" \

				        -d "{\"neonRepoCommitHash\": \"${COMMIT_HASH}\"}"

				        -d "{\"neonRepoCommitHash\": \"${COMMIT_HASH}\", \"neonRepoCommitHashType\": \"${COMMIT_HASH_TYPE}\"}"

				    - name: Poll Test Status

				      id: poll_step

				@@ -116,7 +136,7 @@ jobs:

				        -H 'accept: application/gzip' \

				        -H "Authorization: Bearer $API_KEY" \

				        --output "test_log_${GITHUB_RUN_ID}.gz"

				    - name: Unzip Test Log and Print it into this job's log

				      if: always() && steps.poll_step.outputs.too_many_runs != 'true'

				      run: |

				@@ -124,23 +144,22 @@ jobs:

				        cat "test_log_${GITHUB_RUN_ID}"

				    - name: Create Allure report

				      env:

				        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				      if: ${{ !cancelled() }}

				      uses: ./.github/actions/allure-report-generate

				      with:

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Post to a Slack channel

				      if: ${{ github.event.schedule && failure() }}

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C033QLM5P7D" # dev-staging-stream

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: "Periodic pagebench testing on dedicated hardware: ${{ job.status }}\n${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"

				      env:

				        SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

				    - name: Cleanup Test Resources

				      if: always() 

				      if: always()

				      run: |

				        curl -k -X 'POST' \

				        "${EC2_MACHINE_URL_US}/cleanup_test/${GITHUB_RUN_ID}" \

				@@ -148,6 +167,14 @@ jobs:

				        -H "Authorization: Bearer $API_KEY" \

				        -d ''

				    - name: Assume AWS OIDC role that allows to manage (start/stop/describe... EC machine)

				      if: always() && steps.poll_step.outputs.too_many_runs != 'true'

				      uses: aws-actions/configure-aws-credentials@e3dd6a429d7300a6a4c196c26e071d42e0343502 # v4.0.2

				      with:

				        aws-region: eu-central-1

				        role-to-assume: ${{ vars.DEV_AWS_OIDC_ROLE_MANAGE_BENCHMARK_EC2_VMS_ARN }}

				        role-duration-seconds: 3600

				    - name: Stop EC2 instance and wait for the instance to be stopped

				      if: always() && steps.poll_step.outputs.too_many_runs != 'true'

				      run: |

									
										56

.github/workflows/pg-clients.yml
									
										vendored
									
												View File
												
				@@ -12,8 +12,8 @@ on:

				  pull_request:

				    paths:

				      - '.github/workflows/pg-clients.yml'

				      - 'test_runner/pg_clients/**'

				      - 'test_runner/logical_repl/**'

				      - 'test_runner/pg_clients/**/*.py'

				      - 'test_runner/logical_repl/**/*.py'

				      - 'poetry.lock'

				  workflow_dispatch:

				@@ -25,11 +25,13 @@ defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				permissions:

				  id-token: write # aws-actions/configure-aws-credentials

				  statuses: write # require for posting a status update

				env:

				  DEFAULT_PG_VERSION: 16

				  PLATFORM: neon-captest-new

				  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY_DEV }}

				  AWS_DEFAULT_REGION: eu-central-1

				jobs:

				@@ -39,15 +41,9 @@ jobs:

				    with:

				      github-event-name: ${{ github.event_name }}

				  check-build-tools-image:

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/check-build-tools-image.yml

				  build-build-tools-image:

				    needs: [ check-build-tools-image ]

				    needs: [ check-permissions ]

				    uses: ./.github/workflows/build-build-tools-image.yml

				    with:

				      image-tag: ${{ needs.check-build-tools-image.outputs.image-tag }}

				    secrets: inherit

				  test-logical-replication:

				@@ -55,10 +51,10 @@ jobs:

				    runs-on: ubuntu-22.04

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init --user root

				    services:

				      clickhouse:

				@@ -92,7 +88,12 @@ jobs:

				        ports:

				          - 8083:8083

				    steps:

				      - uses: actions/checkout@v4

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - name: Download Neon artifact

				        uses: ./.github/actions/download

				@@ -100,6 +101,7 @@ jobs:

				          name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				          path: /tmp/neon/

				          prefix: latest

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      - name: Create Neon Project

				        id: create-neon-project

				@@ -107,6 +109,8 @@ jobs:

				        with:

				          api_key: ${{ secrets.NEON_STAGING_API_KEY }}

				          postgres_version: ${{ env.DEFAULT_PG_VERSION }}

				          project_settings: >-

				            {"enable_logical_replication": true}

				      - name: Run tests

				        uses: ./.github/actions/run-python-test-set

				@@ -116,6 +120,7 @@ jobs:

				          run_in_parallel: false

				          extra_params: -m remote_cluster

				          pg_version: ${{ env.DEFAULT_PG_VERSION }}

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

				@@ -132,12 +137,13 @@ jobs:

				        uses: ./.github/actions/allure-report-generate

				        with:

				          store-test-results-into-db: true

				          aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				        env:

				          REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				      - name: Post to a Slack channel

				        if: github.event.schedule && failure()

				        uses: slackapi/slack-github-action@v1

				        uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				        with:

				          channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				          slack-message: |

				@@ -150,14 +156,19 @@ jobs:

				    runs-on: ubuntu-22.04

				    container:

				      image: ${{ needs.build-build-tools-image.outputs.image }}

				      image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm

				      credentials:

				        username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				        password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				        username: ${{ github.actor }}

				        password: ${{ secrets.GITHUB_TOKEN }}

				      options: --init --user root

				    steps:

				    - uses: actions/checkout@v4

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				    - name: Download Neon artifact

				      uses: ./.github/actions/download

				@@ -165,6 +176,7 @@ jobs:

				        name: neon-${{ runner.os }}-${{ runner.arch }}-release-artifact

				        path: /tmp/neon/

				        prefix: latest

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				    - name: Create Neon Project

				      id: create-neon-project

				@@ -181,6 +193,7 @@ jobs:

				        run_in_parallel: false

				        extra_params: -m remote_cluster

				        pg_version: ${{ env.DEFAULT_PG_VERSION }}

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        BENCHMARK_CONNSTR: ${{ steps.create-neon-project.outputs.dsn }}

				@@ -197,12 +210,13 @@ jobs:

				      uses: ./.github/actions/allure-report-generate

				      with:

				        store-test-results-into-db: true

				        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

				      env:

				        REGRESS_TEST_RESULT_CONNSTR_NEW: ${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}

				    - name: Post to a Slack channel

				      if: github.event.schedule && failure()

				      uses: slackapi/slack-github-action@v1

				      uses: slackapi/slack-github-action@fcfb566f8b0aab22203f066d80ca1d7e4b5d05b3 # v1.27.1

				      with:

				        channel-id: "C06KHQVQ7U3" # on-call-qa-staging-stream

				        slack-message: |

									
										82

.github/workflows/pin-build-tools-image.yml
									
										vendored
									
												View File
												
				@@ -33,10 +33,6 @@ concurrency:

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				env:

				  FROM_TAG: ${{ inputs.from-tag }}

				  TO_TAG: pinned

				jobs:

				  check-manifests:

				    runs-on: ubuntu-22.04

				@@ -44,13 +40,21 @@ jobs:

				      skip: ${{ steps.check-manifests.outputs.skip }}

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@v2

				        with:

				          egress-policy: audit

				      - name: Check if we really need to pin the image

				        id: check-manifests

				        env:

				          FROM_TAG: ${{ inputs.from-tag }}

				          TO_TAG: pinned

				        run: |

				          docker manifest inspect neondatabase/build-tools:${FROM_TAG} > ${FROM_TAG}.json

				          docker manifest inspect neondatabase/build-tools:${TO_TAG}   > ${TO_TAG}.json

				          docker manifest inspect "ghcr.io/neondatabase/build-tools:${FROM_TAG}" > "${FROM_TAG}.json"

				          docker manifest inspect "ghcr.io/neondatabase/build-tools:${TO_TAG}"   > "${TO_TAG}.json"

				          if diff ${FROM_TAG}.json ${TO_TAG}.json; then

				          if diff "${FROM_TAG}.json" "${TO_TAG}.json"; then

				            skip=true

				          else

				            skip=false

				@@ -64,38 +68,36 @@ jobs:

				    # use format(..) to catch both inputs.force = true AND inputs.force = 'true'

				    if: needs.check-manifests.outputs.skip == 'false' || format('{0}', inputs.force) == 'true'

				    runs-on: ubuntu-22.04

				    permissions:

				      id-token: write # for `azure/login`

				      id-token: write  # Required for aws/azure login

				      packages: write  # required for pushing to GHCR

				    steps:

				      - uses: docker/login-action@v3

				        with:

				          username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}

				          password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

				      - uses: docker/login-action@v3

				        with:

				          registry: 369495373322.dkr.ecr.eu-central-1.amazonaws.com

				          username: ${{ secrets.AWS_ACCESS_KEY_DEV }}

				          password: ${{ secrets.AWS_SECRET_KEY_DEV }}

				      - name: Azure login

				        uses: azure/login@6c251865b4e6290e7b78be643ea2d005bc51f69a  # @v2.1.1

				        with:

				          client-id: ${{ secrets.AZURE_DEV_CLIENT_ID }}

				          tenant-id: ${{ secrets.AZURE_TENANT_ID }}

				          subscription-id: ${{ secrets.AZURE_DEV_SUBSCRIPTION_ID }}

				      - name: Login to ACR

				        run: |

				          az acr login --name=neoneastus2

				      - name: Tag build-tools with `${{ env.TO_TAG }}` in Docker Hub, ECR, and ACR

				        run: |

				          docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG} \

				                                          -t neoneastus2.azurecr.io/neondatabase/build-tools:${TO_TAG} \

				                                          -t neondatabase/build-tools:${TO_TAG} \

				                                             neondatabase/build-tools:${FROM_TAG}

				    uses: ./.github/workflows/_push-to-container-registry.yml

				    with:

				      image-map: |

				        {

				          "ghcr.io/neondatabase/build-tools:${{ inputs.from-tag }}-bullseye": [

				            "docker.io/neondatabase/build-tools:pinned-bullseye",

				            "ghcr.io/neondatabase/build-tools:pinned-bullseye",

				            "${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/build-tools:pinned-bullseye",

				            "${{ vars.AZURE_DEV_REGISTRY_NAME }}.azurecr.io/neondatabase/build-tools:pinned-bullseye"

				          ],

				          "ghcr.io/neondatabase/build-tools:${{ inputs.from-tag }}-bookworm": [

				            "docker.io/neondatabase/build-tools:pinned-bookworm",

				            "docker.io/neondatabase/build-tools:pinned",

				            "ghcr.io/neondatabase/build-tools:pinned-bookworm",

				            "ghcr.io/neondatabase/build-tools:pinned",

				            "${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/build-tools:pinned-bookworm",

				            "${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}.dkr.ecr.${{ vars.AWS_ECR_REGION }}.amazonaws.com/build-tools:pinned",

				            "${{ vars.AZURE_DEV_REGISTRY_NAME }}.azurecr.io/neondatabase/build-tools:pinned-bookworm",

				            "${{ vars.AZURE_DEV_REGISTRY_NAME }}.azurecr.io/neondatabase/build-tools:pinned"

				          ]

				        }

				      aws-region: ${{ vars.AWS_ECR_REGION }}

				      aws-account-id: "${{ vars.NEON_DEV_AWS_ACCOUNT_ID }}"

				      aws-role-to-assume: "gha-oidc-neon-admin"

				      azure-client-id: ${{ vars.AZURE_DEV_CLIENT_ID }}

				      azure-subscription-id: ${{ vars.AZURE_DEV_SUBSCRIPTION_ID }}

				      azure-tenant-id: ${{ vars.AZURE_TENANT_ID }}

				      acr-registry-name: ${{ vars.AZURE_DEV_REGISTRY_NAME }}

				    secrets: inherit

									
										177

.github/workflows/pre-merge-checks.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,177 @@

				name: Pre-merge checks

				on:

				  pull_request:

				    paths:

				      - .github/workflows/_check-codestyle-python.yml

				      - .github/workflows/_check-codestyle-rust.yml

				      - .github/workflows/build-build-tools-image.yml

				      - .github/workflows/pre-merge-checks.yml

				  merge_group:

				defaults:

				  run:

				    shell: bash -euxo pipefail {0}

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				jobs:

				  meta:

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read

				    outputs:

				      python-changed: ${{ steps.python-src.outputs.any_changed }}

				      rust-changed: ${{ steps.rust-src.outputs.any_changed }}

				      branch: ${{ steps.group-metadata.outputs.branch }}

				      pr-number: ${{ steps.group-metadata.outputs.pr-number }}

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      - uses: step-security/changed-files@3dbe17c78367e7d60f00d78ae6781a35be47b4a1 # v45.0.1

				        id: python-src

				        with:

				          files: |

				            .github/workflows/_check-codestyle-python.yml

				            .github/workflows/build-build-tools-image.yml

				            .github/workflows/pre-merge-checks.yml

				            **/**.py

				            poetry.lock

				            pyproject.toml

				      - uses: step-security/changed-files@3dbe17c78367e7d60f00d78ae6781a35be47b4a1 # v45.0.1

				        id: rust-src

				        with:

				          files: |

				            .github/workflows/_check-codestyle-rust.yml

				            .github/workflows/build-build-tools-image.yml

				            .github/workflows/pre-merge-checks.yml

				            **/**.rs

				            **/Cargo.toml

				            Cargo.toml

				            Cargo.lock

				      - name: PRINT ALL CHANGED FILES FOR DEBUG PURPOSES

				        env:

				          PYTHON_CHANGED_FILES: ${{ steps.python-src.outputs.all_changed_files }}

				          RUST_CHANGED_FILES: ${{ steps.rust-src.outputs.all_changed_files }}

				        run: |

				          echo "${PYTHON_CHANGED_FILES}"

				          echo "${RUST_CHANGED_FILES}"

				      - name: Merge group metadata

				        if: ${{ github.event_name == 'merge_group' }}

				        id: group-metadata

				        env:

				          MERGE_QUEUE_REF: ${{ github.event.merge_group.head_ref }}

				        run: |

				          echo $MERGE_QUEUE_REF | jq -Rr 'capture("refs/heads/gh-readonly-queue/(?<branch>.*)/pr-(?<pr_number>[0-9]+)-[0-9a-f]{40}") | ["branch=" + .branch, "pr-number=" + .pr_number] | .[]' | tee -a "${GITHUB_OUTPUT}"

				  build-build-tools-image:

				    if: |

				      false

				      || needs.meta.outputs.python-changed == 'true'

				      || needs.meta.outputs.rust-changed == 'true'

				    needs: [ meta ]

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/build-build-tools-image.yml

				    with:

				      # Build only one combination to save time

				      archs: '["x64"]'

				      debians: '["bookworm"]'

				    secrets: inherit

				  check-codestyle-python:

				    if: needs.meta.outputs.python-changed == 'true'

				    needs: [ meta, build-build-tools-image ]

				    permissions:

				      contents: read

				      packages: read

				    uses: ./.github/workflows/_check-codestyle-python.yml

				    with:

				      # `-bookworm-x64` suffix should match the combination in `build-build-tools-image`

				      build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64

				    secrets: inherit

				  check-codestyle-rust:

				    if: needs.meta.outputs.rust-changed == 'true'

				    needs: [ meta, build-build-tools-image ]

				    permissions:

				      contents: read

				      packages: read

				    uses: ./.github/workflows/_check-codestyle-rust.yml

				    with:

				      # `-bookworm-x64` suffix should match the combination in `build-build-tools-image`

				      build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm-x64

				      archs: '["x64"]'

				    secrets: inherit

				  # To get items from the merge queue merged into main we need to satisfy "Status checks that are required".

				  # Currently we require 2 jobs (checks with exact name):

				  # - conclusion

				  # - neon-cloud-e2e

				  conclusion:

				    # Do not run job on Pull Requests as it interferes with the `conclusion` job from the `build_and_test` workflow

				    if: always() && github.event_name == 'merge_group'

				    permissions:

				      statuses: write # for `github.repos.createCommitStatus(...)`

				      contents: write

				    needs:

				      - meta

				      - check-codestyle-python

				      - check-codestyle-rust

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Create fake `neon-cloud-e2e` check

				        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1

				        with:

				          # Retry script for 5XX server errors: https://github.com/actions/github-script#retries

				          retries: 5

				          script: |

				            const { repo, owner } = context.repo;

				            const targetUrl = `${context.serverUrl}/${owner}/${repo}/actions/runs/${context.runId}`;

				            await github.rest.repos.createCommitStatus({

				              owner: owner,

				              repo: repo,

				              sha: context.sha,

				              context: `neon-cloud-e2e`,

				              state: `success`,

				              target_url: targetUrl,

				              description: `fake check for merge queue`,

				            });

				      - name: Fail the job if any of the dependencies do not succeed or skipped

				        run: exit 1

				        if: |

				          false

				          || (github.event_name == 'merge_group' && needs.meta.outputs.branch != 'main')

				          || (needs.check-codestyle-python.result == 'skipped' && needs.meta.outputs.python-changed == 'true')

				          || (needs.check-codestyle-rust.result   == 'skipped' && needs.meta.outputs.rust-changed   == 'true')

				          || contains(needs.*.result, 'failure')

				          || contains(needs.*.result, 'cancelled')

				      - name: Add fast-forward label to PR to trigger fast-forward merge

				        if: >-

				          ${{

				            always()

				            && github.event_name == 'merge_group'

				            && contains(fromJSON('["release", "release-proxy", "release-compute"]'), needs.meta.outputs.branch)

				          }}

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				        run: >-

				          gh pr edit ${{ needs.meta.outputs.pr-number }} --repo "${GITHUB_REPOSITORY}" --add-label "fast-forward"

									
										46

.github/workflows/regenerate-pg-setting.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,46 @@

				name: Regenerate Postgres Settings

				on:

				  pull_request:

				    types:

				      - opened

				      - synchronize

				      - reopened

				    paths:

				      - pgxn/neon/**.c

				      - vendor/postgres-v*

				      - vendor/revisions.json

				concurrency:

				  group: ${{ github.workflow }}-${{ github.head_ref }}

				  cancel-in-progress: true

				permissions:

				  pull-requests: write

				jobs:

				  regenerate-pg-settings:

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - name: Add comment

				        uses: thollander/actions-comment-pull-request@65f9e5c9a1f2cd378bd74b2e057c9736982a8e74 # v3

				        with:

				          comment-tag: ${{ github.job }}

				          pr-number: ${{ github.event.number }}

				          message: |

				            If this PR added a GUC in the Postgres fork or `neon` extension,

				            please regenerate the Postgres settings in the `cloud` repo:

				            ```

				            make NEON_WORKDIR=path/to/neon/checkout \

				              -C goapp/internal/shareddomain/postgres generate

				            ```

				            If you're an external contributor, a Neon employee will assist in

				            making sure this step is done.

									
										7

.github/workflows/release-notify.yml
									
										vendored
									
												View File
												
				@@ -22,7 +22,12 @@ jobs:

				    runs-on: ubuntu-22.04

				    steps:

				      - uses: neondatabase/dev-actions/release-pr-notify@main

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				        with:

				          egress-policy: audit

				      - uses: neondatabase/dev-actions/release-pr-notify@483a843f2a8bcfbdc4c69d27630528a3ddc4e14b # main

				        with:

				          slack-token: ${{ secrets.SLACK_BOT_TOKEN }}

				          slack-channel-id: ${{ vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID || 'C05QQ9J1BRC' }} # if not set, then `#test-release-notifications`

									
										104

.github/workflows/release.yml
									
										vendored
									
												View File
												
				@@ -3,8 +3,9 @@ name: Create Release Branch

				on:

				  schedule:

				    # It should be kept in sync with if-condition in jobs

				    - cron: '0 6 * * MON' # Storage release

				    - cron: '0 6 * * THU' # Proxy release

				    - cron: '0 6 * * TUE' # Proxy release

				    - cron: '0 6 * * FRI' # Storage release

				    - cron: '0 7 * * FRI' # Compute release

				  workflow_dispatch:

				    inputs:

				      create-storage-release-branch:

				@@ -15,6 +16,10 @@ on:

				        type: boolean

				        description: 'Create Proxy release PR'

				        required: false

				      create-compute-release-branch:

				        type: boolean

				        description: 'Create Compute release PR'

				        required: false

				# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.

				permissions: {}

				@@ -25,83 +30,40 @@ defaults:

				jobs:

				  create-storage-release-branch:

				    if: ${{ github.event.schedule == '0 6 * * MON' || format('{0}', inputs.create-storage-release-branch) == 'true' }}

				    runs-on: ubuntu-22.04

				    if: ${{ github.event.schedule == '0 6 * * FRI' || inputs.create-storage-release-branch }}

				    permissions:

				      contents: write # for `git push`

				      contents: write

				    steps:

				    - name: Check out code

				      uses: actions/checkout@v4

				      with:

				        ref: main

				    - name: Set environment variables

				      run: |

				        echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				        echo "RELEASE_BRANCH=rc/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				    - name: Create release branch

				      run: git checkout -b $RELEASE_BRANCH

				    - name: Push new branch

				      run: git push origin $RELEASE_BRANCH

				    - name: Create pull request into release

				      env:

				        GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      run: |

				        TITLE="Storage & Compute release ${RELEASE_DATE}"

				        cat << EOF > body.md

				          ## ${TITLE}

				          **Please merge this Pull Request using 'Create a merge commit' button**

				        EOF

				        gh pr create --title "${TITLE}" \

				                     --body-file "body.md" \

				                     --head "${RELEASE_BRANCH}" \

				                     --base "release"

				    uses: ./.github/workflows/_create-release-pr.yml

				    with:

				      component-name: 'Storage'

				      source-branch: ${{ github.ref_name }}

				    secrets:

				      ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}

				  create-proxy-release-branch:

				    if: ${{ github.event.schedule == '0 6 * * THU' || format('{0}', inputs.create-proxy-release-branch) == 'true' }}

				    runs-on: ubuntu-22.04

				    if: ${{ github.event.schedule == '0 6 * * TUE' || inputs.create-proxy-release-branch }}

				    permissions:

				      contents: write # for `git push`

				      contents: write

				    steps:

				    - name: Check out code

				      uses: actions/checkout@v4

				      with:

				        ref: main

				    uses: ./.github/workflows/_create-release-pr.yml

				    with:

				      component-name: 'Proxy'

				      source-branch: ${{ github.ref_name }}

				    secrets:

				      ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}

				    - name: Set environment variables

				      run: |

				        echo "RELEASE_DATE=$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				        echo "RELEASE_BRANCH=rc/proxy/$(date +'%Y-%m-%d')" | tee -a $GITHUB_ENV

				  create-compute-release-branch:

				    if: ${{ github.event.schedule == '0 7 * * FRI' || inputs.create-compute-release-branch }}

				    - name: Create release branch

				      run: git checkout -b $RELEASE_BRANCH

				    permissions:

				      contents: write

				    - name: Push new branch

				      run: git push origin $RELEASE_BRANCH

				    - name: Create pull request into release

				      env:

				        GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      run: |

				        TITLE="Proxy release ${RELEASE_DATE}"

				        cat << EOF > body.md

				          ## ${TITLE}

				          **Please merge this Pull Request using 'Create a merge commit' button**

				        EOF

				        gh pr create --title "${TITLE}" \

				                     --body-file "body.md" \

				                     --head "${RELEASE_BRANCH}" \

				                     --base "release-proxy"

				    uses: ./.github/workflows/_create-release-pr.yml

				    with:

				      component-name: 'Compute'

				      source-branch: ${{ github.ref_name }}

				    secrets:

				      ci-access-token: ${{ secrets.CI_ACCESS_TOKEN }}

									
										71

.github/workflows/report-workflow-stats-batch.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,71 @@

				name: Report Workflow Stats Batch

				on:

				  schedule:

				    - cron: '*/15 * * * *'

				    - cron: '25 0 * * *'

				    - cron: '25 1 * * 6'

				permissions:

				  contents: read

				jobs:

				  gh-workflow-stats-batch-2h:

				    name: GitHub Workflow Stats Batch 2 hours

				    if: github.event.schedule == '*/15 * * * *'

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: read

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - name: Export Workflow Run for the past 2 hours

				      uses: neondatabase/gh-workflow-stats-action@701b1f202666d0b82e67b4d387e909af2b920127 # v0.2.2

				      with:

				        db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}

				        db_table: "gh_workflow_stats_neon"

				        gh_token: ${{ secrets.GITHUB_TOKEN }}

				        duration: '2h'

				  gh-workflow-stats-batch-48h:

				    name: GitHub Workflow Stats Batch 48 hours

				    if: github.event.schedule == '25 0 * * *'

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: read

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - name: Export Workflow Run for the past 48 hours

				      uses: neondatabase/gh-workflow-stats-action@701b1f202666d0b82e67b4d387e909af2b920127 # v0.2.2

				      with:

				        db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}

				        db_table: "gh_workflow_stats_neon"

				        gh_token: ${{ secrets.GITHUB_TOKEN }}

				        duration: '48h'

				  gh-workflow-stats-batch-30d:

				    name: GitHub Workflow Stats Batch 30 days

				    if: github.event.schedule == '25 1 * * 6'

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: read

				    steps:

				    - name: Harden the runner (Audit all outbound calls)

				      uses: step-security/harden-runner@4d991eb9b905ef189e4c376166672c3f2f230481 # v2.11.0

				      with:

				        egress-policy: audit

				    - name: Export Workflow Run for the past 30 days

				      uses: neondatabase/gh-workflow-stats-action@701b1f202666d0b82e67b4d387e909af2b920127 # v0.2.2

				      with:

				        db_uri: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}

				        db_table: "gh_workflow_stats_neon"

				        gh_token: ${{ secrets.GITHUB_TOKEN }}

				        duration: '720h'

									
										114

.github/workflows/trigger-e2e-tests.yml
									
										vendored
									
												View File
												
				@@ -5,6 +5,13 @@ on:

				    types:

				      - ready_for_review

				  workflow_call:

				    inputs:

				      github-event-name:

				        type: string

				        required: true

				      github-event-json:

				        type: string

				        required: true

				defaults:

				  run:

				@@ -15,11 +22,23 @@ env:

				  E2E_CONCURRENCY_GROUP: ${{ github.repository }}-e2e-tests-${{ github.ref_name }}-${{ github.ref_name == 'main' && github.sha || 'anysha' }}

				jobs:

				  check-permissions:

				    if: ${{ !contains(github.event.pull_request.labels.*.name, 'run-no-ci') }}

				    uses: ./.github/workflows/check-permissions.yml

				    with:

				      github-event-name: ${{ inputs.github-event-name || github.event_name }}

				  cancel-previous-e2e-tests:

				    needs: [ check-permissions ]

				    if: github.event_name == 'pull_request'

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@v2

				        with:

				          egress-policy: audit

				      - name: Cancel previous e2e-tests runs for this PR

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				@@ -28,45 +47,37 @@ jobs:

				            run cancel-previous-in-concurrency-group.yml \

				              --field concurrency_group="${{ env.E2E_CONCURRENCY_GROUP }}"

				  tag:

				    runs-on: ubuntu-22.04

				    outputs:

				      build-tag: ${{ steps.build-tag.outputs.tag }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v4

				        with:

				          fetch-depth: 0

				      - name: Get build tag

				        env:

				          GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				          CURRENT_BRANCH: ${{ github.head_ref || github.ref_name }}

				          CURRENT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}

				        run: |

				          if [[ "$GITHUB_REF_NAME" == "main" ]]; then

				            echo "tag=$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release" ]]; then

				            echo "tag=release-$(git rev-list --count HEAD)" | tee -a $GITHUB_OUTPUT

				          elif [[ "$GITHUB_REF_NAME" == "release-proxy" ]]; then

				            echo "tag=release-proxy-$(git rev-list --count HEAD)" >> $GITHUB_OUTPUT

				          else

				            echo "GITHUB_REF_NAME (value '$GITHUB_REF_NAME') is not set to either 'main' or 'release'"

				            BUILD_AND_TEST_RUN_ID=$(gh run list -b $CURRENT_BRANCH -c $CURRENT_SHA -w 'Build and Test' -L 1 --json databaseId --jq '.[].databaseId')

				            echo "tag=$BUILD_AND_TEST_RUN_ID" | tee -a $GITHUB_OUTPUT

				          fi

				        id: build-tag

				  meta:

				    uses: ./.github/workflows/_meta.yml

				    with:

				      github-event-name: ${{ inputs.github-event-name || github.event_name }}

				      github-event-json: ${{ inputs.github-event-json || toJSON(github.event) }}

				  trigger-e2e-tests:

				    needs: [ tag ]

				    needs: [ meta ]

				    runs-on: ubuntu-22.04

				    env:

				      EVENT_ACTION: ${{ github.event.action }}

				      GH_TOKEN: ${{ secrets.CI_ACCESS_TOKEN }}

				      TAG: ${{ needs.tag.outputs.build-tag }}

				      TAG: >-

				        ${{

				          contains(fromJSON('["compute-release", "compute-rc-pr"]'), needs.meta.outputs.run-kind)

				          && needs.meta.outputs.previous-storage-release

				          || needs.meta.outputs.build-tag

				        }}

				      COMPUTE_TAG: >-

				        ${{

				          contains(fromJSON('["storage-release", "storage-rc-pr", "proxy-release", "proxy-rc-pr"]'), needs.meta.outputs.run-kind)

				          && needs.meta.outputs.previous-compute-release

				          || needs.meta.outputs.build-tag

				        }}

				    steps:

				      - name: Wait for `promote-images` job to finish

				      - name: Harden the runner (Audit all outbound calls)

				        uses: step-security/harden-runner@v2

				        with:

				          egress-policy: audit

				      - name: Wait for `push-{neon,compute}-image-dev` job to finish

				        # It's important to have a timeout here, the script in the step can run infinitely

				        timeout-minutes: 60

				        run: |

				@@ -77,20 +88,20 @@ jobs:

				          # For PRs we use the run id as the tag

				          BUILD_AND_TEST_RUN_ID=${TAG}

				          while true; do

				            conclusion=$(gh run --repo ${GITHUB_REPOSITORY} view ${BUILD_AND_TEST_RUN_ID} --json jobs --jq '.jobs[] | select(.name == "promote-images") | .conclusion')

				            case "$conclusion" in

				              success)

				                break

				                ;;

				              failure | cancelled | skipped)

				                echo "The 'promote-images' job didn't succeed: '${conclusion}'. Exiting..."

				                exit 1

				                ;;

				              *)

				                echo "The 'promote-images' hasn't succeed yet. Waiting..."

				                sleep 60

				                ;;

				            esac

				            gh run --repo ${GITHUB_REPOSITORY} view ${BUILD_AND_TEST_RUN_ID} --json jobs --jq '[.jobs[] | select((.name | startswith("push-neon-image-dev")) or (.name | startswith("push-compute-image-dev"))) | {"name": .name, "conclusion": .conclusion, "url": .url}]' > jobs.json

				            if [ $(jq '[.[] | select(.conclusion == "success")] | length' jobs.json) -eq 2 ]; then

				              break

				            fi

				            jq -c '.[]' jobs.json | while read -r job; do

				              case $(echo $job | jq .conclusion) in

				                failure | cancelled | skipped)

				                  echo "The '$(echo $job | jq .name)' job didn't succeed: '$(echo $job | jq .conclusion)'. See log in '$(echo $job | jq .url)' Exiting..."

				                  exit 1

				                  ;;

				              esac

				            done

				            echo "The 'push-{neon,compute}-image-dev' jobs haven't succeeded yet. Waiting..."

				            sleep 60

				          done

				      - name: Set e2e-platforms

				@@ -102,12 +113,17 @@ jobs:

				          # Default set of platforms to run e2e tests on

				          platforms='["docker", "k8s"]'

				          # If the PR changes vendor/, pgxn/ or libs/vm_monitor/ directories, or Dockerfile.compute-node, add k8s-neonvm to the list of platforms.

				          # If a PR changes anything that affects computes, add k8s-neonvm to the list of platforms.

				          # If the workflow run is not a pull request, add k8s-neonvm to the list.

				          if [ "$GITHUB_EVENT_NAME" == "pull_request" ]; then

				            for f in $(gh api "/repos/${GITHUB_REPOSITORY}/pulls/${PR_NUMBER}/files" --paginate --jq '.[].filename'); do

				              case "$f" in

				                vendor/*|pgxn/*|libs/vm_monitor/*|Dockerfile.compute-node)

				                # List of directories that contain code which affect compute images.

				                #

				                # This isn't exhaustive, just the paths that are most directly compute-related.

				                # For example, compute_ctl also depends on libs/utils, but we don't trigger

				                # an e2e run on that.

				                vendor/*|pgxn/*|compute_tools/*|libs/vm_monitor/*|compute/compute-node.Dockerfile)

				                  platforms=$(echo "${platforms}" | jq --compact-output '. += ["k8s-neonvm"] | unique')

				                  ;;

				                *)

				@@ -142,6 +158,6 @@ jobs:

				              --raw-field "commit_hash=$COMMIT_SHA" \

				              --raw-field "remote_repo=${GITHUB_REPOSITORY}" \

				              --raw-field "storage_image_tag=${TAG}" \

				              --raw-field "compute_image_tag=${TAG}" \

				              --raw-field "compute_image_tag=${COMPUTE_TAG}" \

				              --raw-field "concurrency_group=${E2E_CONCURRENCY_GROUP}" \

				              --raw-field "e2e-platforms=${E2E_PLATFORMS}"

2

.gitignore vendored

View File

@@ -6,6 +6,8 @@ __pycache__/
 test_output/
 .vscode
 .idea
 *.swp
 tags
 neon.iml
 /.neon
 /integration_tests/.neon

4

.gitmodules vendored

View File

@@ -10,3 +10,7 @@
 	path = vendor/postgres-v16
 	url = https://github.com/neondatabase/postgres.git
 	branch = REL_16_STABLE_neon
 [submodule "vendor/postgres-v17"]
 	path = vendor/postgres-v17
 	url = https://github.com/neondatabase/postgres.git
 	branch = REL_17_STABLE_neon

32

CODEOWNERS

View File

@@ -1,13 +1,29 @@
 /compute_tools/ @neondatabase/control-plane @neondatabase/compute
 # Autoscaling
 /libs/vm_monitor/ @neondatabase/autoscaling
 # DevProd & PerfCorr
 /.github/ @neondatabase/developer-productivity @neondatabase/performance-correctness
 # Compute
 /pgxn/ @neondatabase/compute
 /vendor/ @neondatabase/compute
 /compute/ @neondatabase/compute
 /compute_tools/ @neondatabase/compute
 # Proxy
 /libs/proxy/ @neondatabase/proxy
 /proxy/ @neondatabase/proxy
 # Storage
 /pageserver/ @neondatabase/storage
 /safekeeper/ @neondatabase/storage
 /storage_controller @neondatabase/storage
 /storage_scrubber @neondatabase/storage
 /libs/pageserver_api/ @neondatabase/storage
 /libs/postgres_ffi/ @neondatabase/compute @neondatabase/storage
 /libs/remote_storage/ @neondatabase/storage
 /libs/safekeeper_api/ @neondatabase/storage
 /libs/vm_monitor/ @neondatabase/autoscaling
 /pageserver/ @neondatabase/storage
 /pgxn/ @neondatabase/compute
 # Shared
 /pgxn/neon/ @neondatabase/compute @neondatabase/storage
 /proxy/ @neondatabase/proxy
 /safekeeper/ @neondatabase/storage
 /vendor/ @neondatabase/compute
 /libs/compute_api/ @neondatabase/compute @neondatabase/control-plane
 /libs/postgres_ffi/ @neondatabase/compute @neondatabase/storage

3308

Cargo.lock generated

View File

File diff suppressed because it is too large Load Diff

									
										164

Cargo.toml
									
												View File
												
				@@ -11,12 +11,14 @@ members = [

				    "pageserver/pagebench",

				    "proxy",

				    "safekeeper",

				    "safekeeper/client",

				    "storage_broker",

				    "storage_controller",

				    "storage_controller/client",

				    "storage_scrubber",

				    "workspace_hack",

				    "libs/compute_api",

				    "libs/http-utils",

				    "libs/pageserver_api",

				    "libs/postgres_ffi",

				    "libs/safekeeper_api",

				@@ -33,52 +35,56 @@ members = [

				    "libs/postgres_ffi/wal_craft",

				    "libs/vm_monitor",

				    "libs/walproposer",

				    "libs/wal_decoder",

				    "libs/postgres_initdb",

				    "libs/proxy/postgres-protocol2",

				    "libs/proxy/postgres-types2",

				    "libs/proxy/tokio-postgres2",

				]

				[workspace.package]

				edition = "2021"

				edition = "2024"

				license = "Apache-2.0"

				## All dependency versions, used in the project

				[workspace.dependencies]

				ahash = "0.8"

				anyhow = { version = "1.0", features = ["backtrace"] }

				arc-swap = "1.6"

				arc-swap = "1.7"

				async-compression = { version = "0.4.0", features = ["tokio", "gzip", "zstd"] }

				atomic-take = "1.1.0"

				azure_core = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }

				azure_identity = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage_blobs = { version = "0.19", default-features = false, features = ["enable_reqwest_rustls"] }

				flate2 = "1.0.26"

				assert-json-diff = "2"

				async-stream = "0.3"

				async-trait = "0.1"

				aws-config = { version = "1.3", default-features = false, features=["rustls"] }

				aws-sdk-s3 = "1.26"

				aws-sdk-iam = "1.15.0"

				aws-config = { version = "1.5", default-features = false, features=["rustls", "sso"] }

				aws-sdk-s3 = "1.52"

				aws-sdk-iam = "1.46.0"

				aws-sdk-kms = "1.47.0"

				aws-smithy-async = { version = "1.2.1", default-features = false, features=["rt-tokio"] }

				aws-smithy-types = "1.1.9"

				aws-smithy-types = "1.2"

				aws-credential-types = "1.2.0"

				aws-sigv4 = { version = "1.2.1", features = ["sign-http"] }

				aws-types = "1.2.0"

				axum = { version = "0.6.20", features = ["ws"] }

				aws-sigv4 = { version = "1.2", features = ["sign-http"] }

				aws-types = "1.3"

				axum = { version = "0.8.1", features = ["ws"] }

				axum-extra = { version = "0.10.0", features = ["typed-header"] }

				base64 = "0.13.0"

				bincode = "1.3"

				bindgen = "0.70"

				bindgen = "0.71"

				bit_field = "0.10.2"

				bstr = "1.0"

				byteorder = "1.4"

				bytes = "1.0"

				bytes = "1.9"

				camino = "1.1.6"

				cfg-if = "1.0.0"

				cron = "0.15"

				chrono = { version = "0.4", default-features = false, features = ["clock"] }

				clap = { version = "4.0", features = ["derive"] }

				clap = { version = "4.0", features = ["derive", "env"] }

				clashmap = { version = "1.0", features = ["raw-api"] }

				comfy-table = "7.1"

				const_format = "0.2"

				crc32c = "0.6"

				crossbeam-deque = "0.8.5"

				crossbeam-utils = "0.8.5"

				dashmap = { version = "5.5.0", features = ["raw-api"] }

				diatomic-waker = { version = "0.2.3" }

				either = "1.8"

				enum-map = "2.4.2"

				enumset = "1.0.12"

				@@ -89,60 +95,69 @@ futures = "0.3"

				futures-core = "0.3"

				futures-util = "0.3"

				git-version = "0.3"

				governor = "0.8"

				hashbrown = "0.14"

				hashlink = "0.9.1"

				hdrhistogram = "7.5.2"

				hex = "0.4"

				hex-literal = "0.4"

				hmac = "0.12.1"

				hostname = "0.3.1"

				hostname = "0.4"

				http = {version = "1.1.0", features = ["std"]}

				http-types = { version = "2", default-features = false }

				humantime = "2.1"

				http-body-util = "0.1.2"

				humantime = "2.2"

				humantime-serde = "1.1.1"

				hyper = "0.14"

				tokio-tungstenite = "0.20.0"

				indexmap = "2"

				hyper0 = { package = "hyper", version = "0.14" }

				hyper = "1.4"

				hyper-util = "0.1"

				tokio-tungstenite = "0.21.0"

				indexmap = { version = "2", features = ["serde"] }

				indoc = "2"

				inotify = "0.10.2"

				ipnet = "2.9.0"

				ipnet = "2.10.0"

				itertools = "0.10"

				itoa = "1.0.11"

				jemalloc_pprof = { version = "0.7", features = ["symbolize", "flamegraph"] }

				jsonwebtoken = "9"

				lasso = "0.7"

				libc = "0.2"

				md5 = "0.7.0"

				measured = { version = "0.0.22", features=["lasso"] }

				measured-process = { version = "0.0.22" }

				memoffset = "0.8"

				memoffset = "0.9"

				nix = { version = "0.27", features = ["dir", "fs", "process", "socket", "signal", "poll"] }

				# Do not update to >= 7.0.0, at least. The update will have a significant impact

				# on compute startup metrics (start_postgres_ms), >= 25% degradation.

				notify = "6.0.0"

				num_cpus = "1.15"

				num-traits = "0.2.15"

				num-traits = "0.2.19"

				once_cell = "1.13"

				opentelemetry = "0.20.0"

				opentelemetry-otlp = { version = "0.13.0", default-features=false, features = ["http-proto", "trace", "http", "reqwest-client"] }

				opentelemetry-semantic-conventions = "0.12.0"

				opentelemetry = "0.27"

				opentelemetry_sdk = "0.27"

				opentelemetry-otlp = { version = "0.27", default-features = false, features = ["http-proto", "trace", "http", "reqwest-client"] }

				opentelemetry-semantic-conventions = "0.27"

				parking_lot = "0.12"

				parquet = { version = "53", default-features = false, features = ["zstd"] }

				parquet_derive = "53"

				pbkdf2 = { version = "0.12.1", features = ["simple", "std"] }

				pin-project-lite = "0.2"

				pprof = { version = "0.14", features = ["criterion", "flamegraph", "frame-pointer", "prost-codec"] }

				procfs = "0.16"

				prometheus = {version = "0.13", default-features=false, features = ["process"]} # removes protobuf dependency

				prost = "0.11"

				prost = "0.13"

				rand = "0.8"

				redis = { version = "0.25.2", features = ["tokio-rustls-comp", "keep-alive"] }

				redis = { version = "0.29.2", features = ["tokio-rustls-comp", "keep-alive"] }

				regex = "1.10.2"

				reqwest = { version = "0.12", default-features = false, features = ["rustls-tls"] }

				reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_20"] }

				reqwest-middleware = "0.3.0"

				reqwest-retry = "0.5"

				reqwest-tracing = { version = "0.5", features = ["opentelemetry_0_27"] }

				reqwest-middleware = "0.4"

				reqwest-retry = "0.7"

				routerify = "3"

				rpds = "0.13"

				rustc-hash = "1.1.0"

				rustls = "0.22"

				rustls = { version = "0.23.16", default-features = false }

				rustls-pemfile = "2"

				rustls-split = "0.3"

				rustls-pki-types = "1.11"

				scopeguard = "1.1"

				sysinfo = "0.29.2"

				sd-notify = "0.4.1"

				@@ -151,7 +166,7 @@ sentry = { version = "0.32", default-features = false, features = ["backtrace",

				serde = { version = "1.0", features = ["derive"] }

				serde_json = "1"

				serde_path_to_error = "0.1"

				serde_with = "2.0"

				serde_with = { version = "2.0", features = [ "base64" ] }

				serde_assert = "0.5.0"

				sha2 = "0.10.2"

				signal-hook = "0.3"

				@@ -164,26 +179,33 @@ strum_macros = "0.26"

				svg_fmt = "0.4.3"

				sync_wrapper = "0.1.2"

				tar = "0.4"

				task-local-extensions = "0.1.4"

				test-context = "0.3"

				thiserror = "1.0"

				tikv-jemallocator = "0.5"

				tikv-jemalloc-ctl = "0.5"

				tokio = { version = "1.17", features = ["macros"] }

				tikv-jemallocator = { version = "0.6", features = ["profiling", "stats", "unprefixed_malloc_on_supported_platforms"] }

				tikv-jemalloc-ctl = { version = "0.6", features = ["stats"] }

				tokio = { version = "1.41", features = ["macros"] }

				tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }

				tokio-io-timeout = "1.2.0"

				tokio-postgres-rustls = "0.11.0"

				tokio-rustls = "0.25"

				tokio-postgres-rustls = "0.12.0"

				tokio-rustls = { version = "0.26.0", default-features = false, features = ["tls12", "ring"]}

				tokio-stream = "0.1"

				tokio-tar = "0.3"

				tokio-util = { version = "0.7.10", features = ["io", "rt"] }

				toml = "0.8"

				toml_edit = "0.22"

				tonic = {version = "0.9", features = ["tls", "tls-roots"]}

				tower-service = "0.3.2"

				tonic = {version = "0.12.3", default-features = false, features = ["channel", "tls", "tls-roots"]}

				tower = { version = "0.5.2", default-features = false }

				tower-http = { version = "0.6.2", features = ["auth", "request-id", "trace"] }

				# This revision uses opentelemetry 0.27. There's no tag for it.

				tower-otel = { git = "https://github.com/mattiapenati/tower-otel", rev = "56a7321053bcb72443888257b622ba0d43a11fcd" }

				tower-service = "0.3.3"

				tracing = "0.1"

				tracing-error = "0.2.0"

				tracing-opentelemetry = "0.21.0"

				tracing-error = "0.2"

				tracing-log = "0.2"

				tracing-opentelemetry = "0.28"

				tracing-serde = "0.2.0"

				tracing-subscriber = { version = "0.3", default-features = false, features = ["smallvec", "fmt", "tracing-log", "std", "env-filter", "json"] }

				try-lock = "0.2.5"

				twox-hash = { version = "1.6.3", default-features = false }

				@@ -192,44 +214,45 @@ url = "2.2"

				urlencoding = "2.1"

				uuid = { version = "1.6.1", features = ["v4", "v7", "serde"] }

				walkdir = "2.3.2"

				rustls-native-certs = "0.7"

				x509-parser = "0.15"

				rustls-native-certs = "0.8"

				whoami = "1.5.1"

				zerocopy = { version = "0.7", features = ["derive"] }

				json-structural-diff = { version = "0.2.0" }

				x509-cert = { version = "0.2.5" }

				## TODO replace this with tracing

				env_logger = "0.10"

				env_logger = "0.11"

				log = "0.4"

				## Libraries from neondatabase/ git forks, ideally with changes to be upstreamed

				postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				# We want to use the 'neon' branch for these, but there's currently one

				# incompatible change on the branch. See:

				#

				# - PR #8076 which contained changes that depended on the new changes in

				#   the rust-postgres crate, and

				# - PR #8654 which reverted those changes and made the code in proxy incompatible

				#   with the tip of the 'neon' branch again.

				#

				# When those proxy changes are re-applied (see PR #8747), we can switch using

				# the tip of the 'neon' branch again.

				postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }

				postgres-protocol = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }

				postgres-types = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }

				## Azure SDK crates

				azure_core = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls", "hmac_rust"] }

				azure_identity = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }

				azure_storage_blobs = { git = "https://github.com/neondatabase/azure-sdk-for-rust.git", branch = "neon", default-features = false, features = ["enable_reqwest_rustls"] }

				## Local libraries

				compute_api = { version = "0.1", path = "./libs/compute_api/" }

				consumption_metrics = { version = "0.1", path = "./libs/consumption_metrics/" }

				http-utils = { version = "0.1", path = "./libs/http-utils/" }

				metrics = { version = "0.1", path = "./libs/metrics/" }

				pageserver = { path = "./pageserver" }

				pageserver_api = { version = "0.1", path = "./libs/pageserver_api/" }

				pageserver_client = { path = "./pageserver/client" }

				pageserver_compaction = { version = "0.1", path = "./pageserver/compaction/" }

				postgres_backend = { version = "0.1", path = "./libs/postgres_backend/" }

				postgres_connection = { version = "0.1", path = "./libs/postgres_connection/" }

				postgres_ffi = { version = "0.1", path = "./libs/postgres_ffi/" }

				postgres_initdb = { path = "./libs/postgres_initdb" }

				pq_proto = { version = "0.1", path = "./libs/pq_proto/" }

				remote_storage = { version = "0.1", path = "./libs/remote_storage/" }

				safekeeper_api = { version = "0.1", path = "./libs/safekeeper_api" }

				safekeeper_client = { path = "./safekeeper/client" }

				desim = { version = "0.1", path = "./libs/desim" }

				storage_broker = { version = "0.1", path = "./storage_broker/" } # Note: main broker code is inside the binary crate, so linking with the library shouldn't be heavy.

				storage_controller_client = { path = "./storage_controller/client" }

				@@ -238,27 +261,30 @@ tracing-utils = { version = "0.1", path = "./libs/tracing-utils/" }

				utils = { version = "0.1", path = "./libs/utils/" }

				vm_monitor = { version = "0.1", path = "./libs/vm_monitor/" }

				walproposer = { version = "0.1", path = "./libs/walproposer/" }

				wal_decoder = { version = "0.1", path = "./libs/wal_decoder" }

				## Common library dependency

				workspace_hack = { version = "0.1", path = "./workspace_hack/" }

				## Build dependencies

				criterion = "0.5.1"

				rcgen = "0.12"

				rcgen = "0.13"

				rstest = "0.18"

				camino-tempfile = "1.0.2"

				tonic-build = "0.9"

				tonic-build = "0.12"

				[patch.crates-io]

				# Needed to get `tokio-postgres-rustls` to depend on our fork.

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", rev = "20031d7a9ee1addeae6e0968e3899ae6bf01cee2" }

				tokio-postgres = { git = "https://github.com/neondatabase/rust-postgres.git", branch = "neon" }

				################# Binary contents sections

				[profile.release]

				# This is useful for profiling and, to some extent, debug.

				# Besides, debug info should not affect the performance.

				#

				# NB: we also enable frame pointers for improved profiling, see .cargo/config.toml.

				debug = true

				# disable debug symbols for all packages except this one to decrease binaries size

									
										65

Dockerfile
									
												View File
												
				@@ -2,9 +2,35 @@

				### The image itself is mainly used as a container for the binaries and for starting e2e tests with custom parameters.

				### By default, the binaries inside the image have some mock parameters and can start, but are not intended to be used

				### inside this image in the real deployments.

				ARG REPOSITORY=neondatabase

				ARG REPOSITORY=ghcr.io/neondatabase

				ARG IMAGE=build-tools

				ARG TAG=pinned

				ARG DEFAULT_PG_VERSION=17

				ARG STABLE_PG_VERSION=16

				ARG DEBIAN_VERSION=bookworm

				ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim

				# Here are the INDEX DIGESTS for the images we use.

				# You can get them following next steps for now:

				# 1. Get an authentication token from DockerHub:

				#    TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/debian:pull" | jq -r .token)

				# 2. Using that token, query index for the given tag:

				#    curl -s -H "Authorization: Bearer $TOKEN" \

				#       -H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \

				#       "https://registry.hub.docker.com/v2/library/debian/manifests/bullseye-slim" \

				#       -I | grep -i docker-content-digest

				# 3. As a next step, TODO(fedordikarev): create script and schedule workflow to run these checks

				#    and updates on regular bases and in automated way.

				ARG BOOKWORM_SLIM_SHA=sha256:40b107342c492725bc7aacbe93a49945445191ae364184a6d24fedb28172f6f7

				ARG BULLSEYE_SLIM_SHA=sha256:e831d9a884d63734fe3dd9c491ed9a5a3d4c6a6d32c5b14f2067357c49b0b7e1

				# Here we use ${var/search/replace} syntax, to check

				# if base image is one of the images, we pin image index for.

				# If var will match one the known images, we will replace it with the known sha.

				# If no match, than value will be unaffected, and will process with no-pinned image.

				ARG BASE_IMAGE_SHA=debian:${DEBIAN_FLAVOR}

				ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bookworm-slim/debian@$BOOKWORM_SLIM_SHA}

				ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bullseye-slim/debian@$BULLSEYE_SLIM_SHA}

				# Build Postgres

				FROM $REPOSITORY/$IMAGE:$TAG AS pg-build

				@@ -13,6 +39,7 @@ WORKDIR /home/nonroot

				COPY --chown=nonroot vendor/postgres-v14 vendor/postgres-v14

				COPY --chown=nonroot vendor/postgres-v15 vendor/postgres-v15

				COPY --chown=nonroot vendor/postgres-v16 vendor/postgres-v16

				COPY --chown=nonroot vendor/postgres-v17 vendor/postgres-v17

				COPY --chown=nonroot pgxn pgxn

				COPY --chown=nonroot Makefile Makefile

				COPY --chown=nonroot scripts/ninstall.sh scripts/ninstall.sh

				@@ -23,21 +50,38 @@ RUN set -e \

				    && rm -rf pg_install/build \

				    && tar -C pg_install -czf /home/nonroot/postgres_install.tar.gz .

				# Prepare cargo-chef recipe

				FROM $REPOSITORY/$IMAGE:$TAG AS plan

				WORKDIR /home/nonroot

				COPY --chown=nonroot . .

				RUN cargo chef prepare --recipe-path recipe.json

				# Build neon binaries

				FROM $REPOSITORY/$IMAGE:$TAG AS build

				WORKDIR /home/nonroot

				ARG GIT_VERSION=local

				ARG BUILD_TAG

				ARG STABLE_PG_VERSION

				COPY --from=pg-build /home/nonroot/pg_install/v14/include/postgresql/server pg_install/v14/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v15/include/postgresql/server pg_install/v15/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v16/include/postgresql/server pg_install/v16/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v17/include/postgresql/server pg_install/v17/include/postgresql/server

				COPY --from=pg-build /home/nonroot/pg_install/v16/lib                       pg_install/v16/lib

				COPY --from=pg-build /home/nonroot/pg_install/v17/lib                       pg_install/v17/lib

				COPY --from=plan     /home/nonroot/recipe.json                              recipe.json

				ARG ADDITIONAL_RUSTFLAGS=""

				RUN set -e \

				    && RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo chef cook --locked --release --recipe-path recipe.json

				COPY --chown=nonroot . .

				ARG ADDITIONAL_RUSTFLAGS

				RUN set -e \

				    && PQ_LIB_DIR=$(pwd)/pg_install/v16/lib RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment ${ADDITIONAL_RUSTFLAGS}" cargo build \

				    && RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo build \

				      --bin pg_sni_router  \

				      --bin pageserver  \

				      --bin pagectl  \

				@@ -51,15 +95,21 @@ RUN set -e \

				# Build final image

				#

				FROM debian:bullseye-slim

				FROM $BASE_IMAGE_SHA

				ARG DEFAULT_PG_VERSION

				WORKDIR /data

				RUN set -e \

				    && echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries \

				    && apt update \

				    && apt install -y \

				        libreadline-dev \

				        libseccomp-dev \

				        ca-certificates \

					# System postgres for use with client libraries (e.g. in storage controller)

				        postgresql-15 \

				        openssl \

				    && rm -f /etc/apt/apt.conf.d/80-retries \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \

				    && useradd -d /data neon \

				    && chown -R neon:neon /data

				@@ -77,6 +127,7 @@ COPY --from=build --chown=neon:neon /home/nonroot/target/release/storage_scrubbe

				COPY --from=pg-build /home/nonroot/pg_install/v14 /usr/local/v14/

				COPY --from=pg-build /home/nonroot/pg_install/v15 /usr/local/v15/

				COPY --from=pg-build /home/nonroot/pg_install/v16 /usr/local/v16/

				COPY --from=pg-build /home/nonroot/pg_install/v17 /usr/local/v17/

				COPY --from=pg-build /home/nonroot/postgres_install.tar.gz /data/

				# By default, pageserver uses `.neon/` working directory in WORKDIR, so create one and fill it with the dummy config.

				@@ -91,15 +142,9 @@ RUN mkdir -p /data/.neon/ && \

				  > /data/.neon/pageserver.toml && \

				  chown -R neon:neon /data/.neon

				# When running a binary that links with libpq, default to using our most recent postgres version.  Binaries

				# that want a particular postgres version will select it explicitly: this is just a default.

				ENV LD_LIBRARY_PATH=/usr/local/v16/lib

				VOLUME ["/data"]

				USER neon

				EXPOSE 6400

				EXPOSE 9898

				CMD ["/usr/local/bin/pageserver", "-D", "/data/.neon"]

									
										229

Dockerfile.build-tools
									
												View File
											
				@@ -1,229 +0,0 @@

				FROM debian:bullseye-slim

				# Use ARG as a build-time environment variable here to allow.

				# It's not supposed to be set outside.

				# Alternatively it can be obtained using the following command

				# ```

				# . /etc/os-release && echo "${VERSION_CODENAME}"

				# ```

				ARG DEBIAN_VERSION_CODENAME=bullseye

				# Add nonroot user

				RUN useradd -ms /bin/bash nonroot -b /home

				SHELL ["/bin/bash", "-c"]

				# System deps

				RUN set -e \

				    && apt update \

				    && apt install -y \

				        autoconf \

				        automake \

				        bison \

				        build-essential \

				        ca-certificates \

				        cmake \

				        curl \

				        flex \

				        git \

				        gnupg \

				        gzip \

				        jq \

				        libcurl4-openssl-dev \

				        libbz2-dev \

				        libffi-dev \

				        liblzma-dev \

				        libncurses5-dev \

				        libncursesw5-dev \

				        libreadline-dev \

				        libseccomp-dev \

				        libsqlite3-dev \

				        libssl-dev \

				        libstdc++-10-dev \

				        libtool \

				        libxml2-dev \

				        libxmlsec1-dev \

				        libxxhash-dev \

				        lsof \

				        make \

				        netcat \

				        net-tools \

				        openssh-client \

				        parallel \

				        pkg-config \

				        unzip \

				        wget \

				        xz-utils \

				        zlib1g-dev \

				        zstd \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# protobuf-compiler (protoc)

				ENV PROTOC_VERSION=25.1

				RUN curl -fsSL "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/protoc-${PROTOC_VERSION}-linux-$(uname -m | sed 's/aarch64/aarch_64/g').zip" -o "protoc.zip" \

				    && unzip -q protoc.zip -d protoc \

				    && mv protoc/bin/protoc /usr/local/bin/protoc \

				    && mv protoc/include/google /usr/local/include/google \

				    && rm -rf protoc.zip protoc

				# s5cmd

				ENV S5CMD_VERSION=2.2.2

				RUN curl -sL "https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_Linux-$(uname -m | sed 's/x86_64/64bit/g' | sed 's/aarch64/arm64/g').tar.gz" | tar zxvf - s5cmd \

				    && chmod +x s5cmd \

				    && mv s5cmd /usr/local/bin/s5cmd

				# LLVM

				ENV LLVM_VERSION=18

				RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \

				    && echo "deb http://apt.llvm.org/${DEBIAN_VERSION_CODENAME}/ llvm-toolchain-${DEBIAN_VERSION_CODENAME}-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \

				    && apt update \

				    && apt install -y clang-${LLVM_VERSION} llvm-${LLVM_VERSION} \

				    && bash -c 'for f in /usr/bin/clang*-${LLVM_VERSION} /usr/bin/llvm*-${LLVM_VERSION}; do ln -s "${f}" "${f%-${LLVM_VERSION}}"; done' \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# Install docker

				RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \

				    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian ${DEBIAN_VERSION_CODENAME} stable" > /etc/apt/sources.list.d/docker.list \

				    && apt update \

				    && apt install -y docker-ce docker-ce-cli \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# Configure sudo & docker

				RUN usermod -aG sudo nonroot && \

				    echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers && \

				    usermod -aG docker nonroot

				# AWS CLI

				RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "awscliv2.zip" \

				    && unzip -q awscliv2.zip \

				    && ./aws/install \

				    && rm awscliv2.zip

				# Mold: A Modern Linker

				ENV MOLD_VERSION=v2.33.0

				RUN set -e \

				    && git clone https://github.com/rui314/mold.git \

				    && mkdir mold/build \

				    && cd mold/build \

				    && git checkout ${MOLD_VERSION} \

				    && cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ .. \

				    && cmake --build . -j $(nproc) \

				    && cmake --install . \

				    && cd .. \

				    && rm -rf mold

				# LCOV

				# Build lcov from a fork:

				# It includes several bug fixes on top on v2.0 release (https://github.com/linux-test-project/lcov/compare/v2.0...master)

				# And patches from us:

				# - Generates json file with code coverage summary (https://github.com/neondatabase/lcov/commit/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz)

				RUN for package in Capture::Tiny DateTime Devel::Cover Digest::MD5 File::Spec JSON::XS Memory::Process Time::HiRes JSON; do yes | perl -MCPAN -e "CPAN::Shell->notest('install', '$package')"; done \

				    && wget https://github.com/neondatabase/lcov/archive/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz -O lcov.tar.gz \

				    && echo "61a22a62e20908b8b9e27d890bd0ea31f567a7b9668065589266371dcbca0992  lcov.tar.gz" | sha256sum --check \

				    && mkdir -p lcov && tar -xzf lcov.tar.gz -C lcov --strip-components=1 \

				    && cd lcov \

				    && make install \

				    && rm -rf ../lcov.tar.gz

				# Compile and install the static OpenSSL library

				ENV OPENSSL_VERSION=1.1.1w

				ENV OPENSSL_PREFIX=/usr/local/openssl

				RUN wget -O /tmp/openssl-${OPENSSL_VERSION}.tar.gz https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz && \

				    echo "cf3098950cb4d853ad95c0841f1f9c6d3dc102dccfcacd521d93925208b76ac8 /tmp/openssl-${OPENSSL_VERSION}.tar.gz" | sha256sum --check && \

				    cd /tmp && \

				    tar xzvf /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \

				    rm /tmp/openssl-${OPENSSL_VERSION}.tar.gz && \

				    cd /tmp/openssl-${OPENSSL_VERSION} && \

				    ./config --prefix=${OPENSSL_PREFIX}  -static --static no-shared -fPIC && \

				    make -j "$(nproc)" && \

				    make install && \

				    cd /tmp && \

				    rm -rf /tmp/openssl-${OPENSSL_VERSION}

				# Use the same version of libicu as the compute nodes so that

				# clusters created using inidb on pageserver can be used by computes.

				#

				# TODO: at this time, Dockerfile.compute-node uses the debian bullseye libicu

				# package, which is 67.1. We're duplicating that knowledge here, and also, technically,

				# Debian has a few patches on top of 67.1 that we're not adding here.

				ENV ICU_VERSION=67.1

				ENV ICU_PREFIX=/usr/local/icu

				# Download and build static ICU

				RUN wget -O /tmp/libicu-${ICU_VERSION}.tgz https://github.com/unicode-org/icu/releases/download/release-${ICU_VERSION//./-}/icu4c-${ICU_VERSION//./_}-src.tgz && \

				    echo "94a80cd6f251a53bd2a997f6f1b5ac6653fe791dfab66e1eb0227740fb86d5dc /tmp/libicu-${ICU_VERSION}.tgz" | sha256sum --check && \

				    mkdir /tmp/icu && \

				    pushd /tmp/icu && \

				    tar -xzf /tmp/libicu-${ICU_VERSION}.tgz && \

				    pushd icu/source && \

				    ./configure --prefix=${ICU_PREFIX}  --enable-static --enable-shared=no CXXFLAGS="-fPIC" CFLAGS="-fPIC" && \

				    make -j "$(nproc)" && \

				    make install && \

				    popd && \

				    rm -rf icu && \

				    rm -f /tmp/libicu-${ICU_VERSION}.tgz && \

				    popd

				# Switch to nonroot user

				USER nonroot:nonroot

				WORKDIR /home/nonroot

				# Python

				ENV PYTHON_VERSION=3.9.19 \

				    PYENV_ROOT=/home/nonroot/.pyenv \

				    PATH=/home/nonroot/.pyenv/shims:/home/nonroot/.pyenv/bin:/home/nonroot/.poetry/bin:$PATH

				RUN set -e \

				    && cd $HOME \

				    && curl -sSO https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer \

				    && chmod +x pyenv-installer \

				    && ./pyenv-installer \

				    && export PYENV_ROOT=/home/nonroot/.pyenv \

				    && export PATH="$PYENV_ROOT/bin:$PATH" \

				    && export PATH="$PYENV_ROOT/shims:$PATH" \

				    && pyenv install ${PYTHON_VERSION} \

				    && pyenv global ${PYTHON_VERSION} \

				    && python --version \

				    && pip install --upgrade pip \

				    && pip --version \

				    && pip install pipenv wheel poetry

				# Switch to nonroot user (again)

				USER nonroot:nonroot

				WORKDIR /home/nonroot

				# Rust

				# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)

				ENV RUSTC_VERSION=1.81.0

				ENV RUSTUP_HOME="/home/nonroot/.rustup"

				ENV PATH="/home/nonroot/.cargo/bin:${PATH}"

				ARG RUSTFILT_VERSION=0.2.1

				ARG CARGO_HAKARI_VERSION=0.9.30

				ARG CARGO_DENY_VERSION=0.16.1

				ARG CARGO_HACK_VERSION=0.6.31

				ARG CARGO_NEXTEST_VERSION=0.9.72

				RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && whoami && \

					chmod +x rustup-init && \

					./rustup-init -y --default-toolchain ${RUSTC_VERSION} && \

					rm rustup-init && \

				    export PATH="$HOME/.cargo/bin:$PATH" && \

				    . "$HOME/.cargo/env" && \

				    cargo --version && rustup --version && \

				    rustup component add llvm-tools rustfmt clippy && \

				    cargo install rustfilt            --version ${RUSTFILT_VERSION} && \

				    cargo install cargo-hakari        --version ${CARGO_HAKARI_VERSION} && \

				    cargo install cargo-deny --locked --version ${CARGO_DENY_VERSION} && \

				    cargo install cargo-hack          --version ${CARGO_HACK_VERSION} && \

				    cargo install cargo-nextest       --version ${CARGO_NEXTEST_VERSION} && \

				    rm -rf /home/nonroot/.cargo/registry && \

				    rm -rf /home/nonroot/.cargo/git

				# Show versions

				RUN whoami \

				    && python --version \

				    && pip --version \

				    && cargo --version --verbose \

				    && rustup --version --verbose \

				    && rustc --version --verbose \

				    && clang --version

				# Set following flag to check in Makefile if its running in Docker

				RUN touch /home/nonroot/.docker_build

1039

Dockerfile.compute-node

View File

File diff suppressed because it is too large Load Diff

									
										103

Makefile
									
												View File
												
				@@ -3,7 +3,6 @@ ROOT_PROJECT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))

				# Where to install Postgres, default is ./pg_install, maybe useful for package managers

				POSTGRES_INSTALL_DIR ?= $(ROOT_PROJECT_DIR)/pg_install/

				OPENSSL_PREFIX_DIR := /usr/local/openssl

				ICU_PREFIX_DIR := /usr/local/icu

				#

				@@ -11,33 +10,46 @@ ICU_PREFIX_DIR := /usr/local/icu

				# environment variable.

				#

				BUILD_TYPE ?= debug

				WITH_SANITIZERS ?= no

				PG_CFLAGS = -fsigned-char

				ifeq ($(BUILD_TYPE),release)

					PG_CONFIGURE_OPTS = --enable-debug --with-openssl

					PG_CFLAGS = -O2 -g3 $(CFLAGS)

					PG_CFLAGS += -O2 -g3 $(CFLAGS)

					PG_LDFLAGS = $(LDFLAGS)

					# Unfortunately, `--profile=...` is a nightly feature

					CARGO_BUILD_FLAGS += --release

				else ifeq ($(BUILD_TYPE),debug)

					PG_CONFIGURE_OPTS = --enable-debug --with-openssl --enable-cassert --enable-depend

					PG_CFLAGS = -O0 -g3 $(CFLAGS)

					PG_CFLAGS += -O0 -g3 $(CFLAGS)

					PG_LDFLAGS = $(LDFLAGS)

				else

					$(error Bad build type '$(BUILD_TYPE)', see Makefile for options)

				endif

				ifeq ($(WITH_SANITIZERS),yes)

					PG_CFLAGS += -fsanitize=address -fsanitize=undefined -fno-sanitize-recover

					COPT += -Wno-error # to avoid failing on warnings induced by sanitizers

					PG_LDFLAGS = -fsanitize=address -fsanitize=undefined -static-libasan -static-libubsan $(LDFLAGS)

					export CC := gcc

					export ASAN_OPTIONS := detect_leaks=0

				endif

				ifeq ($(shell test -e /home/nonroot/.docker_build && echo -n yes),yes)

					# Exclude static build openssl, icu for local build (MacOS, Linux)

					# Only keep for build type release and debug

					PG_CFLAGS += -I$(OPENSSL_PREFIX_DIR)/include

					PG_CONFIGURE_OPTS += --with-icu

					PG_CONFIGURE_OPTS += ICU_CFLAGS='-I/$(ICU_PREFIX_DIR)/include -DU_STATIC_IMPLEMENTATION'

					PG_CONFIGURE_OPTS += ICU_LIBS='-L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -licui18n -licuuc -licudata -lstdc++ -Wl,-Bdynamic -lm'

					PG_CONFIGURE_OPTS += LDFLAGS='-L$(OPENSSL_PREFIX_DIR)/lib -L$(OPENSSL_PREFIX_DIR)/lib64 -L$(ICU_PREFIX_DIR)/lib -L$(ICU_PREFIX_DIR)/lib64 -Wl,-Bstatic -lssl -lcrypto -Wl,-Bdynamic -lrt -lm -ldl -lpthread'

				endif

				UNAME_S := $(shell uname -s)

				ifeq ($(UNAME_S),Linux)

					# Seccomp BPF is only available for Linux

					PG_CONFIGURE_OPTS += --with-libseccomp

					ifneq ($(WITH_SANITIZERS),yes)

						PG_CONFIGURE_OPTS += --with-libseccomp

					endif

				else ifeq ($(UNAME_S),Darwin)

					PG_CFLAGS += -DUSE_PREFETCH

					ifndef DISABLE_HOMEBREW

						# macOS with brew-installed openssl requires explicit paths

						# It can be configured with OPENSSL_PREFIX variable

				@@ -66,8 +78,6 @@ CARGO_BUILD_FLAGS += $(filter -j1,$(MAKEFLAGS))

				CARGO_CMD_PREFIX += $(if $(filter n,$(MAKEFLAGS)),,+)

				# Force cargo not to print progress bar

				CARGO_CMD_PREFIX += CARGO_TERM_PROGRESS_WHEN=never CI=1

				# Set PQ_LIB_DIR to make sure `storage_controller` get linked with bundled libpq (through diesel)

				CARGO_CMD_PREFIX += PQ_LIB_DIR=$(POSTGRES_INSTALL_DIR)/v16/lib

				CACHEDIR_TAG_CONTENTS := "Signature: 8a477f597d28d172789f06886806bc55"

				@@ -110,7 +120,7 @@ $(POSTGRES_INSTALL_DIR)/build/%/config.status:

					EXTRA_VERSION=$$(cd $(ROOT_PROJECT_DIR)/vendor/postgres-$$VERSION && git rev-parse HEAD); \

					(cd $(POSTGRES_INSTALL_DIR)/build/$$VERSION && \

					env PATH="$(EXTRA_PATH_OVERRIDES):$$PATH" $(ROOT_PROJECT_DIR)/vendor/postgres-$$VERSION/configure \

						CFLAGS='$(PG_CFLAGS)' \

						CFLAGS='$(PG_CFLAGS)' LDFLAGS='$(PG_LDFLAGS)' \

						$(PG_CONFIGURE_OPTS) --with-extra-version=" ($$EXTRA_VERSION)" \

						--prefix=$(abspath $(POSTGRES_INSTALL_DIR))/$$VERSION > configure.log)

				@@ -119,6 +129,8 @@ $(POSTGRES_INSTALL_DIR)/build/%/config.status:

				# I'm not sure why it wouldn't work, but this is the only place (apart from

				# the "build-all-versions" entry points) where direct mention of PostgreSQL

				# versions is used.

				.PHONY: postgres-configure-v17

				postgres-configure-v17: $(POSTGRES_INSTALL_DIR)/build/v17/config.status

				.PHONY: postgres-configure-v16

				postgres-configure-v16: $(POSTGRES_INSTALL_DIR)/build/v16/config.status

				.PHONY: postgres-configure-v15

				@@ -144,8 +156,12 @@ postgres-%: postgres-configure-% \

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_prewarm install

					+@echo "Compiling pg_buffercache $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_buffercache install

					+@echo "Compiling pg_visibility $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_visibility install

					+@echo "Compiling pageinspect $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pageinspect install

					+@echo "Compiling pg_trgm $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/pg_trgm install

					+@echo "Compiling amcheck $*"

					$(MAKE) -C $(POSTGRES_INSTALL_DIR)/build/$*/contrib/amcheck install

					+@echo "Compiling test_decoding $*"

				@@ -166,27 +182,27 @@ postgres-check-%: postgres-%

				neon-pg-ext-%: postgres-%

					+@echo "Compiling neon $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile install

					+@echo "Compiling neon_walredo $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-walredo-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_walredo/Makefile install

					+@echo "Compiling neon_rmgr $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-rmgr-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_rmgr/Makefile install

					+@echo "Compiling neon_test_utils $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-test-utils-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_test_utils/Makefile install

					+@echo "Compiling neon_utils $*"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/neon-utils-$*

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/$*/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-utils-$* \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon_utils/Makefile install

				@@ -215,29 +231,31 @@ neon-pg-clean-ext-%:

				# they depend on openssl and other libraries that are not included in our

				# Rust build.

				.PHONY: walproposer-lib

				walproposer-lib: neon-pg-ext-v16

				walproposer-lib: neon-pg-ext-v17

					+@echo "Compiling walproposer-lib"

					mkdir -p $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config COPT='$(COPT)' \

						-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgport.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v16/lib/libpgcommon.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

				ifeq ($(UNAME_S),Linux)

					cp $(POSTGRES_INSTALL_DIR)/v17/lib/libpgport.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					cp $(POSTGRES_INSTALL_DIR)/v17/lib/libpgcommon.a $(POSTGRES_INSTALL_DIR)/build/walproposer-lib

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgport.a \

						pg_strong_random.o

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgcommon.a \

						pg_crc32c.o \

						hmac_openssl.o \

						checksum_helper.o \

						cryptohash_openssl.o \

						scram-common.o \

						hmac_openssl.o \

						md5_common.o \

						checksum_helper.o

						parse_manifest.o \

						scram-common.o

				ifeq ($(UNAME_S),Linux)

					$(AR) d $(POSTGRES_INSTALL_DIR)/build/walproposer-lib/libpgcommon.a \

						pg_crc32c.o

				endif

				.PHONY: walproposer-lib-clean

				walproposer-lib-clean:

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config \

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config \

						-C $(POSTGRES_INSTALL_DIR)/build/walproposer-lib \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile clean

				@@ -245,48 +263,55 @@ walproposer-lib-clean:

				neon-pg-ext: \

					neon-pg-ext-v14 \

					neon-pg-ext-v15 \

					neon-pg-ext-v16

					neon-pg-ext-v16 \

					neon-pg-ext-v17

				.PHONY: neon-pg-clean-ext

				neon-pg-clean-ext: \

					neon-pg-clean-ext-v14 \

					neon-pg-clean-ext-v15 \

					neon-pg-clean-ext-v16

					neon-pg-clean-ext-v16 \

					neon-pg-clean-ext-v17

				# shorthand to build all Postgres versions

				.PHONY: postgres

				postgres: \

					postgres-v14 \

					postgres-v15 \

					postgres-v16

					postgres-v16 \

					postgres-v17

				.PHONY: postgres-headers

				postgres-headers: \

					postgres-headers-v14 \

					postgres-headers-v15 \

					postgres-headers-v16

					postgres-headers-v16 \

					postgres-headers-v17

				.PHONY: postgres-clean

				postgres-clean: \

					postgres-clean-v14 \

					postgres-clean-v15 \

					postgres-clean-v16

					postgres-clean-v16 \

					postgres-clean-v17

				.PHONY: postgres-check

				postgres-check: \

					postgres-check-v14 \

					postgres-check-v15 \

					postgres-check-v16

					postgres-check-v16 \

					postgres-check-v17

				# This doesn't remove the effects of 'configure'.

				.PHONY: clean

				clean: postgres-clean neon-pg-clean-ext

					$(MAKE) -C compute clean

					$(CARGO_CMD_PREFIX) cargo clean

				# This removes everything

				.PHONY: distclean

				distclean:

					rm -rf $(POSTGRES_INSTALL_DIR)

					$(RM) -r $(POSTGRES_INSTALL_DIR)

					$(CARGO_CMD_PREFIX) cargo clean

				.PHONY: fmt

				@@ -318,16 +343,16 @@ postgres-%-pgindent: postgres-%-pg-bsd-indent postgres-%-typedefs.list

						$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/pgindent --typedefs postgres-$*-typedefs-full.list \

						$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/ \

						--excludes $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/exclude_file_patterns

					rm -f pg*.BAK

					$(RM) pg*.BAK

				# Indent pxgn/neon.

				.PHONY: pgindent

				neon-pgindent: postgres-v16-pg-bsd-indent neon-pg-ext-v16

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v16/bin/pg_config CFLAGS='$(PG_CFLAGS) $(COPT)' \

						FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/find_typedef \

						INDENT=$(POSTGRES_INSTALL_DIR)/build/v16/src/tools/pg_bsd_indent/pg_bsd_indent \

						PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v16/src/tools/pgindent/pgindent \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-v16 \

				.PHONY: neon-pgindent

				neon-pgindent: postgres-v17-pg-bsd-indent neon-pg-ext-v17

					$(MAKE) PG_CONFIG=$(POSTGRES_INSTALL_DIR)/v17/bin/pg_config COPT='$(COPT)' \

						FIND_TYPEDEF=$(ROOT_PROJECT_DIR)/vendor/postgres-v17/src/tools/find_typedef \

						INDENT=$(POSTGRES_INSTALL_DIR)/build/v17/src/tools/pg_bsd_indent/pg_bsd_indent \

						PGINDENT_SCRIPT=$(ROOT_PROJECT_DIR)/vendor/postgres-v17/src/tools/pgindent/pgindent \

						-C $(POSTGRES_INSTALL_DIR)/build/neon-v17 \

						-f $(ROOT_PROJECT_DIR)/pgxn/neon/Makefile pgindent

									
										12

README.md
									
												View File
												
				@@ -21,8 +21,10 @@ The Neon storage engine consists of two major components:

				See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more information.

				## Running local installation

				## Running a local development environment

				Neon can be run on a workstation for small experiments and to test code changes, by

				following these instructions.

				#### Installing dependencies on Linux

				1. Install build dependencies and other applicable packages

				@@ -31,7 +33,7 @@ See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more informati

				```bash

				apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \

				libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler \

				libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev

				libprotobuf-dev libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev

				```

				* On Fedora, these packages are needed:

				```bash

				@@ -58,7 +60,7 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

				1. Install XCode and dependencies

				```

				xcode-select --install

				brew install protobuf openssl flex bison icu4c pkg-config

				brew install protobuf openssl flex bison icu4c pkg-config m4

				# add openssl to PATH, required for ed25519 keys generation in neon_local

				echo 'export PATH="$(brew --prefix openssl)/bin:$PATH"' >> ~/.zshrc

				@@ -132,7 +134,7 @@ make -j`sysctl -n hw.logicalcpu` -s

				To run the `psql` client, install the `postgresql-client` package or modify `PATH` and `LD_LIBRARY_PATH` to include `pg_install/bin` and `pg_install/lib`, respectively.

				To run the integration tests or Python scripts (not required to use the code), install

				Python (3.9 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.8](https://python-poetry.org/)) in the project directory.

				Python (3.11 or higher), and install the python3 packages using `./scripts/pysync` (requires [poetry>=1.8](https://python-poetry.org/)) in the project directory.

				#### Running neon database

				@@ -238,7 +240,7 @@ postgres=# select * from t;

				> cargo neon stop

				```

				More advanced usages can be found at [Control Plane and Neon Local](./control_plane/README.md).

				More advanced usages can be found at [Local Development Control Plane (`neon_local`))](./control_plane/README.md).

				#### Handling build failures

									
										340

build-tools.Dockerfile
									
										Normal file
									
												View File
												
				@@ -0,0 +1,340 @@

				ARG DEBIAN_VERSION=bookworm

				ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim

				# Here are the INDEX DIGESTS for the images we use.

				# You can get them following next steps for now:

				# 1. Get an authentication token from DockerHub:

				#    TOKEN=$(curl -s "https://auth.docker.io/token?service=registry.docker.io&scope=repository:library/debian:pull" | jq -r .token)

				# 2. Using that token, query index for the given tag:

				#    curl -s -H "Authorization: Bearer $TOKEN" \

				#       -H "Accept: application/vnd.docker.distribution.manifest.list.v2+json" \

				#       "https://registry.hub.docker.com/v2/library/debian/manifests/bullseye-slim" \

				#       -I | grep -i docker-content-digest

				# 3. As a next step, TODO(fedordikarev): create script and schedule workflow to run these checks

				#    and updates on regular bases and in automated way.

				ARG BOOKWORM_SLIM_SHA=sha256:40b107342c492725bc7aacbe93a49945445191ae364184a6d24fedb28172f6f7

				ARG BULLSEYE_SLIM_SHA=sha256:e831d9a884d63734fe3dd9c491ed9a5a3d4c6a6d32c5b14f2067357c49b0b7e1

				# Here we use ${var/search/replace} syntax, to check

				# if base image is one of the images, we pin image index for.

				# If var will match one the known images, we will replace it with the known sha.

				# If no match, than value will be unaffected, and will process with no-pinned image.

				ARG BASE_IMAGE_SHA=debian:${DEBIAN_FLAVOR}

				ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bookworm-slim/debian@$BOOKWORM_SLIM_SHA}

				ARG BASE_IMAGE_SHA=${BASE_IMAGE_SHA/debian:bullseye-slim/debian@$BULLSEYE_SLIM_SHA}

				FROM $BASE_IMAGE_SHA AS pgcopydb_builder

				ARG DEBIAN_VERSION

				# Use strict mode for bash to catch errors early

				SHELL ["/bin/bash", "-euo", "pipefail", "-c"]

				# By default, /bin/sh used in debian images will treat '\n' as eol,

				# but as we use bash as SHELL, and built-in echo in bash requires '-e' flag for that.

				RUN echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries && \

				    echo -e "retry_connrefused=on\ntimeout=15\ntries=5\nretry-on-host-error=on\n" > /root/.wgetrc && \

				    echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /root/.curlrc

				COPY build_tools/patches/pgcopydbv017.patch /pgcopydbv017.patch

				RUN if [ "${DEBIAN_VERSION}" = "bookworm" ]; then \

				        set -e && \

				        apt update && \

				        apt install -y --no-install-recommends \

				        ca-certificates wget gpg && \

				        wget -qO - https://www.postgresql.org/media/keys/ACCC4CF8.asc | gpg --dearmor -o /usr/share/keyrings/postgresql-keyring.gpg && \

				        echo "deb [signed-by=/usr/share/keyrings/postgresql-keyring.gpg] http://apt.postgresql.org/pub/repos/apt bookworm-pgdg main" > /etc/apt/sources.list.d/pgdg.list && \

				        apt-get update && \

				        apt install -y --no-install-recommends \

				        build-essential \

				        autotools-dev \

				        libedit-dev \

				        libgc-dev \

				        libpam0g-dev \

				        libreadline-dev \

				        libselinux1-dev \

				        libxslt1-dev \

				        libssl-dev \

				        libkrb5-dev \

				        zlib1g-dev \

				        liblz4-dev \

				        libpq5 \

				        libpq-dev \

				        libzstd-dev \

				        postgresql-16 \

				        postgresql-server-dev-16 \

				        postgresql-common  \

				        python3-sphinx && \

				        wget -O /tmp/pgcopydb.tar.gz https://github.com/dimitri/pgcopydb/archive/refs/tags/v0.17.tar.gz && \

				        mkdir /tmp/pgcopydb && \

				        tar -xzf /tmp/pgcopydb.tar.gz -C /tmp/pgcopydb --strip-components=1 && \

				        cd /tmp/pgcopydb && \

				        patch -p1 < /pgcopydbv017.patch && \

				        make -s clean && \

				        make -s -j12 install && \

				        libpq_path=$(find /lib /usr/lib -name "libpq.so.5" | head -n 1) && \

				        mkdir -p /pgcopydb/lib && \

				        cp "$libpq_path" /pgcopydb/lib/; \

				    else \

				        # copy command below will fail if we don't have dummy files, so we create them for other debian versions

				        mkdir -p /usr/lib/postgresql/16/bin && touch /usr/lib/postgresql/16/bin/pgcopydb && \

				        mkdir -p mkdir -p /pgcopydb/lib && touch /pgcopydb/lib/libpq.so.5; \

				    fi

				FROM $BASE_IMAGE_SHA AS build_tools

				ARG DEBIAN_VERSION

				# Add nonroot user

				RUN useradd -ms /bin/bash nonroot -b /home

				# Use strict mode for bash to catch errors early

				SHELL ["/bin/bash", "-euo", "pipefail", "-c"]

				RUN mkdir -p /pgcopydb/bin && \

				    mkdir -p /pgcopydb/lib && \

				    chmod -R 755 /pgcopydb && \

				    chown -R nonroot:nonroot /pgcopydb

				COPY --from=pgcopydb_builder /usr/lib/postgresql/16/bin/pgcopydb /pgcopydb/bin/pgcopydb

				COPY --from=pgcopydb_builder /pgcopydb/lib/libpq.so.5 /pgcopydb/lib/libpq.so.5

				RUN echo 'Acquire::Retries "5";' > /etc/apt/apt.conf.d/80-retries && \

				    echo -e "retry_connrefused=on\ntimeout=15\ntries=5\nretry-on-host-error=on\n" > /root/.wgetrc && \

				    echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /root/.curlrc

				# System deps

				#

				# 'gdb' is included so that we get backtraces of core dumps produced in

				# regression tests

				RUN set -e \

				    && apt update \

				    && apt install -y \

				        autoconf \

				        automake \

				        bison \

				        build-essential \

				        ca-certificates \

				        cmake \

				        curl \

				        flex \

				        gdb \

				        git \

				        gnupg \

				        gzip \

				        jq \

				        jsonnet \

				        libcurl4-openssl-dev \

				        libbz2-dev \

				        libffi-dev \

				        liblzma-dev \

				        libncurses5-dev \

				        libncursesw5-dev \

				        libreadline-dev \

				        libseccomp-dev \

				        libsqlite3-dev \

				        libssl-dev \

				        $([[ "${DEBIAN_VERSION}" = "bullseye" ]] && echo libstdc++-10-dev || echo libstdc++-11-dev) \

				        libtool \

				        libxml2-dev \

				        libxmlsec1-dev \

				        libxxhash-dev \

				        lsof \

				        make \

				        netcat-openbsd \

				        net-tools \

				        openssh-client \

				        parallel \

				        pkg-config \

				        unzip \

				        wget \

				        xz-utils \

				        zlib1g-dev \

				        zstd \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# sql_exporter

				# Keep the version the same as in compute/compute-node.Dockerfile and

				# test_runner/regress/test_compute_metrics.py.

				ENV SQL_EXPORTER_VERSION=0.17.0

				RUN curl -fsSL \

				    "https://github.com/burningalchemist/sql_exporter/releases/download/${SQL_EXPORTER_VERSION}/sql_exporter-${SQL_EXPORTER_VERSION}.linux-$(case "$(uname -m)" in x86_64) echo amd64;; aarch64) echo arm64;; esac).tar.gz" \

				    --output sql_exporter.tar.gz \

				    && mkdir /tmp/sql_exporter \

				    && tar xzvf sql_exporter.tar.gz -C /tmp/sql_exporter --strip-components=1 \

				    && mv /tmp/sql_exporter/sql_exporter /usr/local/bin/sql_exporter \

				    && rm sql_exporter.tar.gz

				# protobuf-compiler (protoc)

				ENV PROTOC_VERSION=25.1

				RUN curl -fsSL "https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOC_VERSION}/protoc-${PROTOC_VERSION}-linux-$(uname -m | sed 's/aarch64/aarch_64/g').zip" -o "protoc.zip" \

				    && unzip -q protoc.zip -d protoc \

				    && mv protoc/bin/protoc /usr/local/bin/protoc \

				    && mv protoc/include/google /usr/local/include/google \

				    && rm -rf protoc.zip protoc

				# s5cmd

				ENV S5CMD_VERSION=2.2.2

				RUN curl -sL "https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/s5cmd_${S5CMD_VERSION}_Linux-$(uname -m | sed 's/x86_64/64bit/g' | sed 's/aarch64/arm64/g').tar.gz" | tar zxvf - s5cmd \

				    && chmod +x s5cmd \

				    && mv s5cmd /usr/local/bin/s5cmd

				# LLVM

				ENV LLVM_VERSION=19

				RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \

				    && echo "deb http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain-${DEBIAN_VERSION}-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \

				    && apt update \

				    && apt install -y clang-${LLVM_VERSION} llvm-${LLVM_VERSION} \

				    && bash -c 'for f in /usr/bin/clang*-${LLVM_VERSION} /usr/bin/llvm*-${LLVM_VERSION}; do ln -s "${f}" "${f%-${LLVM_VERSION}}"; done' \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# Install docker

				RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \

				    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian ${DEBIAN_VERSION} stable" > /etc/apt/sources.list.d/docker.list \

				    && apt update \

				    && apt install -y docker-ce docker-ce-cli \

				    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# Configure sudo & docker

				RUN usermod -aG sudo nonroot && \

				    echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers && \

				    usermod -aG docker nonroot

				# AWS CLI

				RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "awscliv2.zip" \

				    && unzip -q awscliv2.zip \

				    && ./aws/install \

				    && rm awscliv2.zip

				# Mold: A Modern Linker

				ENV MOLD_VERSION=v2.34.1

				RUN set -e \

				    && git clone https://github.com/rui314/mold.git \

				    && mkdir mold/build \

				    && cd mold/build \

				    && git checkout ${MOLD_VERSION} \

				    && cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=clang++ .. \

				    && cmake --build . -j $(nproc) \

				    && cmake --install . \

				    && cd .. \

				    && rm -rf mold

				# LCOV

				# Build lcov from a fork:

				# It includes several bug fixes on top on v2.0 release (https://github.com/linux-test-project/lcov/compare/v2.0...master)

				# And patches from us:

				# - Generates json file with code coverage summary (https://github.com/neondatabase/lcov/commit/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz)

				RUN set +o pipefail && \

					 for package in Capture::Tiny DateTime Devel::Cover Digest::MD5 File::Spec JSON::XS Memory::Process Time::HiRes JSON; do \

						yes | perl -MCPAN -e "CPAN::Shell->notest('install', '$package')";\

					 done && \

					set -o pipefail

				# Split into separate step to debug flaky failures here

				RUN wget https://github.com/neondatabase/lcov/archive/426e7e7a22f669da54278e9b55e6d8caabd00af0.tar.gz -O lcov.tar.gz \

				    && ls -laht lcov.tar.gz && sha256sum lcov.tar.gz \

				    && echo "61a22a62e20908b8b9e27d890bd0ea31f567a7b9668065589266371dcbca0992  lcov.tar.gz" | sha256sum --check \

				    && mkdir -p lcov && tar -xzf lcov.tar.gz -C lcov --strip-components=1 \

				    && cd lcov \

				    && make install \

				    && rm -rf ../lcov.tar.gz

				# Use the same version of libicu as the compute nodes so that

				# clusters created using inidb on pageserver can be used by computes.

				#

				# TODO: at this time, compute-node.Dockerfile uses the debian bullseye libicu

				# package, which is 67.1. We're duplicating that knowledge here, and also, technically,

				# Debian has a few patches on top of 67.1 that we're not adding here.

				ENV ICU_VERSION=67.1

				ENV ICU_PREFIX=/usr/local/icu

				# Download and build static ICU

				RUN wget -O /tmp/libicu-${ICU_VERSION}.tgz https://github.com/unicode-org/icu/releases/download/release-${ICU_VERSION//./-}/icu4c-${ICU_VERSION//./_}-src.tgz && \

				    echo "94a80cd6f251a53bd2a997f6f1b5ac6653fe791dfab66e1eb0227740fb86d5dc /tmp/libicu-${ICU_VERSION}.tgz" | sha256sum --check && \

				    mkdir /tmp/icu && \

				    pushd /tmp/icu && \

				    tar -xzf /tmp/libicu-${ICU_VERSION}.tgz && \

				    pushd icu/source && \

				    ./configure --prefix=${ICU_PREFIX}  --enable-static --enable-shared=no CXXFLAGS="-fPIC" CFLAGS="-fPIC" && \

				    make -j "$(nproc)" && \

				    make install && \

				    popd && \

				    rm -rf icu && \

				    rm -f /tmp/libicu-${ICU_VERSION}.tgz && \

				    popd

				# Switch to nonroot user

				USER nonroot:nonroot

				WORKDIR /home/nonroot

				RUN echo -e "--retry-connrefused\n--connect-timeout 15\n--retry 5\n--max-time 300\n" > /home/nonroot/.curlrc

				# Python

				ENV PYTHON_VERSION=3.11.10 \

				    PYENV_ROOT=/home/nonroot/.pyenv \

				    PATH=/home/nonroot/.pyenv/shims:/home/nonroot/.pyenv/bin:/home/nonroot/.poetry/bin:$PATH

				RUN set -e \

				    && cd $HOME \

				    && curl -sSO https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer \

				    && chmod +x pyenv-installer \

				    && ./pyenv-installer \

				    && export PYENV_ROOT=/home/nonroot/.pyenv \

				    && export PATH="$PYENV_ROOT/bin:$PATH" \

				    && export PATH="$PYENV_ROOT/shims:$PATH" \

				    && pyenv install ${PYTHON_VERSION} \

				    && pyenv global ${PYTHON_VERSION} \

				    && python --version \

				    && pip install --upgrade pip \

				    && pip --version \

				    && pip install pipenv wheel poetry

				# Switch to nonroot user (again)

				USER nonroot:nonroot

				WORKDIR /home/nonroot

				# Rust

				# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)

				ENV RUSTC_VERSION=1.86.0

				ENV RUSTUP_HOME="/home/nonroot/.rustup"

				ENV PATH="/home/nonroot/.cargo/bin:${PATH}"

				ARG RUSTFILT_VERSION=0.2.1

				ARG CARGO_HAKARI_VERSION=0.9.33

				ARG CARGO_DENY_VERSION=0.16.2

				ARG CARGO_HACK_VERSION=0.6.33

				ARG CARGO_NEXTEST_VERSION=0.9.85

				ARG CARGO_CHEF_VERSION=0.1.71

				ARG CARGO_DIESEL_CLI_VERSION=2.2.6

				RUN curl -sSO https://static.rust-lang.org/rustup/dist/$(uname -m)-unknown-linux-gnu/rustup-init && whoami && \

					chmod +x rustup-init && \

					./rustup-init -y --default-toolchain ${RUSTC_VERSION} && \

					rm rustup-init && \

				    export PATH="$HOME/.cargo/bin:$PATH" && \

				    . "$HOME/.cargo/env" && \

				    cargo --version && rustup --version && \

				    rustup component add llvm-tools rustfmt clippy && \

				    cargo install rustfilt            --version ${RUSTFILT_VERSION} && \

				    cargo install cargo-hakari        --version ${CARGO_HAKARI_VERSION} && \

				    cargo install cargo-deny --locked --version ${CARGO_DENY_VERSION} && \

				    cargo install cargo-hack          --version ${CARGO_HACK_VERSION} && \

				    cargo install cargo-nextest       --version ${CARGO_NEXTEST_VERSION} && \

				    cargo install cargo-chef --locked --version ${CARGO_CHEF_VERSION} && \

				    cargo install diesel_cli          --version ${CARGO_DIESEL_CLI_VERSION} \

				                                      --features postgres-bundled --no-default-features && \

				    rm -rf /home/nonroot/.cargo/registry && \

				    rm -rf /home/nonroot/.cargo/git

				# Show versions

				RUN whoami \

				    && python --version \

				    && pip --version \

				    && cargo --version --verbose \

				    && rustup --version --verbose \

				    && rustc --version --verbose \

				    && clang --version

				RUN if [ "${DEBIAN_VERSION}" = "bookworm" ]; then \

				    LD_LIBRARY_PATH=/pgcopydb/lib /pgcopydb/bin/pgcopydb --version; \

				else \

				    echo "pgcopydb is not available for ${DEBIAN_VERSION}"; \

				fi

				# Set following flag to check in Makefile if its running in Docker

				RUN touch /home/nonroot/.docker_build

									
										57

build_tools/patches/pgcopydbv017.patch
									
										Normal file
									
												View File
												
				@@ -0,0 +1,57 @@

				diff --git a/src/bin/pgcopydb/copydb.c b/src/bin/pgcopydb/copydb.c

				index d730b03..69a9be9 100644

				--- a/src/bin/pgcopydb/copydb.c

				+++ b/src/bin/pgcopydb/copydb.c

				@@ -44,6 +44,7 @@ GUC dstSettings[] = {

				 	{ "synchronous_commit", "'off'" },

				 	{ "statement_timeout", "0" },

				 	{ "lock_timeout", "0" },

				+	{ "idle_in_transaction_session_timeout", "0" },

				 	{ NULL, NULL },

				 };

				diff --git a/src/bin/pgcopydb/pgsql.c b/src/bin/pgcopydb/pgsql.c

				index 94f2f46..e051ba8 100644

				--- a/src/bin/pgcopydb/pgsql.c

				+++ b/src/bin/pgcopydb/pgsql.c

				@@ -2319,6 +2319,11 @@ pgsql_execute_log_error(PGSQL *pgsql,

				 	LinesBuffer lbuf = { 0 };

				+	if (message != NULL){

				+		// make sure message is writable by splitLines

				+		message = strdup(message);

				+	}

				+

				 	if (!splitLines(&lbuf, message))

				 	{

				 		/* errors have already been logged */

				@@ -2332,6 +2337,7 @@ pgsql_execute_log_error(PGSQL *pgsql,

				 				  PQbackendPID(pgsql->connection),

				 				  lbuf.lines[lineNumber]);

				 	}

				+        free(message); // free copy of message we created above

				 	if (pgsql->logSQL)

				 	{

				@@ -3174,11 +3180,18 @@ pgcopy_log_error(PGSQL *pgsql, PGresult *res, const char *context)

				 		/* errors have already been logged */

				 		return;

				 	}

				-

				 	if (res != NULL)

				 	{

				 		char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);

				-		strlcpy(pgsql->sqlstate, sqlstate, sizeof(pgsql->sqlstate));

				+		if (sqlstate == NULL)

				+		{

				+			// PQresultErrorField returned NULL!

				+			pgsql->sqlstate[0] = '\0';  // Set to an empty string to avoid segfault

				+		}

				+		else

				+		{

				+			strlcpy(pgsql->sqlstate, sqlstate, sizeof(pgsql->sqlstate));

				+		}

				 	}

				 	char *endpoint =

5

compute/.gitignore vendored Normal file

View File

@@ -0,0 +1,5 @@
 # sql_exporter config files generated from Jsonnet
 etc/neon_collector.yml
 etc/neon_collector_autoscaling.yml
 etc/sql_exporter.yml
 etc/sql_exporter_autoscaling.yml

									
										50

compute/Makefile
									
										Normal file
									
												View File
												
				@@ -0,0 +1,50 @@

				jsonnet_files = $(wildcard \

					etc/*.jsonnet \

					etc/sql_exporter/*.libsonnet)

				.PHONY: all

				all: neon_collector.yml neon_collector_autoscaling.yml sql_exporter.yml sql_exporter_autoscaling.yml

				neon_collector.yml: $(jsonnet_files)

					JSONNET_PATH=jsonnet:etc jsonnet \

						--output-file etc/$@ \

						--ext-str pg_version=$(PG_VERSION) \

						etc/neon_collector.jsonnet

				neon_collector_autoscaling.yml: $(jsonnet_files)

					JSONNET_PATH=jsonnet:etc jsonnet \

						--output-file etc/$@ \

						--ext-str pg_version=$(PG_VERSION) \

						etc/neon_collector_autoscaling.jsonnet

				sql_exporter.yml: $(jsonnet_files)

					JSONNET_PATH=etc jsonnet \

						--output-file etc/$@ \

						--tla-str collector_name=neon_collector \

						--tla-str collector_file=neon_collector.yml \

						--tla-str 'connection_string=postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=sql_exporter' \

						etc/sql_exporter.jsonnet

				sql_exporter_autoscaling.yml: $(jsonnet_files)

					JSONNET_PATH=etc jsonnet \

						--output-file etc/$@ \

						--tla-str collector_name=neon_collector_autoscaling \

						--tla-str collector_file=neon_collector_autoscaling.yml \

						--tla-str 'connection_string=postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=sql_exporter_autoscaling' \

						etc/sql_exporter.jsonnet

				.PHONY: clean

				clean:

					$(RM) \

						etc/neon_collector.yml \

						etc/neon_collector_autoscaling.yml \

						etc/sql_exporter.yml \

						etc/sql_exporter_autoscaling.yml

				.PHONY: jsonnetfmt-test

				jsonnetfmt-test:

					jsonnetfmt --test $(jsonnet_files)

				.PHONY: jsonnetfmt-format

				jsonnetfmt-format:

					jsonnetfmt --in-place $(jsonnet_files)

21

compute/README.md Normal file

View File

@@ -0,0 +1,21 @@
 This directory contains files that are needed to build the compute
 images, or included in the compute images.
 compute-node.Dockerfile
 	To build the compute image
 vm-image-spec.yaml
 	Instructions for vm-builder, to turn the compute-node image into
 	corresponding vm-compute-node image.
 etc/
 	Configuration files included in /etc in the compute image
 patches/
 	Some extensions need to be patched to work with Neon. This
 	directory contains such patches. They are applied to the extension
 	sources in compute-node.Dockerfile
 In addition to these, postgres itself, the neon postgres extension,
 and compute_ctl are built and copied into the compute image by
 compute-node.Dockerfile.

1968

compute/compute-node.Dockerfile Normal file

View File

File diff suppressed because it is too large Load Diff

									
										17

compute/etc/README.md
									
										Normal file
									
												View File
												
				@@ -0,0 +1,17 @@

				# Compute Configuration

				These files are the configuration files for various other pieces of software

				that will be running in the compute alongside Postgres.

				## `sql_exporter`

				### Adding a `sql_exporter` Metric

				We use `sql_exporter` to export various metrics from Postgres. In order to add

				a metric, you will need to create two files: a `libsonnet` and a `sql` file. You

				will then import the `libsonnet` file in one of the collector files, and the

				`sql` file will be imported in the `libsonnet` file.

				In the event your statistic is an LSN, you may want to cast it to a `float8`

				because Prometheus only supports floats. It's probably fine because `float8` can

				store integers from `-2^53` to `+2^53` exactly.

									
										56

compute/etc/neon_collector.jsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,56 @@

				{

				  collector_name: 'neon_collector',

				  metrics: [

				    import 'sql_exporter/checkpoints_req.libsonnet',

				    import 'sql_exporter/checkpoints_timed.libsonnet',

				    import 'sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet',

				    import 'sql_exporter/compute_current_lsn.libsonnet',

				    import 'sql_exporter/compute_logical_snapshot_files.libsonnet',

				    import 'sql_exporter/compute_logical_snapshots_bytes.libsonnet',

				    import 'sql_exporter/compute_max_connections.libsonnet',

				    import 'sql_exporter/compute_receive_lsn.libsonnet',

				    import 'sql_exporter/compute_subscriptions_count.libsonnet',

				    import 'sql_exporter/connection_counts.libsonnet',

				    import 'sql_exporter/db_total_size.libsonnet',

				    import 'sql_exporter/file_cache_read_wait_seconds_bucket.libsonnet',

				    import 'sql_exporter/file_cache_read_wait_seconds_count.libsonnet',

				    import 'sql_exporter/file_cache_read_wait_seconds_sum.libsonnet',

				    import 'sql_exporter/file_cache_write_wait_seconds_bucket.libsonnet',

				    import 'sql_exporter/file_cache_write_wait_seconds_count.libsonnet',

				    import 'sql_exporter/file_cache_write_wait_seconds_sum.libsonnet',

				    import 'sql_exporter/getpage_prefetch_discards_total.libsonnet',

				    import 'sql_exporter/getpage_prefetch_misses_total.libsonnet',

				    import 'sql_exporter/getpage_prefetch_requests_total.libsonnet',

				    import 'sql_exporter/getpage_prefetches_buffered.libsonnet',

				    import 'sql_exporter/getpage_sync_requests_total.libsonnet',

				    import 'sql_exporter/getpage_wait_seconds_bucket.libsonnet',

				    import 'sql_exporter/getpage_wait_seconds_count.libsonnet',

				    import 'sql_exporter/getpage_wait_seconds_sum.libsonnet',

				    import 'sql_exporter/lfc_approximate_working_set_size.libsonnet',

				    import 'sql_exporter/lfc_approximate_working_set_size_windows.libsonnet',

				    import 'sql_exporter/lfc_cache_size_limit.libsonnet',

				    import 'sql_exporter/lfc_chunk_size.libsonnet',

				    import 'sql_exporter/lfc_hits.libsonnet',

				    import 'sql_exporter/lfc_misses.libsonnet',

				    import 'sql_exporter/lfc_used.libsonnet',

				    import 'sql_exporter/lfc_used_pages.libsonnet',

				    import 'sql_exporter/lfc_writes.libsonnet',

				    import 'sql_exporter/logical_slot_restart_lsn.libsonnet',

				    import 'sql_exporter/max_cluster_size.libsonnet',

				    import 'sql_exporter/pageserver_disconnects_total.libsonnet',

				    import 'sql_exporter/pageserver_requests_sent_total.libsonnet',

				    import 'sql_exporter/pageserver_send_flushes_total.libsonnet',

				    import 'sql_exporter/pageserver_open_requests.libsonnet',

				    import 'sql_exporter/pg_stats_userdb.libsonnet',

				    import 'sql_exporter/replication_delay_bytes.libsonnet',

				    import 'sql_exporter/replication_delay_seconds.libsonnet',

				    import 'sql_exporter/retained_wal.libsonnet',

				    import 'sql_exporter/wal_is_lost.libsonnet',

				  ],

				  queries: [

				    {

				      query_name: 'neon_perf_counters',

				      query: importstr 'sql_exporter/neon_perf_counters.sql',

				    },

				  ],

				}

									
										11

compute/etc/neon_collector_autoscaling.jsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,11 @@

				{

				  collector_name: 'neon_collector_autoscaling',

				  metrics: [

				    import 'sql_exporter/lfc_approximate_working_set_size_windows.autoscaling.libsonnet',

				    import 'sql_exporter/lfc_cache_size_limit.libsonnet',

				    import 'sql_exporter/lfc_hits.libsonnet',

				    import 'sql_exporter/lfc_misses.libsonnet',

				    import 'sql_exporter/lfc_used.libsonnet',

				    import 'sql_exporter/lfc_writes.libsonnet',

				  ],

				}

									
										30

compute/etc/pgbouncer.ini
									
										Normal file
									
												View File
												
				@@ -0,0 +1,30 @@

				[databases]

				;; pgbouncer propagates application_name (if it's specified) to the server, but some

				;; clients don't set it. We set default application_name=pgbouncer to make it

				;; easier to identify pgbouncer connections in Postgres. If client sets

				;; application_name, it will be used instead.

				*=host=localhost port=5432 auth_user=cloud_admin application_name=pgbouncer

				[pgbouncer]

				listen_port=6432

				listen_addr=0.0.0.0

				auth_type=scram-sha-256

				auth_user=cloud_admin

				auth_dbname=postgres

				client_tls_sslmode=disable

				server_tls_sslmode=disable

				pool_mode=transaction

				max_client_conn=10000

				default_pool_size=64

				max_prepared_statements=0

				admin_users=postgres

				unix_socket_dir=/tmp/

				unix_socket_mode=0777

				; required for pgbouncer_exporter

				ignore_startup_parameters=extra_float_digits

				;; Disable connection logging. It produces a lot of logs that no one looks at,

				;; and we can get similar log entries from the proxy too. We had incidents in

				;; the past where the logging significantly stressed the log device or pgbouncer

				;; itself.

				log_connections=0

				log_disconnections=0

0

compute/etc/postgres_exporter.yml Normal file

View File

									
										40

compute/etc/sql_exporter.jsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,40 @@

				function(collector_name, collector_file, connection_string) {

				  // Configuration for sql_exporter for autoscaling-agent

				  // Global defaults.

				  global: {

				    // If scrape_timeout <= 0, no timeout is set unless Prometheus provides one. The default is 10s.

				    scrape_timeout: '10s',

				    // Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.

				    scrape_timeout_offset: '500ms',

				    // Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.

				    min_interval: '0s',

				    // Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,

				    // as will concurrent scrapes.

				    max_connections: 1,

				    // Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should

				    // always be the same as max_connections.

				    max_idle_connections: 1,

				    // Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.

				    // If 0, connections are not closed due to a connection's age.

				    max_connection_lifetime: '5m',

				  },

				  // The target to monitor and the collectors to execute on it.

				  target: {

				    // Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)

				    // the schema gets dropped or replaced to match the driver expected DSN format.

				    data_source_name: connection_string,

				    // Collectors (referenced by name) to execute on the target.

				    // Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).

				    collectors: [

				      collector_name,

				    ],

				  },

				  // Collector files specifies a list of globs. One collector definition is read from each matching file.

				  // Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).

				  collector_files: [

				    collector_file,

				  ],

				}

									
										1

compute/etc/sql_exporter/checkpoints_req.17.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT num_requested AS checkpoints_req FROM pg_stat_checkpointer;

									
										15

compute/etc/sql_exporter/checkpoints_req.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,15 @@

				local neon = import 'neon.libsonnet';

				local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_req.sql';

				local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_req.17.sql';

				{

				  metric_name: 'checkpoints_req',

				  type: 'gauge',

				  help: 'Number of requested checkpoints',

				  key_labels: null,

				  values: [

				    'checkpoints_req',

				  ],

				  query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,

				}

									
										1

compute/etc/sql_exporter/checkpoints_req.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT checkpoints_req FROM pg_stat_bgwriter;

									
										1

compute/etc/sql_exporter/checkpoints_timed.17.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT num_timed AS checkpoints_timed FROM pg_stat_checkpointer;

									
										15

compute/etc/sql_exporter/checkpoints_timed.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,15 @@

				local neon = import 'neon.libsonnet';

				local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_timed.sql';

				local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_timed.17.sql';

				{

				  metric_name: 'checkpoints_timed',

				  type: 'gauge',

				  help: 'Number of scheduled checkpoints',

				  key_labels: null,

				  values: [

				    'checkpoints_timed',

				  ],

				  query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,

				}

									
										1

compute/etc/sql_exporter/checkpoints_timed.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT checkpoints_timed FROM pg_stat_bgwriter;

									
										10

compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_backpressure_throttling_seconds_total',

				  type: 'counter',

				  help: 'Time compute has spent throttled',

				  key_labels: null,

				  values: [

				    'throttled',

				  ],

				  query: importstr 'sql_exporter/compute_backpressure_throttling_seconds_total.sql',

				}

									
										1

compute/etc/sql_exporter/compute_backpressure_throttling_seconds_total.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT (neon.backpressure_throttling_time()::float8 / 1000000) AS throttled;

									
										10

compute/etc/sql_exporter/compute_current_lsn.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_current_lsn',

				  type: 'gauge',

				  help: 'Current LSN of the database',

				  key_labels: null,

				  values: [

				    'lsn',

				  ],

				  query: importstr 'sql_exporter/compute_current_lsn.sql',

				}

									
										4

compute/etc/sql_exporter/compute_current_lsn.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,4 @@

				SELECT CASE

				  WHEN pg_catalog.pg_is_in_recovery() THEN (pg_last_wal_replay_lsn() - '0/0')::FLOAT8

				  ELSE (pg_current_wal_lsn() - '0/0')::FLOAT8

				END AS lsn;

									
										12

compute/etc/sql_exporter/compute_logical_snapshot_files.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				{

				  metric_name: 'compute_logical_snapshot_files',

				  type: 'gauge',

				  help: 'Number of snapshot files in pg_logical/snapshot',

				  key_labels: [

				    'timeline_id',

				  ],

				  values: [

				    'num_logical_snapshot_files',

				  ],

				  query: importstr 'sql_exporter/compute_logical_snapshot_files.sql',

				}

									
										7

compute/etc/sql_exporter/compute_logical_snapshot_files.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,7 @@

				SELECT

				  (SELECT setting FROM pg_settings WHERE name = 'neon.timeline_id') AS timeline_id,

				  -- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.

				  -- These temporary snapshot files are renamed to the actual snapshot files

				  -- after they are completely built. We only WAL-log the completely built

				  -- snapshot files

				  (SELECT COUNT(*) FROM pg_ls_dir('pg_logical/snapshots') AS name WHERE name LIKE '%.snap') AS num_logical_snapshot_files;

									
										7

compute/etc/sql_exporter/compute_logical_snapshots_bytes.15.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,7 @@

				SELECT

				  (SELECT current_setting('neon.timeline_id')) AS timeline_id,

				  -- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.

				  -- These temporary snapshot files are renamed to the actual snapshot files

				  -- after they are completely built. We only WAL-log the completely built

				  -- snapshot files

				  (SELECT COALESCE(sum(size), 0) FROM pg_ls_logicalsnapdir() WHERE name LIKE '%.snap') AS logical_snapshots_bytes;

									
										17

compute/etc/sql_exporter/compute_logical_snapshots_bytes.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,17 @@

				local neon = import 'neon.libsonnet';

				local pg_ls_logicalsnapdir = importstr 'sql_exporter/compute_logical_snapshots_bytes.15.sql';

				local pg_ls_dir = importstr 'sql_exporter/compute_logical_snapshots_bytes.sql';

				{

				  metric_name: 'compute_logical_snapshots_bytes',

				  type: 'gauge',

				  help: 'Size of the pg_logical/snapshots directory, not including temporary files',

				  key_labels: [

				    'timeline_id',

				  ],

				  values: [

				    'logical_snapshots_bytes',

				  ],

				  query: if neon.PG_MAJORVERSION_NUM < 15 then pg_ls_dir else pg_ls_logicalsnapdir,

				}

									
										9

compute/etc/sql_exporter/compute_logical_snapshots_bytes.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				SELECT

				  (SELECT setting FROM pg_settings WHERE name = 'neon.timeline_id') AS timeline_id,

				  -- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.

				  -- These temporary snapshot files are renamed to the actual snapshot files

				  -- after they are completely built. We only WAL-log the completely built

				  -- snapshot files

				  (SELECT COALESCE(sum((pg_stat_file('pg_logical/snapshots/' || name, missing_ok => true)).size), 0)

				    FROM (SELECT * FROM pg_ls_dir('pg_logical/snapshots') WHERE pg_ls_dir LIKE '%.snap') AS name

				  ) AS logical_snapshots_bytes;

									
										10

compute/etc/sql_exporter/compute_max_connections.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_max_connections',

				  type: 'gauge',

				  help: 'Max connections allowed for Postgres',

				  key_labels: null,

				  values: [

				    'max_connections',

				  ],

				  query: importstr 'sql_exporter/compute_max_connections.sql',

				}

									
										1

compute/etc/sql_exporter/compute_max_connections.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1 @@

				SELECT current_setting('max_connections') as max_connections;

									
										10

compute/etc/sql_exporter/compute_receive_lsn.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,10 @@

				{

				  metric_name: 'compute_receive_lsn',

				  type: 'gauge',

				  help: 'Returns the last write-ahead log location that has been received and synced to disk by streaming replication',

				  key_labels: null,

				  values: [

				    'lsn',

				  ],

				  query: importstr 'sql_exporter/compute_receive_lsn.sql',

				}

									
										4

compute/etc/sql_exporter/compute_receive_lsn.sql
									
										Normal file
									
												View File
												
				@@ -0,0 +1,4 @@

				SELECT CASE

				  WHEN pg_catalog.pg_is_in_recovery() THEN (pg_last_wal_receive_lsn() - '0/0')::FLOAT8

				  ELSE 0

				END AS lsn;

									
										12

compute/etc/sql_exporter/compute_subscriptions_count.libsonnet
									
										Normal file
									
												View File
												
				@@ -0,0 +1,12 @@

				{

				  metric_name: 'compute_subscriptions_count',

				  type: 'gauge',

				  help: 'Number of logical replication subscriptions grouped by enabled/disabled',

				  key_labels: [

				    'enabled',

				  ],

				  values: [

				    'subscriptions_count',

				  ],

				  query: importstr 'sql_exporter/compute_subscriptions_count.sql',

				}

				`@@ -0,0 +1 @@`
				`SELECT num_requested AS checkpoints_req FROM pg_stat_checkpointer;`

				`@@ -0,0 +1 @@`
				`SELECT checkpoints_req FROM pg_stat_bgwriter;`

				`@@ -0,0 +1 @@`
				`SELECT num_timed AS checkpoints_timed FROM pg_stat_checkpointer;`

				`@@ -0,0 +1 @@`
				`SELECT checkpoints_timed FROM pg_stat_bgwriter;`

				`@@ -0,0 +1 @@`
				`SELECT (neon.backpressure_throttling_time()::float8 / 1000000) AS throttled;`

				`@@ -0,0 +1 @@`
				`SELECT current_setting('max_connections') as max_connections;`

Compare commits

2245 Commits hackathon/ ... release-82

10 .cargo/config.toml Unescape Escape View File

3 .config/hakari.toml Unescape Escape View File

8 .dockerignore Unescape Escape View File

1 .github/ISSUE_TEMPLATE/bug-template.md vendored Unescape Escape View File

1 .github/ISSUE_TEMPLATE/epic-template.md vendored Unescape Escape View File

21 .github/PULL_REQUEST_TEMPLATE/release-pr.md vendored Unescape Escape View File

20 .github/actionlint.yml vendored Unescape Escape View File

29 .github/actions/allure-report-generate/action.yml vendored Unescape Escape View File

21 .github/actions/allure-report-store/action.yml vendored Unescape Escape View File

9 .github/actions/download/action.yml vendored Unescape Escape View File

12 .github/actions/neon-branch-create/action.yml vendored Unescape Escape View File

74 .github/actions/neon-project-create/action.yml vendored Unescape Escape View File

49 .github/actions/run-python-test-set/action.yml vendored Unescape Escape View File

2 .github/actions/save-coverage-data/action.yml vendored Unescape Escape View File

36 .github/actions/set-docker-config-dir/action.yml vendored Unescape Escape View File

29 .github/actions/upload/action.yml vendored Unescape Escape View File

13 .github/file-filters.yaml vendored Normal file Unescape Escape View File

11 .github/pull_request_template.md vendored Unescape Escape View File

69 .github/scripts/generate_image_maps.py vendored Normal file Unescape Escape View File

110 .github/scripts/lint-release-pr.sh vendored Executable file Unescape Escape View File

31 .github/scripts/previous-releases.jq vendored Normal file Unescape Escape View File

45 .github/scripts/push_with_image_map.py vendored Normal file Unescape Escape View File

60 .github/workflows/_benchmarking_preparation.yml vendored Unescape Escape View File

204 .github/workflows/_build-and-test-locally.yml vendored Unescape Escape View File

55 .github/workflows/_check-codestyle-python.yml vendored Normal file Unescape Escape View File

102 .github/workflows/_check-codestyle-rust.yml vendored Normal file Unescape Escape View File

100 .github/workflows/_create-release-pr.yml vendored Normal file Unescape Escape View File

169 .github/workflows/_meta.yml vendored Normal file Unescape Escape View File

56 .github/workflows/_push-to-acr.yml vendored Unescape Escape View File

128 .github/workflows/_push-to-container-registry.yml vendored Normal file Unescape Escape View File

11 .github/workflows/actionlint.yml vendored Unescape Escape View File

29 .github/workflows/approved-for-ci-run.yml vendored Unescape Escape View File

485 .github/workflows/benchmarking.yml vendored Unescape Escape View File

178 .github/workflows/build-build-tools-image.yml vendored Unescape Escape View File

304 .github/workflows/build-macos.yml vendored Normal file Unescape Escape View File

1401 .github/workflows/build_and_test.yml vendored View File

144 .github/workflows/build_and_test_with_sanitizers.yml vendored Normal file Unescape Escape View File

69 .github/workflows/cargo-deny.yml vendored Normal file Unescape Escape View File

51 .github/workflows/check-build-tools-image.yml vendored Unescape Escape View File

5 .github/workflows/check-permissions.yml vendored Unescape Escape View File

5 .github/workflows/cleanup-caches-by-a-branch.yml vendored Unescape Escape View File

138 .github/workflows/cloud-regress.yml vendored Normal file Unescape Escape View File

41 .github/workflows/fast-forward.yml vendored Normal file Unescape Escape View File

82 .github/workflows/force-test-extensions-upgrade.yml vendored Normal file Unescape Escape View File

200 .github/workflows/ingest_benchmark.yml vendored Normal file Unescape Escape View File

10 .github/workflows/label-for-external-users.yml vendored Unescape Escape View File

194 .github/workflows/large_oltp_benchmark.yml vendored Normal file Unescape Escape View File

32 .github/workflows/lint-release-pr.yml vendored Normal file Unescape Escape View File

181 .github/workflows/neon_extra_builds.yml vendored Unescape Escape View File

69 .github/workflows/periodic_pagebench.yml vendored Unescape Escape View File

56 .github/workflows/pg-clients.yml vendored Unescape Escape View File

82 .github/workflows/pin-build-tools-image.yml vendored Unescape Escape View File

177 .github/workflows/pre-merge-checks.yml vendored Normal file Unescape Escape View File

46 .github/workflows/regenerate-pg-setting.yml vendored Normal file Unescape Escape View File

7 .github/workflows/release-notify.yml vendored Unescape Escape View File

104 .github/workflows/release.yml vendored Unescape Escape View File

71 .github/workflows/report-workflow-stats-batch.yml vendored Normal file Unescape Escape View File

114 .github/workflows/trigger-e2e-tests.yml vendored Unescape Escape View File

2 .gitignore vendored Unescape Escape View File

4 .gitmodules vendored Unescape Escape View File

32 CODEOWNERS Unescape Escape View File

3308 Cargo.lock generated View File

164 Cargo.toml Unescape Escape View File

65 Dockerfile Unescape Escape View File

229 Dockerfile.build-tools Unescape Escape View File

1039 Dockerfile.compute-node View File

103 Makefile Unescape Escape View File

12 README.md Unescape Escape View File

340 build-tools.Dockerfile Normal file Unescape Escape View File

57 build_tools/patches/pgcopydbv017.patch Normal file Unescape Escape View File

5 compute/.gitignore vendored Normal file Unescape Escape View File

50 compute/Makefile Normal file Unescape Escape View File

21 compute/README.md Normal file Unescape Escape View File

1968 compute/compute-node.Dockerfile Normal file View File

17 compute/etc/README.md Normal file Unescape Escape View File

56 compute/etc/neon_collector.jsonnet Normal file Unescape Escape View File

11 compute/etc/neon_collector_autoscaling.jsonnet Normal file Unescape Escape View File

30 compute/etc/pgbouncer.ini Normal file Unescape Escape View File

2245 Commits

hackathon/ ... release-82

10

.cargo/config.toml

View File

3

.config/hakari.toml

View File

8

.dockerignore

View File

1

.github/ISSUE_TEMPLATE/bug-template.md vendored

View File

1

.github/ISSUE_TEMPLATE/epic-template.md vendored

View File

21

.github/PULL_REQUEST_TEMPLATE/release-pr.md vendored

View File

20

.github/actionlint.yml vendored

View File

29

.github/actions/allure-report-generate/action.yml vendored

View File

21

.github/actions/allure-report-store/action.yml vendored

View File

9

.github/actions/download/action.yml vendored

View File

12

.github/actions/neon-branch-create/action.yml vendored

View File

74

.github/actions/neon-project-create/action.yml vendored

View File

49

.github/actions/run-python-test-set/action.yml vendored

View File

2

.github/actions/save-coverage-data/action.yml vendored

View File

36

.github/actions/set-docker-config-dir/action.yml vendored

View File

29

.github/actions/upload/action.yml vendored

View File

13

.github/file-filters.yaml vendored Normal file

View File

11

.github/pull_request_template.md vendored

View File

69

.github/scripts/generate_image_maps.py vendored Normal file

View File

110

.github/scripts/lint-release-pr.sh vendored Executable file

View File

31

.github/scripts/previous-releases.jq vendored Normal file

View File

45

.github/scripts/push_with_image_map.py vendored Normal file

View File

60

.github/workflows/_benchmarking_preparation.yml vendored

View File

204

.github/workflows/_build-and-test-locally.yml vendored

View File

55

.github/workflows/_check-codestyle-python.yml vendored Normal file

View File

102

.github/workflows/_check-codestyle-rust.yml vendored Normal file

View File

100

.github/workflows/_create-release-pr.yml vendored Normal file

View File

169

.github/workflows/_meta.yml vendored Normal file

View File

56

.github/workflows/_push-to-acr.yml vendored

View File

128

.github/workflows/_push-to-container-registry.yml vendored Normal file

View File

11

.github/workflows/actionlint.yml vendored

View File

29

.github/workflows/approved-for-ci-run.yml vendored

View File

485

.github/workflows/benchmarking.yml vendored

View File

178

.github/workflows/build-build-tools-image.yml vendored

View File

304

.github/workflows/build-macos.yml vendored Normal file

View File

1401

.github/workflows/build_and_test.yml vendored

View File

144

.github/workflows/build_and_test_with_sanitizers.yml vendored Normal file

View File

69

.github/workflows/cargo-deny.yml vendored Normal file

View File

51

.github/workflows/check-build-tools-image.yml vendored

View File

5

.github/workflows/check-permissions.yml vendored

View File

5

.github/workflows/cleanup-caches-by-a-branch.yml vendored

View File

138

.github/workflows/cloud-regress.yml vendored Normal file

View File

41

.github/workflows/fast-forward.yml vendored Normal file

View File

82

.github/workflows/force-test-extensions-upgrade.yml vendored Normal file

View File

200

.github/workflows/ingest_benchmark.yml vendored Normal file

View File

10

.github/workflows/label-for-external-users.yml vendored

View File

194

.github/workflows/large_oltp_benchmark.yml vendored Normal file

View File

32

.github/workflows/lint-release-pr.yml vendored Normal file

View File

181

.github/workflows/neon_extra_builds.yml vendored

View File

69

.github/workflows/periodic_pagebench.yml vendored

View File

56

.github/workflows/pg-clients.yml vendored

View File

82

.github/workflows/pin-build-tools-image.yml vendored

View File

177

.github/workflows/pre-merge-checks.yml vendored Normal file

View File

46

.github/workflows/regenerate-pg-setting.yml vendored Normal file

View File

7

.github/workflows/release-notify.yml vendored

View File

104

.github/workflows/release.yml vendored

View File

71

.github/workflows/report-workflow-stats-batch.yml vendored Normal file

View File

114

.github/workflows/trigger-e2e-tests.yml vendored

View File

2

.gitignore vendored

View File

4

.gitmodules vendored

View File

32

CODEOWNERS

View File

3308

Cargo.lock generated

View File

164

Cargo.toml

View File

65

Dockerfile

View File

229

Dockerfile.build-tools

View File

1039

Dockerfile.compute-node

View File

103

Makefile

View File

12

README.md

View File

340

build-tools.Dockerfile Normal file

View File

57

build_tools/patches/pgcopydbv017.patch Normal file

View File

5

compute/.gitignore vendored Normal file

View File

50

compute/Makefile Normal file

View File

21

compute/README.md Normal file

View File

1968

compute/compute-node.Dockerfile Normal file

View File

17

compute/etc/README.md Normal file

View File

56

compute/etc/neon_collector.jsonnet Normal file

View File

11

compute/etc/neon_collector_autoscaling.jsonnet Normal file

View File

30

compute/etc/pgbouncer.ini Normal file

View File

0

compute/etc/postgres_exporter.yml Normal file

View File