rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 13:10:37 +00:00

Author	SHA1	Message	Date
Tristan Partin	2526f6aea1	Add remote extension test with library component (#11301 ) The current test was just SQL files only, but we also want to test a remote extension which includes a loadable library. With both extensions we should cover a larger portion of compute_ctl's remote extension code paths. Fixes: https://github.com/neondatabase/neon/issues/11146 Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-24 22:33:46 +00:00
Vlad Lazar	5ba7315c84	storage_controller: reconcile completed imports at start-up (#11614 ) ## Problem In https://github.com/neondatabase/neon/pull/11345 coordination of imports moved to the storage controller. It involves notifying cplane when the import has been completed by calling an idempotent endpoint. If the storage controller shuts down in the middle of finalizing an import, it would never be retried. ## Summary of changes Reconcile imports at start-up by fetching the complete imports from the database and spawning a background task which notifies cplane. Closes: https://github.com/neondatabase/neon/issues/11570	2025-04-24 18:39:19 +00:00
Vlad Lazar	6f7e3c18e4	storage_controller: make leadership protocol more robust (#11703 ) ## Problem We saw the following scenario in staging: 1. Pod A starts up. Becomes leader and steps down the previous pod cleanly. 2. Pod B starts up (deployment). 3. Step down request from pod B to pod A times out. Pod A did not manage to stop its reconciliations within 10 seconds and exited with return code 1 ([code](`7ba8519b43/storage_controller/src/service.rs (L8686-L8702)`)). 4. Pod B marks itself as the leader and finishes start-up 5. k8s restarts pod A 6. k8s marks pod B as ready 7. pod A sends step down request to pod A - this succeeds => pod A is now the leader 8. k8s kills pod A because it thinks pod B is healthy and pod A is part of the old replica set We end up in a situation where the only pod we have (B) is stepped down and attempts to forward requests to a leader that doesn't exist. k8s can't detect that pod B is in a bad state since the /status endpoint simply returns 200 hundred if the pod is running. ## Summary of changes This PR includes a number of robustness improvements to the leadership protocol: * use a single step down task per controller * add a new endpoint to be used as k8s liveness probe and check leadership status there * handle restarts explicitly (i.e. don't step yourself down) * increase the step down retry count * don't kill the process on long step down since k8s will just restart it	2025-04-24 16:59:56 +00:00
Alexey Kondratov	985056be37	feat(compute): Introduce Postgres downtime metrics (#11346 ) ## Problem Currently, we only report the timestamp of the last moment we think Postgres was active. The problem is that if Postgres gets completely unresponsive, we still report some old timestamp, and it's impossible to distinguish situations 'Postgres is effectively down' and 'Postgres is running, but no client activity'. ## Summary of changes Refactor the `compute_ctl`'s compute monitor so that it was easier to track the connection errors and failed activity checks, and report - `now() - last_successful_check` as current downtime on any failure - cumulative Postgres downtime during the whole compute lifetime After adding a test, I also noticed that the compute monitor may not reconnect even though queries fail with `connection closed` or `error communicating with the server: Connection reset by peer (os error 54)`, but for some reason we do not catch it with `client.is_closed()`, so I added an explicit reconnect in case of any failures. Discussion: https://neondb.slack.com/archives/C03TN5G758R/p1742489426966639	2025-04-24 13:51:09 +00:00
Vlad Lazar	3a50d95b6d	storage_controller: coordinate imports across shards in the storage controller (#11345 ) ## Problem Pageservers notify control plane directly when a shard import has completed. Control plane has to download the status of each shard from S3 and figure out if everything is truly done, before proceeding with branch activation. Issues with this approach are: * We can't control shard split behaviour on the storage controller side. It's unsafe to split during import. * Control plane needs to know about shards and implement logic to check all timelines are indeed ready. ## Summary of changes In short, storage controller coordinates imports, and, only when everything is done, notifies control plane. Big rocks: 1. Store timeline imports in the storage controller database. Each import stores the status of its shards in the database. We hook into the timeline creation call as our entry point for this. 2. Pageservers get a new upcall endpoint to notify the storage controller of shard import updates. 3. Storage controller handles these updates by updating persisted state. If an update finalizes the import, then poll pageservers until timeline activation, and, then, notify the control plane that the import is complete. Cplane side change with new endpoint is in https://github.com/neondatabase/cloud/pull/26166 Closes https://github.com/neondatabase/neon/issues/11566	2025-04-24 11:26:06 +00:00
Mikhail Kot	c3534cea39	Rename object_storage->endpoint_storage (#11678 ) 1. Rename service to avoid ambiguity as discussed in Slack 2. Ignore endpoint_id in read paths as requested in https://github.com/neondatabase/cloud/issues/26346#issuecomment-2806758224	2025-04-23 14:03:19 +00:00
Tristan Partin	b00db536bb	Add CPU architecture to the remote extensions object key (#11590 ) ARM computes are incoming and we need to account for that in remote extensions. Previously, we just blindly assumed that all computes were x86_64. Note that we use the Go architecture naming convention instead of the Rust one directly to do our best and be consistent across the stack. Part-of: https://github.com/neondatabase/cloud/issues/23148 Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-22 22:47:22 +00:00
Arpad Müller	149cbd1e0a	Support single and two safekeeper scenarios (#11483 ) In tests and when one safekeeper is down in small regions, we need to contend with one or two safekeepers. Before, we gave an error in `safekeepers_for_new_timeline`. Now we just silently allow the timeline to be created on one or two safekeepers. Part of #9011	2025-04-22 21:27:01 +00:00
Alexander Lakhin	7b949daf13	fix(test): allow reconcile errors in test_storage_controller_heartbeats (#11665 ) ## Problem test_storage_controller_heartbeats is flaky because of unallowed reconciler errors (#11625) ## Summary of changes Allow reconcile errors as in other tests in test_storage_controller.py.	2025-04-22 18:13:16 +00:00
Arpad Müller	c1e4befd56	Additional fixes and improvements to storcon safekeeper timelines (#11477 ) This delivers some additional fixes and improvements to storcon managed safekeeper timelines: * use `i32::MAX` for the generation number of timeline deletion * start the generation for new timelines at 1 instead of 0: this ensures that the other components actually are generation enabled * fix database operations we use for metrics * use join in list_pending_ops to prevent the classical ORM issue where one does many db queries * use enums in `test_storcon_create_delete_sk_down`. we are adding a second parameter, and having two bool parameters is weird. * extend `test_storcon_create_delete_sk_down` with a test of whole tenant deletion. this hasn't been tested before. * remove some redundant logging contexts * Don't require mutable access to the service lock for scheduling pending ops in memory. In order to pull this off, create reconcilers eagerly. The advantage is that we don't need mutable access to the service lock that way any more. Part of #9011 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2025-04-17 20:25:30 +00:00
Alex Chi Z.	ad0c5fdae7	fix(test): allow stale generation warnings in storcon (#11624 ) ## Problem https://github.com/neondatabase/neon/pull/11531 did not fully fix the problem because the warning is part of the storcon instead of pageserver. ## Summary of changes Allow stale generation error in storcon. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-17 16:12:24 +00:00
Christian Schwarz	2b041964b3	cover direct IO + concurrent IO in unit, regression & perf tests (#11585 ) This mirrors the production config. Thread that discusses the merits of this: - https://neondb.slack.com/archives/C033RQ5SPDH/p1744742010740569 # Refs - context https://neondb.slack.com/archives/C04BLQ4LW7K/p1744724844844589?thread_ts=1744705831.014169&cid=C04BLQ4LW7K - prep for https://github.com/neondatabase/neon/pull/11558 which adds new io mode `direct-rw` # Impact on CI turnaround time Spot-checking impact on CI timings - Baseline: [some recent main commit](https://github.com/neondatabase/neon/actions/runs/14471549758/job/40587837475) - Comparison: [this commit](https://github.com/neondatabase/neon/actions/runs/14471945087/job/40589613274) in this PR here Impact on CI turnaround time - Regression tests: - x64: very minor, sometimes better; likely in the noise - arm64: substantial 30min => 40min - Benchmarks (x86 only I think): very minor; noise seems higher than regress tests --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z. <4198311+skyzh@users.noreply.github.com> Co-authored-by: Peter Bendel <peterbendel@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2025-04-17 15:53:10 +00:00
Alexander Bayandin	07c2411f6b	tests: remove mentions of ALLOW_*_COMPATIBILITY_BREAKAGE (#11618 ) ## Problem There are mentions of `ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` and `ALLOW_FORWARD_COMPATIBILITY_BREAKAGE`, but in reality, this mechanism doesn't work, so let's remove it to avoid confusion. The idea behind it was to allow some breaking changes by adding a special label to a PR that would `xfail` the test. However, in practice, this means we would need to carry this label through all subsequent PRs until the release (and artifact regeneration). This approach isn't really viable, as it increases the risk of missing a compatibility break in another PR. ## Summary of changes - Remove mentions and handling of `ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` / `ALLOW_FORWARD_COMPATIBILITY_BREAKAGE`	2025-04-17 10:03:21 +00:00
Konstantin Knizhnik	b7548de814	Disable autovacuum and increase limit for WS approximation (#11583 ) ## Problem Test lfc working set approximation becomes flaky after recent changes in prefetch. May be it is caused by updating HLL in `lfc_write`, may be by some other reasons. ## Summary of changes 1. Disable autovacuum in this test (as possible source of extra page accesses). 2. Increase upper boundary for WS approximation from 12 to 20. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-04-17 05:07:45 +00:00
Tristan Partin	9794f386f4	Make Postgres 17 the default version (#11619 ) This is mostly a documentation update, but a few updates with regard to neon_local, pageserver, and tests. 17 is our default for users in production, so dropping references to 16 makes sense. Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-16 23:23:37 +00:00
Vlad Lazar	0e00faf528	tests: stability fixes for `test_migration_to_cold_secondary` (#11606 ) 1. Compute may generate WAL on shutdown. The test assumes that after shutdown, no further ingest happens. Tweak the compute shutdown to make the assumption true. 2. Assertion of local layer count post cold migration is not right since we may have downloaded layers due to ingest. Remove it. Closes https://github.com/neondatabase/neon/issues/11587	2025-04-16 16:31:23 +00:00
Erik Grinaker	00eeff9b8d	pageserver: add `compaction_shard_ancestor` to disable shard ancestor compaction (#11608 ) ## Problem Splits of large tenants (several TB) can cause a huge amount of shard ancestor compaction work, which can overload Pageservers. Touches https://github.com/neondatabase/cloud/issues/22532. ## Summary of changes Add a setting `compaction_shard_ancestor` (default `true`) to disable shard ancestor compaction on a per-tenant basis.	2025-04-16 14:41:02 +00:00
Alex Chi Z.	aa19f10e7e	fix(test): allow shutdown warning in preempt tests (#11600 ) ## Problem test_gc_compaction_preempt is still flaky ## Summary of changes - allow shutdown warning logs Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-15 21:50:28 +00:00
Alex Chi Z.	057ce115de	fix(test): allow stale generation errors (1/2) (#11531 ) ## Problem Part of https://github.com/neondatabase/neon/issues/11486 ## Summary of changes 50% of the test instability of `test_create_churn_during_restart` are due to error message gets changed. Allow the new error message. Still need to fix other errors due to failure to acquire semaphore in this or the next patch. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-14 14:51:17 +00:00
Vlad Lazar	a338984dc7	pageserver: support keys at different LSNs in one get page batch (#11494 ) ## Problem Get page batching stops when we encounter requests at different LSNs. We are leaving batching factor on the table. ## Summary of changes The goal is to support keys with different LSNs in a single batch and still serve them with a single vectored get. Important restriction: the same key at different LSNs is not supported in one batch. Returning different key versions is a much more intrusive change. Firstly, the read path is changed to support "scattered" queries. This is a conceptually simple step from https://github.com/neondatabase/neon/pull/11463. Instead of initializing the fringe for one keyspace, we do it for multiple at different LSNs and let the logic already present into the fringe handle selection. Secondly, page service code is updated to support batching at different LSNs. Eeach request parsed from the wire determines its effective request LSN and keeps it in mem for the batcher toinspect. The batcher allows keys at different LSNs in one batch as long one key is not requested at different LSNs. I'd suggest doing the first pass commit by commit to get a feel for the changes. ## Results I used the batching test from [Christian's PR](https://github.com/neondatabase/neon/pull/11391) which increases the change of batch breaks. Looking at the logs I think the new code is at the max batching factor for the workload (we only break batches due to them being oversized or because the executor is idle). ``` Main: Reasons for stopping batching: {'LSN changed': 22843, 'of batch size': 33417} test_throughput[release-pg16-50-pipelining_config0-30-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 14.6662 My branch: Reasons for stopping batching: {'of batch size': 37024} test_throughput[release-pg16-50-pipelining_config0-30-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 19.8333 ``` Related: https://github.com/neondatabase/neon/issues/10765	2025-04-14 09:05:29 +00:00
Konstantin Knizhnik	8936a7abd8	Increase limit for worker processes for isolation test (#11504 ) ## Problem See https://github.com/neondatabase/neon/issues/10652 Neon extension launches 2 BGW which reduce limit for parallel workers and so affecting parallel_deadlock isolation test. ## Summary of changes Increase `max_worker_processes` from default 8 to 16 for isolation test. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-04-12 18:09:12 +00:00
Alex Chi Z.	4f7b2cdd4f	feat(pageserver): gc-compaction result verification (#11515 ) ## Problem Part of #9114 There was a debug-mode verification mode that verifies at every retain_lsn. However, the code was tangled within the actual history generation itself and it's hard to reason about correctness. This patch adds a separate post-verification of the gc-compaction result that redos logs at every retain_lsn and every record above the GC horizon. This ensures that all key history we produce with gc-compaction is readable, and if there're read errors after gc-compaction, it can only be read-path errors instead of gc-compaction bugs. ## Summary of changes * Add gc_compaction_verification flag, default to true. * Implement a post-verification process. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-11 15:50:29 +00:00
Alex Chi Z.	66f56ddaec	fix(pageserver): allow shutdown errors for gc compaction tests (#11530 ) ## Problem `test_pageserver_compaction_preempt` is flaky. ## Summary of changes Allow the shutdown errors. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-11 15:20:51 +00:00
Tristan Partin	ff5a527167	Consolidate compute_ctl configuration structures (#11514 ) Previously, the structure of the spec file was just the compute spec. However, the response from the control plane get spec request included the compute spec and the compute_ctl config. This divergence was hindering other work such as adding regression tests for compute_ctl HTTP authorization. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-11 15:06:29 +00:00
Christian Schwarz	8884865bca	tests: make `test_pageserver_getpage_throttle` less flaky (#11482 ) # Refs - fixes https://github.com/neondatabase/neon/issues/11395 # Problem Since 2025-03-10, we have observed increased flakiness of `test_pageserver_getpage_throttle`. The test is timing-dependent by nature, and was hitting the ``` assert duration_secs >= 10 * actual_smgr_query_seconds, ( "smgr metrics should not include throttle wait time" ) ``` quite frequently. # Analysis These failures are not reproducible. In this PR's history is a commit that reran the test 100 times without requiring a single retry. In https://github.com/neondatabase/neon/issues/11395 there is a link to a query to the test results database. It shows that the flakiness was not constant, but rather episodic: 2025-03-{10,11,12,13} 2025-03-{19,20,21} 2025-03-31 and 2025-04-01. To me, this suggests variability in available CPU. # Solution The point of the offending assertion is to ensure that most of the request latency is spent on throttling, because testing of the throttling mechanism is the point of the test. The `10` magic number means at most 10% of mean latency may be spent on request processing. Ideally we would control the passage of time (virtual clock source) to make this test deterministic. But I don't see that happening in our regression test setup. So, this PR de-flakes the test as follows: - allot up to 66% of mean latency for request processing - increase duration from 10s to 20s, hoping to get better protection from momentary CPU spikes in noisy neighbor tests or VMs on the runner host As a drive-by, switch to `pytest.approx` and remove one self-test assertion I can't make sense of anymore.	2025-04-11 09:38:05 +00:00
Dmitrii Kovalkov	4c4e33bc2e	storage: add http/https server and cert resover metrics (#11450 ) ## Problem We need to export some metrics about certs/connections to configure alerts and make sure that all HTTP requests are gone before turning https-only mode on. - Closes: https://github.com/neondatabase/cloud/issues/25526 ## Summary of changes - Add started connection and connection error metrics to http/https Server. - Add certificate expiration time and reload metrics to ReloadingCertificateResolver.	2025-04-11 06:11:35 +00:00
John Spray	52dee408dc	storage controller: improve safety of shard splits coinciding with controller restarts (#11412 ) ## Problem The graceful leadership transfer process involves calling step_down on the old controller, but this was not waiting for shard splits to complete, and the new controller could therefore end up trying to abort a shard split while it was still going on. We mitigated this already in #11256 by avoiding the case where shard split completion would update the database incorrectly, but this was a fragile fix because it assumes that is the only problematic part of the split running concurrently. Precursors: - #11290 - #11256 Closes: #11254 ## Summary of changes - Hold the reconciler gate from shard splits, so that step_down will wait for them. Splits should always be fairly prompt, so it is okay to wait here. - Defense in depth: if step_down times out (hardcoded 10 second limit), then fully terminate the controller process rather than letting it continue running, potentially doing split-brainy things. This makes sense because the new controller will always declare itself leader unilaterally if step_down fails, so leaving an old controller running is not beneficial. - Tests: extend `test_storage_controller_leadership_transfer_during_split` to separately exercise the case of a split holding up step_down, and the case where the overall timeout on step_down is hit and the controller terminates.	2025-04-10 16:55:37 +00:00
Christian Schwarz	2e35f23085	tests: remove ignored `fair` field (#11521 ) Pageserver has been ignoring field `tenant_config.timeline_get_throttle.fair` for many monhts, since we removed it from the config struct in neondatabase/neon#8539. Refs - epic https://github.com/neondatabase/cloud/issues/27320	2025-04-10 14:24:30 +00:00
Erik Grinaker	0122d97f95	test_runner: only use last gen in `test_location_conf_churn` (#11511 ) ## Problem `test_location_conf_churn` performs random location updates on Pageservers. While doing this, it could instruct the compute to connect to a stale generation and execute queries. This is invalid, and will fail if a newer generation has removed layer files used by the stale generation. Resolves #11348. ## Summary of changes Only connect to the latest generation when executing queries.	2025-04-10 10:07:16 +00:00
Arseny Sher	fae7528adb	walproposer: make it aware of membership (#11407 ) ## Problem Walproposer should get elected and commit WAL on safekeepers specified by the membership configuration. ## Summary of changes - Add to wp `members_safekeepers` and `new_members_safekeepers` arrays mapping configuration members to connection slots. Establish this mapping (by node id) when safekeeper sends greeting, giving its id and when mconf becomes known / changes. - Add to TermsCollected, VotesCollected, GetAcknowledgedByQuorumWALPosition membership aware logic. Currently it partially duplicates existing one, but we'll drop the latter eventually. - In python, rename Configuration to MembershipConfiguration for clarity. - Add test_quorum_sanity testing new logic. ref https://github.com/neondatabase/neon/issues/10851	2025-04-10 09:55:37 +00:00
Alex Chi Z.	405a17bf0b	fix(pageserver): ensure gc-compaction gets preempted by L0 (#11512 ) ## Problem Part of #9114 ## Summary of changes Gc-compaction flag was not correctly set, causing it not getting preempted by L0. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-09 20:57:50 +00:00
Erik Grinaker	63ee8e2181	test_runner: ignore `.___temp` files in `evict_random_layers` (#11509 ) ## Problem `test_location_conf_churn` often fails with `neither image nor delta layer`, but doesn't say what the file actually is. However, past local failures have indicated that it might be `.___temp` files. Touches https://github.com/neondatabase/neon/issues/11348. ## Summary of changes Ignore `.___temp` files when evicting local layers, and include the file name in the error message.	2025-04-09 19:03:49 +00:00
Dmitrii Kovalkov	e7502a3d63	pageserver: return 412 PreconditionFailed in get_timestamp_of_lsn if timestamp is not found (#11491 ) ## Problem Now `get_timestamp_of_lsn` returns `404 NotFound` if there is no clog pages for given LSN, and it's difficult to distinguish from other 404 errors. A separate status code for this error will allow the control plane to handle this case. - Closes: https://github.com/neondatabase/neon/issues/11439 - Corresponding PR in control plane: https://github.com/neondatabase/cloud/pull/27125 ## Summary of changes - Return `412 PreconditionFailed` instead of `404 NotFound` if no timestamp is fond for given LSN. I looked briefly through the current error handling code in cloud.git and the status code change should not affect anything for the existing code. Change from the corresponding PR also looks fine and should work with the current PS status code. Additionally, here is OK to merge it from control plane team: https://github.com/neondatabase/neon/issues/11439#issuecomment-2789327552 --------- Co-authored-by: John Spray <john@neon.tech>	2025-04-09 13:16:15 +00:00
Erik Grinaker	a6ff8ec3d4	storcon: change default stripe size to 16 MB (#11168 ) ## Problem The current stripe size of 256 MB is a bit large, and can cause load imbalances across shards. A stripe size of 16 MB appears more reasonable to avoid hotspots, although we don't see evidence of this in benchmarks. Resolves https://github.com/neondatabase/cloud/issues/25634. Touches https://github.com/neondatabase/cloud/issues/21870. ## Summary of changes * Change the default stripe size to 16 MB. * Remove `ShardParameters::DEFAULT_STRIPE_SIZE`, and only use `pageserver_api::shard::DEFAULT_STRIPE_SIZE`. * Update a bunch of tests that assumed a certain stripe size.	2025-04-09 08:41:38 +00:00
Erik Grinaker	c610f3584d	test_runner: tweak `test_create_snapshot` compaction (#11495 ) ## Problem With the recent improvements to L0 compaction responsiveness, `test_create_snapshot` now ends up generating 10,000 layer files (compared to 1,000 in previous snapshots). This increases the snapshot size by 4x, and significantly slows down tests. ## Summary of changes Increase the target layer size from 128 KB to 256 KB, and the L0 compaction threshold from 1 to 5. This reduces the layer count from about 10,000 to 1,000.	2025-04-09 06:52:49 +00:00
Erik Grinaker	7679b63a2c	pageserver: persist stripe size in tenant manifest for tenant_import (#11181 ) ## Problem `tenant_import`, used to import an existing tenant from remote storage into a storage controller for support and debugging, assumed `DEFAULT_STRIPE_SIZE` since this can't be recovered from remote storage. In #11168, we are changing the stripe size, which will break `tenant_import`. Resolves #11175. ## Summary of changes * Add `stripe_size` to the tenant manifest. * Add `TenantScanRemoteStorageShard::stripe_size` and return from `tenant_scan_remote` if present. * Recover the stripe size during`tenant_import`, or fall back to 32768 (the original default stripe size). * Add tenant manifest compatibility snapshot: `2025-04-08-pgv17-tenant-manifest-v1.tar.zst` There are no cross-version concerns here, since unknown fields are ignored during deserialization where relevant.	2025-04-08 20:43:27 +00:00
Mikhail Kot	6138d61592	Object storage proxy (#11357 ) Service targeted for storing and retrieving LFC prewarm data. Can be used for proxying S3 access for Postgres extensions like pg_mooncake as well. Requests must include a Bearer JWT token. Token is validated using a pemfile (should be passed in infra/). Note: app is not tolerant to extra trailing slashes, see app.rs `delete_prefix` test for comments. Resolves: https://github.com/neondatabase/cloud/issues/26342 Unrelated changes: gate a `rename_noreplace` feature and disable it in `remote_storage` so as `object_storage` can be built with musl	2025-04-08 14:54:53 +00:00
Dmitrii Kovalkov	7791a49dd4	fix(tests): improve test_scrubber_tenant_snapshot stability (#11471 ) ## Problem `test_scrubber_tenant_snapshot` is flaky with `request was dropped` errors. More details are in the issue. - Closes: https://github.com/neondatabase/neon/issues/11278 ## Summary of changes - Disable shard scheduling during pageservers restart - Add `reconcile_until_idle` in the end of the test	2025-04-08 10:03:38 +00:00
Konstantin Knizhnik	b2a0b2e9dd	Skip hole tags in local_cache view (#11454 ) ## Problem If the local file cache is shrunk, so that we punch some holes in the underlying file, the local_cache view displays the holes incorrectly. See https://github.com/neondatabase/neon/issues/10770 ## Summary of changes Skip hole tags in the local_cache view. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-04-08 03:52:50 +00:00
Erik Grinaker	99d8788756	pageserver: improve tenant manifest lifecycle (#11328 ) ## Problem Currently, the tenant manifest is only uploaded if there are offloaded timelines. The checks are also a bit loose (e.g. only checks number of offloaded timelines). We want to start using the manifest for other things too (e.g. stripe size). Resolves #11271. ## Summary of changes This patch ensures that a tenant manifest always exists. The lifecycle is: * During preload, fetch the existing manifest, if any. * During attach, upload a tenant manifest if it differs from the preloaded one (or does not exist). * Upload a new manifest as needed, if it differs from the last-known manifest (ignoring version number). * On splits, pre-populate the manifest from the parent. * During Pageserver physical GC, remove old manifests but keep the latest 2 generations. This will cause nearly all existing tenants to upload a new tenant manifest on their first attach after this change. Attaches are concurrency-limited in the storage controller, so we expect this will be fine. Also updates `make_broken` to automatically log at `INFO` level when the tenant has been cancelled, to avoid spurious error logs during shutdown.	2025-04-07 19:10:36 +00:00
Arpad Müller	8a2b19f467	Allow potential warning in test_storcon_create_delete_sk_down (#11466 ) Since merging #11400 and addition of `test_storcon_create_delete_sk_down`, we've seen an error occur multiple times. https://github.com/neondatabase/neon/pull/11400#issuecomment-2782528369	2025-04-07 16:52:54 +00:00
Christian Schwarz	aad410c8f1	improve ondemand-download latency observability (#11421 ) ## Problem We don't have metrics to exactly quantify the end user impact of on-demand downloads. Perf tracing is underway (#11140) to supply us with high-resolution samples. But it will also be useful to have some aggregate per-timeline and per-instance metrics that definitively contain all observations. ## Summary of changes This PR consists of independent commits that should be reviewed independently. However, for convenience, we're going to merge them together. - refactor(metrics): measure_remote_op can use async traits - impr(pageserver metrics): task_kind dimension for remote_timeline_client latency histo - implements https://github.com/neondatabase/cloud/issues/26800 - refs https://github.com/neondatabase/cloud/issues/26193#issuecomment-2769705793 - use the opportunity to rename the metric and add a _global suffix; checked grafana export, it's only used in two personal dashboards, one of them mine, the other by Heikki - log on-demand download latency for expensive-to-query but precise ground truth - metric for wall clock time spent waiting for on-demand downloads ## Refs - refs https://github.com/neondatabase/cloud/issues/26800 - a bunch of minor investigations / incidents into latency outliers	2025-04-04 18:04:39 +00:00
Christian Schwarz	4f94751b75	pageserver config: ignore+warn about unknown fields (instead of `deny_unknown_fields`) (#11275 ) # Refs - refs https://github.com/neondatabase/neon/issues/8915 - discussion thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1742406381132599 - stacked atop https://github.com/neondatabase/neon/pull/11298 - corresponding internal docs update that illustrates how this PR removes friction: https://github.com/neondatabase/docs/pull/404 # Problem Rejecting `pageserver.toml`s with unknown fields adds friction, especially when using `pageserver.toml` fields as feature flags that need to be decommissioned. See the added paragraphs on `pageserver_api::models::ConfigToml` for details on what kind of friction it causes. Also read the corresponding internal docs update linked above to see a more imperative guide for using `pageserver.toml` flags as feature flags. # Solution ## Ignoring unknown fields Ignoring is the serde default behavior. So, just remove `serde(deny_unknown_fields)` from all structs in `pageserver_api::config::ConfigToml` `pageserver_api::config::TenantConfigToml`. I went through all the child fields and verified they don't use `deny_unknown_fields` either, including those shared with `pageserver_api::models`. ## Warning about unknown fields We still want to warn about unknown fields to - be informed about typos in the config template - be reminded about feature-flag style configs that have been cleaned up in code but not yet in config templates We tried `serde_ignore` (cf draft #11319) but it doesn't work with `serde(flatten)`. The solution we arrived at is to compare the on-disk TOML with the TOML that we produce if we serialize the `ConfigToml` again. Any key specified in the on-disk TOML but not present in the serialized TOML is flagged as an ignored key. The mechanism to do it is a tiny recursive decent visitor on the `toml_edit::DocumentMut`. # Future Work Invalid config _values_ in known fields will continue to fail pageserver startup. See - https://github.com/neondatabase/cloud/issues/24349 for current worst case impact to deployments & ideas to improve.	2025-04-04 17:30:58 +00:00
Vlad Lazar	1ef4258f29	pageserver: add tenant level performance tracing sampling ratio (#11433 ) ## Problem https://github.com/neondatabase/neon/pull/11140 introduces performance tracing with OTEL and a pageserver config which configures the sampling ratio of get page requests. Enabling a non-zero sampling ratio on a per region basis is too aggressive and comes with perf impact that isn't very well understood yet. ## Summary of changes Add a `sampling_ratio` tenant level config which overrides the pageserver level config. Note that we do not cache the config and load it on every get page request such that changes propagate timely. Note that I've had to remove the `SHARD_SELECTION` span to get this to work. The tracing library doesn't expose a neat way to drop a span if one realises it's not needed at runtime. Closes https://github.com/neondatabase/neon/issues/11392	2025-04-04 13:41:28 +00:00
Vlad Lazar	65e2aae6e4	pageserver/secondary: deregister IO metrics (#11283 ) ## Problem IO metrics for secondary locations do not get deregistered when the timeline is removed. ## Summary of changes Stash the request context to be used for downloads in `SecondaryTimelineDetail`. These objects match the lifetime of the secondary timeline location pretty well. When the timeline is removed, deregister the metrics too. Closes https://github.com/neondatabase/neon/issues/11156	2025-04-04 10:52:59 +00:00
Arpad Müller	a917952b30	Add test_storcon_create_delete_sk_down and make it work (#11400 ) Adds a test `test_storcon_create_delete_sk_down` which tests the reconciler and pending op persistence if faced with a temporary safekeeper downtime during timeline creation or deletion. This is in contrast to `test_explicit_timeline_creation_storcon`, which tests the happy path. We also do some fixes: * timeline and tenant deletion http requests didn't expect a body, but `()` sent one. * we got the tenant deletion http request's return type wrong: it's supposed to be a hash map * we add some logging to improve observability * We fix `list_pending_ops` which had broken code meant to make it possible to restrict oneself to a single pageserver. But diesel doesn't support that sadly, or at least I couldn't figure out a way to make it work. We don't need that functionality, so remove it. * We add an info span to the heartbeater futures with the node id, so that there is no context-free msgs like "Backoff: waiting 1.1 seconds before processing with the task" in the storcon logs. we could also add the full base url of the node but don't do it as most other log lines contain that information already, and if we do duplication it should at least not be verbose. One can always find out the base url from the node id. Successor of #11261 Part of #9011	2025-04-04 00:17:40 +00:00
Alex Chi Z.	bfc767d60d	fix(test): wait for shard split complete for test_lsn_lease_storcon (#11436 ) ## Problem close https://github.com/neondatabase/neon/issues/11397 ref https://github.com/neondatabase/cloud/issues/23667 ## Summary of changes We need to wait until the shard split is complete, otherwise it will print warning like waiting for shard split exclusive lock for 30s. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-03 17:49:45 +00:00
Vlad Lazar	74920d8cd8	storcon: notify compute if correct observed state was refreshed (#11342 ) ## Problem Previously, if the observed state was refreshed and matching the intent, we wouldn't send a compute notification. This is unsafe. There's no guarantee that the location landed on the pageserver _and_ a compute notification for it was delivered. See https://github.com/neondatabase/neon/issues/11291#issuecomment-2743205411 for one such example. ## Summary of changes Add a reproducer and notify the compute if the correct observed state required a refresh. Closes https://github.com/neondatabase/neon/issues/11291	2025-04-03 16:35:55 +00:00
Alex Chi Z.	131b32ef48	fix(pageserver): clean up aux files before detaching (#11299 ) ## Problem Related to https://github.com/neondatabase/cloud/issues/26091 and https://github.com/neondatabase/cloud/issues/25840 Close https://github.com/neondatabase/neon/issues/11297 Discussion on Slack: https://neondb.slack.com/archives/C033RQ5SPDH/p1742320666313969 ## Summary of changes * When detaching, scan all aux files within `sparse_non_inherited_keyspace` in the ancestor timeline and create an image layer exactly at the ancestor LSN. All scanned keys will map to an empty value, which is a delete tombstone. - Note that end_lsn for rewritten delta layers = ancestor_lsn + 1, so the image layer will have image_end_lsn=end_lsn. With the current `select_layer` logic, the read path will always first read the image layer. * Add a test case. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-04-03 15:55:22 +00:00
Alexander Lakhin	4e8e0951be	Increase timeout for test_pageserver_gc_compaction_smoke (#11410 ) ## Problem The test_pageserver_gc_compaction_smoke fails rather often due to a timeout on slow machines. See https://github.com/neondatabase/neon/issues/11355. ## Summary of changes Increase the timeout for the test.	2025-04-03 11:23:30 +00:00

1 2 3 4 5 ...

1276 Commits