rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Dmitrii Kovalkov	9bf59989db	storcon: add https API (#11239 ) ## Problem Pageservers use unencrypted HTTP requests for storage controller API. - Closes: https://github.com/neondatabase/cloud/issues/25524 ## Summary of changes - Replace hyper0::server::Server with http_utils::server::Server in storage controller. - Add HTTPS handler for storage controller API. - Support `ssl_ca_file` in pageserver.	2025-03-20 08:22:02 +00:00
Arpad Müller	56149a046a	Add test_explicit_timeline_creation_storcon and make it work (#11261 ) Adds a basic test that makes the storcon issue explicit creation of a timeline on safeekepers (main storcon PR in #11058). It was adapted from `test_explicit_timeline_creation` from #11002. Also, do a bunch of fixes needed to get the test work (the API definitions weren't correct), and log more stuff when we can't create a new timeline due to no safekeepers being active. Part of #9011 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2025-03-17 16:28:21 +00:00
Dmitrii Kovalkov	f68be2b5e2	safekeeper: https for management API (#11171 ) ## Problem Storage controller uses unencrypted HTTP requests for safekeeper management API. - Closes: https://github.com/neondatabase/cloud/issues/24836 ## Summary of changes - Replace `hyper0::server::Server` with `http_utils::server::Server` in safekeeper. - Add HTTPS handler for safekeeper management API.	2025-03-14 11:41:22 +00:00
Alex Chi Z.	23b713900e	feat(storcon): passthrough ancestor detach behavior (#11199 ) ## Problem https://github.com/neondatabase/neon/issues/10310 https://github.com/neondatabase/neon/pull/11158 ## Summary of changes We need to passthrough the new detach behavior through the storcon API. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-13 20:21:23 +00:00
Arpad Müller	b1a1be6a4c	switch pytests and neon_local to control_plane_hooks_api (#11195 ) We want to switch away from and deprecate the `--compute-hook-url` param for the storcon in favour of `--control-plane-url` because it allows us to construct urls with `notify-safekeepers`. This PR switches the pytests and neon_local from a `control_plane_compute_hook_api` to a new param named `control_plane_hooks_api` which is supposed to point to the parent of the `notify-attach` URL. We still support reading the old url from disk to not be too disruptive with existing deployments, but we just ignore it. Also add docs for the `notify-safekeepers` upcall API. Follow-up of #11173 Part of https://github.com/neondatabase/neon/issues/11163	2025-03-13 19:50:52 +00:00
Christian Schwarz	ed31dd2a3c	pageserver: better observability for slow wait_lsn (#11176 ) # Problem We leave too few observability breadcrumbs in the case where wait_lsn is exceptionally slow. # Changes - refactor: extract the monitoring logic out of `log_slow` into `monitor_slow_future` - add global + per-timeline counter for time spent waiting for wait_lsn - It is updated while we're still waiting, similar to what we do for page_service response flush. - add per-timeline counterpair for started & finished wait_lsn count - add slow-logging to leave breadcrumbs in logs, not just metrics For the slow-logging, we need to consider not flooding the logs during a broker or network outage/blip. The solution is a "log-streak-level" concurrency limit per timeline. At any given time, there is at most one slow wait_lsn that is logging the "still running" and "completed" sequence of logs. Other concurrent slow wait_lsn's don't log at all. This leaves at least one breadcrumb in each timeline's logs if some wait_lsn was exceptionally slow during a given period. The full degree of slowness can then be determined by looking at the per-timeline metric. # Performance Reran the `bench_log_slow` benchmark, no difference, so, existing call sites are fine. We do use a Semaphore, but only try_acquire it _after_ things have already been determined to be slow. So, no baseline overhead anticipated. # Refs - https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222	2025-03-13 15:03:53 +00:00
Alex Chi Z.	c3b3b507f7	feat(pageserver): support detaching behavior v2 (#11158 ) ## Problem close https://github.com/neondatabase/neon/issues/10310 ## Summary of changes This patch adds a new behavior for the detach_ancestor API: detach with multi-level ancestor and no reparenting. Though we can potentially support multi-level + do reparenting / single-level + no-reparenting in the future, as it's not required for the recovery/snapshot epic, I'd prefer keeping things simple now that we only handle the old one and the new one instead of supporting the full feature matrix. I only added a test case of successful detaching instead of testing failures. I'd like to make this into staging and add more tests in the future. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-12 22:27:23 +00:00
Dmitrii Kovalkov	63b22d3fb1	pageserver: https for management API (#11025 ) ## Problem Storage controller uses unencrypted HTTP requests for pageserver management API. Closes: https://github.com/neondatabase/cloud/issues/24283 ## Summary of changes - Implement `http_utils::server::Server` with TLS support. - Replace `hyper0::server::Server` with `http_utils::server::Server` in pageserver. - Add HTTPS handler for pageserver management API. - Generate local SSL certificates in neon local.	2025-03-10 15:07:59 +00:00
Alex Chi Z.	cd438406fb	feat(pageserver): add force patch index_part API (#11119 ) ## Problem As part of the disaster recovery tool. Partly for https://github.com/neondatabase/neon/issues/9114. ## Summary of changes * Add a new pageserver API to force patch the fields in index_part and modify the timeline internal structures. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-07 17:42:52 +00:00
John Spray	87e6117dfd	storage controller: API-driven graceful migrations (#10913 ) ## Problem The current migration API does a live migration, but if the destination doesn't already have a secondary, that live migration is unlikely to be able to warm up a tenant properly within its timeout (full warmup of a big tenant can take tens of minutes). Background optimisation code knows how to do this gracefully by creating a secondary first, but we don't currently give a human a way to trigger that. Closes: https://github.com/neondatabase/neon/issues/10540 ## Summary of changes - Add `prefererred_node` parameter to TenantShard, which is respected by optimize_attachment - Modify migration API to have optional prewarm=true mode, in which we set preferred_node and call optimize_attachment, rather than directly modifying intentstate - Require override_scheduler=true flag if migrating somewhere that is a less-than-optimal scheduling location (e.g. wrong AZ) - Add `origin_node_id` to migration API so that callers can ensure they're moving from where they think they're moving from - Add tests for the above The storcon_cli wrapper for this has a 'watch' mode that waits for eventual cutover. This doesn't show the warmth of the secondary evolve because we don't currently have an API for that in the controller, as the passthrough API only targets attached locations, not secondaries. It would be straightforward to add later as a dedicated endpoint for getting secondary status, then extend the storcon_cli to consume that and print a nice progress indicator.	2025-03-07 17:02:38 +00:00
Vlad Lazar	084fc4a757	pageserver: enable previous heatmaps by default (#11132 ) We add the off by default configs in https://github.com/neondatabase/neon/pull/11088 because the unarchival heatmap was causing oversized secondary locations. That was fixed in https://github.com/neondatabase/neon/pull/11098, so let's turn them on by default.	2025-03-07 16:05:31 +00:00
Alexey Kondratov	f5aa8c3eac	feat(compute_ctl): Add a basic HTTP API benchmark (#11123 ) ## Problem We just had a regression reported at https://neondb.slack.com/archives/C08EXUJF554/p1741102467515599, which clearly came with one of the releases. It's not a huge problem yet, but it's annoying that we cannot quickly attribute it to a specific commit. ## Summary of changes Add a very simple `compute_ctl` HTTP API benchmark that does 10k requests to `/status` and `metrics.json` and reports p50 and p99. --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech>	2025-03-07 12:35:42 +00:00
Alex Chi Z.	2de3629b88	test(pageserver): use reldirv2 by default in regress tests (#11081 ) ## Problem For pg_regress test, we do both v1 and v2; for all the rest, we default to v2. part of https://github.com/neondatabase/neon/issues/9516 ## Summary of changes Use reldir v2 across test cases by default. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-05 21:02:44 +00:00
Peter Bendel	604eb5e8d4	fix grafana dashboard link for pooler endoints (#11099 ) ## Problem Our benchmarking workflows contain links to grafana dashboards to troubleshoot problems. This works fine for non-pooled endpoints. For pooled endpoints we need to remove the `-pooler` suffix from the endpoint's hostname to get a valid endpoint ID. Example link that doesn't work in this run https://github.com/neondatabase/neon/actions/runs/13678933253/job/38246028316#step:8:311 ## Summary of changes Check if connection string is a -pooler connection string and if so remove this suffix from the endpoint ID. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-03-05 20:01:17 +00:00
Arpad Müller	2d45522fa6	storcon db: load safekeepers from DB again (#11087 ) Earlier PR #11041 soft-disabled the loading code for safekeepers from the storcon db. This PR makes us load the safekeepers from the database again, now that we have [JWTs available on staging](https://github.com/neondatabase/neon/pull/11087) and soon on prod. This reverts commit `23fb8053c5`. Part of https://github.com/neondatabase/cloud/issues/24727	2025-03-05 15:45:43 +00:00
Vlad Lazar	8c12ccf729	pageserver: gate previous heatmap behind config flag (#11088 ) ## Problem On unarchival, we update the previous heatmap with all visible layers. When the primary generates a new heatmap it includes all those layers, so the secondary will download them. Since they're not actually resident on the primary (we didn't call the warm up API), they'll never be evicted, so they remain in the heatmap. This leads to oversized secondary locations like we saw in pre-prod. ## Summary of changes Gate the loading of the previous heatmaps and the heatmap generation on unarchival behind configuration flags. They are disabled by default, but enabled in tests.	2025-03-05 12:20:18 +00:00
Erik Grinaker	65addfc524	storcon: add per-tenant rate limiting for API requests (#10924 ) ## Problem Incoming requests often take the service lock, and sometimes even do database transactions. That creates a risk that a rogue client can starve the controller of the ability to do its primary job of reconciling tenants to an available state. ## Summary of changes * Use the `governor` crate to rate limit tenant requests at 10 requests per second. This is ~10-100x lower than the worst "attack" we've seen from a client bug. Admin APIs are not rate limited. * Add a `storage_controller_http_request_rate_limited` histogram for rate limited requests. * Log a warning every 10 seconds for rate limited tenants. The rate limiter is parametrized on TenantId, because the kinds of client bug we're protecting against generally happen within tenant scope, and the rates should be somewhat stable: we expect the global rate of requests to increase as we do more work, but we do not expect the rate of requests to one tenant to increase. --------- Co-authored-by: John Spray <john@neon.tech>	2025-03-03 22:04:59 +00:00
John Spray	b953daa21f	safekeeper: allow remote deletion to proceed after dropped requests (#11042 ) ## Problem If a caller times out on safekeeper timeline deletion on a large timeline, and waits a while before retrying, the deletion will not progress while the retry is waiting. The net effect is very very slow deletion as it only proceeds in 30 second bursts across 5 minute idle periods. Related: https://github.com/neondatabase/neon/issues/10265 ## Summary of changes - Run remote deletion in a background task - Carry a watch::Receiver on the Timeline for other callers to join the wait - Restart deletion if the API is called again and the previous attempt failed	2025-03-03 16:03:51 +00:00
Peter Bendel	a07599949f	First version of a new benchmark to test larger OLTP workload (#11053 ) ## Problem We want to support larger tenants (regarding logical database size, number of transactions per second etc.) and should increase our test coverage of OLTP transactions at larger scale. ## Summary of changes Start a new benchmark that over time will add more OLTP tests at larger scale. This PR covers the first version and will be extended in further PRs. Also fix some infrastructure: - default for new connections and large tenants is to use connection pooler pgbouncer, however our fixture always added `statement_timeout=120` which is not compatible with pooler [see](https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter) - action to create branch timed out after 10 seconds and 10 retries but for large tenants it can take longer so use increasing back-off for retries ## Test run https://github.com/neondatabase/neon/actions/runs/13593446706	2025-03-03 15:25:48 +00:00
Arseny Sher	ef2b50994c	walproposer: basic infra to enable generations (#11002 ) ## Problem Preparation for https://github.com/neondatabase/neon/issues/10851 ## Summary of changes Add walproposer `safekeepers_generations` field which can be set by prefixing `neon.safekeepers` GUC with `g#n:`. Non zero value (n) forces walproposer to use generations. In particular, this also disables implicit timeline creation as timeline will be created by storcon. Add test checking this. Also add missing infra: `--safekeepers-generation` flag to neon_local endpoint start + fix `--start-timeout` flag: it existed but value wasn't used.	2025-03-03 13:20:20 +00:00
Vlad Lazar	23fb8053c5	storcon: soft disable SK heartbeats (#11041 ) ## Problem JWT tokens aren't in place, so all SK heartbeats fail. This is equivalent to a wait before applying the PS heartbeats and makes things more flaky. ## Summary of Changes Add a flag that skips loading SKs from the db on start-up and at runtime.	2025-02-28 15:49:09 +00:00
Conrad Ludgate	d9ced89ec0	feat(proxy): require TLS to compute if prompted by cplane (#10717 ) https://github.com/neondatabase/cloud/issues/23008 For TLS between proxy and compute, we are using an internally provisioned CA to sign the compute certificates. This change ensures that proxy will load them from a supplied env var pointing to the correct file - this file and env var will be configured later, using a kubernetes secret. Control plane responds with a `server_name` field if and only if the compute uses TLS. This server name is the name we use to validate the certificate. Control plane still sends us the IP to connect to as well (to support overlay IP). To support this change, I'd had to split `host` and `host_addr` into separate fields. Using `host_addr` and bypassing `lookup_addr` if possible (which is what happens in production). `host` then is only used for the TLS connection. There's no blocker to merging this. The code paths will not be triggered until the new control plane is deployed and the `enableTLS` compute flag is enabled on a project.	2025-02-28 14:20:25 +00:00
Vlad Lazar	0d6d58bd3e	pageserver: make heatmap layer download API more cplane friendly (#10957 ) ## Problem We intend for cplane to use the heatmap layer download API to warm up timelines after unarchival. It's tricky for them to recurse in the ancestors, and the current implementation doesn't work well when unarchiving a chain of branches and warming them up. ## Summary of changes * Add a `recurse` flag to the API. When the flag is set, the operation recurses into the parent timeline after the current one is done. * Be resilient to warming up a chain of unarchived branches. Let's say we unarchived `B` and `C` from the `A -> B -> C` branch hierarchy. `B` got unarchived first. We generated the unarchival heatmaps and stash them in `A` and `B`. When `C` unarchived, it dropped it's unarchival heatmap since `A` and `B` already had one. If `C` needed layers from `A` and `B`, it was out of luck. Now, when choosing whether to keep an unarchival heatmap we look at its end LSN. If it's more inclusive than what we currently have, keep it.	2025-02-28 10:36:53 +00:00
Christian Schwarz	e35f7758d8	impr(controller_upcall_client): clean up copy-pasta code & add context to retries (#10991 ) Before this PR, re-attach and validate would log the same warning ``` calling control plane generation validation API failed ``` on retry errors. This can be confusing. This PR makes the message generically valid for any upcall and adds additional tracing spans to capture context. Along the way, clean up some copy-pasta variable naming. refs - https://github.com/neondatabase/neon/issues/10381#issuecomment-2684755827 --------- Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>	2025-02-27 10:59:43 +00:00
Arseny Sher	643a48210f	safekeeper: exclude API (#10757 ) ## Problem https://github.com/neondatabase/neon/pull/10241 added configuration switch endpoint, but it didn't delete timeline if node was excluded. ## Summary of changes Add separate /exclude API endpoint which similarly accepts membership configuration where sk is supposed by be excluded. Implementation deletes the timeline locally. Some more small related tweaks: - make mconf switch API PUT instead of POST as it is idempotent; - return 409 if switch was refused instead of 200 with requested & current; - remove unused was_active flag from delete response; - remove meaningless _force suffix from delete functions names; - reuse timeline.rs delete_dir function in timelines_global_map instead of its own copy. part of https://github.com/neondatabase/neon/issues/9965	2025-02-26 19:26:33 +00:00
Arseny Sher	01581f3af5	safekeeper: drop json_ctrl (#10722 ) ## Problem json_ctrl.rs is an obsolete attempt to have tests with fine control of feeding messages into safekeeper superseded by desim framework. ## Summary of changes Drop it.	2025-02-26 13:32:37 +00:00
Vlad Lazar	34996416d6	pageserver: guard against WAL gaps in the interpreted protocol (#10858 ) ## Problem The interpreted SK <-> PS protocol does not guard against gaps (neither does the Vanilla one, but that's beside the point). ## Summary of changes Extend the protocol to include the start LSN of the PG WAL section from which the records were interpreted. Validation is enabled via a config flag on the pageserver and works as follows: Case 1: `raw_wal_start_lsn` is smaller than the requested LSN There can't be gaps here, but we check that the shard received records which it hasn't seen before. Case 2: `raw_wal_start_lsn` is equal to the requested LSN This is the happy case. No gap and nothing to check Case 3: `raw_wal_start_lsn` is greater than the requested LSN This is a gap. To make Case 3 work I had to bend the protocol a bit. We read record chunks of WAL which aren't record aligned and feed them to the decoder. The picture below shows a shard which subscribes at a position somewhere within Record 2. We already have a wal reader which is below that position so we wait to catch up. We read some wal in Read 1 (all of Record 1 and some of Record 2). The new shard doesn't need Record 1 (it has already processed it according to the starting position), but we read past it's starting position. When we do Read 2, we decode Record 2 and ship it off to the shard, but the starting position of Read 2 is greater than the starting position the shard requested. This looks like a gap. ![image](https://github.com/user-attachments/assets/8aed292e-5d62-46a3-9b01-fbf9dc25efe0) To make it work, we extend the protocol to send an empty `InterpretedWalRecords` to shards if the WAL the records originated from ends the requested start position. On the pageserver, that just updates the tracking LSNs in memory (no-op really). This gives us a workaround for the fake gap. As a drive by, make `InterpretedWalRecords::next_record_lsn` mandatory in the application level definition. It's always included. Related: https://github.com/neondatabase/cloud/issues/23935	2025-02-20 17:49:05 +00:00
Dmitrii Kovalkov	e808e9432a	storcon: use https for pageservers (#10759 ) ## Problem Storage controller uses unsecure http for pageserver API. Closes: https://github.com/neondatabase/cloud/issues/23734 Closes: https://github.com/neondatabase/cloud/issues/24091 ## Summary of changes - Add an optional `listen_https_port` field to storage controller's Node state and its API (RegisterNode/ListNodes/etc). - Allow updating `listen_https_port` on node registration to gradually add https port for all nodes. - Add `use_https_pageserver_api` CLI option to storage controller to enable https. - Pageserver doesn't support https for now and always reports `https_port=None`. This will be addressed in follow-up PR.	2025-02-20 17:16:04 +00:00
Peter Bendel	9d074db18d	Use link to cross-service-endpoint dashboard in allure reports and benchmarking workflow logs (#10874 ) ## Problem We have links to deprecated dashboards in our logs Example https://github.com/neondatabase/neon/actions/runs/13382454571/job/37401983608#step:8:348 ## Summary of changes Use link to cross service endpoint instead. Example: https://github.com/neondatabase/neon/actions/runs/13395407925/job/37413056148#step:7:345	2025-02-18 19:54:21 +00:00
Vlad Lazar	1a69a8cba7	storage: add APIs for warming up location after cold migrations (#10788 ) ## Problem We lack an API for warming up attached locations based on the heatmap contents. This is problematic in two places: 1. If we manually migrate and cut over while the secondary is still cold 2. When we re-attach a previously offloaded tenant ## Summary of changes https://github.com/neondatabase/neon/pull/10597 made heatmap generation additive across migrations, so we won't clobber it a after a cold migration. This allows us to implement: 1. An endpoint for downloading all missing heatmap layers on the pageserver: `/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`. Only one such operation per timeline is allowed at any given time. The granularity is tenant shard. 2. An endpoint to the storage controller to trigger the downloads on the pageserver: `/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`. This works both at tenant and tenant shard level. If an unsharded tenant id is provided, the operation is started on all shards, otherwise only the specified shard. 3. A storcon cli command. Again, tenant and tenant-shard level granularities are supported. Cplane will call into storcon and trigger the downloads for all shards. When we want to rescue a migration, we will use storcon cli targeting the specific tenant shard. Related: https://github.com/neondatabase/neon/issues/10541	2025-02-18 16:09:06 +00:00
Alexander Bayandin	274cb13293	test_runner: fix mismatch versions tests on linux (#10869 ) ## Problem Tests with mixed-version binaries always use the latest binaries on CI ([an example](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10848/13378137061/index.html#suites/8fc5d1648d2225380766afde7c428d81/1ccefc4cfd4ef176/)): The versions of new `storage_broker` and old `pageserver` are the same: `b45254a5605f6fdafdf475cdd3e920fe00898543`. This affects only Linux, on macOS the version mixed correctly. ## Summary of changes - Use hardlinks instead of symlinks to create a directory with mixed-version binaries	2025-02-18 15:52:00 +00:00
Alexander Lakhin	f81259967d	Add test to make sure sanitizers really work when expected (#10838 )	2025-02-18 13:23:18 +00:00
Alexander Bayandin	27241f039c	test_runner: fix `neon_local` usage for version mismatch tests (#10859 ) ## Problem Tests with mixed versions of binaries always pick up new versions if services are started using `neon_local`. ## Summary of changes - Set `neon_local_binpath` along with `neon_binpath` and `pg_distrib_dir` for tests with mixed versions	2025-02-17 20:29:14 +00:00
Gleb Novikov	3d7a32f619	fast import: allow restore to provided connection string (#10407 ) Within https://github.com/neondatabase/cloud/issues/22089 we decided that would be nice to start with import that runs dump-restore into a running compute (more on this [here](https://www.notion.so/neondatabase/2024-Jan-13-Migration-Assistant-Next-Steps-Proposal-Revised-17af189e004780228bdbcad13eeda93f?pvs=4#17af189e004780de816ccd9c13afd953)) We could do it by writing another tool or by extending existing `fast_import.rs`, we chose the latter. In this PR, I have added optional `restore_connection_string` as a cli arg and as a part of the json spec. If specified, the script will not run postgres and will just perform restore into provided connection string. TODO: - [x] fast_import.rs: - [x] cli arg in the fast_import.rs - [x] encoded connstring in json spec - [x] simplify `fn main` a little, take out too verbose stuff to some functions - [ ] ~~allow streaming from dump stdout to restore stdin~~ will do in a separate PR - [ ] ~~address https://github.com/neondatabase/neon/pull/10251#pullrequestreview-2551877845~~ will do in a separate PR - [x] tests: - [x] restore with cli arg in the fast_import.rs - [x] restore with encoded connstring in json spec in s3 - [ ] ~~test with custom dbname~~ will do in a separate PR - [ ] ~~test with s3 + pageserver + fast import binary~~ https://github.com/neondatabase/neon/pull/10487 - [ ] ~~https://github.com/neondatabase/neon/pull/10271#discussion_r1923715493~~ will do in a separate PR neondatabase/cloud#22775 --------- Co-authored-by: Eduard Dykman <bird.duskpoet@gmail.com>	2025-02-14 16:10:06 +00:00
John Spray	a4d0a34591	tests: flush in test_isolation (#10658 ) ## Problem This test occasionally fails while the test teardown tries to do a graceful shutdown, because the test has quickly written lots of data into the pageserver. Closes: #10654 ## Summary of changes - Call `post_checks` at the end of `test_isolation`, as we already do for test_pg_regress -- this improves our detection of issues, and as a nice side effect flushes the pageserver. - Ignore pg_notify files when validating state at end of test, these are not expected to be the same	2025-02-13 16:23:51 +00:00
Vlad Lazar	8fea43a5ba	pageserver: make heatmap generation additive (#10597 ) ## Problem Previously, when cutting over to cold secondary locations, we would clobber the previous, good, heatmap with a cold one. This is because heatmap generation used to include only resident layers. Once this merges, we can add an endpoint which triggers full heatmap hydration on attached locations to heal cold migrations. ## Summary of changes With this patch, heatmap generation becomes additive. If we have a heatmap from when this location was secondary, the new uploaded heatmap will be the result of a reconciliation between the old one and the on disk resident layers. More concretely, when we have the previous heatmap: 1. Filter the previous heatmap and keep layers that are (a) present in the current layer map, (b) visible, (c) not resident. Call this set of layers `visible_non_resident`. 2. From the layer map, select all layers that are resident and visible. Call this set of layers `resident`. 3. The new heatmap is the result of merging the two disjoint sets. Related https://github.com/neondatabase/neon/issues/10541	2025-02-13 12:48:47 +00:00
John Spray	9989d8bfae	tests: make Workload more determinstic (#10741 ) ## Problem Previously, Workload was reconfiguring the compute before each run of writes, which was meant to be a no-op when nothing changed, but was actually writing extra data due to an issue being fixed in https://github.com/neondatabase/neon/pull/10696. The row counts in tests were too low in some cases, these tests were only working because of those extra writes that shouldn't have been happening, and moreover were relying on checkpoints happening. ## Summary of changes - Only reconfigure compute if the attached pageserver actually changed. If pageserver is set to None, that means controller is managing everything, so never reconfigure compute. - Update tests that wrote too few rows. --------- Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2025-02-12 12:35:29 +00:00
Tristan Partin	da9c101939	Implement a second HTTP server within compute_ctl (#10574 ) The compute_ctl HTTP server has the following purposes: - Allow management via the control plane - Provide an endpoint for scaping metrics - Provide APIs for compute internal clients - Neon Postgres extension for installing remote extensions - local_proxy for installing extensions and adding grants The first two purposes require the HTTP server to be available outside the compute. The Neon threat model is a bad actor within our internal network. We need to reduce the surface area of attack. By exposing unnecessary unauthenticated HTTP endpoints to the internal network, we increase the surface area of attack. For endpoints described in the third bullet point, we can just run an extra HTTP server, which is only bound to the loopback interface since all consumers of those endpoints are within the compute.	2025-02-11 18:02:22 +00:00
John Spray	5e95860e70	tests: wait for manifest persistence in test_timeline_archival_chaos (#10719 ) ## Problem This test would sometimes fail its assertion that a timeline does not revert to active once archived. That's because it was using the in-memory offload state, not the persistent state, so this was sometimes lost across a pageserver restart. Closes: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - When reading offload status, read from pageserver API _and_ remote storage before considering the timeline offloaded	2025-02-07 16:27:39 +00:00
John Spray	95220ba43e	tests: fix flaky endpoint in test_ingest_logical_message (#10700 ) ## Problem Endpoint kept running while timeline was deleted, causing forbidden warnings on the pageserver when the tenant is not found. ## Summary of changes - Explicitly stop the endpoint before the end of the test, so that it isn't trying to talk to the pageserver in the background while things are torn down	2025-02-07 14:51:36 +00:00
Fedor Dikarev	8f651f9582	switch from localtest.me to local.neon.build (#10714 ) ## Problem Ref: https://github.com/neondatabase/neon/issues/10632 We use dns named `.localtest.me` in our test, and that domain is well-known and widely used for that, with all the records there resolve to the localhost, both IPv4 and IPv6: `127.0.0.1` and `::1` In some cases on our runners these addresses resolves only to `IPv6`, and so components fail to connect when runner doesn't have `IPv6` address. We suspect issue in systemd-resolved here (https://github.com/systemd/systemd/issues/17745) To workaround that and improve test stability, we introduced our own domain `.local.neon.build` with IPv4 address `127.0.0.1` only See full details and troubleshoot log in referred issue. p.s. If you're FritzBox user, don't forget to add that domain `local.neon.build` to the `DNS Rebind Protection` section under `Home Network -> Network -> Network Settings`, otherwise FritzBox will block addresses, resolving to the local addresses. For other devices/vendors, please check corresponding documentation, if resolving `local.neon.build` will produce empty answer for you. ## Summary of changes Replace all the occurrences of `localtest.me` with `local.neon.build`	2025-02-07 12:25:16 +00:00
Alexander Lakhin	977781e423	Enable sanitizers for postgres v17 (#10401 ) Add a build with sanitizers (asan, ubsan) to the CI pipeline and run tests on it. See https://github.com/neondatabase/neon/issues/6053 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-02-06 12:53:43 +00:00
Vlad Lazar	f9009d6b80	pageserver: write heatmap to disk after uploading it (#10650 ) ## Problem We wish to make heatmap generation additive in https://github.com/neondatabase/neon/pull/10597. However, if the pageserver restarts and has a heatmap on disk from when it was a secondary long ago, we can end up keeping extra layers on the secondary's disk. ## Summary of changes Persist the heatmap after a successful upload.	2025-02-04 17:52:54 +00:00
Anastasia Lubennikova	b1bc33eb4d	Fix logical_replication_sync test fixture (#10531 ) Fixes flaky test_lr_with_slow_safekeeper test #10242 Fix query to `pg_catalog.pg_stat_subscription` catalog to handle table synchronization and parallel LR correctly.	2025-02-03 12:44:47 +00:00
Erik Grinaker	6a2afa0c02	pageserver: add per-timeline read amp histogram (#10566 ) ## Problem We don't have per-timeline observability for read amplification. Touches https://github.com/neondatabase/cloud/issues/23283. ## Summary of changes Add a per-timeline `pageserver_layers_per_read` histogram. NB: per-timeline histograms are expensive, but probably worth it in this case.	2025-01-30 11:24:49 +00:00
Tristan Partin	707a926057	Remove unused compute_ctl HTTP routes (#10544 ) These are not used anywhere within the platform, so let's remove dead code. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-29 19:22:01 +00:00
Erik Grinaker	ed942b05f7	Revert "pageserver: revert flush backpressure" (#10402 )" (#10533 ) This reverts commit `9e55d79803`. We'll still need this until we can tune L0 flush backpressure and compaction. I'll add a setting to disable this separately.	2025-01-28 13:33:58 +00:00
Vlad Lazar	de8276488d	tests: enable wal reader fanout in tests (#10301 ) Note: this has to merge after the release is cut on `2025-01-17` for compat tests to start passing. ## Problem SK wal reader fan-out is not enabled in tests by default. ## Summary of changes Enable it.	2025-01-24 10:34:57 +00:00
Erik Grinaker	ddb9ae1214	pageserver: add compaction backpressure for layer flushes (#10405 ) ## Problem There is no direct backpressure for compaction and L0 read amplification. This allows a large buildup of compaction debt and read amplification. Resolves #5415. Requires #10402. ## Summary of changes Delay layer flushes based on the number of level 0 delta layers: * `l0_flush_delay_threshold`: delay flushes such that they take 2x as long (default `2 * compaction_threshold`). * `l0_flush_stall_threshold`: stall flushes until level 0 delta layers drop below threshold (default `4 * compaction_threshold`). If either threshold is reached, ephemeral layer rolls also synchronously wait for layer flushes to propagate this backpressure up into WAL ingestion. This will bound the number of frozen layers to 1 once backpressure kicks in, since all other frozen layers must flush before the rolled layer. ## Analysis This will significantly change the compute backpressure characteristics. Recall the three compute backpressure knobs: * `max_replication_write_lag`: 500 MB (based on Pageserver `last_received_lsn`). * `max_replication_flush_lag`: 10 GB (based on Pageserver `disk_consistent_lsn`). * `max_replication_apply_lag`: disabled (based on Pageserver `remote_consistent_lsn`). Previously, the Pageserver would keep ingesting WAL and build up ephemeral layers and L0 layers until the compute hit `max_replication_flush_lag` at 10 GB and began backpressuring. Now, once we delay/stall WAL ingestion, the compute will begin backpressuring after `max_replication_write_lag`, i.e. 500 MB. This is probably a good thing (we're not building up a ton of compaction debt), but we should consider tuning these settings. `max_replication_flush_lag` probably doesn't serve a purpose anymore, and we should consider removing it. Furthermore, the removal of the upload barrier in #10402 will mean that we no longer backpressure flushes based on S3 uploads, since `max_replication_apply_lag` is disabled. We should consider enabling this as well. ### When and what do we compact? Default compaction settings: * `compaction_threshold`: 10 L0 delta layers. * `compaction_period`: 20 seconds (between each compaction loop check). * `checkpoint_distance`: 256 MB (size of L0 delta layers). * `l0_flush_delay_threshold`: 20 L0 delta layers. * `l0_flush_stall_threshold`: 40 L0 delta layers. Compaction characteristics: * Minimum compaction volume: 10 layers * 256 MB = 2.5 GB. * Additional compaction volume (assuming 128 MB/s WAL): 128 MB/s * 20 seconds = 2.5 GB (10 L0 layers). * Required compaction bandwidth: 5.0 GB / 20 seconds = 256 MB/s. ### When do we hit `max_replication_write_lag`? Depending on how fast compaction and flushes happens, the compute will backpressure somewhere between `l0_flush_delay_threshold` or `l0_flush_stall_threshold` + `max_replication_write_lag`. * Minimum compute backpressure lag: 20 layers * 256 MB + 500 MB = 5.6 GB * Maximum compute backpressure lag: 40 layers * 256 MB + 500 MB = 10.0 GB This seems like a reasonable range to me.	2025-01-24 09:47:28 +00:00
Erik Grinaker	9e55d79803	Reapply "pageserver: revert flush backpressure" (#10270 ) (#10402 ) This reapplies #10135. Just removing this flush backpressure without further mitigations caused read amp increases during bulk ingestion (predictably), so it was reverted. We will replace it by compaction-based backpressure. ## Problem In #8550, we made the flush loop wait for uploads after every layer. This was to avoid unbounded buildup of uploads, and to reduce compaction debt. However, the approach has several problems: * It prevents upload parallelism. * It prevents flush and upload pipelining. * It slows down ingestion even when there is no need to backpressure. * It does not directly backpressure based on compaction debt and read amplification. We will instead implement compaction-based backpressure in a PR immediately following this removal (#5415). Touches #5415. Touches #10095. ## Summary of changes Remove waiting on the upload queue in the flush loop.	2025-01-24 08:35:35 +00:00

1 2 3 4 5 ...

1012 Commits