rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-04 03:52:56 +00:00

Author	SHA1	Message	Date
John Spray	a82a6631fd	storage controller: prioritize reconciles for user-facing operations (#10822 ) ## Problem Some situations may produce a large number of pending reconciles. If we experience an issue where reconciles are processed more slowly than expected, that can prevent us responding promptly to user requests like tenant/timeline CRUD. This is a cleaner implementation of the hotfix in https://github.com/neondatabase/neon/pull/10815 ## Summary of changes - Introduce a second semaphore for high priority tasks, with configurable units (default 256). The intent is that in practical situations these user-facing requests should never have to wait. - Use the high priority semaphore for: tenant/timeline CRUD, and shard splitting operations. Use normal priority for everything else.	2025-02-14 13:25:43 +00:00
John Spray	996f0a3753	storcon: fix eliding parameters from proxied URL labels (#10817 ) ## Problem We had code for stripping IDs out of proxied paths to reduce cardinality of metrics, but it was only stripping out tenant IDs, and leaving in timeline IDs and query parameters (e.g. LSN in lsn->timestamp lookups). ## Summary of changes - Use a more general regex approach. There is still some risk that a future pageserver API might include a parameter in `/the/path/`, but we control that API and it is not often extended. We will also alert on metrics cardinality in staging so that if we made that mistake we would notice.	2025-02-14 09:57:19 +00:00
Vlad Lazar	8fea43a5ba	pageserver: make heatmap generation additive (#10597 ) ## Problem Previously, when cutting over to cold secondary locations, we would clobber the previous, good, heatmap with a cold one. This is because heatmap generation used to include only resident layers. Once this merges, we can add an endpoint which triggers full heatmap hydration on attached locations to heal cold migrations. ## Summary of changes With this patch, heatmap generation becomes additive. If we have a heatmap from when this location was secondary, the new uploaded heatmap will be the result of a reconciliation between the old one and the on disk resident layers. More concretely, when we have the previous heatmap: 1. Filter the previous heatmap and keep layers that are (a) present in the current layer map, (b) visible, (c) not resident. Call this set of layers `visible_non_resident`. 2. From the layer map, select all layers that are resident and visible. Call this set of layers `resident`. 3. The new heatmap is the result of merging the two disjoint sets. Related https://github.com/neondatabase/neon/issues/10541	2025-02-13 12:48:47 +00:00
Arpad Müller	536bdb3209	storcon: track safekeepers in memory, send heartbeats to them (#10583 ) In #9011, we want to schedule timelines to safekeepers. In order to do such scheduling, we need information about how utilized a safekeeper is and if it's available or not. Therefore, send constant heartbeats to the safekeepers and try to figure out if they are online or not. Includes some code from #10440.	2025-02-13 11:06:30 +00:00
Heikki Linnakangas	635b67508b	Split utils::http to separate crate (#10753 ) Avoids compiling the crate and its dependencies into binaries that don't need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the release-line-debug-size-lto profile.	2025-02-11 22:06:53 +00:00
John Spray	14e05276a3	storcon: fix a case where optimise could get stuck on unschedulable node (#10648 ) ## Problem When a shard has two secondary locations, but one of them is on a node with MaySchedule::No, the optimiser would get stuck, because it couldn't decide which secondary to remove. This is generally okay if a node is offline, but if a node is in Pause mode for a long period of time, it's a problem. Closes: https://github.com/neondatabase/neon/issues/10646 ## Summary of changes - Instead of insisting on finding a node in the wrong AZ to remove, find an available node in the _right_ AZ, and remove all the others. This ensures that if there is one live suitable node, then other offline/paused nodes cannot hold things up.	2025-02-05 16:05:12 +00:00
Vlad Lazar	47975d06d9	storcon: silence cplane 404s on tenant creation (#10665 ) ## Problem We get WARN log noise on tenant creations. Cplane creates tenants via /location_config. That returns the attached locations in the response and spawns a reconciliation which will also attempt to notify cplane. If the notification is attempted before cplane persists the shards to its database, storcon gets back a 404. The situation is harmless, but annoying. ## Summary of Changes * Add a tenant creation hint to the reconciler config * If the hint is true and we get back a 404 on the notification from cplane, ignore the error, but still queue the reconcile up for a retry. Closes https://github.com/neondatabase/cloud/issues/20732	2025-02-05 12:41:09 +00:00
Arpad Müller	b6e9daea9a	storcon: only allow errrors of the server cert verification (#10644 ) This PR does a bunch of things: * only allow errors of the server cert verification, not of the TLS handshake. The TLS handshake doesn't cause any errors for us so we can just always require it to be valid. This simplifies the code a little. * As the solution is more permanent than originally anticipated, I think it makes sense to move the `AcceptAll` verifier outside. * log the connstr information. this helps with figuring out which domain names are configured in the connstr, etc. I think it is generally useful to print it. make extra sure that the password is not leaked. Follow-up of #10640	2025-02-04 14:01:57 +00:00
John Spray	715e20343a	storage controller: improve scheduling of tenants created in PlacementPolicy::Secondary (#10590 ) ## Problem I noticed when onboarding lots of tenants that the AZ scheduling violation stat was climbing, before falling later as optimisations happened. This was happening because we first add the tenant with PlacementPolicy::Secondary, and then later go to PlacementPolicy::Attached, and the scheduler's behavior led to a bad AZ choice: 1. Create a secondary location in the non-preferred AZ 2. Upgrade to Attached where we promote that non-preferred-AZ location to attached and then create another secondary 3. Optimiser later realises we're in the wrong AZ and moves us ## Summary of changes - Extend some logging to give more information about AZs - When scheduling secondary location in PlacementPolicy::Secondary, select it as if we were attached: in this mode, our business goal is to have a warm pageserver location that we can make available as attached quickly if needed, therefore we want it to be in the preferred AZ. - Make optimize_secondary logic the same, so that it will consider a secondary location in the preferred AZ to be optimal when in PlacementPolicy::Secondary - When transitioning to from PlacementPolicy::Attached(N) to PlacementPolicy::Secondary, instead of arbitrarily picking a location to keep, prefer to keep the location in the preferred AZ	2025-02-03 19:01:16 +00:00
Arpad Müller	c774f0a147	storcon db: allow accepting any TLS certificate (#10640 ) We encountered some TLS validation errors for the storcon since applying #10614. Add an option to downgrade them to logged errors instead to allow us to debug with more peace. cc issue https://github.com/neondatabase/cloud/issues/23583	2025-02-03 18:21:01 +00:00
Arpad Müller	87ad50c925	storcon: use diesel-async again, now with tls support (#10614 ) Successor of #10280 after it was reverted in #10592. Re-introduce the usage of diesel-async again, but now also add TLS support so that we connect to the storcon database using TLS. By default, diesel-async doesn't support TLS, so add some code to make us explicitly request TLS. cc https://github.com/neondatabase/cloud/issues/23583	2025-02-03 11:53:51 +00:00
John Spray	5e0c40709f	storcon: refine chaos selection logic (#10600 ) ## Problem In https://github.com/neondatabase/neon/pull/10438 it was pointed out that it would be good to avoid picking tenants in ID order, and also to avoid situations where we might double-select the same tenant. There was an initial swing at this in https://github.com/neondatabase/neon/pull/10443, where Chi suggested a simpler approach which is done in this PR ## Summary of changes - Split total set of tenants into in and out of home AZ - Consume out of home AZ first, and if necessary shuffle + consume from out of home AZ	2025-01-30 22:45:43 +00:00
John Spray	d18f6198e1	storcon: fix AZ-driven tenant selection in chaos (#10443 ) ## Problem In https://github.com/neondatabase/neon/pull/10438 I had got the function for picking tenants backwards, and it was preferring to move things _away_ from their preferred AZ. ## Summary of changes - Fix condition in `is_attached_outside_preferred_az`	2025-01-30 22:17:07 +00:00
Arpad Müller	4d2c2e9460	Revert "storcon: switch to diesel-async and tokio-postgres (#10280 )" (#10592 ) There was a regression of #10280, tracked in [#23583](https://github.com/neondatabase/cloud/issues/23583). I have ideas how to fix the issue, but we are too close to the release cutoff, so revert #10280 for now. We can revert the revert later :).	2025-01-30 19:23:25 +00:00
Vlad Lazar	c54cd9e76a	storcon: signal LSN wait to pageserver during live migration (#10452 ) ## Problem We've seen the ingest connection manager get stuck shortly after a migration. ## Summary of changes A speculative mitigation is to use the same mechanism as get page requests for kicking LSN ingest. The connection manager monitors LSN waits and queries the broker if no updates are received for the timeline. Closes https://github.com/neondatabase/neon/issues/10351	2025-01-28 17:33:07 +00:00
Arpad Müller	b0b4b7dd8f	storcon: switch to diesel-async and tokio-postgres (#10280 ) Switches the storcon away from using diesel's synchronous APIs in favour of `diesel-async`. Advantages: * less C dependencies, especially no openssl, which might be behind the bug: https://github.com/neondatabase/cloud/issues/21010 * Better to only have async than mix of async plus `spawn_blocking` We had to turn off usage of the connection pool for migrations, as diesel migrations don't support async APIs. Thus we still use `spawn_blocking` in that one place. But this is explicitly done in one of the `diesel-async` examples.	2025-01-27 14:25:11 +00:00
John Spray	7d761a9d22	storage controller: make chaos less disruptive to AZ locality (#10438 ) ## Problem Since #9916 , the chaos code is actively fighting the optimizer: tenants tend to be attached in their preferred AZ, so most chaos migrations were moving them to a non-preferred AZ. ## Summary of changes - When picking migrations, prefer to migrate things _toward_ their preferred AZ when possible. Then pick shards to move the other way when necessary. The resulting behavior should be an alternating "back and forth" where the chaos code migrates thiings away from home, and then migrates them back on the next iteration. The side effect will be that the chaos code actively helps to push things into their home AZ. That's not contrary to its purpose though: we mainly just want it to continuously migrate things to exercise migration+notification code.	2025-01-20 09:47:23 +00:00
John Spray	da13154791	storcon: revise fill logic to prioritize AZ (#10411 ) ## Problem Node fills were limited to moving (total shards / node_count) shards. In systems that aren't perfectly balanced already, that leads us to skip migrating some of the shards that belong on this node, generating work for the optimizer later to gradually move them back. ## Summary of changes - Where a shard has a preferred AZ and is currently attached outside this AZ, then always promote it during fill, irrespective of target fill count	2025-01-16 17:33:46 +00:00
John Spray	2e13a3aa7a	storage controller: handle legacy TenantConf in consistency_check (#10422 ) ## Problem We were comparing serialized configs from the database with serialized configs from memory. If fields have been added/removed to TenantConfig, this generates spurious consistency errors. This is fine in test environments, but limits the usefulness of this debug API in the field. Closes: https://github.com/neondatabase/neon/issues/10369 ## Summary of changes - Do a decode/encode cycle on the config before comparing it, so that it will have exactly the expected fields.	2025-01-16 16:56:44 +00:00
Arpad Müller	e436dcad57	Rename "disabled" safekeeper scheduling policy to "pause" (#10410 ) Rename the safekeeper scheduling policy "disabled" to "pause". A rename was requested in https://github.com/neondatabase/neon/pull/10400#discussion_r1916259124, as the "disabled" policy is meant to be analogous to the "pause" policy for pageservers. Also simplify the `SkSchedulingPolicyArg::from_str` function, relying on the `from_str` implementation of `SkSchedulingPolicy`. Latter is used for the database format as well, so it is quite stable. If we ever want to change the UI, we'll need to duplicate the function again but this is cheap.	2025-01-16 14:30:49 +00:00
Arpad Müller	efaec6cdf8	Add endpoint and storcon cli cmd to set sk scheduling policy (#10400 ) Implementing the last missing endpoint of #9981, this adds support to set the scheduling policy of an individual safekeeper, as specified in the RFC. However, unlike in the RFC we call the endpoint `scheduling_policy` not `status` Closes #9981. As for why not use the upsert endpoint for this: we want to have the safekeeper upsert endpoint be used for testing and for deploying new safekeepers, but not for changes of the scheduling policy. We don't want to change any of the other fields when marking a safekeeper as decommissioned for example, so we'd have to first fetch them only to then specify them again. Of course one can also design an endpoint where one can omit any field and it doesn't get modified, but it's still not great for observability to put everything into one big "change something about this safekeeper" endpoint.	2025-01-15 18:15:30 +00:00
John Spray	47c1640acc	storage controller: pagination for tenant listing API (#10365 ) ## Problem For large deployments, the `control/v1/tenant` listing API can time out transmitting a monolithic serialized response. ## Summary of changes - Add `limit` and `start_after` parameters to listing API - Update storcon_cli to use these parameters and limit requests to 1000 items at a time	2025-01-14 21:37:32 +00:00
John Spray	aa7323a384	storage controller: quality of life improvements for AZ handling (#10379 ) ## Problem Since https://github.com/neondatabase/neon/pull/9916, the preferred AZ of a tenant is much more impactful, and we would like to make it more visible in tooling. ## Summary of changes - Include AZ in node describe API - Include AZ info in node & tenant outputs in CLI - Add metrics for per-node shard counts, labelled by AZ - Add a CLI for setting preferred AZ on a tenant - Extend AZ-setting API+CLI to handle None for clearing preferred AZ	2025-01-14 15:30:43 +00:00
John Spray	fd1368d31e	storcon: rework scheduler optimisation, prioritize AZ (#9916 ) ## Problem We want to do a more robust job of scheduling tenants into their home AZ: https://github.com/neondatabase/neon/issues/8264. Closes: https://github.com/neondatabase/neon/issues/8969 ## Summary of changes ### Scope This PR combines prioritizing AZ with a larger rework of how we do optimisation. The rationale is that just bumping AZ in the order of Score attributes is a very tiny change: the interesting part is lining up all the optimisation logic to respect this properly, which means rewriting it to use the same scores as the scheduler, rather than the fragile hand-crafted logic that we had before. Separating these changes out is possible, but would involve doing two rounds of test updates instead of one. ### Scheduling optimisation `TenantShard`'s `optimize_attachment` and `optimize_secondary` methods now both use the scheduler to pick a new "favourite" location. Then there is some refined logic for whether + how to migrate to it: - To decide if a new location is sufficiently "better", we generate scores using some projected ScheduleContexts that exclude the shard under consideration, so that we avoid migrating from a node with AffinityScore(2) to a node with AffinityScore(1), only to migrate back later. - Score types get a `for_optimization` method so that when we compare scores, we will only do an optimisation if the scores differ by their highest-ranking attributes, not just because one pageserver is lower in utilization. Eventually we _will_ want a mode that does this, but doing it here would make scheduling logic unstable and harder to test, and to do this correctly one needs to know the size of the tenant that one is migrating. - When we find a new attached location that we would like to move to, we will create a new secondary location there, even if we already had one on some other node. This handles the case where we have a home AZ A, and want to migrate the attachment between pageservers in that AZ while retaining a secondary location in some other AZ as well. - A unit test is added for https://github.com/neondatabase/neon/issues/8969, which is implicitly fixed by reworking optimisation to use the same scheduling scores as scheduling.	2025-01-13 19:33:00 +00:00
John Spray	ef8bfacd6b	storage controller: API + CLI for migrating secondary locations (#10284 ) ## Problem Currently, if we want to move a secondary there isn't a neat way to do that: we just have migration API for the attached location, and it is only clean to use that if you've manually created a secondary via pageserver API in the place you're going to move it to. Secondary migration API enables: - Moving the secondary somewhere because we would like to later move the attached location there. - Move the secondary location because we just want to reclaim some disk space from its current location. ## Summary of changes - Add `/migrate_secondary` API - Add `tenant-shard-migrate-secondary` CLI - Add tests for above	2025-01-13 14:52:43 +00:00
John Spray	d1bc36f536	storage controller: fix retries of compute hook notifications while a secondary node is offline (#10352 ) ## Problem We would sometimes fail to retry compute notifications: 1. Try and send, set compute_notify_failure if we can't 2. On next reconcile, reconcile() fails for some other reason (e.g. tried to talk to an offline node), and we fail the `result.is_ok() && must_notify` condition around the re-sending. Closes: https://github.com/neondatabase/cloud/issues/22612 ## Summary of changes - Clarify the meaning of the reconcile result: it should be Ok(()) if configuring attached location worked, even if secondary or detach locations cannot be reached. - Skip trying to talk to secondaries if they're offline - Even if reconcile fails and we can't send the compute notification (we can't send it because we're not sure if it's really attached), make sure we save the `compute_notify_failure` flag so that subsequent reconciler runs will try again - Add a regression test for the above	2025-01-13 13:31:57 +00:00
John Spray	12053cf832	storage controller: improve consistency_check_api (#10363 ) ## Problem Limitations found while using this to investigate https://github.com/neondatabase/neon/issues/10234: - If we hit a node consistency issue, we drop out and don't check shards for consistency - The messages printed after a shard consistency issue are huge, and grafana appears to drop them. ## Summary of changes - Defer node consistency errors until the end of the function, so that we always proceed to check shards for consistency - Print out smaller log lines that just point out the diffs between expected and persistent state	2025-01-13 11:18:14 +00:00
Arpad Müller	23c0748cdd	Remove active column (#10335 ) We don't need or want the `active` column. Remove it. Vlad pointed out that this is safe. Thanks to the separation of the schemata in earlier PRs, this is easy. follow-up of #10205 Part of https://github.com/neondatabase/neon/issues/9981	2025-01-11 02:52:45 +00:00
Vlad Lazar	3cd5034eac	storcon: don't assume host:port format in storcon client (#10347 ) ## Problem https://github.com/neondatabase/infra/pull/2725 updated the scrubber to use a non-host port endpoint for storcon. That breaks when unwrapping the port. ## Summary of changes Support both `host:port` and `host` formats for the storcon api.	2025-01-10 18:35:16 +00:00
Arpad Müller	bebc46e713	Add scheduling_policy column to safekeepers table (#10205 ) Add a `scheduling_policy` column to the safekeepers table of the storage controller. Part of #9981	2025-01-09 15:55:02 +00:00
John Spray	ac6cca17ac	storcon: don't log a heartbeat error during shutdown (#10325 ) ## Problem Occasionally we see an unexpected error like: ``` ERROR spawn_heartbeat_driver: Failed to update node state 1 after heartbeat round: Shutting down\n') Hint: use scripts/check_allowed_errors.sh to test any new allowed_error you add ``` https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10324/12690404952/index.html#/testresult/63406a0687bf6eca ## Summary of changes - Explicitly handle ApiError::ShuttingDown as a no-op when mutating node status	2025-01-09 15:33:44 +00:00
John Spray	68d8acfd05	storage controller: don't hold detached tenants in memory (#10264 ) ## Problem Typical deployments of neon have some tenants that stay in use continuously, and a background churning population of tenants that are created and then fall idle, and are configured to Detached state. Currently, this churn of short lived tenants results in an ever-increasing memory footprint. Closes: https://github.com/neondatabase/neon/issues/9712 ## Summary of changes - At startup, filter to only load shards that don't have Detached policy - In process_result, check if a tenant's shards are all Detached and observed=={}, and if so drop them from memory - In tenant_location_conf and other tenant mutators, load the tenants' shards on-demand if they are not present	2025-01-08 18:12:09 +00:00
Vlad Lazar	dc284247a5	storage_controller: fix node flap detach race (#10298 ) ## Problem The observed state removal may race with the inline updates of the observed state done from `Service::node_activate_reconcile`. This was intended to work as follows: 1. Detaches while the node is unavailable remove the entry from the observed state. 2. `Service::node_activate_reconcile` diffs the locations returned by the pageserver with the observed state and detaches in-line when required. ## Summary of changes This PR removes step (1) and lets background reconciliations deal with the mismatch between the intent and observed state. A follow up will attempt to remove `Service::node_activate_reconcile` altogether. Closes https://github.com/neondatabase/neon/issues/10253	2025-01-08 10:26:53 +00:00
John Spray	c08759f367	storcon: verbose logs in rare case of shards not attached yet (#10262 ) ## Problem When we do a timeline CRUD operation, we check that the shards we need to mutate are currently attached to a pageserver, by reading `generation` and `generation_pageserver` from the database. If any don't appear to be attached, we respond with a a 503 and "One or more shards in tenant is not yet attached". This is happening more often than expected, and it's not obvious with current logging what's going on: specifically which shard has a problem, and exactly what we're seeing in these persistent generation columns. (Aside: it's possible that we broke something with the change in #10011 which clears generation_pageserver when we detach a shard, although if so the mechanism isn't trivial: what should happen is that if we stamp on generation_pageserver if a reconciler is running, then it shouldn't matter because we're about to ## Summary of changes - When we are in Attached mode but find that generation_pageserver/generation are unset, output details while looping over shards.	2025-01-03 10:55:15 +00:00
John Spray	2d4f267983	cargo: update diesel, pq-sys (#10256 ) ## Problem Versions of `diesel` and `pq-sys` were somewhat stale. I was checking on libpq->openssl versions while investigating a segfault via https://github.com/neondatabase/cloud/issues/21010. I don't think these rust bindings are likely to be the source of issues, but we might as well freshen them as a precaution. ## Summary of changes - Update diesel to 2.2.6 - Update pq-sys to 0.6.3	2025-01-03 10:20:18 +00:00
Arpad Müller	85696297c5	Add safekeepers command to storcon_cli for listing (#10151 ) Add a `safekeepers` subcommand to `storcon_cli` that allows listing the safekeepers. ``` $ curl -X POST --url http://localhost:1234/control/v1/safekeeper/42 --data \ '{"active":true, "id":42, "created_at":"2023-10-25T09:11:25Z", "updated_at":"2024-08-28T11:32:43Z","region_id":"neon_local","host":"localhost","port":5454,"http_port":0,"version":123,"availability_zone_id":"us-east-2b"}' $ cargo run --bin storcon_cli -- --api http://localhost:1234 safekeepers Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.38s Running `target/debug/storcon_cli --api 'http://localhost:1234' safekeepers` +----+---------+-----------+------+-----------+------------+ \| Id \| Version \| Host \| Port \| Http Port \| AZ Id \| +==========================================================+ \| 42 \| 123 \| localhost \| 5454 \| 0 \| us-east-2b \| +----+---------+-----------+------+-----------+------------+ ``` Also: * Don't return the raw `SafekeeperPersistence` struct that contains the raw database presentation, but instead a new `SafekeeperDescribeResponse` struct. * The `SafekeeperPersistence` struct leaves out the `active` field on purpose because we want to deprecate it and replace it with a `scheduling_policy` one. Part of https://github.com/neondatabase/neon/issues/9981	2024-12-18 12:47:56 +00:00
John Spray	fd230227f2	storcon: include preferred AZ in compute notifications (#9953 ) ## Problem It is unreliable for the control plane to infer the AZ for computes from where the tenant is currently attached, because if a tenant happens to be in a degraded state or a release is ongoing while a compute starts, then the tenant's attached AZ can be a different one to where it will run long-term, and the control plane doesn't check back later to restart the compute. This can land in parallel with https://github.com/neondatabase/neon/pull/9947 ## Summary of changes - Thread through the preferred AZ into the compute hook code via the reconciler - Include the preferred AZ in the body of compute hook notifications	2024-12-17 20:04:09 +00:00
Conrad Ludgate	6565fd4056	chore: fix clippy lints 2024-12-06 (#10138 )	2024-12-16 15:33:21 +00:00
John Spray	a93e3d31cc	storcon: refine logic for choosing AZ on tenant creation (#10054 ) ## Problem When we update our scheduler/optimization code to respect AZs properly (https://github.com/neondatabase/neon/pull/9916), the choice of AZ becomes a much higher-stakes decision. We will pretty much always run a tenant in its preferred AZ, and that AZ is fixed for the lifetime of the tenant (unless a human intervenes) Eventually, when we do auto-balancing based on utilization, I anticipate that part of that will be to automatically change the AZ of tenants if our original scheduling decisions have caused imbalance, but as an interim measure, we can at least avoid making this scheduling decision based purely on which AZ contains the emptiest node. This is a precursor to https://github.com/neondatabase/neon/pull/9947 ## Summary of changes - When creating a tenant, instead of scheduling a shard and then reading its preferred AZ back, make the AZ decision first. - Instead of choosing AZ based on which node is emptiest, use the median utilization of nodes in each AZ to pick the AZ to use. This avoids bad AZ decisions during periods when some node has very low utilization (such as after replacing a dead node) I considered also making the selection a weighted pseudo-random choice based on utilization, but wanted to avoid destabilising tests with that for now.	2024-12-12 19:35:38 +00:00
Arpad Müller	342cbea255	storcon: add safekeeper list API (#10089 ) This adds an API to the storage controller to list safekeepers registered to it. This PR does a `diesel print-schema > storage_controller/src/schema.rs` because of an inconsistency between up.sql and schema.rs, introduced by [this](`2c142f14f7`) commit, so there is some updates of `schema.rs` due to that. As a followup to this, we should maybe think about running `diesel print-schema` in CI. Part of #9981	2024-12-12 01:09:24 +00:00
Vlad Lazar	e8395807a5	storcon: allow for more concurrency in drain/fill operations (#10093 ) ## Problem We saw the drain/fill operations not drain fast enough in ap-southeast. ## Summary of changes These are some quick changes to speed it up: * double reconcile concurrency - this is now half of the available reconcile bandwidth * reduce the waiter polling timeout - this way we can spawn new reconciliations faster	2024-12-11 19:43:40 +00:00
Vlad Lazar	a3e80448e8	pageserver/storcon: add patch endpoints for tenant config metrics (#10020 ) ## Problem Cplane and storage controller tenant config changes are not additive. Any change overrides all existing tenant configs. This would be fine if both did client side patching, but that's not the case. Once this merges, we must update cplane to use the PATCH endpoint. ## Summary of changes ### High Level Allow for patching of tenant configuration with a `PATCH /v1/tenant/config` endpoint. It takes the same data as it's PUT counterpart. For example the payload below will update `gc_period` and unset `compaction_period`. All other fields are left in their original state. ``` { "tenant_id": "1234", "gc_period": "10s", "compaction_period": null } ``` ### Low Level * PS and storcon gain `PATCH /v1/tenant/config` endpoints. PS endpoint is only used for cplane managed instances. * `storcon_cli` is updated to have separate commands for `set-tenant-config` and `patch-tenant-config` Related https://github.com/neondatabase/cloud/issues/21043	2024-12-11 19:16:33 +00:00
John Spray	ec790870d5	storcon: automatically clear Pause/Stop scheduling policies to enable detaches (#10011 ) ## Problem We saw a tenant get stuck when it had been put into Pause scheduling mode to pin it to a pageserver, then it was left idle for a while and the control plane tried to detach it. Close: https://github.com/neondatabase/neon/issues/9957 ## Summary of changes - When changing policy to Detached or Secondary, set the scheduling policy to Active. - Add a test that exercises this - When persisting tenant shards, set their `generation_pageserver` to null if the placement policy is not Attached (this enables consistency checks to work, and avoids leaving state in the DB that could be confusing/misleading in future)	2024-12-07 13:05:09 +00:00
Erik Grinaker	db79304416	storage_controller: increase shard scan timeout (#10000 ) ## Problem The node shard scan timeout of 1 second is a bit too aggressive, and we've seen this cause test failures. The scans are performed in parallel across nodes, and the entire operation has a 15 second timeout. Resolves #9801. ## Summary of changes Increase the timeout to 5 seconds. This is still enough to time out on a network failure and retry successfully within 15 seconds.	2024-12-05 17:29:21 +00:00
Erik Grinaker	699a213c5d	Display reqwest error source (#10004 ) ## Problem Reqwest errors don't include details about the inner source error. This means that we get opaque errors like: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config) ``` Instead of the more helpful: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out ``` Touches #9801. ## Summary of changes Include the source error for `reqwest::Error` wherever it's displayed.	2024-12-04 13:05:53 +00:00
Vlad Lazar	68205c48ed	storcon: return an error for drain attempts while paused (#9997 ) ## Problem We currently allow drain operations to proceed while the node policy is paused. ## Summary of changes Return a precondition failed error in such cases. The orchestrator is updated in https://github.com/neondatabase/infra/pull/2544 to skip drain and fills if the pageserver is paused. Closes: https://github.com/neondatabase/neon/issues/9907	2024-12-04 09:25:29 +00:00
John Spray	71d004289c	storcon: in shard splits, inherit parent's AZ (#9946 ) ## Problem Sharded tenants should be run in a single AZ for best performance, so that computes have AZ-local latency to all the shards. Part of https://github.com/neondatabase/neon/issues/8264 ## Summary of changes - When we split a tenant, instead of updating each shard's preferred AZ to wherever it is scheduled, propagate the preferred AZ from the parent. - Drop the check in `test_shard_preferred_azs` that asserts shards end up in their preferred AZ: this will not be true again until the optimize_attachment logic is updated to make this so. The existing check wasn't testing anything about scheduling, it was just asserting that we set preferred AZ in a way that matches the way things happen to be scheduled at time of split.	2024-12-03 16:55:00 +00:00
John Spray	aaee713e53	storcon: use proper schedule context during node delete (#9958 ) ## Problem I was touching `test_storage_controller_node_deletion` because for AZ scheduling work I was adding a change to the storage controller (kick secondaries during optimisation) that made a FIXME in this test defunct. While looking at it I also realized that we can easily fix the way node deletion currently doesn't use a proper ScheduleContext, using the iterator type recently added for that purpose. ## Summary of changes - A testing-only behavior in storage controller where if a secondary location isn't yet ready during optimisation, it will be actively polled. - Remove workaround in `test_storage_controller_node_deletion` that previously was needed because optimisation would get stuck on cold secondaries. - Update node deletion code to use a `TenantShardContextIterator` and thereby a proper ScheduleContext	2024-12-03 08:59:38 +00:00
John Spray	bd09369198	storcon: add metric for AZ scheduling violations (#9949 ) ## Problem We can't easily tell how far the state of shards is from their AZ preferences. This can be a cause of performance issues, so it's important for diagnosability that we can tell easily if there are significant numbers of shards that aren't running in their preferred AZ. Related: https://github.com/neondatabase/cloud/issues/15413 ## Summary of changes - In reconcile_all, count shards that are scheduled into the wrong AZ (if they have a preference), and publish it as a prometheus gauge. - Also calculate a statistic for how many shards wanted to reconcile but couldn't. This is clearly a lazy calculation: reconcile all only runs periodically. But that's okay: shards in the wrong AZ is something that only matters if it stays that way for some period of time.	2024-12-02 11:50:22 +00:00
John Spray	14853a3284	storcon: don't take any Service locks in /status and /ready (#9944 ) ## Problem We saw unexpected container terminations when running in k8s with with small CPU resource requests. The /status and /ready handlers called `maybe_forward`, which always takes the lock on Service::inner. If there is a lot of writer lock contention, and the container is starved of CPU, this increases the likelihood that we will get killed by the kubelet. It isn't certain that this was a cause of issues, but it is a potential source that we can eliminate. ## Summary of changes - Revise logic to return immediately if the URL is in the non-forwarded list, rather than calling maybe_forward	2024-12-01 18:09:58 +00:00

1 2 3 4

172 Commits