Commit Graph

328 Commits

Author SHA1 Message Date
Arpad Müller
e3d27b2f68 Start safekeeper node IDs with 0 and forbid 0 from registering (#11419)
Right now we start safekeeper node ids at 0. However, other code treats
0 as invalid (see #11407). We decided on latter. Therefore, make the
register python tests register safekeepers starting at node id 1 instead
of 0, and forbid safekeepers with id 0 from registering.

Context:
https://github.com/neondatabase/neon/pull/11407#discussion_r2024852328
2025-04-02 18:36:50 +00:00
Alex Chi Z.
dd1299f337 feat(storcon): passthrough mark invisible and add tests (#11401)
## Problem

close https://github.com/neondatabase/neon/issues/11279

## Summary of changes

* Allow passthrough of other methods in tenant timeline shard0
passthrough of storcon.
* Passthrough mark invisible API in storcon.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-04-02 17:11:49 +00:00
Erik Grinaker
9df230c837 storcon: improve autosplit defaults (#11332)
## Problem

In #11122, we changed the autosplit behavior to allow repeated and
initial splits. The defaults were set such that they retain the current
production settings (8 shards at 64 GB). However, these defaults don't
really make sense by themselves.

Once we deploy new settings to production, we should change the defaults
to something more reasonable.

## Summary of changes

Changes the following default settings:

* `max_split_shards`: 8 → 16
* `initial_split_threshold`: 64 GB → disabled
* `initial_split_shards`: 8 → 2
2025-04-02 15:11:52 +00:00
John Spray
7dc8370848 storcon: do graceful migrations from chaos injector (#11028)
## Problem

Followup to https://github.com/neondatabase/neon/pull/10913

Existing chaos injection just does simple cutovers to secondary
locations. Let's also exercise code for doing graceful migrations. This
should implicitly test how such migrations cope with overlapping with
service restarts.

## Summary of changes
2025-04-02 10:08:05 +00:00
Arpad Müller
1fad1abb24 storcon: timeline deletion improvements and fixes (#11334)
This PR contains a bunch of smaller followups and fixes of the original
PR #11058. most of these implement suggestions from Arseny:

* remove `Queryable, Selectable` from `TimelinePersistence`: they are
not needed.
* no `Arc` around `CancellationToken`: it itself is an arc wrapper
* only schedule deletes instead of scheduling excludes and deletes
* persist and delete deletion ops
* delete rows in timelines table upon tenant and timeline deletion
* set `deleted_at` for timelines we are deleting before we start any
reconciles: this flag will help us later to recognize half-executed
deletions, or when we crashed before we could remove the timeline row
but after we removed the last pending op (handling these situations are
left for later).

Part of #9011
2025-04-01 15:16:33 +00:00
Alex Chi Z.
47d47000df fix(pageserver): passthrough lsn lease in storcon API (#11386)
## Problem

part of https://github.com/neondatabase/cloud/issues/23667

## Summary of changes

lsn_lease API can only be used on pageservers. This patch enables
storcon passthrough.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-31 19:16:42 +00:00
Vlad Lazar
9fc7c22cc9 storcon: add use_local_compute_notifications flag (#11333)
## Problem

While working on bulk import, I want to use the `control-plane-url` flag
for a different request.
Currently, the local compute hook is used whenever no control plane is
specified in the config.
My test requires local compute notifications and a configured
`control-plane-url` which isn't supported.

## Summary of changes

Add a `use-local-compute-notifications` flag. When this is set, we use
the local flow regardless of other config values.
It's enabled by default in neon_local and disabled by default in all
other envs. I had to turn the flag off in tests
that wish to bypass the local flow, but that's expected.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2025-03-21 15:31:06 +00:00
Dmitrii Kovalkov
aeb53fea94 storage: support multiple SSL CA certificates (#11341)
## Problem
- We need to support multiple SSL CA certificates for graceful root CA
certificate rotation.
- Closes: https://github.com/neondatabase/cloud/issues/25971

## Summary of changes
- Parses `ssl_ca_file` as a pem bundle, which may contain multiple
certificates. Single pem cert is a valid pem bundle, so the change is
backward compatible.
2025-03-21 13:43:38 +00:00
Dmitrii Kovalkov
0f367cb665 storcon: reuse reqwest http client (#11327)
## Problem

- Part of https://github.com/neondatabase/neon/issues/11113
- Building a new `reqwest::Client` for every request is expensive
because it parses CA certs under the hood. It's noticeable in storcon's
flamegraph.

## Summary of changes
- Reuse one `reqwest::Client` for all API calls to avoid parsing CA
certificates every time.
2025-03-21 11:48:22 +00:00
John Spray
76088c16d2 storcon: reproduce shard split issue (#11290)
## Problem

Issue https://github.com/neondatabase/neon/issues/11254 describes a case
where restart during a shard split can result in a bad end state in the
database.

## Summary of changes

- Add a reproducer for the issue
- Tighten an existing safety check around updated row counts in
complete_shard_split
2025-03-21 08:48:56 +00:00
Dmitrii Kovalkov
28fc051dcc storage: live ssl certificate reload (#11309)
## Problem
SSL certs are loaded only during start up. It doesn't allow the rotation
of short-lived certificates without server restart.

- Closes: https://github.com/neondatabase/cloud/issues/25525

## Summary of changes
- Implement `ReloadingCertificateResolver` which reloads certificates
from disk periodically.
2025-03-20 16:26:27 +00:00
Erik Grinaker
65d690b21d storcon: add repeated auto-splits and initial splits (#11122)
## Problem

Currently, we only split tenants into 8 shards once, at the 64 GB split
threshold. For very large tenants, we need to keep splitting to avoid
huge shards. And we also want to eagerly split at a lower threshold to
improve throughput during initial ingestion.

See
https://github.com/neondatabase/cloud/issues/22532#issuecomment-2706215907
for details.

Touches https://github.com/neondatabase/cloud/issues/22532.
Requires #11157.

## Summary of changes

This adds parameters and logic to enable repeated splits when a tenant's
largest timeline divided by shard count exceeds `split_threshold`, as
well as eager initial splits at a lower threshold to speed up initial
ingestion. The default parameters are all set such that they retain the
current behavior in production (only split into 8 shards once, at 64
GB).

* `split_threshold` now specifies a maximum shard size. When a shard
exceeds it, all tenant shards are split by powers of 2 such that all
tenant shards fall below `split_threshold`. Disabled by default, like
today.
* Add `max_split_shards` to specify a max shard count for autosplits.
Defaults to 8 to retain current behavior.
* Add `initial_split_threshold` and `initial_split_shards` to specify a
threshold and target count for eager splits of unsharded tenants.
Defaults to 64 GB and 8 shards to retain current production behavior.

Because this PR sets `initial_split_threshold` to 64 GB by default, it
has the effect of enabling autosplits by default. This was not the case
previously, since `split_threshold` defaults to None, but it is already
enabled across production and staging. This is temporary until we
complete the production rollout.

For more details, see code comments.

This must wait until #11157 has been deployed to Pageservers.

Once this has been deployed to production, we plan to change the
parameters to:

* `split-threshold`: 256 GB
* `initial-split-threshold`: 16 GB
* `initial-split-shards`: 4
* `max-split-shards`: 16

The final split points will thus be:

* Start: 1 shard
* 16 GB: 4 shards
* 1 TB: 8 shards
* 2 TB: 16 shards

We will then change the default settings to be disabled by default.

---------

Co-authored-by: John Spray <john@neon.tech>
2025-03-20 15:43:57 +00:00
Arpad Müller
91dad2514f storcon: also support tenant deletion for safekeepers (#11289)
If a tenant gets deleted, delete also all of its timelines. We assume
that by the time a tenant is being deleted, no new timelines are being
created, so we don't need to worry about races with creation in this
situation.

Unlike #11233, which was very simple because it listed the timelines and
invoked timeline deletion, this PR obtains a list of safekeepers to
invoke the tenant deletion on, and then invokes tenant deletion on each
safekeeper that has one or multiple timelines.

Alternative to #11233
Builds on #11288
Part of #9011
2025-03-20 10:52:21 +00:00
Dmitrii Kovalkov
9bf59989db storcon: add https API (#11239)
## Problem

Pageservers use unencrypted HTTP requests for storage controller API.

- Closes: https://github.com/neondatabase/cloud/issues/25524

## Summary of changes

- Replace hyper0::server::Server with http_utils::server::Server in
storage controller.
- Add HTTPS handler for storage controller API.
- Support `ssl_ca_file` in pageserver.
2025-03-20 08:22:02 +00:00
Arpad Müller
2cf6ae76fc storcon: move safekeeper related stuff out of service.rs (#11288)
There is no functional change here. We move safekeeper related code from
`service.rs` to `service/safekeeper_service.rs`, so that safekeeper
related stuff is contained in a single file. This also helps with
preventing `service.rs` from growing even further.

Part of #9011.
2025-03-18 09:00:53 +00:00
Arpad Müller
0d3d639ef3 storcon: remove timeouts for safekeeper heartbeating (#11232)
PRs #10891 and #10902 have time-bounded the safekeeper heartbeating of
the storage controller. Those timeouts were not meant to be permanent,
but temporary until we figured out the reasons for the safekeeper
heartbeating causing problems.

Now they are better understood and resolved. A comment is
[here](https://github.com/neondatabase/cloud/issues/24396#issuecomment-2679342929),
but most importantly, we've had:

* #10954 to send heartbeats concurrently (before the issue was we sent
them sequentially, so the total time time was number of nodes times time
for timeout to be hit, now the total time is the maximum of all things
we are heartbeating)
* work to actually make heartbeats work and not error, i.e. JWT rollout
for storcon, not sending heartbeats to decomissioned safekeepers,
removal of decomissioned safekeepers from the databases

Part of https://github.com/neondatabase/cloud/issues/25473
2025-03-18 03:37:45 +00:00
Arpad Müller
56149a046a Add test_explicit_timeline_creation_storcon and make it work (#11261)
Adds a basic test that makes the storcon issue explicit creation of a
timeline on safeekepers (main storcon PR in #11058). It was adapted from
`test_explicit_timeline_creation` from #11002.

Also, do a bunch of fixes needed to get the test work (the API
definitions weren't correct), and log more stuff when we can't create a
new timeline due to no safekeepers being active.

Part of #9011

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2025-03-17 16:28:21 +00:00
John Spray
a674ed8caf storcon: safety check when completing shard split (#11256)
## Problem

There is a rare race between controller graceful deployment and shard
splitting where we may incorrectly both abort _and_ complete the split
(on different pods), and thereby leave no shards at all in the database.

Related: #11254

## Summary of changes

- In complete_shard_split, refuse to delete anything if child shards are
not found
2025-03-14 20:08:24 +00:00
Christian Schwarz
04370b48b3 fix(storcon): optimization validation makes decisions based on wrong SecondaryProgress (#11229)
# Refs

- fixes https://github.com/neondatabase/neon/issues/11228

# Problem High-Level

When storcon validates whether a `ScheduleOptimizationAction` should be
applied, it retrieves the `tenant_secondary_status` to determine whether
a secondary is ready for the optimization.

When collecting results, it associates secondary statuses with the wrong
optimization actions in the batch of optimizations that we're
validating.

The result is that we make the decision for shard/location X based on
the SecondaryStatus of a random secondary location Y in the current
batch of optimizations.

A possible symptom is an early cutover, as seen in this engineering
investigation here:
- https://github.com/neondatabase/cloud/issues/25734

# Problem Code-Level

This code here in `optimize_all_validate`


97e2e27f68/storage_controller/src/service.rs (L7012-L7029)

zips the `want_secondary_status` with the Vec returned from
`tenant_for_shards_api` .

However, the Vec returned from `want_secondary_status` is not ordered
(it uses FuturesUnordered internally).

# Solution

Sort the Vec in input order before returning it.

`optimize_all_validate` was the only caller affected by this problem

While at it, also future-proof similar-looking function
`tenant_for_shards`.
None of its callers care about the order, but this type of function
signature is easy to use incorrectly.

# Future Work

Avoid the additional iteration, map, and allocation.

Change API to leverage AsyncFn (async closure).
And/or invert `tenant_for_shards_api` into a Future ext trait / iterator
adaptor thing.
2025-03-14 11:21:16 +00:00
Arpad Müller
5359cf717c storcon: add API definitions for exclude_timeline and term_bump (#11197)
Adds API definitions for the safekeeper API endpoints `exclude_timeline`
and `term_bump`. Also does a bugfix to return the correct type from
`delete_timeline`.

Part of #8614
2025-03-14 00:00:37 +00:00
Alex Chi Z.
23b713900e feat(storcon): passthrough ancestor detach behavior (#11199)
## Problem

https://github.com/neondatabase/neon/issues/10310
https://github.com/neondatabase/neon/pull/11158

## Summary of changes

We need to passthrough the new detach behavior through the storcon API.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-03-13 20:21:23 +00:00
Erik Grinaker
5a245a837d storcon: retain stripe size when autosplitting sharded tenants (#11194)
## Problem

Autosplits always request `DEFAULT_STRIPE_SIZE` for splits. However,
splits do not allow changing the stripe size of already-sharded tenants,
and will error out if it differs.

In #11168, we are changing the stripe size, which could hit this when
attempting to autosplit already sharded tenants.

Touches #11168.

## Summary of changes

Pass `new_stripe_size: None` when autosplitting already sharded tenants.
Otherwise, pass `DEFAULT_STRIPE_SIZE` instead of the shard identity's
stripe size, since we want to use the current default rather than an
old, persisted default.
2025-03-13 13:28:10 +00:00
Vlad Lazar
02a83913ec storcon: do not update observed state on node activation (#11155)
## Problem

When a node becomes active, we query its locations and update the
observed state in-place.
This can race with the observed state updates done when processing
reconcile results.

## Summary of changes

The argument for this reconciliation step is that is reduces the need
for background reconciliations.
I don't think is actually true anymore. There's two cases.

1. Restart of node after drain. Usually the node does not go through the
offline state here, so observed locations
were not marked as none. In any case, there should be a handful of
shards max on the node since we've just drained it.
2. Node comes back online after failure or network partition. When the
node is marked offline, we reschedule everything away from it. When it
later becomes active, the previous observed location is extraneous and
requires a reconciliation anyway.

Closes https://github.com/neondatabase/neon/issues/11148
2025-03-12 15:31:28 +00:00
Erik Grinaker
c7717c85c7 storcon,pageserver: use persisted stripe size when loading unsharded tenants (#11193)
## Problem

When the storage controller and Pageserver loads tenants from persisted
storage, it uses `ShardIdentity::unsharded()` for unsharded tenants.
However, this replaces the persisted stripe size of unsharded tenants
with the default stripe size.

This doesn't really matter for practical purposes, since the stripe size
is meaningless for unsharded tenants anyway, but can cause consistency
check failures if the persisted stripe size differs from the default.
This was seen in #11168, where we change the default stripe size.

Touches #11168.

## Summary of changes

Carry over the persisted stripe size from `TenantShardPersistence` for
unsharded tenants, and from `LocationConf` on Pageservers.

Also add bounds checks for type casts when loading persisted shard
metadata.
2025-03-12 15:16:54 +00:00
Arpad Müller
da2431f11f storcon: add --control-plane-url config option (#11173)
Adds the `--control-plane-url` config option to the storcon, which we
want to migrate to instead of using `notify-attach`.

Part of #11163
2025-03-12 02:30:56 +00:00
Arpad Müller
4d3c477689 storcon: timetime table, creation and deletion (#11058)
This PR extends the storcon with basic safekeeper management of
timelines, mainly timeline creation and deletion. We want to make the
storcon manage safekeepers in the future. Timeline creation is
controlled by the `--timelines-onto-safekeepers` flag.

1. it adds the `timelines` and `safekeeper_timeline_pending_ops` tables
to the storcon db
2. extend code for the timeline creation and deletion
4. it adds per-safekeeper reconciler tasks 

TODO:

* maybe not immediately schedule reconciliations for deletions but have
a prior manual step
* tenant deletions
* add exclude API definitions (probably separate PR)
* how to choose safekeeper to do exclude on vs deletion? this can be a
bit hairy because the safekeeper might go offline in the meantime.
* error/failure case handling
* tests (cc test_explicit_timeline_creation from #11002)
* single safekeeper mode: we often only have one SK (in tests for
example)
* `notify-safekeepers` hook:
https://github.com/neondatabase/neon/issues/11163

TODOs implemented:

* cancellations of enqueued reconciliations on a per-timeline basis,
helpful if there is an ongoing deletion
* implement pending ops overwrite behavior
* load pending operations from db

RFC section for important reading:
[link](https://github.com/neondatabase/neon/blob/main/docs/rfcs/035-safekeeper-dynamic-membership-change.md#storage_controller-implementation)

Implements the bulk of #9011
Successor of #10440.

---------

Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
2025-03-11 02:31:22 +00:00
Dmitrii Kovalkov
63b22d3fb1 pageserver: https for management API (#11025)
## Problem

Storage controller uses unencrypted HTTP requests for pageserver
management API.

Closes: https://github.com/neondatabase/cloud/issues/24283


## Summary of changes
- Implement `http_utils::server::Server` with TLS support.
- Replace `hyper0::server::Server` with `http_utils::server::Server` in
pageserver.
- Add HTTPS handler for pageserver management API.
- Generate local SSL certificates in neon local.
2025-03-10 15:07:59 +00:00
Dmitrii Kovalkov
66881b4394 storcon: update scheduler stats when changing node's preferred az (#11147)
## Problem

`home_shard_count` is not updated on the preferred AZ change.

Closes: https://github.com/neondatabase/neon/issues/10493

## Summary of changes
- Update scheduler stats (node ref counts) on preferred AZ change.
2025-03-10 11:33:00 +00:00
Dmitrii Kovalkov
e876794ce5 storcon: use https safekeeper api (#11065)
## Problem

Storage controller uses http for requests to safekeeper management API.

Closes: https://github.com/neondatabase/cloud/issues/24835

## Summary of changes
- Add `use_https_safekeeper_api` option to storcon to use https api
- Use https for requests to safekeeper management API if this option is
enabled
- Add `ssl_ca_file` option to storcon for ability to specify custom root
CA certificate
2025-03-07 17:22:47 +00:00
John Spray
87e6117dfd storage controller: API-driven graceful migrations (#10913)
## Problem

The current migration API does a live migration, but if the destination
doesn't already have a secondary, that live migration is unlikely to be
able to warm up a tenant properly within its timeout (full warmup of a
big tenant can take tens of minutes).

Background optimisation code knows how to do this gracefully by creating
a secondary first, but we don't currently give a human a way to trigger
that.

Closes: https://github.com/neondatabase/neon/issues/10540

## Summary of changes

- Add `prefererred_node` parameter to TenantShard, which is respected by
optimize_attachment
- Modify migration API to have optional prewarm=true mode, in which we
set preferred_node and call optimize_attachment, rather than directly
modifying intentstate
- Require override_scheduler=true flag if migrating somewhere that is a
less-than-optimal scheduling location (e.g. wrong AZ)
- Add `origin_node_id` to migration API so that callers can ensure
they're moving from where they think they're moving from
- Add tests for the above

The storcon_cli wrapper for this has a 'watch' mode that waits for
eventual cutover. This doesn't show the warmth of the secondary evolve
because we don't currently have an API for that in the controller, as
the passthrough API only targets attached locations, not secondaries. It
would be straightforward to add later as a dedicated endpoint for
getting secondary status, then extend the storcon_cli to consume that
and print a nice progress indicator.
2025-03-07 17:02:38 +00:00
Erik Grinaker
eedd179f0c storcon: initial autosplit tweaks (#11134)
## Problem

This patch makes some initial tweaks as preparation for
https://github.com/neondatabase/cloud/issues/22532, where we will be
introducing additional autosplit logic.

The plan is outlined in
https://github.com/neondatabase/cloud/issues/22532#issuecomment-2706215907.

## Summary of changes

Minor code cleanups and behavioral changes:

* Decide that we'll split based on `max_logical_size` (later possibly
`total_logical_size`).
* Fix a bug that would split the smallest candidate rather than the
largest.
* Pick the largest candidate by `max_logical_size` rather than
`resident_size`, for consistency (debatable).
* Split out `get_top_tenant_shards()` to fetch split candidates.
* Fetch candidates concurrently from all nodes.
* Make `TenantShard.get_scheduling_policy()` return a copy instead of a
reference.
2025-03-07 14:38:01 +00:00
Arpad Müller
f1b18874c3 storcon: require safekeeper jwt's in strict mode (#11116)
We have introduced the ability to specify safekeeper JWTs for the
storage controller. It now does heartbeats. We now want to also require
the presence of those JWTs.

Let's merge this PR shortly after the release cutoff.

Part of / follow-up of
https://github.com/neondatabase/cloud/issues/24727
2025-03-07 13:29:48 +00:00
Arpad Müller
2d45522fa6 storcon db: load safekeepers from DB again (#11087)
Earlier PR #11041 soft-disabled the loading code for safekeepers from
the storcon db. This PR makes us load the safekeepers from the database
again, now that we have [JWTs available on
staging](https://github.com/neondatabase/neon/pull/11087) and soon on
prod.

This reverts commit 23fb8053c5.

Part of https://github.com/neondatabase/cloud/issues/24727
2025-03-05 15:45:43 +00:00
Erik Grinaker
65addfc524 storcon: add per-tenant rate limiting for API requests (#10924)
## Problem

Incoming requests often take the service lock, and sometimes even do
database transactions. That creates a risk that a rogue client can
starve the controller of the ability to do its primary job of
reconciling tenants to an available state.

## Summary of changes

* Use the `governor` crate to rate limit tenant requests at 10 requests
per second. This is ~10-100x lower than the worst "attack" we've seen
from a client bug. Admin APIs are not rate limited.
* Add a `storage_controller_http_request_rate_limited` histogram for
rate limited requests.
* Log a warning every 10 seconds for rate limited tenants.

The rate limiter is parametrized on TenantId, because the kinds of
client bug we're protecting against generally happen within tenant
scope, and the rates should be somewhat stable: we expect the global
rate of requests to increase as we do more work, but we do not expect
the rate of requests to one tenant to increase.

---------

Co-authored-by: John Spray <john@neon.tech>
2025-03-03 22:04:59 +00:00
Alex Chi Z.
f79ee0bb88 fix(storcon): loop in chaos injection (#11004)
## Problem

Somehow the previous patch loses the loop in the chaos injector function
so everything will only run once.
https://github.com/neondatabase/neon/pull/10934

## Summary of changes

Add back the loop.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-28 15:49:15 +00:00
Vlad Lazar
23fb8053c5 storcon: soft disable SK heartbeats (#11041)
## Problem

JWT tokens aren't in place, so all SK heartbeats fail. This is
equivalent to a wait before applying the PS heartbeats and makes things
more flaky.

## Summary of Changes

Add a flag that skips loading SKs from the db on start-up and at
runtime.
2025-02-28 15:49:09 +00:00
Vlad Lazar
0d6d58bd3e pageserver: make heatmap layer download API more cplane friendly (#10957)
## Problem

We intend for cplane to use the heatmap layer download API to warm up
timelines after unarchival. It's tricky for them to recurse in the
ancestors,
and the current implementation doesn't work well when unarchiving a
chain
of branches and warming them up.

## Summary of changes

* Add a `recurse` flag to the API. When the flag is set, the operation
recurses into the parent
timeline after the current one is done.
* Be resilient to warming up a chain of unarchived branches. Let's say
we unarchived `B` and `C` from
the `A -> B -> C` branch hierarchy. `B` got unarchived first. We
generated the unarchival heatmaps
and stash them in `A` and `B`. When `C` unarchived, it dropped it's
unarchival heatmap since `A` and `B`
already had one. If `C` needed layers from `A` and `B`, it was out of
luck. Now, when choosing whether
to keep an unarchival heatmap we look at its end LSN. If it's more
inclusive than what we currently have,
keep it.
2025-02-28 10:36:53 +00:00
John Spray
55633ebe3a storcon: enable API passthrough to nonzero shards (#11026)
## Problem

Storage controller will proxy GETs to pageserver-like tenant/timeline
paths through to the pageserver.

Usually GET passthroughs make sense to go to shard 0, e.g. if you want
to list timelines.

But sometimes you really want to know about a particular shard, e.g.
reading its cache state or similar.

## Summary of changes

- Accept shard IDs as well as tenant IDs in the passthrough route
- Refactor node lookup to take a shard ID and make the tenant ID case a
layer on top of that. This is one more lock take-drop during these
requests, but it's not particularly expensive and these requests
shouldn't be terribly frequent

This is not immediately used by anything, but will be there any time we
want to e.g. do a pass-through query to check the warmth of a tenant
cache on a particular shard or somesuch.
2025-02-28 08:42:08 +00:00
Arpad Müller
a22be5af72 Migrate the last crates to edition 2024 (#10998)
Migrates the remaining crates to edition 2024. We like to stay on the
latest edition if possible. There is no functional changes, however some
code changes had to be done to accommodate the edition's breaking
changes.

Like the previous migration PRs, this is comprised of three commits:

* the first does the edition update and makes `cargo check`/`cargo
clippy` pass. we had to update bindgen to make its output [satisfy the
requirements of edition
2024](https://doc.rust-lang.org/edition-guide/rust-2024/unsafe-extern.html)
* the second commit does a `cargo fmt` for the new style edition.
* the third commit reorders imports as a one-off change. As before, it
is entirely optional.

Part of #10918
2025-02-27 09:40:40 +00:00
Arpad Müller
26bda17551 storcon: use the SchedulingPolicy enum in SafekeeperPersistence (#10897)
We don't want to serialize to/from string all the time, so use
`SchedulingPolicy` in `SafekeeperPersistence` via the use of a wrapper.

Stacked atop #10891
2025-02-26 12:12:50 +00:00
Arpad Müller
920040e402 Update storage components to edition 2024 (#10919)
Updates storage components to edition 2024. We like to stay on the
latest edition if possible. There is no functional changes, however some
code changes had to be done to accommodate the edition's breaking
changes.

The PR has two commits:

* the first commit updates storage crates to edition 2024 and appeases
`cargo clippy` by changing code. i have accidentially ran the formatter
on some files that had other edits.
* the second commit performs a `cargo fmt`

I would recommend a closer review of the first commit and a less close
review of the second one (as it just runs `cargo fmt`).

part of https://github.com/neondatabase/neon/issues/10918
2025-02-25 23:51:37 +00:00
Vlad Lazar
5d17640944 storcon: send heartbeats concurrently (#10954)
## Problem

While looking at logs I noticed that heartbeats are sent sequentially.
The loop polling the UnorderedSet is at the wrong level of identation.
Instead of doing it after we have the full set, we did after each entry.

## Summary of Changes

Poll the UnorderedSet properly.
2025-02-25 09:33:08 +00:00
Alex Chi Z.
5fad4a4cee feat(storcon): chaos injection of force exit (#10934)
## Problem

close https://github.com/neondatabase/cloud/issues/24485

## Summary of changes

This patch adds a new chaos injection mode for the storcon. The chaos
injector reads the crontab and exits immediately at the configured time.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2025-02-24 15:30:21 +00:00
Vlad Lazar
3e82addd64 storcon: use Duration for duration's in the storage controller tenant config (#10928)
## Problem

The storage controller treats durations in the tenant config as strings.
These are loaded from the db.
The pageserver maps these durations to a seconds only format and we
always get a mismatch compared
to what's in the db.

## Summary of changes

Treat durations as durations inside the storage controller and not as
strings.
Nothing changes in the cross service API's themselves or the way things
are stored in the db.

I also added some logging which I would have made the investigation a
10min job:
1. Reason for why the reconciliation was spawned
2. Location config diff between the observed and wanted states
2025-02-21 15:45:00 +00:00
John Spray
5e3c234edc storcon: do more concurrent optimisations (#10929)
## Problem

Now that we rely on the optimisation logic to handle fixing things up
after some tenants are in the wrong AZ (e.g. after node failure), it's
no longer appropriate to treat optimisations as an ultra-low-priority
task. We used to reflect that low priority with a very low limit on
concurrent execution, such that we would only migrate 2 things every 20
seconds.

## Summary of changes

- Increase MAX_OPTIMIZATIONS_EXEC_PER_PASS from 2 to 16
- Increase MAX_OPTIMIZATIONS_PLAN_PER_PASS from 8 to 64.

Since we recently gave user-initiated actions their own semaphore, this
should not risk starving out API requests.
2025-02-21 14:58:49 +00:00
Arpad Müller
ff3819efc7 storcon: infrastructure for safekeeper specific JWT tokens (#10905)
Safekeepers only respond to requests with the per-token scope, or the
`safekeeperdata` JWT scope. Therefore, add infrastructure in the storage
controller for safekeeper JWTs. Also, rename the ambiguous `jwt_token`
to `pageserver_jwt_token`.

Part of #9011
Related: https://github.com/neondatabase/cloud/issues/24727
2025-02-21 11:02:02 +00:00
Arpad Müller
f927ae6e15 Return a json response in scheduling_policy handler (#10904)
Return an empty json response in the `scheduling_policy` handler.

This prevents errors of the form:

```
Error: receive body: error decoding response body: EOF while parsing a value at line 1 column 0
```

when setting the scheduling policy via the `storcon_cli`.

part of #9011.
2025-02-21 11:01:57 +00:00
Dmitrii Kovalkov
e808e9432a storcon: use https for pageservers (#10759)
## Problem

Storage controller uses unsecure http for pageserver API.

Closes: https://github.com/neondatabase/cloud/issues/23734
Closes: https://github.com/neondatabase/cloud/issues/24091

## Summary of changes

- Add an optional `listen_https_port` field to storage controller's Node
state and its API (RegisterNode/ListNodes/etc).
- Allow updating `listen_https_port` on node registration to gradually
add https port for all nodes.
- Add `use_https_pageserver_api` CLI option to storage controller to
enable https.
- Pageserver doesn't support https for now and always reports
`https_port=None`. This will be addressed in follow-up PR.
2025-02-20 17:16:04 +00:00
Arpad Müller
bb7e244a42 storcon: fix heartbeats timing out causing a panic (#10902)
Fix an issue caused by PR
https://github.com/neondatabase/neon/pull/10891: we introduced the
concept of timeouts for heartbeats, where we would hang up on the other
side of the oneshot channel if a timeout happened (future gets
cancelled, receiver is dropped).

This hang up would make the heartbeat task panic when it did obtain the
response, as we unwrap the result of the result sending operation. The
panic would lead to the heartbeat task panicing itself, which is then
according to logs the last sign of life we of that process invocation.
I'm not sure what brings down the process, in theory tokio [should
continue](https://docs.rs/tokio/latest/tokio/runtime/enum.UnhandledPanic.html#variant.Ignore),
but idk.

Alternative to #10901.
2025-02-19 23:04:05 +00:00
Arpad Müller
787b98f8f2 storcon: log all safekeepers marked as offline (#10898)
Doing this to help debugging offline safekeepers.

Part of https://github.com/neondatabase/neon/issues/9011
2025-02-19 20:45:22 +00:00