## Problem
When creating a new shard the storage controller schedules via
Scheduler::schedule_shard. This does not take into account the number of
attached shards. What it does take into account is the node affinity:
when a shard is scheduled, all its nodes (primaries and secondaries) get
their affinity incremented.
For two node clusters and shards with one secondary we have a
pathological case where all primaries are scheduled on the same node.
Now that we track the count of attached shards per node, this is trivial
to fix. Still, the "proper" fix is to use the pageserver's utilization
score.
Closes https://github.com/neondatabase/neon/issues/8041
## Summary of changes
Use attached shard count when deciding which node to schedule a fresh
shard on.
## Problem
Pageserver restarts cause read availablity downtime for tenants. See
`Motivation` section in the
[RFC](https://github.com/neondatabase/neon/pull/7704).
## Summary of changes
* Introduce a new `NodeSchedulingPolicy`: `PauseForRestart`
* Implement the first take of drain and fill algorithms
* Add a node status endpoint which can be polled to figure out when an
operation is done
The implementation follows the RFC, so it might be useful to peek at it
as you're reviewing.
Since the PR is rather chunky, I've made sure all commits build (with
warnings), so you can
review by commit if you prefer that.
RFC: https://github.com/neondatabase/neon/pull/7704
Related https://github.com/neondatabase/neon/issues/7387
## Problem
The storage controller does not track the number of shards attached to a
given pageserver. This is a requirement for various scheduling
operations (e.g. draining and filling will use this to figure out if the
cluster is balanced)
## Summary of Changes
Track the number of shards attached to each node.
Related https://github.com/neondatabase/neon/issues/7387
## Problem
- https://github.com/neondatabase/neon/issues/7355
The optimize_secondary function calls schedule_shard to check for
improvements, but if there are exactly the same number of nodes as there
are replicas of the shard, it emits some scary looking logs about no
nodes being elegible.
Closes https://github.com/neondatabase/neon/issues/7355
## Summary of changes
- Add a mode to SchedulingContext that controls logging: this should be
useful in future any time we add a log to the scheduling path, to avoid
it becoming a source of spam when the scheduler is called during
optimization.
The binary etc were renamed some time ago, but the path in the source
tree remained "attachment_service" to avoid disruption to ongoing PRs.
There aren't any big PRs out right now, so it's a good time to cut over.
- Rename `attachment_service` to `storage_controller`
- Move it to the top level for symmetry with `storage_broker` & to avoid
mixing the non-prod neon_local stuff (`control_plane/`) with the storage
controller which is a production component.