mirror of
https://github.com/neondatabase/neon.git
synced 2026-05-30 11:30:37 +00:00
## Problem To test sharding, we need something to control it. We could write python code for doing this from the test runner, but this wouldn't be usable with neon_local run directly, and when we want to write tests with large number of shards/tenants, Rust is a better fit efficiently handling all the required state. This service enables automated tests to easily get a system with sharding/HA without the test itself having to set this all up by hand: existing tests can be run against sharded tenants just by setting a shard count when creating the tenant. ## Summary of changes Attachment service was previously a map of TenantId->TenantState, where the principal state stored for each tenant was the generation and the last attached pageserver. This enabled it to serve the re-attach and validate requests that the pageserver requires. In this PR, the scope of the service is extended substantially to do overall management of tenants in the pageserver, including tenant/timeline creation, live migration, evacuation of offline pageservers etc. This is done using synchronous code to make declarative changes to the tenant's intended state (`TenantState.policy` and `TenantState.intent`), which are then translated into calls into the pageserver by the `Reconciler`. Top level summary of modules within `control_plane/attachment_service/src`: - `tenant_state`: structure that represents one tenant shard. - `service`: implements the main high level such as tenant/timeline creation, marking a node offline, etc. - `scheduler`: for operations that need to pick a pageserver for a tenant, construct a scheduler and call into it. - `compute_hook`: receive notifications when a tenant shard is attached somewhere new. Once we have locations for all the shards in a tenant, emit an update to postgres configuration via the neon_local `LocalEnv`. - `http`: HTTP stubs. These mostly map to methods on `Service`, but are separated for readability and so that it'll be easier to adapt if/when we switch to another RPC layer. - `node`: structure that describes a pageserver node. The most important attribute of a node is its availability: marking a node offline causes tenant shards to reschedule away from it. This PR is a precursor to implementing the full sharding service for prod (#6342). What's the difference between this and a production-ready controller for pageservers? - JSON file persistence to be replaced with a database - Limited observability. - No concurrency limits. Marking a pageserver offline will try and migrate every tenant to a new pageserver concurrently, even if there are thousands. - Very simple scheduler that only knows to pick the pageserver with fewest tenants, and place secondary locations on a different pageserver than attached locations: it does not try to place shards for the same tenant on different pageservers. This matters little in tests, because picking the least-used pageserver usually results in round-robin placement. - Scheduler state is rebuilt exhaustively for each operation that requires a scheduler. - Relies on neon_local mechanisms for updating postgres: in production this would be something that flows through the real control plane. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
58 lines
1.2 KiB
Rust
58 lines
1.2 KiB
Rust
use serde::{Deserialize, Serialize};
|
|
use utils::seqwait::MonotonicCounter;
|
|
|
|
mod compute_hook;
|
|
pub mod http;
|
|
mod node;
|
|
pub mod persistence;
|
|
mod reconciler;
|
|
mod scheduler;
|
|
pub mod service;
|
|
mod tenant_state;
|
|
|
|
#[derive(Clone, Serialize, Deserialize)]
|
|
enum PlacementPolicy {
|
|
/// Cheapest way to attach a tenant: just one pageserver, no secondary
|
|
Single,
|
|
/// Production-ready way to attach a tenant: one attached pageserver and
|
|
/// some number of secondaries.
|
|
Double(usize),
|
|
}
|
|
|
|
#[derive(Ord, PartialOrd, Eq, PartialEq, Copy, Clone)]
|
|
struct Sequence(u64);
|
|
|
|
impl Sequence {
|
|
fn initial() -> Self {
|
|
Self(0)
|
|
}
|
|
}
|
|
|
|
impl std::fmt::Display for Sequence {
|
|
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
|
|
write!(f, "{}", self.0)
|
|
}
|
|
}
|
|
|
|
impl MonotonicCounter<Sequence> for Sequence {
|
|
fn cnt_advance(&mut self, v: Sequence) {
|
|
assert!(*self <= v);
|
|
*self = v;
|
|
}
|
|
fn cnt_value(&self) -> Sequence {
|
|
*self
|
|
}
|
|
}
|
|
|
|
impl Sequence {
|
|
fn next(&self) -> Sequence {
|
|
Sequence(self.0 + 1)
|
|
}
|
|
}
|
|
|
|
impl Default for PlacementPolicy {
|
|
fn default() -> Self {
|
|
PlacementPolicy::Double(1)
|
|
}
|
|
}
|