mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-12 16:02:56 +00:00
NB: effectively a no-op in the neon env since the handling is config gated in storcon ## Problem When a pageserver suffers from a local disk/node failure and restarts, the storage controller will receive a re-attach call and return all the tenants the pageserver is suppose to attach, but the pageserver will not act on any tenants that it doesn't know about locally. As a result, the pageserver will not rehydrate any tenants from remote storage if it restarted following a local disk loss, while the storage controller still thinks that the pageserver have all the tenants attached. This leaves the system in a bad state, and the symptom is that PG's pageserver connections will fail with "tenant not found" errors. ## Summary of changes Made a slight change to the storage controller's `re_attach` API: * The pageserver will set an additional bit `empty_local_disk` in the reattach request, indicating whether it has started with an empty disk or does not know about any tenants. * Upon receiving the reattach request, if this `empty_local_disk` bit is set, the storage controller will go ahead and clear all observed locations referencing the pageserver. The reconciler will then discover the discrepancy between the intended state and observed state of the tenant and take care of the situation. To facilitate rollouts this extra behavior in the `re_attach` API is guarded by the `handle_ps_local_disk_loss` command line flag of the storage controller. --------- Co-authored-by: William Huang <william.huang@databricks.com>
83 lines
2.7 KiB
Rust
83 lines
2.7 KiB
Rust
//! Types in this file are for pageserver's upward-facing API calls to the storage controller,
|
|
//! required for acquiring and validating tenant generation numbers.
|
|
//!
|
|
//! See docs/rfcs/025-generation-numbers.md
|
|
|
|
use serde::{Deserialize, Serialize};
|
|
use utils::generation::Generation;
|
|
use utils::id::{NodeId, TimelineId};
|
|
|
|
use crate::controller_api::NodeRegisterRequest;
|
|
use crate::models::{LocationConfigMode, ShardImportStatus};
|
|
use crate::shard::{ShardStripeSize, TenantShardId};
|
|
|
|
/// Upcall message sent by the pageserver to the configured `control_plane_api` on
|
|
/// startup.
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct ReAttachRequest {
|
|
pub node_id: NodeId,
|
|
|
|
/// Optional inline self-registration: this is useful with the storage controller,
|
|
/// if the node already has a node_id set.
|
|
#[serde(skip_serializing_if = "Option::is_none", default)]
|
|
pub register: Option<NodeRegisterRequest>,
|
|
|
|
/// Hadron: Optional flag to indicate whether the node is starting with an empty local disk.
|
|
/// Will be set to true if the node couldn't find any local tenant data on startup, could be
|
|
/// due to the node starting for the first time or due to a local SSD failure/disk wipe event.
|
|
/// The flag may be used by the storage controller to update its observed state of the world
|
|
/// to make sure that it sends explicit location_config calls to the node following the
|
|
/// re-attach request.
|
|
pub empty_local_disk: Option<bool>,
|
|
}
|
|
|
|
#[derive(Serialize, Deserialize, Debug)]
|
|
pub struct ReAttachResponseTenant {
|
|
pub id: TenantShardId,
|
|
/// Mandatory if LocationConfigMode is None or set to an Attached* mode
|
|
pub r#gen: Option<u32>,
|
|
pub mode: LocationConfigMode,
|
|
pub stripe_size: ShardStripeSize,
|
|
}
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct ReAttachResponse {
|
|
pub tenants: Vec<ReAttachResponseTenant>,
|
|
}
|
|
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct ValidateRequestTenant {
|
|
pub id: TenantShardId,
|
|
pub r#gen: u32,
|
|
}
|
|
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct ValidateRequest {
|
|
pub tenants: Vec<ValidateRequestTenant>,
|
|
}
|
|
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct ValidateResponse {
|
|
pub tenants: Vec<ValidateResponseTenant>,
|
|
}
|
|
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct ValidateResponseTenant {
|
|
pub id: TenantShardId,
|
|
pub valid: bool,
|
|
}
|
|
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct TimelineImportStatusRequest {
|
|
pub tenant_shard_id: TenantShardId,
|
|
pub timeline_id: TimelineId,
|
|
pub generation: Generation,
|
|
}
|
|
|
|
#[derive(Serialize, Deserialize)]
|
|
pub struct PutTimelineImportStatusRequest {
|
|
pub tenant_shard_id: TenantShardId,
|
|
pub timeline_id: TimelineId,
|
|
pub status: ShardImportStatus,
|
|
pub generation: Generation,
|
|
}
|