mirror of
https://github.com/neondatabase/neon.git
synced 2026-05-30 03:20:36 +00:00
fix(compute_ctl): Only offload LFC state if no prewarming is in progress (#12645)
## Problem We currently offload LFC state unconditionally, which can cause problems. Imagine a situation: 1. Endpoint started with `autoprewarm: true`. 2. While prewarming is not completed, we upload the new incomplete state. 3. Compute gets interrupted and restarts. 4. We start again and try to prewarm with the state from 2. instead of the previous complete state. During the orchestrated prewarming, it's probably not a big issue, but it's still better to do not interfere with the prewarm process. ## Summary of changes Do not offload LFC state if we are currently prewarming or any issue occurred. While on it, also introduce `Skipped` LFC prewarm status, which is used when the corresponding LFC state is not present in the endpoint storage. It's primarily needed to distinguish the first compute start for particular endpoint, as it's completely valid to do not have LFC state yet.
This commit is contained in:
@@ -46,16 +46,33 @@ pub struct ExtensionInstallResponse {
|
||||
pub version: ExtVersion,
|
||||
}
|
||||
|
||||
/// Status of the LFC prewarm process. The same state machine is reused for
|
||||
/// both autoprewarm (prewarm after compute/Postgres start using the previously
|
||||
/// stored LFC state) and explicit prewarming via API.
|
||||
#[derive(Serialize, Default, Debug, Clone, PartialEq)]
|
||||
#[serde(tag = "status", rename_all = "snake_case")]
|
||||
pub enum LfcPrewarmState {
|
||||
/// Default value when compute boots up.
|
||||
#[default]
|
||||
NotPrewarmed,
|
||||
/// Prewarming thread is active and loading pages into LFC.
|
||||
Prewarming,
|
||||
/// We found requested LFC state in the endpoint storage and
|
||||
/// completed prewarming successfully.
|
||||
Completed,
|
||||
Failed {
|
||||
error: String,
|
||||
},
|
||||
/// Unexpected error happened during prewarming. Note, `Not Found 404`
|
||||
/// response from the endpoint storage is explicitly excluded here
|
||||
/// because it can normally happen on the first compute start,
|
||||
/// since LFC state is not available yet.
|
||||
Failed { error: String },
|
||||
/// We tried to fetch the corresponding LFC state from the endpoint storage,
|
||||
/// but received `Not Found 404`. This should normally happen only during the
|
||||
/// first endpoint start after creation with `autoprewarm: true`.
|
||||
///
|
||||
/// During the orchestrated prewarm via API, when a caller explicitly
|
||||
/// provides the LFC state key to prewarm from, it's the caller responsibility
|
||||
/// to handle this status as an error state in this case.
|
||||
Skipped,
|
||||
}
|
||||
|
||||
impl Display for LfcPrewarmState {
|
||||
@@ -64,6 +81,7 @@ impl Display for LfcPrewarmState {
|
||||
LfcPrewarmState::NotPrewarmed => f.write_str("NotPrewarmed"),
|
||||
LfcPrewarmState::Prewarming => f.write_str("Prewarming"),
|
||||
LfcPrewarmState::Completed => f.write_str("Completed"),
|
||||
LfcPrewarmState::Skipped => f.write_str("Skipped"),
|
||||
LfcPrewarmState::Failed { error } => write!(f, "Error({error})"),
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user