pageserver: make heatmap generation additive (#10597)

## Problem

Previously, when cutting over to cold secondary locations,
we would clobber the previous, good, heatmap with a cold one.
This is because heatmap generation used to include only resident layers.

Once this merges, we can add an endpoint which triggers full heatmap
hydration on attached locations to heal cold migrations.

## Summary of changes

With this patch, heatmap generation becomes additive. If we have a
heatmap from when this location was secondary, the new uploaded heatmap
will be the result of a reconciliation between the old one and the on
disk resident layers.

More concretely, when we have the previous heatmap:
1. Filter the previous heatmap and keep layers that are (a) present
in the current layer map, (b) visible, (c) not resident. Call this set
of layers `visible_non_resident`.
2. From the layer map, select all layers that are resident and visible.
Call this set of layers `resident`.
3. The new heatmap is the result of merging the two disjoint sets.

Related https://github.com/neondatabase/neon/issues/10541
This commit is contained in:
Vlad Lazar
2025-02-13 12:48:47 +00:00
committed by GitHub
parent 536bdb3209
commit 8fea43a5ba
12 changed files with 511 additions and 33 deletions

View File

@@ -1,7 +1,7 @@
use crate::pageserver_client::PageserverClient;
use crate::persistence::Persistence;
use crate::{compute_hook, service};
use pageserver_api::controller_api::{AvailabilityZone, PlacementPolicy};
use pageserver_api::controller_api::{AvailabilityZone, MigrationConfig, PlacementPolicy};
use pageserver_api::models::{
LocationConfig, LocationConfigMode, LocationConfigSecondary, TenantConfig, TenantWaitLsnRequest,
};
@@ -162,6 +162,22 @@ impl ReconcilerConfig {
}
}
impl From<&MigrationConfig> for ReconcilerConfig {
fn from(value: &MigrationConfig) -> Self {
let mut builder = ReconcilerConfigBuilder::new();
if let Some(timeout) = value.secondary_warmup_timeout {
builder = builder.secondary_warmup_timeout(timeout)
}
if let Some(timeout) = value.secondary_download_request_timeout {
builder = builder.secondary_download_request_timeout(timeout)
}
builder.build()
}
}
/// RAII resource units granted to a Reconciler, which it should keep alive until it finishes doing I/O
pub(crate) struct ReconcileUnits {
_sem_units: tokio::sync::OwnedSemaphorePermit,

View File

@@ -5213,7 +5213,12 @@ impl Service {
shard.sequence = shard.sequence.next();
}
self.maybe_reconcile_shard(shard, nodes)
let reconciler_config = match migrate_req.migration_config {
Some(cfg) => (&cfg).into(),
None => ReconcilerConfig::default(),
};
self.maybe_configured_reconcile_shard(shard, nodes, reconciler_config)
};
if let Some(waiter) = waiter {