storcon: use Duration for duration's in the storage controller tenant config (#10928)

## Problem

The storage controller treats durations in the tenant config as strings.
These are loaded from the db.
The pageserver maps these durations to a seconds only format and we
always get a mismatch compared
to what's in the db.

## Summary of changes

Treat durations as durations inside the storage controller and not as
strings.
Nothing changes in the cross service API's themselves or the way things
are stored in the db.

I also added some logging which I would have made the investigation a
10min job:
1. Reason for why the reconciliation was spawned
2. Location config diff between the observed and wanted states
This commit is contained in:
Vlad Lazar
2025-02-21 15:45:00 +00:00
committed by GitHub
parent 5e3c234edc
commit 3e82addd64
11 changed files with 234 additions and 80 deletions

View File

@@ -3817,3 +3817,43 @@ def test_update_node_on_registration(neon_env_builder: NeonEnvBuilder):
nodes = env.storage_controller.node_list()
assert len(nodes) == 1
assert nodes[0]["listen_https_port"] is None
def test_storage_controller_location_conf_equivalence(neon_env_builder: NeonEnvBuilder):
"""
Validate that a storage controller restart with no shards in a transient state
performs zero reconciliations at start-up. Implicitly, this means that the location
configs returned by the pageserver are identical to the persisted state in the
storage controller database.
"""
neon_env_builder.num_pageservers = 1
neon_env_builder.storage_controller_config = {
"start_as_candidate": False,
}
env = neon_env_builder.init_configs()
env.start()
tenant_id = TenantId.generate()
env.storage_controller.tenant_create(
tenant_id, shard_count=2, tenant_config={"pitr_interval": "1h2m3s"}
)
env.storage_controller.reconcile_until_idle()
reconciles_before_restart = env.storage_controller.get_metric_value(
"storage_controller_reconcile_complete_total", filter={"status": "ok"}
)
assert reconciles_before_restart != 0
env.storage_controller.stop()
env.storage_controller.start()
env.storage_controller.reconcile_until_idle()
reconciles_after_restart = env.storage_controller.get_metric_value(
"storage_controller_reconcile_complete_total", filter={"status": "ok"}
)
assert reconciles_after_restart == 0