neon/pageserver at 4ce6e2d2fc83ff8664eef2f80912cade71240669 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-11 07:22:55 +00:00

Files

John Spray 4ce6e2d2fc pageserver: fix secondary progress stats when layers are 404 (#7814 )

## Problem

Noticed this issue in staging.

When a tenant is under somewhat heavy timeline creation/deletion
thrashing, it becomes quite common for secondary downloads to encounter
404s downloading layers. This is tolerated by design, because heatmaps
are not guaranteed to be up to date with what layers/timelines actually
exist.

However, we were not updating the SecondaryProgress structure in this
case, so after such a download pass, we would leave a SecondaryProgress
state with lower "downloaded" stats than "total" stats. This causes the
storage controller to consider this secondary location inelegible for
optimization actions such as we do after shard splits

This issue has relative low impact because a typical tenant will
eventually upload a heatmap where we do download all the layers and
thereby enable the controller to progress with migrations -- the heavy
thrashing of timeline creation/deletion is an artifact of our nightly
stress tests.

## Summary of changes

- In the layer 404 case, subtract the skipped layer's stats from the
totals, so that at the end of this download pass we should still end up
in a complete state.
- When updating `last_downloaded`, do a sanity check that our progress
is complete. In debug builds, assert out if this is not the case. In
prod builds, correct the stats and log a warning.

2024-05-21 13:46:04 +01:00

benches

chore!: always use async walredo, warn if sync is configured (#7754 )

2024-05-15 15:04:52 +02:00

client

feat(pagebench): add aux file bench (#7746 )

2024-05-17 20:04:02 +00:00

compaction

Tiered compaction: improvements to the windows (#7787 )

2024-05-16 22:25:19 +02:00

ctl

feat(pageserver): persist aux file policy in index part (#7668 )