mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-09 22:42:57 +00:00
## Problem We've seen some strange behaviors when doing lots of migrations involving secondary locations. One of these was where a tenant was apparently stuck in the `Scheduler::running` list, but didn't appear to be making any progress. Another was a shutdown hang (https://github.com/neondatabase/cloud/issues/13576). ## Summary of changes - Fix one issue (probably not the only one) where a tenant in the `pending` list could proceed to `spawn` even if the same tenant already had a running task via `handle_command` (this could have resulted in a weird value of SecondaryProgress) - Add various extra logging: - log before as well as after layer downloads so that it would be obvious if we were stuck in remote storage code (we shouldn't be, it has built in timeouts) - log the number of running + pending jobs from the scheduler every time it wakes up to do a scheduling iteration (~10s) -- this is quite chatty, but not compared with the volume of logs on a busy pageserver. It should give us confidence that the scheduler loop is still alive, and visibility of how many tasks the scheduler thinks are running.