mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-11 07:22:55 +00:00
## Problem Storage controller memory can spike very high if we have many tenants and they all try to reconcile at the same time. Related: - https://github.com/neondatabase/neon/issues/7463 - https://github.com/neondatabase/neon/issues/7460 Not closing those issues in this PR, because the test coverage for them will be in https://github.com/neondatabase/neon/pull/7475 ## Summary of changes - Add a CLI arg `--reconciler-concurrency`, defaulted to 128 - Add a semaphore to Service with this many units - In `maybe_reconcile_shard`, try to acquire semaphore unit. If we can't get one, return a ReconcileWaiter for a future sequence number, and push the TenantShardId onto a channel of delayed IDs. - In `process_result`, consume from the channel of delayed IDs if there are semaphore units available and call maybe_reconcile_shard again for these delayed shards. This has been tested in https://github.com/neondatabase/neon/pull/7475, but will land that PR separately because it contains other changes & needs the test stabilizing. This change is worth merging sooner, because it fixes a practical issue with larger shard counts.