Files
neon/control_plane
Christian Schwarz 1648639874 fix(neon_local): long init_tenant_mgr causes pageserver startup failure
Before this PR, if neon_local's `start_process()` ran out of retries
before pageserver started listening for requests, it would give up.
As of PR #6474 we at least kill the starting pageserver process in that
case, before that, we would leak it.

Pageserver `bind()s` the mgmt API early, but only starts `accept()`ing
HTTP requests after it has finished `init_tenant_mgr()` (plus some other
stuff).

init_tenant_mgr can take a long time with many tenants, i.e., longer
than the number of retries that neon_local permits.

Changes
=======

This PR changes the status check that neon_local performs when starting
pageserver to ignore connect & timeout errors, as those are expected
(see explanation above).

I verified that this allows for arbitrarily long `init_tenant_mgr()`
by adding a timeout at the top of that function.
2024-01-25 15:12:07 +00:00
..