mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-14 00:42:54 +00:00
part of #6628 Before this PR, we used a std::sync::RwLock to coalesce multiple callers on one walredo spawning. One thread would win the write lock and others would queue up either at the read() or write() lock call. In a scenario where a compute initiates multiple getpage requests from different Postgres backends (= different page_service conns), and we don't have a walredo process around, this means all these page_service handler tasks will enter the spawning code path, one of them will do the spawning, and the others will stall their respective executor thread because they do a blocking read()/write() lock call. I don't know exactly how bad the impact is in reality because posix_spawn uses CLONE_VFORK under the hood, which means that the entire parent process stalls anyway until the child does `exec`, which in turn resumes the parent. But, anyway, we won't know until we fix this issue. And, there's definitely a future way out of stalling the pageserver on posix_spawn, namely, forking template walredo processes that fork again when they need to be per-tenant. This idea is tracked in https://github.com/neondatabase/neon/issues/7320. Changes ------- This PR fixes that scenario by switching to use `heavier_once_cell` for coalescing. There is a comment on the struct field that explains it in a bit more nuance. ### Alternative Design An alternative would be to use tokio::sync::RwLock. I did this in the first commit in this PR branch, before switching to `heavier_once_cell`. Performance ----------- I re-ran the `bench_walredo` and updated the results, showing that the changes are neglible. For the record, the earlier commit in this PR branch that uses `tokio::sync::RwLock` also has updated benchmark numbers, and the results / kinds of tiny regression were equivalent to `heavier_once_cell`. Note that the above doesn't measure performance on the cold path, i.e., when we need to launch the process and coalesce. We don't have a benchmark for that, and I don't expect any significant changes. We have metrics and we log spawn latency, so, we can monitor it in staging & prod. Risks ----- As "usual", replacing a std::sync primitive with something that yields to the executor risks exposing concurrency that was previously implicitly limited to the number of executor threads. This would be the first one for walredo. The risk is that we get descheduled while the reconstruct data is already there. That could pile up reconstruct data. In practice, I think the risk is low because once we get scheduled again, we'll likely have a walredo process ready, and there is no further await point until walredo is complete and the reconstruct data has been dropped. This will change with async walredo PR #6548, and I'm well aware of it in that PR.