mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-07 13:32:57 +00:00
The logic might seem a bit intricate / over-optimized, but I recently spent time benchmarking this code path in the context of a nightly pagebench regression (https://github.com/neondatabase/cloud/issues/21759) and I want to avoid regressing it any further. Ideally would also log the socket send & recv queue length like we do on the compute side in - https://github.com/neondatabase/neon/pull/10673 But that is proving difficult due to the Rust abstractions that wrap the socket fd. Work in progress on that is happening in - https://github.com/neondatabase/neon/pull/10823 Regarding production impact, I am worried at a theoretical level that the additional logging may cause a downward spiral in the case where a pageserver is slow to flush because there is not enough CPU. The logging would consume more CPU and thereby slow down flushes even more. However, I don't think this matters practically speaking. # Refs - context: https://neondb.slack.com/archives/C08DE6Q9C3B/p1739464533762049?thread_ts=1739462628.361019&cid=C08DE6Q9C3B - fixes https://github.com/neondatabase/neon/issues/10668 - part of https://github.com/neondatabase/cloud/issues/23515 # Testing Tested locally by running ``` ./target/debug/pagebench get-page-latest-lsn --num-clients=1000 --queue-depth=1000 ``` in one terminal, waiting a bit, then ``` pkill -STOP pagebench ``` then wait for slow logs to show up in `pageserver.log`. To see that the completion log message is logged, run ``` pkill -CONT pagebench ```