rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-28 18:40:38 +00:00

Author	SHA1	Message	Date
Em Sharnoff	9fe5cc6a82	vm-monitor: Switch from memory.high to polling memory.stat (#5524 ) tl;dr it's really hard to avoid throttling from memory.high, and it counts tmpfs & page cache usage, so it's also hard to make sense of. In the interest of fixing things quickly with something that should be good enough, this PR switches to instead periodically fetch memory statistics from the cgroup's memory.stat and use that data to determine if and when we should upscale. This PR fixes #5444, which has a lot more detail on the difficulties we've hit with memory.high. This PR also supersedes #5488.	2023-10-17 15:30:40 -07:00
Em Sharnoff	48e85460fc	vm-monitor: Unset memory.high on start + refactor cgroup handling (#5348 ) ## Problem Over the past couple days, we've had a couple VMs hit issues with postgres getting hit by memory.high throttling, even after #5303 was supposed to fix that. The tl;dr of those issues is that because vm-monitor startup sets the file cache size first, before interacting with the cgroup, cgroup throttling can mean we timeout connecting to the file cache and never reset the cgroup, even if memory has been upscaled since then. See e.g.: - https://neondb.slack.com/archives/C03F5SM1N02/p1695218132208249 - https://neondb.slack.com/archives/C03F5SM1N02/p1695314613696659 ## Summary of changes This PR adds an additional step into vm-monitor startup, where we first set the cgroup's memory.high value to 'max', removing the capacity for throttling. This preferable to just setting memory.high before the file cache, because it's theoretically possible that the new value of memory.high could still be less than the current memory usage, in which case postgres could continue to be throttled without sufficient memory events to relieve that. Implementing this properly involved adding a method to our internal cgroup interface, and it seemed like there was duplicated functionality there, so this PR unifies that as well, making things a bit more consistent.	2023-09-27 21:27:23 -07:00
Em Sharnoff	722e5260bf	vm-monitor: Don't set cgroup memory.max (#5333 ) All it does is make postgres OOM more often (which, tbf, means we're less likely to have e.g. compute_ctl get OOM-killed, but that tradeoff isn't worth it). Internally, this means removing all references to `memory.max` and the places where we calculate or store the intended value. As discussed in the sync earlier. ref: - https://neondb.slack.com/archives/C03H1K0PGKH/p1694698949252439?thread_ts=1694505575.693449&cid=C03H1K0PGKH - https://neondb.slack.com/archives/C03H1K0PGKH/p1695049198622759	2023-09-18 17:47:48 +00:00
Em Sharnoff	3895829bda	vm-monitor: Fix cgroup throttling (#5303 ) I believe this (not actual IO problems) is the cause of the "disk speed issue" that we've had for VMs recently. See e.g.: 1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH 2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659 The vm-informant (and now, the vm-monitor, its replacement) is supposed to gradually increase the `neon-postgres` cgroup's memory.high value, because otherwise the kernel will throttle all the processes in the cgroup. This PR fixes a bug with the vm-monitor's implementation of this behavior. --- Other references, for the vm-informant's implementation: - Original issue: neondatabase/autoscaling#44 - Original PR: neondatabase/autoscaling#223	2023-09-14 13:21:50 +03:00
Felix Prasanna	40268dcd8d	monitor: fix filecache calculations (#5112 ) ## Problem An underflow bug in the filecache calculations. ## Summary of changes Fixed the bug, cleaned up calculations in general.	2023-08-25 13:29:10 -04:00
Felix Prasanna	3128eeff01	compute_ctl: add vm-monitor (#4946 ) Co-authored-by: Em Sharnoff <sharnoff@neon.tech>	2023-08-24 15:54:37 -04:00

6 Commits