rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-09 22:42:57 +00:00

Files

Em Sharnoff 3895829bda vm-monitor: Fix cgroup throttling (#5303 )

I believe this (not actual IO problems) is the cause of the "disk speed
issue" that we've had for VMs recently. See e.g.:

1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH
2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659

The vm-informant (and now, the vm-monitor, its replacement) is supposed
to gradually increase the `neon-postgres` cgroup's memory.high value,
because otherwise the kernel will throttle all the processes in the
cgroup.

This PR fixes a bug with the vm-monitor's implementation of this
behavior.

---

Other references, for the vm-informant's implementation:

- Original issue: neondatabase/autoscaling#44
- Original PR: neondatabase/autoscaling#223

2023-09-14 13:21:50 +03:00

src

vm-monitor: Fix cgroup throttling (#5303 )

2023-09-14 13:21:50 +03:00

Cargo.toml

compute_ctl: add vm-monitor (#4946 )

2023-08-24 15:54:37 -04:00

README.md

monitor/compute_ctl: remove references to the informant (#5115 )

2023-08-29 02:59:27 +03:00

README.md

`vm-monitor`

The vm-monitor (or just monitor) is a core component of the autoscaling system, along with the autoscale-scheduler and the autoscaler-agents. The monitor has two primary roles: 1) notifying agents when immediate upscaling is necessary due to memory conditions and 2) managing Postgres' file cache and a cgroup to carry out upscaling and downscaling decisions.

More on scaling

We scale CPU and memory using NeonVM, our in-house QEMU tool for use with Kubernetes. To control thresholds for receiving memory usage notifications, we start Postgres in the neon-postgres cgroup and set its memory.{max,high}.

See also: neondatabase/autoscaling
See also: neondatabase/vm-monitor, where initial development of the monitor happened. The repository is no longer maintained but the commit history may be useful for debugging.

Structure

The vm-monitor is loosely comprised of a few systems. These are:

the server: this is just a simple axum server that accepts requests and upgrades them to websocket connections. The server only allows one connection at a time. This means that upon receiving a new connection, the server will terminate and old one if it exists.
the filecache: a struct that allows communication with the Postgres file cache. On startup, we connect to the filecache and hold on to the connection for the entire monitor lifetime.
the cgroup watcher: the CgroupWatcher manages the neon-postgres cgroup by listening for memory.high events and setting its memory.{high,max} values.
the runner: the runner marries the filecache and cgroup watcher together, communicating with the agent throught the Dispatcher, and then calling filecache and cgroup watcher functions as needed to upscale and downscale