Files
neon/libs/vm_monitor
Em Sharnoff 9bf7664049 vm-monitor: Remove spammy log line (#6284)
During a previous incident, we noticed that this particular line can be
repeatedly logged every 100ms if the memory usage continues is
persistently high enough to warrant upscaling.

Per the added comment: Ideally we'd still like to include this log line,
because it's useful information, but the simple way to include it
produces far too many log lines, and the more complex ways to
deduplicate the log lines while still including the information are
probably not worth the effort right now.
2024-01-08 21:12:39 -08:00
..

vm-monitor

The vm-monitor (or just monitor) is a core component of the autoscaling system, along with the autoscale-scheduler and the autoscaler-agents. The monitor has two primary roles: 1) notifying agents when immediate upscaling is necessary due to memory conditions and 2) managing Postgres' file cache and a cgroup to carry out upscaling and downscaling decisions.

More on scaling

We scale CPU and memory using NeonVM, our in-house QEMU tool for use with Kubernetes. To control thresholds for receiving memory usage notifications, we start Postgres in the neon-postgres cgroup and set its memory.{max,high}.

Structure

The vm-monitor is loosely comprised of a few systems. These are:

  • the server: this is just a simple axum server that accepts requests and upgrades them to websocket connections. The server only allows one connection at a time. This means that upon receiving a new connection, the server will terminate and old one if it exists.
  • the filecache: a struct that allows communication with the Postgres file cache. On startup, we connect to the filecache and hold on to the connection for the entire monitor lifetime.
  • the cgroup watcher: the CgroupWatcher polls the neon-postgres cgroup's memory usage and sends rolling aggregates to the runner.
  • the runner: the runner marries the filecache and cgroup watcher together, communicating with the agent throught the Dispatcher, and then calling filecache and cgroup watcher functions as needed to upscale and downscale