Files
neon/compute_tools
Alexey Kondratov 1c432d5492 [compute_ctl] Do not miss short-living connections (#6008)
## Problem

Currently, activity monitor in `compute_ctl` has 500 ms polling
interval. It also looks on the list of current client backends looking
for an active one or one with the most recent state change. This means
we can miss short-living connections.

Yet, during testing this PR I realized that it's usually not a problem
with pooled connection, as pgbouncer maintains connections to Postgres
even though client connection are short-living. We can still miss direct
connections.

## Summary of changes

This commit introduces another way to detect user activity on the
compute. It polls a sum of `active_time` and sum of `sessions` from all
non-system databases in the `pg_stat_database` [1]. If user runs some
queries or just open a direct connection, it will rise; if user will
drop db, it can go down, but it's still a change and will be detected as
activity.

New statistic-based logic seems to be working fine. Yet, after having it
running for a couple of hours I've seen several odd cases with
connections via pgbouncer:

1. Sometimes, if you run just `psql pooler_connstr -c 'select 1;'`
   `active_time` could be not updated immediately, and it may take a couple
   of dozens of seconds. This doesn't seem critical, though.
2. Same query with pooler, `active_time` can be bumped a bit, then
   pgbouncer keeps open connection to Postgres for ~10 minutes, then it
   disconnects, and `active_time` *could be* bumped a bit again. 'Could be'
   because I've seen it once, but it didn't reproduce for a second try.

I think this can create false-positives (hopefully rare), when we will
not suspend some computes because of lagged statistics update OR because
some non-user processes will try to connect to user databases.
Currently, we don't touch them outside of startup and
`postgres_exporter` is configured to do not discover other databases,
but this can change in the future.

New behavior is covered by feature flag `activity_monitor_experimental`,
which should be provided by control plane via neondatabase/cloud#9171

[1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-PG-STAT-DATABASE-VIEW

Related to neondatabase/cloud#7966, neondatabase/cloud#7198
2024-01-12 18:15:41 +01:00
..
2023-10-18 16:42:22 +03:00

Compute node tools

Postgres wrapper (compute_ctl) is intended to be run as a Docker entrypoint or as a systemd ExecStart option. It will handle all the Neon specifics during compute node initialization:

  • compute_ctl accepts cluster (compute node) specification as a JSON file.
  • Every start is a fresh start, so the data directory is removed and initialized again on each run.
  • Next it will put configuration files into the PGDATA directory.
  • Sync safekeepers and get commit LSN.
  • Get basebackup from pageserver using the returned on the previous step LSN.
  • Try to start postgres and wait until it is ready to accept connections.
  • Check and alter/drop/create roles and databases.
  • Hang waiting on the postmaster process to exit.

Also compute_ctl spawns two separate service threads:

  • compute-monitor checks the last Postgres activity timestamp and saves it into the shared ComputeNode;
  • http-endpoint runs a Hyper HTTP API server, which serves readiness and the last activity requests.

If AUTOSCALING environment variable is set, compute_ctl will start the vm-monitor located in [neon/libs/vm_monitor]. For VM compute nodes, vm-monitor communicates with the VM autoscaling system. It coordinates downscaling and requests immediate upscaling under resource pressure.

Usage example:

compute_ctl -D /var/db/postgres/compute \
            -C 'postgresql://cloud_admin@localhost/postgres' \
            -S /var/db/postgres/specs/current.json \
            -b /usr/local/bin/postgres

Tests

Cargo formatter:

cargo fmt

Run tests:

cargo test

Clippy linter:

cargo clippy --all --all-targets -- -Dwarnings -Drust-2018-idioms

Cross-platform compilation

Imaging that you are on macOS (x86) and you want a Linux GNU (x86_64-unknown-linux-gnu platform in rust terminology) executable.

Using docker

You can use a throw-away Docker container (rustlang/rust image) for doing that:

docker run --rm \
    -v $(pwd):/compute_tools \
    -w /compute_tools \
    -t rustlang/rust:nightly cargo build --release --target=x86_64-unknown-linux-gnu

or one-line:

docker run --rm -v $(pwd):/compute_tools -w /compute_tools -t rust:latest cargo build --release --target=x86_64-unknown-linux-gnu

Using rust native cross-compilation

Another way is to add x86_64-unknown-linux-gnu target on your host system:

rustup target add x86_64-unknown-linux-gnu

Install macOS cross-compiler toolchain:

brew tap SergioBenitez/osxct
brew install x86_64-unknown-linux-gnu

And finally run cargo build:

CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-unknown-linux-gnu-gcc cargo build --target=x86_64-unknown-linux-gnu --release