mirror of
https://github.com/neondatabase/neon.git
synced 2026-05-26 01:20:38 +00:00
## Problem
We have been dealing with a number of issues with the SC compute
notification mechanism. Various race conditions exist in the
PG/HCC/cplane/PS distributed system, and relying on the SC to send
notifications to the compute node to notify it of PS changes is not
robust. We decided to pursue a more robust option where the compute node
itself discovers whether it may be pointing to the incorrect PSs and
proactively reconfigure itself if issues are suspected.
## Summary of changes
To support this self-healing reconfiguration mechanism several pieces
are needed. This PR adds a mechanism to `compute_ctl` called "refresh
configuration", where the compute node reaches out to the control plane
to pull a new config and reconfigure PG using the new config, instead of
listening for a notification message containing a config to arrive from
the control plane. Main changes to compute_ctl:
1. The `compute_ctl` state machine now has a new State,
`RefreshConfigurationPending`. The compute node may enter this state
upon receiving a signal that it may be using the incorrect page servers.
2. Upon entering the `RefreshConfigurationPending` state, the background
configurator thread in `compute_ctl` wakes up, pulls a new config from
the control plane, and reconfigures PG (with `pg_ctl reload`) according
to the new config.
3. The compute node may enter the new `RefreshConfigurationPending`
state from `Running` or `Failed` states. If the configurator managed to
configure the compute node successfully, it will enter the `Running`
state, otherwise, it stays in `RefreshConfigurationPending` and the
configurator thread will wait for the next notification if an incorrect
config is still suspected.
4. Added various plumbing in `compute_ctl` data structures to allow the
configurator thread to perform the config fetch.
The "incorrect config suspected" notification is delivered using a HTTP
endpoint, `/refresh_configuration`, on `compute_ctl`. This endpoint is
currently not called by anyone other than the tests. In a follow up PR I
will set up some code in the PG extension/libpagestore to call this HTTP
endpoint whenever PG suspects that it is pointing to the wrong page
servers.
## How is this tested?
Modified `test_runner/regress/test_change_pageserver.py` to add a
scenario where we use the new `/refresh_configuration` mechanism instead
of the existing `/configure` mechanism (which requires us sending a full
config to compute_ctl) to have the compute node reload and reconfigure
its pageservers.
I took one shortcut to reduce the scope of this change when it comes to
testing: the compute node uses a local config file instead of pulling a
config over the network from the HCC. This simplifies the test setup in
the following ways:
* The existing test framework is set up to use local config files for
compute nodes only, so it's convenient if I just stick with it.
* The HCC today generates a compute config with production settings
(e.g., assuming 4 CPUs, 16GB RAM, with local file caches), which is
probably not suitable in tests. We may need to add another test-only
endpoint config to the control plane to make this work.
The config-fetch part of the code is relatively straightforward (and
well-covered in both production and the KIND test) so it is probably
fine to replace it with loading from the local config file for these
integration tests.
In addition to making sure that the tests pass, I also manually
inspected the logs to make sure that the compute node is indeed
reloading the config using the new mechanism instead of going down the
old `/configure` path (it turns out the test has bugs which causes
compute `/configure` messages to be sent despite the test intending to
disable/blackhole them).
```test
2024-09-24T18:53:29.573650Z INFO http request{otel.name=/refresh_configuration http.method=POST}: serving /refresh_configuration POST request
2024-09-24T18:53:29.573689Z INFO configurator_main_loop: compute node suspects its configuration is out of date, now refreshing configuration
2024-09-24T18:53:29.573706Z INFO configurator_main_loop: reloading config.json from path: /workspaces/hadron/test_output/test_change_pageserver_using_refresh[release-pg16]/repo/endpoints/ep-1/spec.json
PG:2024-09-24 18:53:29.574 GMT [52799] LOG: received SIGHUP, reloading configuration files
PG:2024-09-24 18:53:29.575 GMT [52799] LOG: parameter "neon.extension_server_port" cannot be changed without restarting the server
PG:2024-09-24 18:53:29.575 GMT [52799] LOG: parameter "neon.pageserver_connstring" changed to "postgresql://no_user@localhost:15008"
...
```
Co-authored-by: William Huang <william.huang@databricks.com>
119 lines
4.3 KiB
Markdown
119 lines
4.3 KiB
Markdown
# Compute node tools
|
|
|
|
Postgres wrapper (`compute_ctl`) is intended to be run as a Docker entrypoint or as a `systemd`
|
|
`ExecStart` option. It will handle all the `Neon` specifics during compute node
|
|
initialization:
|
|
- `compute_ctl` accepts cluster (compute node) specification as a JSON file.
|
|
- Every start is a fresh start, so the data directory is removed and
|
|
initialized again on each run.
|
|
- Next it will put configuration files into the `PGDATA` directory.
|
|
- Sync safekeepers and get commit LSN.
|
|
- Get `basebackup` from pageserver using the returned on the previous step LSN.
|
|
- Try to start `postgres` and wait until it is ready to accept connections.
|
|
- Check and alter/drop/create roles and databases.
|
|
- Hang waiting on the `postmaster` process to exit.
|
|
|
|
Also `compute_ctl` spawns two separate service threads:
|
|
- `compute-monitor` checks the last Postgres activity timestamp and saves it
|
|
into the shared `ComputeNode`;
|
|
- `http-endpoint` runs a Hyper HTTP API server, which serves readiness and the
|
|
last activity requests.
|
|
|
|
If `AUTOSCALING` environment variable is set, `compute_ctl` will start the
|
|
`vm-monitor` located in [`neon/libs/vm_monitor`]. For VM compute nodes,
|
|
`vm-monitor` communicates with the VM autoscaling system. It coordinates
|
|
downscaling and requests immediate upscaling under resource pressure.
|
|
|
|
Usage example:
|
|
```sh
|
|
compute_ctl -D /var/db/postgres/compute \
|
|
-C 'postgresql://cloud_admin@localhost/postgres' \
|
|
-S /var/db/postgres/specs/current.json \
|
|
-b /usr/local/bin/postgres
|
|
```
|
|
|
|
## State Diagram
|
|
|
|
Computes can be in various states. Below is a diagram that details how a
|
|
compute moves between states.
|
|
|
|
```mermaid
|
|
%% https://mermaid.js.org/syntax/stateDiagram.html
|
|
stateDiagram-v2
|
|
[*] --> Empty : Compute spawned
|
|
Empty --> ConfigurationPending : Waiting for compute spec
|
|
ConfigurationPending --> Configuration : Received compute spec
|
|
Configuration --> Failed : Failed to configure the compute
|
|
Configuration --> Running : Compute has been configured
|
|
Empty --> Init : Compute spec is immediately available
|
|
Empty --> TerminationPendingFast : Requested termination
|
|
Empty --> TerminationPendingImmediate : Requested termination
|
|
Init --> Failed : Failed to start Postgres
|
|
Init --> Running : Started Postgres
|
|
Running --> TerminationPendingFast : Requested termination
|
|
Running --> TerminationPendingImmediate : Requested termination
|
|
Running --> ConfigurationPending : Received a /configure request with spec
|
|
Running --> RefreshConfigurationPending : Received a /refresh_configuration request, compute node will pull a new spec and reconfigure
|
|
RefreshConfigurationPending --> Running : Compute has been re-configured
|
|
TerminationPendingFast --> Terminated compute with 30s delay for cplane to inspect status
|
|
TerminationPendingImmediate --> Terminated : Terminated compute immediately
|
|
Running --> TerminationPending : Requested termination
|
|
TerminationPending --> Terminated : Terminated compute
|
|
Failed --> RefreshConfigurationPending : Received a /refresh_configuration request
|
|
Failed --> [*] : Compute exited
|
|
Terminated --> [*] : Compute exited
|
|
```
|
|
|
|
## Tests
|
|
|
|
Cargo formatter:
|
|
```sh
|
|
cargo fmt
|
|
```
|
|
|
|
Run tests:
|
|
```sh
|
|
cargo test
|
|
```
|
|
|
|
Clippy linter:
|
|
```sh
|
|
cargo clippy --all --all-targets -- -Dwarnings -Drust-2018-idioms
|
|
```
|
|
|
|
## Cross-platform compilation
|
|
|
|
Imaging that you are on macOS (x86) and you want a Linux GNU (`x86_64-unknown-linux-gnu` platform in `rust` terminology) executable.
|
|
|
|
### Using docker
|
|
|
|
You can use a throw-away Docker container ([rustlang/rust](https://hub.docker.com/r/rustlang/rust/) image) for doing that:
|
|
```sh
|
|
docker run --rm \
|
|
-v $(pwd):/compute_tools \
|
|
-w /compute_tools \
|
|
-t rustlang/rust:nightly cargo build --release --target=x86_64-unknown-linux-gnu
|
|
```
|
|
or one-line:
|
|
```sh
|
|
docker run --rm -v $(pwd):/compute_tools -w /compute_tools -t rust:latest cargo build --release --target=x86_64-unknown-linux-gnu
|
|
```
|
|
|
|
### Using rust native cross-compilation
|
|
|
|
Another way is to add `x86_64-unknown-linux-gnu` target on your host system:
|
|
```sh
|
|
rustup target add x86_64-unknown-linux-gnu
|
|
```
|
|
|
|
Install macOS cross-compiler toolchain:
|
|
```sh
|
|
brew tap SergioBenitez/osxct
|
|
brew install x86_64-unknown-linux-gnu
|
|
```
|
|
|
|
And finally run `cargo build`:
|
|
```sh
|
|
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=x86_64-unknown-linux-gnu-gcc cargo build --target=x86_64-unknown-linux-gnu --release
|
|
```
|