With this commit one can start compute with something like
```shell
cargo run --bin compute_ctl -- -i no-compute \
-p http://localhost:9095 \
-D compute_pgdata \
-C "postgresql://cloud_admin@127.0.0.1:5434/postgres" \
-b ./pg_install/v15/bin/postgres
```
and it will hang waiting for spec.
Then send one spec
```shell
curl -d "$(cat ./compute-spec.json)" http://localhost:3080/spec
```
Postgres will be started and configured.
Then reconfigure it with
```shell
curl -d "$(cat ./compute-spec-new.json)" http://localhost:3080/spec
```
Most of safeguards and comments are added. Some polishing especially
around HTTP API is still needed.
Accept spec in JSON format and request compute reconfiguration from
the configurator thread. If anything goes wrong after we set the
compute state to `ConfigurationPending` and / or sent spec to the
configurator thread, we basically leave compute in the potentially
wrong state. That said, it's control-plane's responsibility to
watch compute state after reconfiguration request and to clean
restart it in case of errors.
It still lacks ability of starting up without spec and some validations,
i.e. that live reconfiguration should be only available with
`--compute-id` and `--control-plane-uri` options.
Otherwise, it works fine and could be tested by running `compute_ctl`
locally, then sending it a new spec:
```shell
curl -d "$(cat ./compute-spec-new.json)" http://localhost:3080/spec
```
We have one configurator thread and async http server, so generally we
have single consumer - multiple producers pattern here. That's why we
use `mpsc` channel, not `tokio::sync::watch`. Actually, concurrency of
producers is limited to one due to code logic, but we still need an
ability to potentially pass `Sender` to several threads.
Next, we use async `hyper` + `tokio` http server, but all the other code
is completely synchronous. So we need to send data from async to sync,
that's why we use `mpsc::unbounded_channel` here, not `mpsc::channel`.
It doesn't make much sense to rewrite all code to async now, but we can
consider doing this in the future.
I think that a combination of `Mutex` and `CondVar` would work just fine
too, but as we already have `tokio`, I decided to try something from it.
Currently we don't see from the logs, if shutting down tracing takes
long time or not. We do see that shutting down computes gets delayed for
some reason and hits thhe grace period limit. Moving the shutdown
message to slightly later, when we don't have anything else than just
exit left.
## Issue ticket number and link
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
We read the pageserver connection string from the spec file, so let's
read the auth token from the same place.
We've been talking about pre-launching compute nodes that are not
associated with any particular tenant at startup, so that the spec
file is delivered to the compute node later. We cannot change the env
variables after the process has been launched.
We still pass the token to 'postgres' binary in the NEON_AUTH_TOKEN
env variable, but compute_ctl is now responsible for setting it.
- Remove the neon.safekeeper_token_env GUC. It was used to set the
name of an environment variable, which was then used in pageserver
and safekeeper connection strings to in place of the
password. Instead, always look up the environment variable called
NEON_AUTH_TOKEN. That's what neon.safekeeper_token_env was always
set to in practice, and I don't see the need for the extra level of
indirection or configurability.
- Instead of substituting $NEON_AUTH_TOKEN in the connection strings,
pass $NEON_AUTH_TOKEN "out-of-band" as the password, when we connect
to the pageserver or safekeepers. That's simpler.
- Also use the password from $NEON_AUTH_TOKEN in compute_ctl, when it
connects to the pageserver to get the "base backup".
After enabling autoscaling, we faced the issue that customers are not
able to get the number of CPUs they use at this moment. Therefore I've
added these two options:
1. Postgresql function to allow customers to call it whenever they want
2. `compute_ctl` endpoint to show these number in console
This commit adds a basic HTTP API endpoint that allows scraping the
`pg_stat_statements` data and getting a list of slow queries. New
insights like cache hit rate and so on could be added later.
Extension `pg_stat_statements` is checked / created only if compute
tries to load the corresponding shared library. The latter is configured
by control-plane and currently covered with feature flag.
Co-authored by Eduard Dyckman (bird.duskpoet@gmail.com)
I added these spans to trace how long the queries take, but I didn't realize
that there's a difference between:
let _ = span.entered();
and
let _guard = span.entered();
The former drops the guard immediately, while the latter holds it
until the end of the scope. As a result, the span was ended
immediately, and the query was executed outside the span.
This allows fine-grained distributed tracing of the 'start_compute'
operation from the cloud console. The startup actions performed by
'compute_ctl' are now performed in a child of the 'start_compute'
context, so you can trace through the whole compute start operation.
This needs a corresponding change in the cloud console to fill in the
'startup_tracing_context' field in the json spec. If it's missing, the
startup operations are simply traced as a separate trace, without
a parent.
This allows tracing the startup actions e.g. with Jaeger
(https://www.jaegertracing.io/). We use the "tracing-opentelemetry"
crate, which turns tracing spans into OpenTelemetry spans, so you can
use the usual "#[instrument]" directives to add tracing.
I put the tracing initialization code to a separate crate,
`tracing-utils`, so that we can reuse it in other programs. We
probably want to set up tracing in the same way in all our programs.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
I noticed that `compute_ctl` updates all roles on each start, search for
rows like
> - web_access:[FILTERED] -> update
in the compute startup log.
It happens since we had an adhoc hack for md5 hashes comparison, which
doesn't work with scram hashes stored in the `pg_authid`. It doesn't
really hurt, as nothing changes, but we just run >= 2 extra queries on
each start, so fix it.
Previously, we were trying to re-assign owned objects of the already
deleted role. This were causing a crash loop in the case when compute
was restarted with a spec that includes delta operation for role
deletion. To avoid such cases, check that role is still present before
calling `reassign_owned_objects`.
Resolvesneondatabase/cloud#3553
Refactors Compute::prepare_and_run. It's split into subroutines
differently, to make it easier to attach tracing spans to the
different stages. The high-level logic for waiting for Postgres to
exit is moved to the caller.
Replace 'env_logger' with 'tracing', and add `#instrument` directives
to different stages fo the startup process. This is a fairly
mechanical change, except for the changes in 'spec.rs'. 'spec.rs'
contained some complicated formatting, where parts of log messages
were printed directly to stdout with `print`s. That was a bit messed
up because the log normally goes to stderr, but those lines were
printed to stdout. In our docker images, stderr and stdout both go to
the same place so you wouldn't notice, but I don't think it was
intentional.
This changes the log format to the default
'tracing_subscriber::format' format. It's different from the Postgres
log format, however, and because both compute_tools and Postgres print
to the same log, it's now a mix of two different formats. I'm not
sure how the Grafana log parsing pipeline can handle that. If it's a
problem, we can build custom formatter to change the compute_tools log
format to be the same as Postgres's, like it was before this commit,
or we can change the Postgres log format to match tracing_formatter's,
or we can start printing compute_tool's log output to a different
destination than Postgres
The general idea is that the VM informant binary is added to the
vm-compute-node images only. `compute_tools` then will run whatever's at
`/bin/vm-informant`, if the path exists.
Scan core dumps directory on exit. In case of existing core dumps
call gdb/lldb to get a backtrace and log it. By default look for
core dumps in postgres data directory as core.<pid>. That is how
core collection is configured in our k8s nodes (and a reasonable
convention in general).
1.66 release speeds up compile times for over 10% according to tests.
Also its Clippy finds plenty of old nits in our code:
* useless conversion, `foo as u8` where `foo: u8` and similar, removed
`as u8` and similar
* useless references and dereferenced (that were automatically adjusted
by the compiler), removed various `&` and `*`
* bool -> u8 conversion via `if/else`, changed to `u8::from`
* Map `.iter()` calls where only values were used, changed to
`.values()` instead
Standing out lints:
* `Eq` is missing in our protoc generated structs. Silenced, does not
seem crucial for us.
* `fn default` looks like the one from `Default` trait, so I've
implemented that instead and replaced the `dummy_*` method in tests with
`::default()` invocation
* Clippy detected that
```
if retry_attempt < u32::MAX {
retry_attempt += 1;
}
```
is a saturating add and proposed to replace it.
This fixes all kinds of problems related to missing params,
like broken timestamps (due to `integer_datetimes`).
This solution is not ideal, but it will help. Meanwhile,
I'm going to dedicate some time to improving connection machinery.
Note that this **does not** fix problems with passing certain parameters
in a reverse direction, i.e. **from client to compute**. This is a
separate matter and will be dealt with in an upcoming PR.
`GRANT CREATE ON SCHEMA public` fails if there is no schema `public`.
Disable it in release for now and make a better fix later (it is
needed for v15 support).
We've got at least one user in production that cannot create a
database with a trailing space in the name.
This happens because we use `url` crate for manipulating the
DATABASE_URL, but it follows a standard that doesn't fit really
well with Postgres. For example, it trims all trailing spaces
from the path:
> Remove any leading and trailing C0 control or space from input.
> See: https://url.spec.whatwg.org/#url-parsing
But we used `set_path()` to set database name and it's totally valid
to have trailing spaces in the database name in Postgres.
Thus, use `postgres::config::Config` to modify database name in the
connection details.
Compute node startup time is very important. After launching
PostgreSQL, use 'notify' to be notified immediately when it has
updated the PID file, instead of polling. The polling loop had 100 ms
interval so this shaves up to 100 ms from the startup time.
The loop checked if the TCP port is open for connections, by trying to
connect to it. That seems unnecessary. By the time the postmaster.pid
file says that it's ready, the port should be open. Remove that check.
Including, but not limited to:
* Fixes to neon management code to support walproposer-as-an-extension
* Fix issue in expected output of pg settings serialization.
* Show the logs of a failed --sync-safekeepers process in CI
* Add compat layer for renamed GUCs in postgres.conf
* Update vendor/postgres to the latest origin/main