## Problem
My benchmarks show that prometheus is not very good.
https://github.com/conradludgate/measured
We're already using it in storage_controller and it seems to be working
well.
## Summary of changes
Replace prometheus with my new measured crate in proxy only.
Apologies for the large diff. I tried to keep it as minimal as I could.
The label types add a bit of boiler plate (but reduce the chance we
mistype the labels), and some of our custom metrics like CounterPair and
HLL needed to be rewritten.
## Problem
Would be nice to have a bit more info on cold start metrics.
## Summary of changes
* Change connect compute latency to include `cold_start_info`.
* Update `ColdStartInfo` to include HttpPoolHit and WarmCached.
* Several changes to make more use of interned strings
## Problem
I noticed code coverage for auth_quirks was pretty bare
## Summary of changes
Adds 3 happy path unit tests for auth_quirks
* scram
* cleartext (websockets)
* cleartext (password hack)
## Problem
Not really a problem, just refactoring.
## Summary of changes
Separate authenticate from wake compute.
Do not call wake compute second time if we managed to connect to
postgres or if we got it not from cache.
## Problem
Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and
doing a bit less radical changes. smaller commits.
We currently don't report error classifications in proxy as the current
error handling made it hard to do so.
## Summary of changes
1. Add a `ReportableError` trait that all errors will implement. This
provides the error classification functionality.
2. Handle Client requests a strongly typed error
* this error is a `ReportableError` and is logged appropriately
3. The handle client error only has a few possible error types, to
account for the fact that at this point errors should be returned to the
user.
## Summary of changes
### RequestMonitoring
We want to add an event stream with information on each request for
easier analysis than what we can do with diagnostic logs alone
(https://github.com/neondatabase/cloud/issues/8807). This
RequestMonitoring will keep a record of the final state of a request. On
drop it will be pushed into a queue to be uploaded.
Because this context is a bag of data, I don't want this information to
impact logic of request handling. I personally think that weakly typed
data (such as all these options) makes for spaghetti code. I will
however allow for this data to impact rate-limiting and blocking of
requests, as this does not _really_ change how a request is handled.
### Parquet
Each `RequestMonitoring` is flushed into a channel where it is converted
into `RequestData`, which is accumulated into parquet files. Each file
will have a certain number of rows per row group, and several row groups
will eventually fill up the file, which we then upload to S3.
We will also upload smaller files if they take too long to construct.
## Problem
The `src/proxy.rs` file is far too large
## Summary of changes
Creates 3 new files:
```
src/metrics.rs
src/proxy/retry.rs
src/proxy/connect_compute.rs
```
## Problem
The websockets gauge for active db connections seems to be growing more
than the gauge for client connections over websockets, which does not
make sense.
## Summary of changes
refactor how our counter-pair gauges are represented. not sure if this
will improve the problem, but it should be harder to mess-up the
counters. The API is much nicer though now and doesn't require
scopeguard::defer hacks
## Problem
fixes#5654 - WakeComputeErrors occuring during a connect_to_compute got
propagated as IO errors, which get forwarded to the user as "Couldn't
connect to compute node" with no helpful message.
## Summary of changes
Handle WakeComputeError during ConnectionError properly
The idea is to pass neon_* prefixed options to control plane. It can be
used by cplane to dynamically create timelines and computes. Such
options also should be excluded from passing to compute. Another issue
is how connection caching is working now, because compute's instance now
depends not only on hostname but probably on such options too I included
them to cache key.
## Problem
Pre-requisites for #4852 and #4853
## Summary of changes
1. Includes the client's IP address (which we already log) with the span
info so we can have it on all associated logs. This makes making
dashboards based on IP addresses easier.
2. Switch to a consistent error/warning log for errors during
connection. This includes error, num_retries, retriable=true/false and a
consistent log message that we can grep for.
## Problem
Half of #4699.
TCP/WS have one implementation of `connect_to_compute`, HTTP has another
implementation of `connect_to_compute`.
Having both is annoying to deal with.
## Summary of changes
Creates a set of traits `ConnectMechanism` and `ShouldError` that allows
the `connect_to_compute` to be generic over raw TCP stream or
tokio_postgres based connections.
I'm not super happy with this. I think it would be nice to
remove tokio_postgres entirely but that will need a lot more thought to
be put into it.
I have also slightly refactored the caching to use fewer references.
Instead using ownership to ensure the state of retrying is encoded in
the type system.
## Problem
`cargo +nightly doc` is giving a lot of warnings: broken links, naked
URLs, etc.
## Summary of changes
* update the `proc-macro2` dependency so that it can compile on latest
Rust nightly, see https://github.com/dtolnay/proc-macro2/pull/391 and
https://github.com/dtolnay/proc-macro2/issues/398
* allow the `private_intra_doc_links` lint, as linking to something
that's private is always more useful than just mentioning it without a
link: if the link breaks in the future, at least there is a warning due
to that. Also, one might enable
[`--document-private-items`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html#documentation-options)
in the future and make these links work in general.
* fix all the remaining warnings given by `cargo +nightly doc`
* make it possible to run `cargo doc` on stable Rust by updating
`opentelemetry` and associated crates to version 0.19, pulling in a fix
that previously broke `cargo doc` on stable:
https://github.com/open-telemetry/opentelemetry-rust/pull/904
* Add `cargo doc` to CI to ensure that it won't get broken in the
future.
Fixes#2557
## Future work
* Potentially, it might make sense, for development purposes, to publish
the generated rustdocs somewhere, like for example [how the rust
compiler does
it](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html).
I will file an issue for discussion.
In case we try to connect to an outdated address that is no longer valid, the
default behavior of Kubernetes is to drop the packets, causing us to wait for
the entire timeout period. We want to fail fast in such cases.
A specific case to consider is when we have cached compute node information
with a 5-minute TTL (Time To Live), but the user has executed a `/suspend` API
call, resulting in the nonexistence of the compute node.
Clients may specify endpoint/project name via `options=project=...`,
so we should not only remove `project=` from `options` but also
drop `options` entirely, because connection pools don't support it.
Discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1676464382670119
Upstream proxy erroneously stores user & dbname in compute node info
cache entries, thus causing "funny" connection problems if such an entry
is reused while connecting to e.g. a different DB on the same compute node.
This PR fixes the problem but doesn't eliminate the root cause just yet.
I'll revisit this code and make it more type-safe in the upcoming PR.
This patch adds a timed LRU cache implementation and a compute node info cache on top of that.
Cache entries might expire on their own (default ttl=5mins) or become invalid due to real-world events,
e.g. compute node scale-to-zero event, so we add a connection retry loop with a wake-up call.
Solved problems:
- [x] Find a decent LRU implementation.
- [x] Implement timed LRU on top of that.
- [x] Cache results of `proxy_wake_compute` API call.
- [x] Don't invalidate newer cache entries for the same key.
- [x] Add cmdline configuration knobs (requires some refactoring).
- [x] Add failed connection estab metric.
- [x] Refactor auth backends to make things simpler (retries, cache
placement, etc).
- [x] Address review comments (add code comments + cleanup).
- [x] Retry `/proxy_wake_compute` if we couldn't connect to a compute
(e.g. stalled cache entry).
- [x] Add high-level description for `TimedLru`.
TODOs (will be addressed later):
- [ ] Add cache metrics (hit, spurious hit, miss).
- [ ] Synchronize http requests across concurrent per-client tasks
(https://github.com/neondatabase/neon/pull/3331#issuecomment-1399216069).
- [ ] Cache results of `proxy_get_role_secret` API call.
This fixes all kinds of problems related to missing params,
like broken timestamps (due to `integer_datetimes`).
This solution is not ideal, but it will help. Meanwhile,
I'm going to dedicate some time to improving connection machinery.
Note that this **does not** fix problems with passing certain parameters
in a reverse direction, i.e. **from client to compute**. This is a
separate matter and will be dealt with in an upcoming PR.
Previously, proxy didn't forward auxiliary `options` parameter
and other ones to the client's compute node, e.g.
```
$ psql "user=john host=localhost dbname=postgres options='-cgeqo=off'"
postgres=# show geqo;
┌──────┐
│ geqo │
├──────┤
│ on │
└──────┘
(1 row)
```
With this patch we now forward `options`, `application_name` and `replication`.
Further reading: https://www.postgresql.org/docs/current/libpq-connect.htmlFixes#1287.
[proxy] Add the `password hack` authentication flow
This lets us authenticate users which can use neither
SNI (due to old libpq) nor connection string `options`
(due to restrictions in other client libraries).
Note: `PasswordHack` will accept passwords which are not
encoded in base64 via the "password" field. The assumption
is that most user passwords will be valid utf-8 strings,
and the rest may still be passed via "password_".
Now proxy binary accepts `--auth-backend` CLI option, which determines
auth scheme and cluster routing method. Following backends are currently
implemented:
* legacy
old method, when username ends with `@zenith` it uses md5 auth dbname as
the cluster name; otherwise, it sends a login link and waits for the console
to call back
* console
new SCRAM-based console API; uses SNI info to select the destination
cluster
* postgres
uses postgres to select auth secrets of existing roles. Useful for local
testing
* link
sends login link for all usernames
* `cloud::legacy` talks to Cloud API V1.
* `cloud::api` defines Cloud API v2.
* `cloud::local` mocks the Cloud API V2 using a local postgres instance.
* It's possible to choose between API versions using the `--api-version` flag.
* [proxy] Propagate most errors to user
This change enables propagation of most errors to the user
(e.g. auth and connectivity errors). Some of them will be
stripped of sensitive information.
As a side effect, most occurrences of `anyhow::Error` were
replaced with concrete error types.
* [proxy] Box weighty errors
This change makes most parts of the code asynchronous, except
for the `mgmt` subsystem (we're going to drop it anyway).
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>