Follow up to #12701, which introduced a new regression. When profiling
locally I noticed that writes have the tendency to always reallocate. On
investigation I found that even if the `Connection`'s write buffer is
empty, if it still shares the same data pointer as the `Client`'s write
buffer then the client cannot reclaim it.
The best way I found to fix this is to just drop the `Connection`'s
write buffer each time we fully flush it.
Additionally, I remembered that `BytesMut` has an `unsplit` method which
is allows even better sharing over the previous optimisation I had when
'encoding'.
Another go at #12341. LKB-2497
We now only need 1 connect mechanism (and 1 more for testing) which
saves us some code and complexity. We should be able to remove the final
connect mechanism when we create a separate worker task for
pglb->compute connections - either via QUIC streams or via in-memory
channels.
This also now ensures that connect_once always returns a ConnectionError
type - something simple enough we can probably define a serialisation
for in pglb.
* I've abstracted connect_to_compute to always use TcpMechanism and the
ProxyConfig.
* I've abstracted connect_to_compute_and_auth to perform authentication,
managing any retries for stale computes
* I had to introduce a separate `managed` function for taking ownership
of the compute connection into the Client/Connection pair
Second attempt at #12130, now with a smaller diff.
This allows us to skip allocating for things like parameter status and
notices that we will either just forward untouched, or discard.
LKB-2494
A replacement for #10254 which allows us to introduce notice messages
for sql-over-http in the future if we want to. This also removes the
`ParameterStatus` and `Notification` handling as there's nothing we
could/should do for those.
## Problem
While working more on TLS to compute, I realised that Console Redirect
-> pg-sni-router -> compute would break if channel binding was set to
prefer. This is because the channel binding data would differ between
Console Redirect -> pg-sni-router vs pg-sni-router -> compute.
I also noticed that I actually disabled channel binding in #12145, since
`connect_raw` would think that the connection didn't support TLS.
## Summary of changes
Make sure we specify the channel binding.
Make sure that `connect_raw` can see if we have TLS support.
## Problem
PGLB/Neonkeeper needs to separate the concerns of connecting to compute,
and authenticating to compute.
Additionally, the code within `connect_to_compute` is rather messy,
spending effort on recovering the authentication info after
wake_compute.
## Summary of changes
Split `ConnCfg` into `ConnectInfo` and `AuthInfo`. `wake_compute` only
returns `ConnectInfo` and `AuthInfo` is determined separately from the
`handshake`/`authenticate` process.
Additionally, `ConnectInfo::connect_raw` is in-charge or establishing
the TLS connection, and the `postgres_client::Config::connect_raw` is
configured to use `NoTls` which will force it to skip the TLS
negotiation. This should just work.
## Problem
For #11992 I realised we need to get the type info before executing the
query. This is important to know how to decode rows with custom types,
eg the following query:
```sql
CREATE TYPE foo AS ENUM ('foo','bar','baz');
SELECT ARRAY['foo'::foo, 'bar'::foo, 'baz'::foo] AS data;
```
Getting that to work was harder that it seems. The original
tokio-postgres setup has a split between `Client` and `Connection`,
where messages are passed between. Because multiple clients were
supported, each client message included a dedicated response channel.
Each request would be terminated by the `ReadyForQuery` message.
The flow I opted to use for parsing types early would not trigger a
`ReadyForQuery`. The flow is as follows:
```
PARSE "" // parse the user provided query
DESCRIBE "" // describe the query, returning param/result type oids
FLUSH // force postgres to flush the responses early
// wait for descriptions
// check if we know the types, if we don't then
// setup the typeinfo query and execute it against each OID:
PARSE typeinfo // prepare our typeinfo query
DESCRIBE typeinfo
FLUSH // force postgres to flush the responses early
// wait for typeinfo statement
// for each OID we don't know:
BIND typeinfo
EXECUTE
FLUSH
// wait for type info, might reveal more OIDs to inspect
// close the typeinfo query, we cache the OID->type map and this is kinder to pgbouncer.
CLOSE typeinfo
// finally once we know all the OIDs:
BIND "" // bind the user provided query - already parsed - to the user provided params
EXECUTE // run the user provided query
SYNC // commit the transaction
```
## Summary of changes
Please review commit by commit. The main challenge was allowing one
query to issue multiple sub-queries. To do this I first made sure that
the client could fully own the connection, which required removing any
shared client state. I then had to replace the way responses are sent to
the client, by using only a single permanent channel. This required some
additional effort to track which query is being processed. Lastly I had
to modify the query/typeinfo functions to not issue `sync` commands, so
it would fit into the desired flow above.
To note: the flow above does force an extra roundtrip into each query. I
don't know yet if this has a measurable latency overhead.
https://github.com/neondatabase/cloud/issues/23008
For TLS between proxy and compute, we are using an internally
provisioned CA to sign the compute certificates. This change ensures
that proxy will load them from a supplied env var pointing to the
correct file - this file and env var will be configured later, using a
kubernetes secret.
Control plane responds with a `server_name` field if and only if the
compute uses TLS. This server name is the name we use to validate the
certificate. Control plane still sends us the IP to connect to as well
(to support overlay IP).
To support this change, I'd had to split `host` and `host_addr` into
separate fields. Using `host_addr` and bypassing `lookup_addr` if
possible (which is what happens in production). `host` then is only used
for the TLS connection.
There's no blocker to merging this. The code paths will not be triggered
until the new control plane is deployed and the `enableTLS` compute flag
is enabled on a project.
This upgrades the `proxy/` crate as well as the forked libraries in
`libs/proxy/` to edition 2024.
Also reformats the imports of those forked libraries via:
```
cargo +nightly fmt -p proxy -p postgres-protocol2 -p postgres-types2 -p tokio-postgres2 -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true
```
It can be read commit-by-commit: the first commit has no formatting
changes, only changes to accomodate the new edition.
Part of #10918
(stacked on #9990 and #9995)
Partially fixes#1287 with a custom option field to enable the fixed
behaviour. This allows us to gradually roll out the fix without silently
changing the observed behaviour for our customers.
related to https://github.com/neondatabase/cloud/issues/15284
Fixes https://github.com/neondatabase/cloud/issues/20973.
This refactors `connect_raw` in order to return direct access to the
delayed notices.
I cannot find a way to test this with psycopg2 unfortunately, although
testing it with psql does return the expected results.
Our rust-postgres fork is getting messy. Mostly because proxy wants more
control over the raw protocol than tokio-postgres provides. As such,
it's diverging more and more. Storage and compute also make use of
rust-postgres, but in more normal usage, thus they don't need our crazy
changes.
Idea:
* proxy maintains their subset
* other teams use a minimal patch set against upstream rust-postgres
Reviewing this code will be difficult. To implement it, I
1. Copied tokio-postgres, postgres-protocol and postgres-types from
00940fcdb5
2. Updated their package names with the `2` suffix to make them compile
in the workspace.
3. Updated proxy to use those packages
4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches
applied https://github.com/jbg/tokio-postgres-rustls/pull/32https://github.com/jbg/tokio-postgres-rustls/pull/33)
5. Removed as much dead code as I could find in the vendored libraries
6. Updated the tokio-postgres-rustls code to use our existing channel
binding implementation