misc changes split out from #8855
- **allow cloning the request context in a read-only fashion for
background tasks**
- **propagate endpoint and request context through the jwk cache**
- **only allow password based auth for md5 during testing**
- **remove auth info from conn info**
## Problem
#8736 is getting too big. splitting off some simple changes here
## Summary of changes
Local proxy wont always be using tls, so make it optional. Local proxy
wont be using ws for now, so make it optional. Remove a dead config var.
This reverts #8076 - which was already reverted from the release branch
since forever (it would have been a breaking change to release for all
users who currently set TimeZone options). It's causing conflicts now so
we should revert it here as well.
## Problem
1. Hard to correlate startup parameters with the endpoint that provided
them.
2. Some configurations are not needed in the `ProxyConfig` struct.
## Summary of changes
Because of some borrow checker fun, I needed to switch to an
interior-mutability implementation of our `RequestMonitoring` context
system. Using https://docs.rs/try-lock/latest/try_lock/ as a cheap lock
for such a use-case (needed to be thread safe).
Removed the lock of each startup message, instead just logging only the
startup params in a successful handshake.
Also removed from values from `ProxyConfig` and kept as arguments.
(needed for local-proxy config)
## Problem
new clippy warnings on nightly.
## Summary of changes
broken up each commit by warning type.
1. Remove some unnecessary refs.
2. In edition 2024, inference will default to `!` and not `()`.
3. Clippy complains about doc comment indentation
4. Fix `Trait + ?Sized` where `Trait: Sized`.
5. diesel_derives triggering `non_local_defintions`
This tweaks the rows-to-JSON rendering logic in order to avoid
allocating 0-sized temporary vectors and later growing them
to insert elements.
As the exact size is known in advance, both vectors can be built
with an exact capacity upfront. This will avoid further vector
growing/reallocation in the rendering hotpath.
Signed-off-by: Luca BRUNO <lucab@lucabruno.net>
## Problem
Hard to debug the disconnection reason currently.
## Summary of changes
Keep track of error-direction, and therefore error source (client vs
compute) during passthrough.
## Problem
1. Proxy is retrying errors from cplane that shouldn't be retried
2. ~~Proxy is not using the retry_after_ms value~~
## Summary of changes
1. Correct the could_retry impl for ConsoleError.
2. ~~Update could_retry interface to support returning a fixed wait
duration.~~
## Problem
Fixes https://github.com/neondatabase/neon/issues/1287
## Summary of changes
tokio-postgres now supports arbitrary server params through the
`param(key, value)` method. Some keys are special so we explicitly
filter them out.
## Problem
Some tasks are using around upwards of 10KB of memory at all times,
sometimes having buffers that swing them up to 30MB.
## Summary of changes
Split some of the async tasks in selective places and box them as
appropriate to try and reduce the constant memory usage. Especially in
the locations where the large future is only a small part of the total
runtime of the task.
Also, reduces the size of the CopyBuffer buffer size from 8KB to 1KB.
In my local testing and in staging this had a minor improvement. sadly
not the improvement I was hoping for :/ Might have more impact in
production
Closes#7406.
## Problem
When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is
exposed to handle that case, which verbosely logs the error. However, we
don't benefit from having the full backtrace provided by anyhow in this
case.
## Summary of changes
This PR introduces a new `ApiError` type to handle errors caused by
cancelled request more robustly.
- A new enum variant `ApiError::Cancelled`
- Currently the cancelled request is mapped to status code 500.
- Need to handle this error in proxy's `http_util` as well.
- Added a failpoint test to simulate cancelled `get_lsn_by_timestamp`
request.
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
## Problem
proxy params being a `HashMap<String,String>` when it contains just
```
application_name: psql
database: neondb
user: neondb_owner
```
is quite wasteful allocation wise.
## Summary of changes
Keep the params in the wire protocol form, eg:
```
application_name\0psql\0database\0neondb\0user\0neondb_owner\0
```
Using a linear search for the map is fast enough at small sizes, which
is the normal case.
## Problem
Computes that are healthy can manage many connection attempts at a time.
Unhealthy computes cannot. We initially handled this with a fixed
concurrency limit, but it seems this inhibits pgbench.
## Summary of changes
Support AIMD for connect_to_compute lock to allow varying the
concurrency limit based on compute health
## Problem
Seems the websocket buffering was broken for large query responses only
## Summary of changes
Move buffering until after the underlying stream is ready.
Tested locally confirms this fixes the bug.
Also fixes the pg-sni-router missing metrics bug
## Problem
Despite making password hashing async, it can still take time away from
the network code.
## Summary of changes
Introduce a custom threadpool, inspired by rayon. Features:
### Fairness
Each task is tagged with it's endpoint ID. The more times we have seen
the endpoint, the more likely we are to skip the task if it comes up in
the queue. This is using a min-count-sketch estimator for the number of
times we have seen the endpoint, resetting it every 1000+ steps.
Since tasks are immediately rescheduled if they do not complete, the
worker could get stuck in a "always work available loop". To combat
this, we check the global queue every 61 steps to ensure all tasks
quickly get a worker assigned to them.
### Balanced
Using crossbeam_deque, like rayon does, we have workstealing out of the
box. I've tested it a fair amount and it seems to balance the workload
accordingly
## Problem
I wanted to do a deep dive of the tungstenite codebase.
tokio-tungstenite is incredibly convoluted... In my searching I found
[fastwebsockets by deno](https://github.com/denoland/fastwebsockets),
but it wasn't quite sufficient.
This also removes the default 16MB/64MB frame/message size limitation.
framed-websockets solves this by inserting continuation frames for
partially received messages, so the whole message does not need to be
entirely read into memory.
## Summary of changes
I took the fastwebsockets code as a starting off point and rewrote it to
be simpler, server-only, and be poll-based to support our Read/Write
wrappers.
I have replaced our tungstenite code with my framed-websockets fork.
<https://github.com/neondatabase/framed-websockets>
## Problem
There is no global per-ep rate limiter in proxy.
## Summary of changes
* Return global per-ep rate limiter back.
* Rename weak compute rate limiter (the cli flags were not used
anywhere, so it's safe to rename).
## Problem
If a permit cannot be acquired to connect to compute, the cache is
invalidated. This had the observed affect of sending more traffic to
ProxyWakeCompute on cplane.
## Summary of changes
Make sure that permit acquire failures are marked as "should not
invalidate cache".
## Problem
Some HTTP client connections can stay open for quite a long time.
## Summary of changes
When there are too many HTTP client connections, pick a random
connection and gracefully cancel it.
## Problem
Too many connect_compute attempts can overwhelm postgres, getting the
connections stuck.
## Summary of changes
Limit number of connection attempts that can happen at a given time.
## Problem
It's not possible to get the duration of the session from proxy events.
## Summary of changes
* Added a separate events folder in s3, to record disconnect events.
* Disconnect events are exactly the same as normal events, but also have
`disconnect_timestamp` field not empty.
* @oruen suggested to fill it with the same information as the original
events to avoid potentially heavy joins.
## Problem
Currently we cannot configure retries, also, we don't really have
visibility of what's going on there.
## Summary of changes
* Added cli params
* Improved logging
* Decrease the number of retries: it feels like most of retries doesn't
help. Once there would be better errors handling, we can increase it
back.
## Problem
Many users have access to ipv6 subnets (eg a /64). That gives them 2^64
addresses to play with
## Summary of changes
Truncate the address to /64 to reduce the attack surface.
Todo:
~~Will NAT64 be an issue here? AFAIU they put the IPv4 address at the
end of the IPv6 address. By truncating we will lose all that detail.~~
It's the same problem as a host sharing IPv6 addresses between clients.
I don't think it's up to us to solve. If a customer is getting DDoSed,
then they likely need to arrange a dedicated IP with us.
## Problem
possible for the database connections to not close in time.
## Summary of changes
force the closing of connections if the client has hung up
## Problem
My benchmarks show that prometheus is not very good.
https://github.com/conradludgate/measured
We're already using it in storage_controller and it seems to be working
well.
## Summary of changes
Replace prometheus with my new measured crate in proxy only.
Apologies for the large diff. I tried to keep it as minimal as I could.
The label types add a bit of boiler plate (but reduce the chance we
mistype the labels), and some of our custom metrics like CounterPair and
HLL needed to be rewritten.
## Problem
hyper1 offers control over the HTTP connection that hyper0_14 does not.
We're blocked on switching all services to hyper1 because of how we use
tonic, but no reason we can't switch proxy over.
## Summary of changes
1. hyper0.14 -> hyper1
1. self managed server
2. Remove the `WithConnectionGuard` wrapper from `protocol2`
2. Remove TLS listener as it's no longer necessary
3. include first session ID in connection startup logs
## Problem
Would be nice to have a bit more info on cold start metrics.
## Summary of changes
* Change connect compute latency to include `cold_start_info`.
* Update `ColdStartInfo` to include HttpPoolHit and WarmCached.
* Several changes to make more use of interned strings
## Problem
https://github.com/neondatabase/cloud/issues/11051
additionally, I felt like the http logic was a bit complex.
## Summary of changes
1. Removes timeout for HTTP requests.
2. Split out header parsing to a `HttpHeaders` type.
3. Moved db client handling to `QueryData::process` and
`BatchQueryData::process` to simplify the logic of `handle_inner` a bit.
## Problem
https://github.com/neondatabase/cloud/issues/9642
## Summary of changes
1. Make `EndpointRateLimiter` generic, renamed as `BucketRateLimiter`
2. Add support for claiming multiple tokens at once
3. Add `AuthRateLimiter` alias.
4. Check `(Endpoint, IP)` pair during authentication, weighted by how
many hashes proxy would be doing.
TODO: handle ipv6 subnets. will do this in a separate PR.