## Problem
Hard to debug the disconnection reason currently.
## Summary of changes
Keep track of error-direction, and therefore error source (client vs
compute) during passthrough.
## Problem
1. Proxy is retrying errors from cplane that shouldn't be retried
2. ~~Proxy is not using the retry_after_ms value~~
## Summary of changes
1. Correct the could_retry impl for ConsoleError.
2. ~~Update could_retry interface to support returning a fixed wait
duration.~~
## Problem
Fixes https://github.com/neondatabase/neon/issues/1287
## Summary of changes
tokio-postgres now supports arbitrary server params through the
`param(key, value)` method. Some keys are special so we explicitly
filter them out.
## Problem
Some tasks are using around upwards of 10KB of memory at all times,
sometimes having buffers that swing them up to 30MB.
## Summary of changes
Split some of the async tasks in selective places and box them as
appropriate to try and reduce the constant memory usage. Especially in
the locations where the large future is only a small part of the total
runtime of the task.
Also, reduces the size of the CopyBuffer buffer size from 8KB to 1KB.
In my local testing and in staging this had a minor improvement. sadly
not the improvement I was hoping for :/ Might have more impact in
production
Closes#7406.
## Problem
When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is
exposed to handle that case, which verbosely logs the error. However, we
don't benefit from having the full backtrace provided by anyhow in this
case.
## Summary of changes
This PR introduces a new `ApiError` type to handle errors caused by
cancelled request more robustly.
- A new enum variant `ApiError::Cancelled`
- Currently the cancelled request is mapped to status code 500.
- Need to handle this error in proxy's `http_util` as well.
- Added a failpoint test to simulate cancelled `get_lsn_by_timestamp`
request.
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
## Problem
proxy params being a `HashMap<String,String>` when it contains just
```
application_name: psql
database: neondb
user: neondb_owner
```
is quite wasteful allocation wise.
## Summary of changes
Keep the params in the wire protocol form, eg:
```
application_name\0psql\0database\0neondb\0user\0neondb_owner\0
```
Using a linear search for the map is fast enough at small sizes, which
is the normal case.
## Problem
Computes that are healthy can manage many connection attempts at a time.
Unhealthy computes cannot. We initially handled this with a fixed
concurrency limit, but it seems this inhibits pgbench.
## Summary of changes
Support AIMD for connect_to_compute lock to allow varying the
concurrency limit based on compute health
## Problem
Seems the websocket buffering was broken for large query responses only
## Summary of changes
Move buffering until after the underlying stream is ready.
Tested locally confirms this fixes the bug.
Also fixes the pg-sni-router missing metrics bug
## Problem
Despite making password hashing async, it can still take time away from
the network code.
## Summary of changes
Introduce a custom threadpool, inspired by rayon. Features:
### Fairness
Each task is tagged with it's endpoint ID. The more times we have seen
the endpoint, the more likely we are to skip the task if it comes up in
the queue. This is using a min-count-sketch estimator for the number of
times we have seen the endpoint, resetting it every 1000+ steps.
Since tasks are immediately rescheduled if they do not complete, the
worker could get stuck in a "always work available loop". To combat
this, we check the global queue every 61 steps to ensure all tasks
quickly get a worker assigned to them.
### Balanced
Using crossbeam_deque, like rayon does, we have workstealing out of the
box. I've tested it a fair amount and it seems to balance the workload
accordingly
## Problem
I wanted to do a deep dive of the tungstenite codebase.
tokio-tungstenite is incredibly convoluted... In my searching I found
[fastwebsockets by deno](https://github.com/denoland/fastwebsockets),
but it wasn't quite sufficient.
This also removes the default 16MB/64MB frame/message size limitation.
framed-websockets solves this by inserting continuation frames for
partially received messages, so the whole message does not need to be
entirely read into memory.
## Summary of changes
I took the fastwebsockets code as a starting off point and rewrote it to
be simpler, server-only, and be poll-based to support our Read/Write
wrappers.
I have replaced our tungstenite code with my framed-websockets fork.
<https://github.com/neondatabase/framed-websockets>
## Problem
There is no global per-ep rate limiter in proxy.
## Summary of changes
* Return global per-ep rate limiter back.
* Rename weak compute rate limiter (the cli flags were not used
anywhere, so it's safe to rename).
## Problem
If a permit cannot be acquired to connect to compute, the cache is
invalidated. This had the observed affect of sending more traffic to
ProxyWakeCompute on cplane.
## Summary of changes
Make sure that permit acquire failures are marked as "should not
invalidate cache".
## Problem
Some HTTP client connections can stay open for quite a long time.
## Summary of changes
When there are too many HTTP client connections, pick a random
connection and gracefully cancel it.
## Problem
Too many connect_compute attempts can overwhelm postgres, getting the
connections stuck.
## Summary of changes
Limit number of connection attempts that can happen at a given time.
## Problem
It's not possible to get the duration of the session from proxy events.
## Summary of changes
* Added a separate events folder in s3, to record disconnect events.
* Disconnect events are exactly the same as normal events, but also have
`disconnect_timestamp` field not empty.
* @oruen suggested to fill it with the same information as the original
events to avoid potentially heavy joins.
## Problem
Currently we cannot configure retries, also, we don't really have
visibility of what's going on there.
## Summary of changes
* Added cli params
* Improved logging
* Decrease the number of retries: it feels like most of retries doesn't
help. Once there would be better errors handling, we can increase it
back.
## Problem
Many users have access to ipv6 subnets (eg a /64). That gives them 2^64
addresses to play with
## Summary of changes
Truncate the address to /64 to reduce the attack surface.
Todo:
~~Will NAT64 be an issue here? AFAIU they put the IPv4 address at the
end of the IPv6 address. By truncating we will lose all that detail.~~
It's the same problem as a host sharing IPv6 addresses between clients.
I don't think it's up to us to solve. If a customer is getting DDoSed,
then they likely need to arrange a dedicated IP with us.
## Problem
possible for the database connections to not close in time.
## Summary of changes
force the closing of connections if the client has hung up
## Problem
My benchmarks show that prometheus is not very good.
https://github.com/conradludgate/measured
We're already using it in storage_controller and it seems to be working
well.
## Summary of changes
Replace prometheus with my new measured crate in proxy only.
Apologies for the large diff. I tried to keep it as minimal as I could.
The label types add a bit of boiler plate (but reduce the chance we
mistype the labels), and some of our custom metrics like CounterPair and
HLL needed to be rewritten.
## Problem
hyper1 offers control over the HTTP connection that hyper0_14 does not.
We're blocked on switching all services to hyper1 because of how we use
tonic, but no reason we can't switch proxy over.
## Summary of changes
1. hyper0.14 -> hyper1
1. self managed server
2. Remove the `WithConnectionGuard` wrapper from `protocol2`
2. Remove TLS listener as it's no longer necessary
3. include first session ID in connection startup logs
## Problem
Would be nice to have a bit more info on cold start metrics.
## Summary of changes
* Change connect compute latency to include `cold_start_info`.
* Update `ColdStartInfo` to include HttpPoolHit and WarmCached.
* Several changes to make more use of interned strings
## Problem
https://github.com/neondatabase/cloud/issues/11051
additionally, I felt like the http logic was a bit complex.
## Summary of changes
1. Removes timeout for HTTP requests.
2. Split out header parsing to a `HttpHeaders` type.
3. Moved db client handling to `QueryData::process` and
`BatchQueryData::process` to simplify the logic of `handle_inner` a bit.
## Problem
https://github.com/neondatabase/cloud/issues/9642
## Summary of changes
1. Make `EndpointRateLimiter` generic, renamed as `BucketRateLimiter`
2. Add support for claiming multiple tokens at once
3. Add `AuthRateLimiter` alias.
4. Check `(Endpoint, IP)` pair during authentication, weighted by how
many hashes proxy would be doing.
TODO: handle ipv6 subnets. will do this in a separate PR.
## Problem
Support of IAM Roles for Service Accounts for authentication.
## Summary of changes
* Obtain aws 15m-long credentials
* Retrieve redis password from credentials
* Update every 1h to keep connection for more than 12h
* For now allow to have different endpoints for pubsub/stream redis.
TODOs:
* PubSub doesn't support credentials refresh, consider using stream
instead.
* We need an AWS role for proxy to be able to connect to both: S3 and
elasticache.
Credentials obtaining and connection refresh was tested on xenon
preview.
https://github.com/neondatabase/cloud/issues/10365
## Problem
hyper auto-cancels the request futures on connection close.
`sql_over_http::handle` is not 'drop cancel safe', so we need to do some
other work to make sure connections are queries in the right way.
## Summary of changes
1. tokio::spawn the request handler to resolve the initial cancel-safety
issue
2. share a cancellation token, and cancel it when the request `Service`
is dropped.
3. Add a new log span to be able to track the HTTP connection lifecycle.
## Problem
* quotes in serialized string
* no status if connection is from local cache
## Summary of changes
* remove quotes
* report warm if connection if from local cache
## Problem
Missing error classification for SQL-over-HTTP queries.
Not respecting `UserFacingError` for SQL-over-HTTP queries.
## Summary of changes
Adds error classification.
Adds user facing errors.
## Problem
Now that we have tls-listener vendored, we can refactor and remove a lot
of bloated code and make the whole flow a bit simpler
## Summary of changes
1. Remove dead code
2. Move the error handling to inside the `TlsListener` accept() function
3. Extract the peer_addr from the PROXY protocol header and log it with
errors
## Problem
On HTTP query timeout, we should try and cancel the current in-flight
SQL query.
## Summary of changes
Trigger a cancellation command in postgres once the timeout is reach
## Problem
For the ephemeral endpoint feature, it's not really too helpful to keep
them around in the connection pool. This isn't really pressing but I
think it's still a bit better this way.
## Summary of changes
Add `is_ephemeral` function to `NeonOptions`. Allow
`serverless::ConnInfo::endpoint_cache_key()` to return an `Option`.
Handle that option appropriately
`std` has had `pin!` macro for some time, there is no need for us to use
the older alternatives. Cannot disallow `tokio::pin` because tokio
macros use that.
Nightly has added a bunch of compiler and linter warnings. There is also
two dependencies that fail compilation on latest nightly due to using
the old `stdsimd` feature name. This PR fixes them.
## Problem
Hard to find error reasons by endpoint for HTTP flow.
## Summary of changes
I want all root spans to have session id and endpoint id. I want all
root spans to be consistent.
## Problem
## Summary of changes
1. Classify further cplane API errors
2. add 'serviceratelimit' and make a few of the timeout errors return
that.
3. a few additional minor changes
## Problem
Not really a problem, just refactoring.
## Summary of changes
Separate authenticate from wake compute.
Do not call wake compute second time if we managed to connect to
postgres or if we got it not from cache.
## Problem
hard to see where time is taken during HTTP flow.
## Summary of changes
add a lot more for query state. add a conn_id field to the sql-over-http
span
## Problem
usernames and passwords can be URL 'percent' encoded in the connection
string URL provided by serverless driver.
## Summary of changes
Decode the parameters when getting conn info
## Problem
Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and
doing a bit less radical changes. smaller commits.
We currently don't report error classifications in proxy as the current
error handling made it hard to do so.
## Summary of changes
1. Add a `ReportableError` trait that all errors will implement. This
provides the error classification functionality.
2. Handle Client requests a strongly typed error
* this error is a `ReportableError` and is logged appropriately
3. The handle client error only has a few possible error types, to
account for the fact that at this point errors should be returned to the
user.
## Problem
Drizzle needs to be able to configure the array_mode flag per query.
## Summary of changes
Adds an array_mode flag to the query data json that will otherwise
default to the header flag.