## Problem
1. Hard to correlate startup parameters with the endpoint that provided
them.
2. Some configurations are not needed in the `ProxyConfig` struct.
## Summary of changes
Because of some borrow checker fun, I needed to switch to an
interior-mutability implementation of our `RequestMonitoring` context
system. Using https://docs.rs/try-lock/latest/try_lock/ as a cheap lock
for such a use-case (needed to be thread safe).
Removed the lock of each startup message, instead just logging only the
startup params in a successful handshake.
Also removed from values from `ProxyConfig` and kept as arguments.
(needed for local-proxy config)
## Problem
If there's a quota error, it makes sense to cache it for a short window
of time. Many clients do not handle database connection errors
gracefully, so just spam retry 🤡
## Summary of changes
Updates the node_info cache to support storing console errors. Store
console errors if they cannot be retried (using our own heuristic.
should only trigger for quota exceeded errors).
## Problem
Sometimes rejected metric might record invalid events.
## Summary of changes
* Only record it `rejected` was explicitly set.
* Change order in logs.
* Report metrics if not under high-load.
## Problem
Actually read redis events.
## Summary of changes
This is revert of https://github.com/neondatabase/neon/pull/7350 +
fixes.
* Fixed events parsing
* Added timeout after connection failure
* Separated regional and global redis clients.
## Problem
Proxy doesn't know about existing endpoints.
## Summary of changes
* Added caching of all available endpoints.
* On the high load, use it before going to cplane.
* Report metrics for the outcome.
* For rate limiter and credentials caching don't distinguish between
`-pooled` and not
TODOs:
* Make metrics more meaningful
* Consider integrating it with the endpoint rate limiter
* Test it together with cplane in preview
## Problem
Would be nice to have a bit more info on cold start metrics.
## Summary of changes
* Change connect compute latency to include `cold_start_info`.
* Update `ColdStartInfo` to include HttpPoolHit and WarmCached.
* Several changes to make more use of interned strings
## Problem
https://github.com/neondatabase/cloud/issues/9642
## Summary of changes
1. Make `EndpointRateLimiter` generic, renamed as `BucketRateLimiter`
2. Add support for claiming multiple tokens at once
3. Add `AuthRateLimiter` alias.
4. Check `(Endpoint, IP)` pair during authentication, weighted by how
many hashes proxy would be doing.
TODO: handle ipv6 subnets. will do this in a separate PR.
Nightly has added a bunch of compiler and linter warnings. There is also
two dependencies that fail compilation on latest nightly due to using
the old `stdsimd` feature name. This PR fixes them.
## Problem
Running some memory profiling with high concurrent request rate shows
seemingly some memory fragmentation.
## Summary of changes
Eventually, we will want to separate global memory (caches) from local
memory (per connection handshake and per passthrough).
Using a string interner for project info cache helps reduce some of the
fragmentation of the global cache by having a single heap dedicated to
project strings, and not scattering them throughout all a requests.
At the same time, the interned key is 4 bytes vs the 24 bytes that
`SmolStr` offers.
Important: we should only store verified strings in the interner because
there's no way to remove them afterwards. Good for caching responses
from console.
## Problem
too many string based IDs. easy to mix up ID types.
## Summary of changes
Add a bunch of `SmolStr` wrappers that provide convenience methods but
are type safe
## Problem
Parsing the IP address at check time is a little wasteful.
## Summary of changes
Parse the IP when we get it from cplane. Adding a `None` variant to
still allow malformed patterns
## Problem
There are a lot of responses with 404 role not found error, which are
not getting cached in proxy.
## Summary of changes
If there was returned an empty secret but with the project_id, store it
in cache.
## Problem
Current cache doesn't support any updates from the cplane.
## Summary of changes
* Added redis notifier listner.
* Added cache which can be invalidated with the notifier. If the
notifier is not available, it's just a normal ttl cache.
* Updated cplane api.
The motivation behind this organization of the data is the following:
* In the Neon data model there are projects. Projects could have
multiple branches and each branch could have more than one endpoint.
* Also there is one special `main` branch.
* Password reset works per branch.
* Allowed IPs are the same for every branch in the project (except,
maybe, the main one).
* The main branch can be changed to the other branch.
* The endpoint can be moved between branches.
Every event described above requires some special processing on the
porxy (or cplane) side.
The idea of invalidating for the project is that whenever one of the
events above is happening with the project, proxy can invalidate all
entries for the entire project.
This approach also requires some additional API change (returning
project_id inside the auth info).