rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 04:30:38 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	43b2445d0b	proxy: add jwks endpoint to control plane and mock providers (#9165 )	2024-09-27 16:08:43 +01:00
Conrad Ludgate	ec07a1ecc9	proxy: make local-proxy config by signal with PID, refine JWKS apis with role caching (#9164 )	2024-09-26 19:01:48 +01:00
Folke Behrens	2b65a2b53e	proxy: check if IP is allowed during webauth flow (#9101 ) neondatabase/cloud#12018	2024-09-24 11:52:25 +02:00
Conrad Ludgate	0a1ca7670c	proxy: remove auth info from http conn info & fixup jwt api trait (#9047 ) misc changes split out from #8855 - allow cloning the request context in a read-only fashion for background tasks - propagate endpoint and request context through the jwk cache - only allow password based auth for md5 during testing - remove auth info from conn info	2024-09-19 15:09:30 +00:00
Folke Behrens	794bd4b866	proxy: mock cplane usable without allowed-ips table (#9046 )	2024-09-18 17:14:53 +02:00
Folke Behrens	bae793ffcd	proxy: Handle all let underscore instances (#8898 ) * Most can be simply replaced * One instance renamed to _rtchk (return-type check)	2024-09-10 15:36:08 +02:00
Arpad Müller	97582178cb	Remove async_trait from the Handler trait (#8958 ) Newest attempt to remove `async_trait` from the Handler trait. Earlier attempts were in #7301 and #8296 .	2024-09-10 02:40:00 +02:00
Arpad Müller	a1323231bc	Update Rust to 1.81.0 (#8939 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1810-2024-09-05). Prior update was in #8667 and #8518	2024-09-06 12:40:19 +02:00
Folke Behrens	52cb33770b	proxy: Rename backend types and variants as prep for refactor (#8845 ) * AuthBackend enum to AuthBackendType * BackendType enum to Backend * Link variants to Web * Adjust messages, comments, etc.	2024-08-27 14:12:42 +02:00
Conrad Ludgate	12850dd5e9	proxy: remove dead code (#8847 ) By marking everything possible as pub(crate), we find a few dead code candidates.	2024-08-27 12:00:35 +01:00
Folke Behrens	d6eede515a	proxy: clippy lints: handle some low hanging fruit (#8829 ) Should be mostly uncontroversial ones.	2024-08-26 15:16:54 +02:00
Conrad Ludgate	701cb61b57	proxy: local auth backend (#8806 ) Adds a Local authentication backend. Updates http to extract JWT bearer tokens and passes them to the local backend to validate.	2024-08-23 18:48:06 +00:00
Folke Behrens	f246aa3ca7	proxy: Fix some warnings by extended clippy checks (#8748 ) * Missing blank lifetimes which is now deprecated. * Matching off unqualified enum variants that could act like variable. * Missing semicolons.	2024-08-19 10:33:46 +02:00
Conrad Ludgate	ad0988f278	proxy: random changes (#8602 ) ## Problem 1. Hard to correlate startup parameters with the endpoint that provided them. 2. Some configurations are not needed in the `ProxyConfig` struct. ## Summary of changes Because of some borrow checker fun, I needed to switch to an interior-mutability implementation of our `RequestMonitoring` context system. Using https://docs.rs/try-lock/latest/try_lock/ as a cheap lock for such a use-case (needed to be thread safe). Removed the lock of each startup message, instead just logging only the startup params in a successful handshake. Also removed from values from `ProxyConfig` and kept as arguments. (needed for local-proxy config)	2024-08-07 14:37:03 +01:00
Conrad Ludgate	6ca41d3438	proxy: switch to leaky bucket (#8470 ) ## Problem The current bucket based rate limiter is not very intuitive and has some bad failure cases. ## Summary of changes Switches from fixed interval buckets to leaky bucket impl. A single bucket per endpoint, drains over time. Drains by checking the time since the last check, and draining tokens en-masse. Garbage collection works similar to before, it drains a shard (1/64th of the set) every 2048 checks, and it only removes buckets that are empty. To be compatible with the existing config, I've faffed to make it take the min and the max rps of each as the sustained rps and the max bucket size which should be roughly equivalent.	2024-07-24 12:28:37 +01:00
Christian Schwarz	3f7aebb01c	refactor: postgres_backend: replace abstract shutdown_watcher with CancellationToken (#8295 ) Preliminary refactoring while working on https://github.com/neondatabase/neon/issues/7427 and specifically https://github.com/neondatabase/neon/pull/8286	2024-07-09 21:11:11 +03:00
Conrad Ludgate	e03c3c9893	proxy: cache certain non-retriable console errors for a short time (#8201 ) ## Problem If there's a quota error, it makes sense to cache it for a short window of time. Many clients do not handle database connection errors gracefully, so just spam retry 🤡 ## Summary of changes Updates the node_info cache to support storing console errors. Store console errors if they cannot be retried (using our own heuristic. should only trigger for quota exceeded errors).	2024-07-04 09:03:03 +01:00
Conrad Ludgate	6c5d3b5263	proxy fix wake compute console retry (#8141 ) ## Problem 1. Proxy is retrying errors from cplane that shouldn't be retried 2. ~~Proxy is not using the retry_after_ms value~~ ## Summary of changes 1. Correct the could_retry impl for ConsoleError. 2. ~~Update could_retry interface to support returning a fixed wait duration.~~	2024-06-25 18:07:54 +00:00
Anna Khanova	fbccd1e676	Proxy process updated errors (#8026 ) ## Problem Respect errors classification from cplane	2024-06-13 14:42:26 +02:00
Anna Khanova	00032c9d9f	[proxy] Fix dynamic rate limiter (#7950 ) ## Problem There was a bug in dynamic rate limiter, which exhausted CPU in proxy and proxy wasn't able to accept any connections. ## Summary of changes 1. `if self.available > 1` -> `if self.available >= 1` 2. remove `timeout_at` to use just timeout 3. remove potential infinite loops which can exhaust CPUs.	2024-06-04 05:07:54 +01:00
Conrad Ludgate	238fa47bee	proxy fix wake compute rate limit (#7902 ) ## Problem We were rate limiting wake_compute in the wrong place ## Summary of changes Move wake_compute rate limit to after the permit is acquired. Also makes a slight refactor on normalize, as it caught my eye	2024-05-30 11:09:27 +01:00
Conrad Ludgate	c8cebecabf	proxy: reintroduce dynamic limiter for compute lock (#7737 ) ## Problem Computes that are healthy can manage many connection attempts at a time. Unhealthy computes cannot. We initially handled this with a fixed concurrency limit, but it seems this inhibits pgbench. ## Summary of changes Support AIMD for connect_to_compute lock to allow varying the concurrency limit based on compute health	2024-05-29 11:17:05 +01:00
Anna Khanova	be1a88e574	Proxy added per ep rate limiter (#7636 ) ## Problem There is no global per-ep rate limiter in proxy. ## Summary of changes * Return global per-ep rate limiter back. * Rename weak compute rate limiter (the cli flags were not used anywhere, so it's safe to rename).	2024-05-10 12:17:00 +02:00
Conrad Ludgate	e3a2631df9	proxy: do not invalidate cache for permit errors (#7652 ) ## Problem If a permit cannot be acquired to connect to compute, the cache is invalidated. This had the observed affect of sending more traffic to ProxyWakeCompute on cplane. ## Summary of changes Make sure that permit acquire failures are marked as "should not invalidate cache".	2024-05-08 10:33:41 +00:00
Tristan Partin	69337be5c2	Fix grammar in provider.rs error message s/temporary/temporarily --------- Co-authored-by: Barry Grenon <barry_grenon@yahoo.ca>	2024-05-06 09:14:42 -05:00
Conrad Ludgate	9b65946566	proxy: add connect compute concurrency lock (#7607 ) ## Problem Too many connect_compute attempts can overwhelm postgres, getting the connections stuck. ## Summary of changes Limit number of connection attempts that can happen at a given time.	2024-05-03 15:45:24 +00:00
Conrad Ludgate	a54ea8fb1c	proxy: move endpoint rate limiter (#7413 ) ## Problem ## Summary of changes Rate limit for wake_compute calls	2024-04-18 06:00:33 +01:00
Anna Khanova	40f15c3123	Read cplane events from regional redis (#7352 ) ## Problem Actually read redis events. ## Summary of changes This is revert of https://github.com/neondatabase/neon/pull/7350 + fixes. * Fixed events parsing * Added timeout after connection failure * Separated regional and global redis clients.	2024-04-11 18:24:34 +00:00
Conrad Ludgate	5299f917d6	proxy: replace prometheus with measured (#6717 ) ## Problem My benchmarks show that prometheus is not very good. https://github.com/conradludgate/measured We're already using it in storage_controller and it seems to be working well. ## Summary of changes Replace prometheus with my new measured crate in proxy only. Apologies for the large diff. I tried to keep it as minimal as I could. The label types add a bit of boiler plate (but reduce the chance we mistype the labels), and some of our custom metrics like CounterPair and HLL needed to be rewritten.	2024-04-11 16:26:01 +00:00
Anna Khanova	0bb04ebe19	Revert "Proxy read ids from redis (#7205 )" (#7350 ) This reverts commit `dbac2d2c47`. ## Problem Proxy pods fails to install in k8s clusters, cplane release blocking. ## Summary of changes Revert	2024-04-10 10:12:55 +00:00
Anna Khanova	5efe95a008	proxy: fix credentials cache lookup (#7349 ) ## Problem Incorrect processing of `-pooler` connections. ## Summary of changes Fix TODO: add e2e tests for caching	2024-04-10 08:30:09 +00:00
Anna Khanova	dbac2d2c47	Proxy read ids from redis (#7205 ) ## Problem Proxy doesn't know about existing endpoints. ## Summary of changes * Added caching of all available endpoints. * On the high load, use it before going to cplane. * Report metrics for the outcome. * For rate limiter and credentials caching don't distinguish between `-pooled` and not TODOs: * Make metrics more meaningful * Consider integrating it with the endpoint rate limiter * Test it together with cplane in preview	2024-04-10 02:40:14 +02:00
Conrad Ludgate	55da8eff4f	proxy: report metrics based on cold start info (#7324 ) ## Problem Would be nice to have a bit more info on cold start metrics. ## Summary of changes * Change connect compute latency to include `cold_start_info`. * Update `ColdStartInfo` to include HttpPoolHit and WarmCached. * Several changes to make more use of interned strings	2024-04-05 16:14:50 +01:00
Anna Khanova	6c18109734	proxy: reuse sess_id as request_id for the cplane requests (#7245 ) ## Problem https://github.com/neondatabase/cloud/issues/11599 ## Summary of changes Reuse the same sess_id for requests within the one session. TODO: get rid of `session_id` in query params.	2024-03-26 11:27:48 +00:00
Conrad Ludgate	77f3a30440	proxy: unit tests for auth_quirks (#7199 ) ## Problem I noticed code coverage for auth_quirks was pretty bare ## Summary of changes Adds 3 happy path unit tests for auth_quirks * scram * cleartext (websockets) * cleartext (password hack)	2024-03-22 13:31:10 +00:00
Anna Khanova	46098ea0ea	proxy: add more missing warm logging (#7133 ) ## Problem There is one more missing thing about cached connections for `cold_start_info`. ## Summary of changes Fix and add comments.	2024-03-15 11:13:15 +00:00
Anna Khanova	b0aff04157	proxy: add new dimension to exclude cplane latency (#7011 ) ## Problem Currently cplane communication is a part of the latency monitoring. It doesn't allow to setup the proper alerting based on proxy latency. ## Summary of changes Added dimension to exclude cplane latency.	2024-03-13 13:50:05 +01:00
Anna Khanova	0554bee022	proxy: Report warm cold start if connection is from the local cache (#7104 ) ## Problem * quotes in serialized string * no status if connection is from local cache ## Summary of changes * remove quotes * report warm if connection if from local cache	2024-03-13 11:45:19 +00:00
Conrad Ludgate	cc5d6c66b3	proxy: categorise new cplane error message (#7057 ) ## Problem `422 Unprocessable Entity: compute time quota of non-primary branches is exceeded` being marked as a control plane error. ## Summary of changes Add the manual checks to make this a user error that should not be retried.	2024-03-11 09:20:09 +01:00
Anna Khanova	15b3665dc4	proxy: fix bug with populating the data (#7023 ) ## Problem Branch/project and coldStart were not populated to data events. ## Summary of changes Populate it. Also added logging for the coldstart info.	2024-03-05 15:32:58 +00:00
Anna Khanova	3114be034a	proxy: change is cold start to enum (#6948 ) ## Problem Actually it's good idea to distinguish between cases when it's a cold start, but we took the compute from the pool ## Summary of changes Updated to enum.	2024-03-04 10:31:28 +01:00
Arpad Müller	82853cc1d1	Fix warnings and compile errors on nightly (#6886 ) Nightly has added a bunch of compiler and linter warnings. There is also two dependencies that fail compilation on latest nightly due to using the old `stdsimd` feature name. This PR fixes them.	2024-03-01 17:14:19 +01:00
Anna Khanova	896d51367e	proxy: introdice is cold start for analytics (#6902 ) ## Problem Data team cannot distinguish between cold start and not cold start. ## Summary of changes Report `is_cold_start` to analytics. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2024-02-27 19:53:02 +04:00
Conrad Ludgate	e0af945f8f	proxy: improve error classification (#6841 ) ## Problem ## Summary of changes 1. Classify further cplane API errors 2. add 'serviceratelimit' and make a few of the timeout errors return that. 3. a few additional minor changes	2024-02-21 10:04:09 +00:00
Anna Khanova	fac50a6264	Proxy refactor auth+connect (#6708 ) ## Problem Not really a problem, just refactoring. ## Summary of changes Separate authenticate from wake compute. Do not call wake compute second time if we managed to connect to postgres or if we got it not from cache.	2024-02-12 18:41:02 +00:00
Conrad Ludgate	96d89cde51	Proxy error reworking (#6453 ) ## Problem Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and doing a bit less radical changes. smaller commits. We currently don't report error classifications in proxy as the current error handling made it hard to do so. ## Summary of changes 1. Add a `ReportableError` trait that all errors will implement. This provides the error classification functionality. 2. Handle Client requests a strongly typed error * this error is a `ReportableError` and is logged appropriately 3. The handle client error only has a few possible error types, to account for the fact that at this point errors should be returned to the user.	2024-02-09 15:50:51 +00:00
Anna Khanova	c63e3e7e84	Proxy: improve http-pool (#6577 ) ## Problem The password check logic for the sql-over-http is a bit non-intuitive. ## Summary of changes 1. Perform scram auth using the same logic as for websocket cleartext password. 2. Split establish connection logic and connection pool. 3. Parallelize param parsing logic with authentication + wake compute. 4. Limit the total number of clients	2024-02-08 12:57:05 +01:00
Conrad Ludgate	6506fd14c4	proxy: more refactors (#6526 ) ## Problem not really any problem, just some drive-by changes ## Summary of changes 1. move wake compute 2. move json processing 3. move handle_try_wake 4. move test backend to api provider 5. reduce wake-compute concerns 6. remove duplicate wake-compute loop	2024-02-02 16:07:35 +00:00
Conrad Ludgate	0856fe6676	proxy: remove per client bytes (#5466 ) ## Problem Follow up to #5461 In my memory usage/fragmentation measurements, these metrics came up as a large source of small allocations. The replacement metric has been in use for a long time now so I think it's good to finally remove this. Per-endpoint data is still tracked elsewhere ## Summary of changes remove the per-client bytes metrics	2024-02-02 12:28:48 +00:00
Anna Khanova	271133d960	Proxy: reduce number of get role secret calls (#6557 ) ## Problem Right now if get_role_secret response wasn't cached (e.g. cache already reached max size) it will send the second (exactly the same request). ## Summary of changes Avoid needless request.	2024-01-31 22:16:56 +00:00

1 2

95 Commits