rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-26 23:59:58 +00:00

Author	SHA1	Message	Date
Krzysztof Szafrański	96bcfba79e	[proxy] Cache GetEndpointAccessControl errors (#12571 ) Related to https://github.com/neondatabase/cloud/issues/19353	2025-07-18 10:17:58 +00:00
Krzysztof Szafrański	ff526a1051	[proxy] Recognize more cplane errors, use retry_delay_ms as TTL (#12543 ) ## Problem Not all cplane errors are properly recognized and cached/retried. ## Summary of changes Add more cplane error reasons. Also, use retry_delay_ms as cache TTL if present. Related to https://github.com/neondatabase/cloud/issues/19353	2025-07-15 07:42:48 +00:00
Folke Behrens	e65d5f7369	proxy: Remove the endpoint filter cache (#12488 ) ## Problem The endpoint filter cache is still unused because it's not yet reliable enough to be used. It only consumes a lot of memory. ## Summary of changes Remove the code. Needs a new design. neondatabase/cloud#30634	2025-07-07 17:46:33 +00:00
Conrad Ludgate	03e604e432	Nightly lints and small tweaks (#12456 ) Let chains available in 1.88 :D new clippy lints coming up in future releases.	2025-07-03 14:47:12 +00:00
Conrad Ludgate	e01c8f238c	[proxy] update noisy error logging (#12438 ) Health checks for pg-sni-router open a TCP connection and immediately close it again. This is noisy. We will filter out any EOF errors on the first message. "acquired permit" debug log is incorrect since it logs when we timedout as well. This fixes the debug log.	2025-07-03 07:46:48 +00:00
Conrad Ludgate	4dd9ca7b04	[proxy]: authenticate to compute after connect_to_compute (#12335 ) ## Problem PGLB will do the connect_to_compute logic, neonkeeper will do the session establishment logic. We should split it. ## Summary of changes Moves postgres authentication to compute to a separate routine that happens after connect_to_compute.	2025-06-24 14:15:36 +00:00
Conrad Ludgate	67b94c5992	[proxy] per endpoint configuration for rate limits (#12148 ) https://github.com/neondatabase/cloud/issues/28333 Adds a new `rate_limit` response type to EndpointAccessControl, uses it for rate limiting, and adds a generic invalidation for the cache.	2025-06-10 14:26:08 +00:00
Conrad Ludgate	4d99b6ff4d	[proxy] separate compute connect from compute authentication (#12145 ) ## Problem PGLB/Neonkeeper needs to separate the concerns of connecting to compute, and authenticating to compute. Additionally, the code within `connect_to_compute` is rather messy, spending effort on recovering the authentication info after wake_compute. ## Summary of changes Split `ConnCfg` into `ConnectInfo` and `AuthInfo`. `wake_compute` only returns `ConnectInfo` and `AuthInfo` is determined separately from the `handshake`/`authenticate` process. Additionally, `ConnectInfo::connect_raw` is in-charge or establishing the TLS connection, and the `postgres_client::Config::connect_raw` is configured to use `NoTls` which will force it to skip the TLS negotiation. This should just work.	2025-06-06 10:29:55 +00:00
Conrad Ludgate	781bf4945d	proxy: optimise future layout allocations (#12104 ) A smaller version of #12066 that is somewhat easier to review. Now that I've been using https://crates.io/crates/top-type-sizes I've found a lot more of the low hanging fruit that can be tweaks to reduce the memory usage. Some context for the optimisations: Rust's stack allocation in futures is quite naive. Stack variables, even if moved, often still end up taking space in the future. Rearranging the order in which variables are defined, and properly scoping them can go a long way. `async fn` and `async move {}` have a consequence that they always duplicate the "upvars" (aka captures). All captures are permanently allocated in the future, even if moved. We can be mindful when writing futures to only capture as little as possible. TlsStream is massive. Needs boxing so it doesn't contribute to the above issue. ## Measurements from `top-type-sizes`: ### Before ``` 10328 {async block@proxy::proxy::task_main::{closure#0}::{closure#0}} align=8 6120 {async fn body of proxy::proxy::handle_client<proxy::protocol2::ChainRW<tokio::net::TcpStream>>()} align=8 ``` ### After ``` 4040 {async block@proxy::proxy::task_main::{closure#0}::{closure#0}} 4704 {async fn body of proxy::proxy::handle_client<proxy::protocol2::ChainRW<tokio::net::TcpStream>>()} align=8 ```	2025-06-02 16:13:30 +00:00
Conrad Ludgate	589bfdfd02	proxy: Changes to rate limits and GetEndpointAccessControl caches. (#12048 ) Precursor to https://github.com/neondatabase/cloud/issues/28333. We want per-endpoint configuration for rate limits, which will be distributed via the `GetEndpointAccessControl` API. This lays some of the ground work. 1. Allow the endpoint rate limiter to accept a custom leaky bucket config on check. 2. Remove the unused auth rate limiter, as I don't want to think about how it fits into this. 3. Refactor the caching of `GetEndpointAccessControl`, as it adds friction for adding new cached data to the API. That third one was rather large. I couldn't find any way to split it up. The core idea is that there's now only 2 cache APIs. `get_endpoint_access_controls` and `get_role_access_controls`. I'm pretty sure the behaviour is unchanged, except I did a drive by change to fix #8989 because it felt harmless. The change in question is that when a password validation fails, we eagerly expire the role cache if the role was cached for 5 minutes. This is to allow for edge cases where a user tries to connect with a reset password, but the cache never expires the entry due to some redis related quirk (lag, or misconfiguration, or cplane error)	2025-06-02 08:38:35 +00:00
Em Sharnoff	2b0248cd76	fix(proxy): s/Console/Control plane/ in cplane error (#11716 ) I got bamboozled by the error message while debugging, seems no objections to updating it. ref https://neondb.slack.com/archives/C060N3SEF9D/p1745570961111509 ref https://neondb.slack.com/archives/C039YKBRZB4/p1745570811957019?thread_ts=1745393368.283599	2025-04-25 11:09:56 +00:00
Folke Behrens	4bb7087d4d	proxy: Fix some clippy warnings coming in next versions (#11359 )	2025-03-26 10:50:16 +00:00
Conrad Ludgate	d9ced89ec0	feat(proxy): require TLS to compute if prompted by cplane (#10717 ) https://github.com/neondatabase/cloud/issues/23008 For TLS between proxy and compute, we are using an internally provisioned CA to sign the compute certificates. This change ensures that proxy will load them from a supplied env var pointing to the correct file - this file and env var will be configured later, using a kubernetes secret. Control plane responds with a `server_name` field if and only if the compute uses TLS. This server name is the name we use to validate the certificate. Control plane still sends us the IP to connect to as well (to support overlay IP). To support this change, I'd had to split `host` and `host_addr` into separate fields. Using `host_addr` and bypassing `lookup_addr` if possible (which is what happens in production). `host` then is only used for the TLS connection. There's no blocker to merging this. The code paths will not be triggered until the new control plane is deployed and the `enableTLS` compute flag is enabled on a project.	2025-02-28 14:20:25 +00:00
Arpad Müller	fdde58120c	Upgrade proxy crates to edition 2024 (#10942 ) This upgrades the `proxy/` crate as well as the forked libraries in `libs/proxy/` to edition 2024. Also reformats the imports of those forked libraries via: ``` cargo +nightly fmt -p proxy -p postgres-protocol2 -p postgres-types2 -p tokio-postgres2 -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true ``` It can be read commit-by-commit: the first commit has no formatting changes, only changes to accomodate the new edition. Part of #10918	2025-02-24 15:26:28 +00:00
Folke Behrens	da7496e1ee	proxy: Post-refactor + future clippy lint cleanup (#10824 ) * Clean up deps and code after logging and binary refactor * Also include future clippy lint cleanup	2025-02-14 12:34:09 +00:00
Folke Behrens	f62bc28086	proxy: Move binaries into the lib (#10758 ) * This way all clippy lints defined in the lib also cover the binary code. * It's much easier to detect unused code. * Fix all discovered lints.	2025-02-11 19:46:23 +00:00
Stefan Radig	6dd48ba148	feat(proxy): Implement access control with VPC endpoint checks and block for public internet / VPC (#10143 ) - Wired up filtering on VPC endpoints - Wired up block access from public internet / VPC depending on per project flag - Added cache invalidation for VPC endpoints (partially based on PR from Raphael) - Removed BackendIpAllowlist trait --------- Co-authored-by: Ivan Efremov <ivan@neon.tech>	2025-01-31 20:32:57 +00:00
Conrad Ludgate	738bf83583	chore: replace dashmap with clashmap (#10582 ) ## Problem Because dashmap 6 switched to hashbrown RawTable API, it required us to use unsafe code in the upgrade: https://github.com/neondatabase/neon/pull/8107 ## Summary of changes Switch to clashmap, a fork maintained by me which removes much of the unsafe and ultimately switches to HashTable instead of RawTable to remove much of the unsafe requirement on us.	2025-01-31 09:53:43 +00:00
Conrad Ludgate	735c66dc65	fix(proxy): propagate the existing ComputeUserInfo to connect for cancellation (#10322 ) ## Problem We were incorrectly constructing the ComputeUserInfo, used for cancellation checks, based on the return parameters from postgres. This didn't contain the correct info. ## Summary of changes Propagate down the existing ComputeUserInfo.	2025-01-10 09:36:51 +00:00
Ivan Efremov	fcfff72454	impr(proxy): Decouple ip_allowlist from the CancelClosure (#10199 ) This PR removes the direct dependency of the IP allowlist from CancelClosure, allowing for more scalable and flexible IP restrictions and enabling the future use of Redis-based CancelMap storage. Changes: - Introduce a new BackendAuth async trait that retrieves the IP allowlist through existing authentication methods; - Improve cancellation error handling by instrument() async cancel_sesion() rather than dropping it. - Set and store IP allowlist for SCRAM Proxy to consistently perform IP allowance check Relates to #9660	2025-01-08 19:34:53 +00:00
Conrad Ludgate	38c7a2abfc	chore(proxy): pre-load native tls certificates and propagate compute client config (#10182 ) Now that we construct the TLS client config for cancellation as well as connect, it feels appropriate to construct the same config once and re-use it elsewhere. It might also help should #7500 require any extra setup, so we can easily add it to all the appropriate call sites.	2025-01-02 09:36:13 +00:00
Conrad Ludgate	d63602cc78	chore(proxy): fully remove allow-self-signed-compute flag (#10168 ) When https://github.com/neondatabase/cloud/pull/21856 is merged, this flag is no longer necessary.	2024-12-18 16:03:14 +00:00
Conrad Ludgate	59b7ff8988	chore(proxy): disallow unwrap and unimplemented (#10142 ) As the title says, I updated the lint rules to no longer allow unwrap or unimplemented. Three special cases: * Tests are allowed to use them * std::sync::Mutex lock().unwrap() is common because it's usually correct to continue panicking on poison * `tokio::spawn_blocking(...).await.unwrap()` is common because it will only error if the blocking fn panics, so continuing the panic is also correct I've introduced two extension traits to help with these last two, that are a bit more explicit so they don't need an expect message every time.	2024-12-16 16:37:15 +00:00
Conrad Ludgate	24d6587914	chore(proxy): refactor self-signed config (#10154 ) ## Problem While reviewing #10152 I found it tricky to actually determine whether the connection used `allow_self_signed_compute` or not. I've tried to remove this setting in the past: * https://github.com/neondatabase/neon/pull/7884 * https://github.com/neondatabase/neon/pull/7437 * https://github.com/neondatabase/cloud/pull/13702 But each time it seems it is used by e2e tests ## Summary of changes The `node_info.allow_self_signed_computes` is always initialised to false, and then sometimes inherits the proxy config value. There's no need this needs to be in the node_info, so removing it and propagating it via `TcpMechansim` is simpler.	2024-12-16 11:15:25 +00:00
Conrad Ludgate	e502e880b5	chore(proxy): remove code for old API (#10109 ) ## Problem Now that https://github.com/neondatabase/cloud/issues/15245 is done, we can remove the old code. ## Summary of changes Removes support for the ManagementV2 API, in favour of the ProxyV1 API.	2024-12-12 13:42:50 +00:00
Ivan Efremov	ffc9c33eb2	proxy: Present new auth backend cplane_proxy_v1 (#10012 ) Implement a new auth backend based on the current Neon backend to switch to the new Proxy V1 cplane API. Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)	2024-12-05 05:30:38 +00:00
Conrad Ludgate	9ef0662a42	chore(proxy): enforce single host+port (#9995 ) proxy doesn't ever provide multiple hosts/ports, so this code adds a lot of complexity of error handling for no good reason. (stacked on #9990)	2024-12-03 20:00:14 +00:00
Conrad Ludgate	27a42d0f96	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-03 18:39:23 +00:00
Folke Behrens	7404887b81	proxy: Demote errors from cplane request routines to debug (#9886 ) ## Problem Any errors from these async blocks are unconditionally logged at error level even though we already handle such errors based on context. ## Summary of changes * Log raw errors from creating and executing cplane requests at debug level. * Inline macro calls to retain the correct callsite.	2024-11-25 19:35:32 +00:00
Conrad Ludgate	f36f0068b8	chore(proxy): demote more logs during successful connection attempts (#9828 ) Follow up to #9803 See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment. I've also slightly refactored the sql-over-http body read/length reject code. I can split that into a separate PR. It just felt natural after I switched to `read_body_with_limit` as we discussed during the meet.	2024-11-20 17:50:39 +00:00
Folke Behrens	bf7d859a8b	proxy: Rename RequestMonitoring to RequestContext (#9805 ) ## Problem It is called context/ctx everywhere and the Monitoring suffix needlessly confuses with proper monitoring code. ## Summary of changes * Rename RequestMonitoring to RequestContext * Rename RequestMonitoringInner to RequestContextInner	2024-11-20 12:50:36 +00:00
Conrad Ludgate	3ae0b2149e	chore(proxy): demote a ton of logs for successful connection attempts (#9803 ) See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment.	2024-11-20 10:14:28 +00:00
Conrad Ludgate	73bdc9a2d0	[proxy]: minor changes to endpoint-cache handling (#9666 ) I think I meant to make these changes over 6 months ago. alas, better late than never. 1. should_reject doesn't eagerly intern the endpoint string 2. Rate limiter uses a std Mutex instead of a tokio Mutex. 3. Recently I introduced a `-local-proxy` endpoint suffix. I forgot to add this to normalize. 4. Random but a small cleanup making the ControlPlaneEvent deser directly to the interned strings.	2024-11-06 17:40:40 +00:00
Folke Behrens	bdd492b1d8	proxy: Replace "web(auth)" with "console redirect" everywhere (#9655 )	2024-11-06 11:03:38 +00:00
Folke Behrens	5d8284c7fe	proxy: Read cplane JWT with clap arg (#9654 )	2024-11-06 10:27:55 +00:00
Folke Behrens	ebc43efebc	proxy: Refactor cplane types (#9643 ) The overall idea of the PR is to rename a few types to make their purpose more clear, reduce abstraction where not needed, and move types to to more better suited modules.	2024-11-05 23:03:53 +01:00
Folke Behrens	92d5e0e87a	proxy: clear lib.rs of code items (#9479 ) We keep lib.rs for crate configs, lint configs and re-exports for the binaries.	2024-10-23 08:21:28 +02:00
Ivan Efremov	2dcac94194	proxy: Use common error interface for error handling with cplane (#9454 ) - Remove obsolete error handles. - Use one source of truth for cplane errors. #18468	2024-10-21 17:20:09 +03:00
Folke Behrens	f14e45f0ce	proxy: format imports with nightly rustfmt (#9414 ) ```shell cargo +nightly fmt -p proxy -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true ``` These rust-analyzer settings for VSCode should help retain this style: ```json "rust-analyzer.imports.group.enable": true, "rust-analyzer.imports.prefix": "crate", "rust-analyzer.imports.merge.glob": false, "rust-analyzer.imports.granularity.group": "module", "rust-analyzer.imports.granularity.enforce": true, ```	2024-10-16 15:01:56 +02:00
Folke Behrens	fb74c21e8c	proxy: Migrate jwt module away from anyhow (#9361 )	2024-10-15 15:24:56 +02:00
Folke Behrens	6baf1aae33	proxy: Demote some errors to warnings in logs (#9354 )	2024-10-11 11:29:08 +02:00
Ivan Efremov	b2ecbf3e80	Introduce "quota" ErrorKind (#9300 ) ## Problem Fixes #8340 ## Summary of changes Introduced ErrorKind::quota to handle quota-related errors ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-10-11 10:45:55 +03:00
Conrad Ludgate	75434060a5	local_proxy: integrate with pg_session_jwt extension (#9086 )	2024-10-09 18:24:10 +01:00
Folke Behrens	ad267d849f	proxy: Move module base files into module directory (#9297 )	2024-10-07 16:25:34 +02:00
Conrad Ludgate	8cd7b5bf54	proxy: rename console -> control_plane, rename web -> console_redirect (#9266 ) rename console -> control_plane rename web -> console_redirect I think these names are a little more representative.	2024-10-07 14:09:54 +01:00

45 Commits