rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-22 21:59:59 +00:00

Author	SHA1	Message	Date
Alexey Immoreev	3baef0bca3	Improvement: add console redirect timeout warning (#9985 ) ## Problem There is no information on session being cancelled in 2 minutes at the moment ## Summary of changes The timeout being logged for the user	2024-12-03 18:59:44 +00:00
Conrad Ludgate	27a42d0f96	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-03 18:39:23 +00:00
Conrad Ludgate	2dc238e5b3	feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971 ) Fix the HTTP AuthMethod to accomodate the JWT authorization method. Introduces the JWT issuer as an additional field in the parquet logs	2024-12-02 17:54:32 +00:00
Folke Behrens	1b60571636	proxy: Create Elasticache credentials provider lazily (#9967 ) ## Problem The credentials providers tries to connect to AWS STS even when we use plain Redis connections. ## Summary of changes * Construct the CredentialsProvider only when needed ("irsa").	2024-12-02 15:38:12 +00:00
Conrad Ludgate	cd1d2d1996	fix(proxy): forward notifications from authentication (#9948 ) Fixes https://github.com/neondatabase/cloud/issues/20973. This refactors `connect_raw` in order to return direct access to the delayed notices. I cannot find a way to test this with psycopg2 unfortunately, although testing it with psql does return the expected results.	2024-12-02 12:29:57 +00:00
Folke Behrens	4abc8e5282	Merge the consumption metric pushes (#9939 ) #8564 ## Problem The main and backup consumption metric pushes are completely independent, resulting in different event time windows and different idempotency keys. ## Summary of changes * Merge the push tasks, but keep chunks the same size.	2024-11-30 10:11:37 +00:00
John Spray	d5624cc505	pageserver: download small objects using a smaller timeout (#9938 ) ## Problem It appears that the Azure storage API tends to hang TCP connections more than S3 does. Currently we use a 2 minute timeout for all downloads. This is large because sometimes the objects we download are large. However, waiting 2 minutes when doing something like downloading a manifest on tenant attach is problematic, because when someone is doing a "create tenant, create timeline" workflow, that 2 minutes is long enough for them reasonably to give up creating that timeline. Rather than propagate oversized timeouts further up the stack, we should use a different timeout for objects that we expect to be small. Closes: https://github.com/neondatabase/neon/issues/9836 ## Summary of changes - Add a `small_timeout` configuration attribute to remote storage, defaulting to 30 seconds (still a very generous period to do something like download an index) - Add a DownloadKind parameter to DownloadOpts, so that callers can indicate whether they expect the object to be small or large. - In the azure client, use small timeout for HEAD requests, and for GET requests if DownloadKind::Small is used. - Use DownloadKind::Small for manifests, indices, and heatmap downloads. This PR intentionally does not make the equivalent change to the S3 client, to reduce blast radius in case this has unexpected consequences (we could accomplish the same thing by editing lots of configs, but just skipping the code is simpler for right now)	2024-11-29 15:11:44 +00:00
Conrad Ludgate	1d642d6a57	chore(proxy): vendor a subset of rust-postgres (#9930 ) Our rust-postgres fork is getting messy. Mostly because proxy wants more control over the raw protocol than tokio-postgres provides. As such, it's diverging more and more. Storage and compute also make use of rust-postgres, but in more normal usage, thus they don't need our crazy changes. Idea: * proxy maintains their subset * other teams use a minimal patch set against upstream rust-postgres Reviewing this code will be difficult. To implement it, I 1. Copied tokio-postgres, postgres-protocol and postgres-types from `00940fcdb5` 2. Updated their package names with the `2` suffix to make them compile in the workspace. 3. Updated proxy to use those packages 4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches applied https://github.com/jbg/tokio-postgres-rustls/pull/32 https://github.com/jbg/tokio-postgres-rustls/pull/33) 5. Removed as much dead code as I could find in the vendored libraries 6. Updated the tokio-postgres-rustls code to use our existing channel binding implementation	2024-11-29 11:08:01 +00:00
Ivan Efremov	8173dc600a	proxy: spawn cancellation checks in the background (#9918 ) ## Problem For cancellation, a connection is open during all the cancel checks. ## Summary of changes Spawn cancellation checks in the background, and close connection immediately. Use task_tracker for cancellation checks.	2024-11-28 06:32:22 +00:00
Folke Behrens	5c41707bee	proxy: promote two logs to error, fix multiline log (#9913 ) * Promote two logs from mpsc send errors to error level. The channels are unbounded and there shouldn't be errors. * Fix one multiline log from anyhow::Error. Use Debug instead of Display.	2024-11-27 18:05:46 +00:00
Conrad Ludgate	96a1b71c84	chore(proxy): discard request context span during passthrough (#9882 ) ## Problem The RequestContext::span shouldn't live for the entire postgres connection, only the handshake. ## Summary of changes * Slight refactor to the RequestContext to discard the span upon handshake completion. * Make sure the temporary future for the handshake is dropped (not bound to a variable) * Runs our nightly fmt script	2024-11-25 21:32:53 +00:00
Folke Behrens	7404887b81	proxy: Demote errors from cplane request routines to debug (#9886 ) ## Problem Any errors from these async blocks are unconditionally logged at error level even though we already handle such errors based on context. ## Summary of changes * Log raw errors from creating and executing cplane requests at debug level. * Inline macro calls to retain the correct callsite.	2024-11-25 19:35:32 +00:00
Folke Behrens	87e4dd23a1	proxy: Demote all cplane error replies to info log level (#9880 ) ## Problem The vast majority of the error/warn logs from cplane are about time or data transfer quotas exceeded or endpoint-not-found errors and not operational errors in proxy or cplane. ## Summary of changes * Demote cplane error replies to info level. * Raise other errors from warn back to error.	2024-11-25 17:53:26 +00:00
Ivan Efremov	3b1ac8b14a	proxy: Implement cancellation rate limiting (#9739 ) Implement cancellation rate limiting and ip allowlist checks. Add ip_allowlist to the cancel closure Fixes [#16456](https://github.com/neondatabase/cloud/issues/16456)	2024-11-22 16:46:38 +00:00
Conrad Ludgate	8ab96cc71f	chore(proxy/jwks): reduce the rightward drift of jwks renewal (#9853 ) I found the rightward drift of the `renew_jwks` function hard to review. This PR splits out some major logic and uses early returns to make the happy path more linear.	2024-11-22 14:51:32 +00:00
Conrad Ludgate	725a5ff003	fix(proxy): CancelKeyData display log masking (#9838 ) Fixes the masking for the CancelKeyData display format. Due to negative i32 cast to u64, the top-bits all had `0xffffffff` prefix. On the bitwise-or that followed, these took priority. This PR also compresses 3 logs during sql-over-http into 1 log with durations as label fields, as prior discussed.	2024-11-21 16:46:30 +00:00
Ivan Efremov	2d6bf176a0	proxy: Refactor http conn pool (#9785 ) - Use the same ConnPoolEntry for http connection pool. - Rename EndpointConnPool to the HttpConnPool. - Narrow clone bound for client Fixes #9284	2024-11-20 19:36:29 +00:00
Vadim Kharitonov	313ebfdb88	[proxy] chore: allow bypassing empty `params` to `/sql` endpoint (#9827 ) ## Problem ``` curl -H "Neon-Connection-String: postgresql://neondb_owner:PASSWORD@ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/neondb?sslmode=require" https://ep-autumn-rain-a58lubg0.us-east-2.aws.neon.tech/sql -d '{"query":"SELECT 1","params":[]}' ``` For such a query, I also need to send `params`. Do I really need it? ## Summary of changes I've marked `params` as optional	2024-11-20 19:36:23 +00:00
Conrad Ludgate	f36f0068b8	chore(proxy): demote more logs during successful connection attempts (#9828 ) Follow up to #9803 See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment. I've also slightly refactored the sql-over-http body read/length reject code. I can split that into a separate PR. It just felt natural after I switched to `read_body_with_limit` as we discussed during the meet.	2024-11-20 17:50:39 +00:00
Folke Behrens	bf7d859a8b	proxy: Rename RequestMonitoring to RequestContext (#9805 ) ## Problem It is called context/ctx everywhere and the Monitoring suffix needlessly confuses with proper monitoring code. ## Summary of changes * Rename RequestMonitoring to RequestContext * Rename RequestMonitoringInner to RequestContextInner	2024-11-20 12:50:36 +00:00
Conrad Ludgate	3ae0b2149e	chore(proxy): demote a ton of logs for successful connection attempts (#9803 ) See https://github.com/neondatabase/cloud/issues/14378 In collaboration with @cloneable and @awarus, we sifted through logs and simply demoted some logs to debug. This is not at all finished and there are more logs to review, but we ran out of time in the session we organised. In any slightly more nuanced cases, we didn't touch the log, instead leaving a TODO comment.	2024-11-20 10:14:28 +00:00
Conrad Ludgate	191f745c81	fix(proxy/auth_broker): ignore -pooler suffix (#9800 ) Fixes https://github.com/neondatabase/cloud/issues/20400 We cannot mix local_proxy and pgbouncer, so we are filtering out the `-pooler` suffix prior to calling wake_compute.	2024-11-19 13:58:26 +00:00
Conrad Ludgate	37b97b3a68	chore(local_proxy): reduce some startup logging (#9798 ) Currently, local_proxy will write an error log if it doesn't find the config file. This is expected for startup, so it's just noise. It is an error if we do receive an explicit SIGHUP though. I've also demoted the build info logs to be debug level. We don't need them in the compute image since we have other ways to determine what code is running. Lastly, I've demoted SIGHUP signal handling from warn to info, since it's not really a warning event. See https://github.com/neondatabase/cloud/issues/10880 for more details	2024-11-19 13:58:11 +00:00
Heikki Linnakangas	79929bb1b6	Disable `rust_2024_compatibility` lint option (#9615 ) Compiling with nightly rust compiler, I'm getting a lot of errors like this: error: `if let` assigns a shorter lifetime since Edition 2024 --> proxy/src/auth/backend/jwt.rs:226:16 \| 226 \| if let Some(permit) = self.try_acquire_permit() { \| ^^^^^^^^^^^^^^^^^^^------------------------- \| \| \| this value has a significant drop implementation which may observe a major change in drop order and requires your discretion \| = warning: this changes meaning in Rust 2024 = note: for more information, see issue #124085 <https://github.com/rust-lang/rust/issues/124085> help: the value is now dropped here in Edition 2024 --> proxy/src/auth/backend/jwt.rs:241:13 \| 241 \| } else { \| ^ note: the lint level is defined here --> proxy/src/lib.rs:8:5 \| 8 \| rust_2024_compatibility \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: `#[deny(if_let_rescope)]` implied by `#[deny(rust_2024_compatibility)]` and this: error: these values and local bindings have significant drop implementation that will have a different drop order from that of Edition 2021 --> proxy/src/auth/backend/jwt.rs:376:18 \| 369 \| let client = Client::builder() \| ------ these values have significant drop implementation and will observe changes in drop order under Edition 2024 ... 376 \| map: DashMap::default(), \| ^^^^^^^^^^^^^^^^^^ \| = warning: this changes meaning in Rust 2024 = note: for more information, see issue #123739 <https://github.com/rust-lang/rust/issues/123739> = note: `#[deny(tail_expr_drop_order)]` implied by `#[deny(rust_2024_compatibility)]` They are caused by the `rust_2024_compatibility` lint option. When we actually switch to the 2024 edition, it makes sense to go through all these and check that the drop order changes don't break anything, but in the meanwhile, there's no easy way to avoid these errors. Disable it, to allow compiling with nightly again. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-11-08 08:35:03 +00:00
Conrad Ludgate	82e3f0ecba	[proxy/authorize]: improve JWKS reliability (#9676 ) While setting up some tests, I noticed that we didn't support keycloak. They make use of encryption JWKs as well as signature ones. Our current jwks crate does not support parsing encryption keys which caused the entire jwk set to fail to parse. Switching to lazy parsing fixes this. Also while setting up tests, I couldn't use localhost jwks server as we require HTTPS and we were using webpki so it was impossible to add a custom CA. Enabling native roots addresses this possibility. I saw some of our current e2e tests against our custom JWKS in s3 were taking a while to fetch. I've added a timeout + retries to address this.	2024-11-07 16:24:38 +00:00
Conrad Ludgate	73bdc9a2d0	[proxy]: minor changes to endpoint-cache handling (#9666 ) I think I meant to make these changes over 6 months ago. alas, better late than never. 1. should_reject doesn't eagerly intern the endpoint string 2. Rate limiter uses a std Mutex instead of a tokio Mutex. 3. Recently I introduced a `-local-proxy` endpoint suffix. I forgot to add this to normalize. 4. Random but a small cleanup making the ControlPlaneEvent deser directly to the interned strings.	2024-11-06 17:40:40 +00:00
Folke Behrens	bdd492b1d8	proxy: Replace "web(auth)" with "console redirect" everywhere (#9655 )	2024-11-06 11:03:38 +00:00
Folke Behrens	5d8284c7fe	proxy: Read cplane JWT with clap arg (#9654 )	2024-11-06 10:27:55 +00:00
Folke Behrens	ebc43efebc	proxy: Refactor cplane types (#9643 ) The overall idea of the PR is to rename a few types to make their purpose more clear, reduce abstraction where not needed, and move types to to more better suited modules.	2024-11-05 23:03:53 +01:00
Folke Behrens	754d2950a3	proxy: Revert ControlPlaneEvent back to struct (#9649 ) Due to neondatabase/cloud#19815 we need to be more tolerant when reading events.	2024-11-05 21:32:33 +00:00
Conrad Ludgate	fcde40d600	[proxy] use the proxy protocol v2 command to silence some logs (#9620 ) The PROXY Protocol V2 offers a "command" concept. It can be of two different values. "Local" and "Proxy". The spec suggests that "Local" be used for health-checks. We can thus use this to silence logging for such health checks such as those from NLB. This additionally refactors the flow to be a bit more type-safe, self documenting and using zerocopy deser.	2024-11-05 17:23:00 +00:00
Ivan Efremov	2f1a56c8f9	proxy: Unify local and remote conn pool client structures (#9604 ) Unify client, EndpointConnPool and DbUserConnPool for remote and local conn. - Use new ClientDataEnum for additional client data. - Add ClientInnerCommon client structure. - Remove Client and EndpointConnPool code from local_conn_pool.rs	2024-11-05 17:33:41 +02:00
Folke Behrens	1085fe57d3	proxy: Rewrite ControlPlaneEvent as enum (#9627 )	2024-11-04 20:19:26 +01:00
Folke Behrens	59879985b4	proxy: Wrap JWT errors in separate AuthError variant (#9625 ) * Also rename `AuthFailed` variant to `PasswordFailed`. * Before this all JWT errors end up in `AuthError::AuthFailed()`, expects a username and also causes cache invalidation.	2024-11-04 19:56:40 +01:00
Conrad Ludgate	81d1bb1941	quieten aws_config logs (#9626 ) logs during aws authentication are soooo noisy in staging 🙃	2024-11-04 17:28:10 +00:00
Conrad Ludgate	8ad1dbce72	[proxy]: parse proxy protocol TLVs with aws/azure support (#9610 ) AWS/azure private link shares extra information in the "TLV" values of the proxy protocol v2 header. This code doesn't action on it, but it parses it as appropriate.	2024-11-04 14:04:56 +00:00
Conrad Ludgate	3dcdbcc34d	remove aws-lc-rs dep and fix storage_broker tls (#9613 ) It seems the ecosystem is not so keen on moving to aws-lc-rs as it's build setup is more complicated than ring (requiring cmake). Eventually I expect the ecosystem should pivot to https://github.com/ctz/graviola/tree/main/rustls-graviola as it stabilises (it has a very simply build step and license), but for now let's try not have a headache of juggling two crypto libs. I also noticed that tonic will just fail with tls without a default provider, so I added some defensive code for that.	2024-11-04 13:29:13 +00:00
Conrad Ludgate	897cffb9d8	auth_broker: fix local_proxy conn count (#9593 ) our current metrics for http pool opened connections is always negative :D oops	2024-10-31 14:57:55 +00:00
Jakub Kołodziejczak	57499640c5	proxy: more granular http status codes for sql-over-http errors (#9549 ) closes #9532	2024-10-29 15:44:45 +00:00
Conrad Ludgate	47c35f67c3	[proxy]: fix JWT handling for AWS cognito. (#9536 ) In the base64 payload of an aws cognito jwt, I saw the following: ``` "iss":"https:\/\/cognito-idp.us-west-2.amazonaws.com\/us-west-2_redacted" ``` issuers are supposed to be URLs, and URLs are always valid un-escaped JSON. However, `\/` is a valid escape character so what AWS is doing is technically correct... sigh... This PR refactors the test suite and adds a new regression test for cognito.	2024-10-29 11:01:09 +00:00
Conrad Ludgate	25f1e5cfeb	[proxy] demote warnings and remove dead-argument (#9512 ) fixes https://github.com/neondatabase/cloud/issues/19000	2024-10-28 15:02:20 +00:00
Conrad Ludgate	dbadb0f9bb	proxy: propagate session IDs (#9509 ) fixes #9367 by sending session IDs to local_proxy, and also returns session IDs to the client for easier debugging.	2024-10-25 14:34:19 +00:00
Jakub Kołodziejczak	9768f09f6b	proxy: don't follow redirects for user provided JWKS urls + set custom user agent (#9514 ) partially fixes https://github.com/neondatabase/cloud/issues/19249 ref https://docs.rs/reqwest/latest/reqwest/redirect/index.html > By default, a Client will automatically handle HTTP redirects, having a maximum redirect chain of 10 hops. To customize this behavior, a redirect::Policy can be used with a ClientBuilder.	2024-10-25 14:04:41 +02:00
Folke Behrens	92d5e0e87a	proxy: clear lib.rs of code items (#9479 ) We keep lib.rs for crate configs, lint configs and re-exports for the binaries.	2024-10-23 08:21:28 +02:00
Ivan Efremov	2dcac94194	proxy: Use common error interface for error handling with cplane (#9454 ) - Remove obsolete error handles. - Use one source of truth for cplane errors. #18468	2024-10-21 17:20:09 +03:00
Conrad Ludgate	cc25ef7342	bump pg-session-jwt version (#9455 ) forgot to bump this before	2024-10-20 14:42:50 +02:00
Conrad Ludgate	5cbdec9c79	[local_proxy]: install pg_session_jwt extension on demand (#9370 ) Follow up on #9344. We want to install the extension automatically. We didn't want to couple the extension into compute_ctl so instead local_proxy is the one to issue requests specific to the extension. depends on #9344 and #9395	2024-10-18 14:41:21 +01:00
Conrad Ludgate	b8304f90d6	2024 oct new clippy lints (#9448 ) Fixes new lints from `cargo +nightly clippy` (`clippy 0.1.83 (798fb83f 2024-10-16)`)	2024-10-18 10:27:50 +01:00
Conrad Ludgate	d762ad0883	update rustls (#9396 ) The forever ongoing effort of juggling multiple versions of rustls :3 now with new crypto library aws-lc. Because of dependencies, it is currently impossible to not have both ring and aws-lc in the dep tree, therefore our only options are not updating rustls or having both crypto backends enabled... According to benchmarks run by the rustls maintainer, aws-lc is faster than ring in some cases too <https://jbp.io/graviola/>, so it's not without its upsides,	2024-10-17 20:45:37 +01:00
Ivan Efremov	22d8834474	proxy: move the connection pools to separate file (#9398 ) First PR for #9284 Start unification of the client and connection pool interfaces: - Exclude the 'global_connections_count' out from the get_conn_entry() - Move remote connection pools to the conn_pool_lib as a reference - Unify clients among all the conn pools	2024-10-17 13:38:24 +03:00

... 2 3 4 5 6 ...

689 Commits