rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-28 00:23:00 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	719ec378cd	fix(local_proxy): discard all in tx (#10864 ) ## Problem `discard all` cannot run in a transaction (even if implicit) ## Summary of changes Split up the query into two, we don't need transaction support.	2025-02-18 08:54:20 +00:00
Conrad Ludgate	3204efc860	chore(proxy): use specially named prepared statements for type-checking (#10843 ) I was looking into https://github.com/neondatabase/serverless/issues/144, I recall previous cases where proxy would trigger these prepared statements which would conflict with other statements prepared by our client downstream. Because of that, and also to aid in debugging, I've made sure all prepared statements that proxy needs to make have specific names that likely won't conflict and makes it clear in a error log if it's our statements that are causing issues	2025-02-17 16:19:57 +00:00
Folke Behrens	da7496e1ee	proxy: Post-refactor + future clippy lint cleanup (#10824 ) * Clean up deps and code after logging and binary refactor * Also include future clippy lint cleanup	2025-02-14 12:34:09 +00:00
Ivan Efremov	356cca23a5	fix(proxy): Change HSet to HDel for cancellation key metric (#10789 )	2025-02-13 10:22:13 +00:00
Heikki Linnakangas	635b67508b	Split utils::http to separate crate (#10753 ) Avoids compiling the crate and its dependencies into binaries that don't need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the release-line-debug-size-lto profile.	2025-02-11 22:06:53 +00:00
Folke Behrens	f62bc28086	proxy: Move binaries into the lib (#10758 ) * This way all clippy lints defined in the lib also cover the binary code. * It's much easier to detect unused code. * Fix all discovered lints.	2025-02-11 19:46:23 +00:00
Tristan Partin	da9c101939	Implement a second HTTP server within compute_ctl (#10574 ) The compute_ctl HTTP server has the following purposes: - Allow management via the control plane - Provide an endpoint for scaping metrics - Provide APIs for compute internal clients - Neon Postgres extension for installing remote extensions - local_proxy for installing extensions and adding grants The first two purposes require the HTTP server to be available outside the compute. The Neon threat model is a bad actor within our internal network. We need to reduce the surface area of attack. By exposing unnecessary unauthenticated HTTP endpoints to the internal network, we increase the surface area of attack. For endpoints described in the third bullet point, we can just run an extra HTTP server, which is only bound to the loopback interface since all consumers of those endpoints are within the compute.	2025-02-11 18:02:22 +00:00
Arpad Müller	a4ea1e53ae	Apply Azure SDK patch to periodically load workload identity file (#10415 ) The SDK bug https://github.com/Azure/azure-sdk-for-rust/issues/1739 was originally worked around via #10378, but now upstream has provided a fix in [this](https://github.com/Azure/azure-sdk-for-rust/pull/1997) PR, which we've been asked to test. So this is what this PR is doing: revert #10378 (to make sure we fail if the bug isn't fixed by the SDK PR), and apply the SDK PR to our fork. Currently pointing to my local branch to check CI. I'd like to merge the [SDK fork PR](https://github.com/neondatabase/azure-sdk-for-rust/pull/2) before merging this to main.	2025-02-11 09:40:22 +00:00
Ivan Efremov	73633e27ed	fix(proxy): Log errors from the local proxy in auth-broker (#10659 ) Handle errors from local proxy by parsing HTTP response in auth broker code Closes [#19476](https://github.com/neondatabase/cloud/issues/19476)	2025-02-10 16:06:13 +00:00
Ivan Efremov	e7118213ab	impr(proxy): Set TTL for Redis cancellation map keys (#10671 ) Use expire() op to set TTL for Redis cancellation key	2025-02-10 10:51:53 +00:00
Fedor Dikarev	8f651f9582	switch from localtest.me to local.neon.build (#10714 ) ## Problem Ref: https://github.com/neondatabase/neon/issues/10632 We use dns named `.localtest.me` in our test, and that domain is well-known and widely used for that, with all the records there resolve to the localhost, both IPv4 and IPv6: `127.0.0.1` and `::1` In some cases on our runners these addresses resolves only to `IPv6`, and so components fail to connect when runner doesn't have `IPv6` address. We suspect issue in systemd-resolved here (https://github.com/systemd/systemd/issues/17745) To workaround that and improve test stability, we introduced our own domain `.local.neon.build` with IPv4 address `127.0.0.1` only See full details and troubleshoot log in referred issue. p.s. If you're FritzBox user, don't forget to add that domain `local.neon.build` to the `DNS Rebind Protection` section under `Home Network -> Network -> Network Settings`, otherwise FritzBox will block addresses, resolving to the local addresses. For other devices/vendors, please check corresponding documentation, if resolving `local.neon.build` will produce empty answer for you. ## Summary of changes Replace all the occurrences of `localtest.me` with `local.neon.build`	2025-02-07 12:25:16 +00:00
Folke Behrens	dcf335a251	proxy: Switch proxy to JSON logging (#9857 ) ## Problem We want to switch proxy and ideally all Rust services to structured JSON logging to support better filtering and cross-referencing with tracing. ## Summary of changes * Introduce a custom tracing-subscriber to write the JSON. In a first attempt a customized tracing::fmt::FmtSubscriber was used, but it's very inefficient and can still generate invalid JSON. It's also doesn't allow us to add important fields to the root object. * Make this opt in: the `LOGFMT` env var can be set to `"json"` to enable to new logger at startup.	2025-02-04 14:50:53 +00:00
Folke Behrens	628a9616c4	fix(proxy): Don't use --is-private-access-proxy to disable IP check (#10633 ) ## Problem * The behavior of this flag changed. Plus, it's not necessary to disable the IP check as long as there are no IPs listed in the local postgres. ## Summary of changes * Drop the flag from the command in the README.md section. * Change the postgres URL passed to proxy to not use the endpoint hostname. * Also swap postgres creation and proxy startup, so the DB is running when proxy comes up.	2025-02-03 14:12:41 +00:00
Stefan Radig	6dd48ba148	feat(proxy): Implement access control with VPC endpoint checks and block for public internet / VPC (#10143 ) - Wired up filtering on VPC endpoints - Wired up block access from public internet / VPC depending on per project flag - Added cache invalidation for VPC endpoints (partially based on PR from Raphael) - Removed BackendIpAllowlist trait --------- Co-authored-by: Ivan Efremov <ivan@neon.tech>	2025-01-31 20:32:57 +00:00
Conrad Ludgate	ad1a41157a	feat(proxy): optimizing the chances of large write in copy_bidirectional (#10608 ) We forked copy_bidirectional to solve some issues like fast-shutdown (disallowing half-open connections) and to introduce better error tracking (which side of the conn closed down). A change recently made its way upstream offering performance improvements: https://github.com/tokio-rs/tokio/pull/6532. These seem applicable to our fork, thus it makes sense to apply them here as well.	2025-01-31 19:14:27 +00:00
Conrad Ludgate	738bf83583	chore: replace dashmap with clashmap (#10582 ) ## Problem Because dashmap 6 switched to hashbrown RawTable API, it required us to use unsafe code in the upgrade: https://github.com/neondatabase/neon/pull/8107 ## Summary of changes Switch to clashmap, a fork maintained by me which removes much of the unsafe and ultimately switches to HashTable instead of RawTable to remove much of the unsafe requirement on us.	2025-01-31 09:53:43 +00:00
Ivan Efremov	222cc181e9	impr(proxy): Move the CancelMap to Redis hashes (#10364 ) ## Problem The approach of having CancelMap as an in-memory structure increases code complexity, as well as putting additional load for Redis streams. ## Summary of changes - Implement a set of KV ops for Redis client; - Remove cancel notifications code; - Send KV ops over the bounded channel to the handling background task for removing and adding the cancel keys. Closes #9660	2025-01-29 11:19:10 +00:00
Folke Behrens	ae4b2af299	fix(proxy): Use correct identifier for usage metrics upload (#10538 ) ## Problem The request data and usage metrics S3 requests use the same identifier shown in logs, causing confusion about what type of upload failed. ## Summary of changes Use the correct identifier for usage metrics uploads. neondatabase/cloud#23084	2025-01-28 17:08:17 +00:00
Conrad Ludgate	df4abd8b14	fix: force-refresh azure identity token (#10378 ) ## Problem Because of https://github.com/Azure/azure-sdk-for-rust/issues/1739, our identity token file was not being refreshed. This caused our uploads to start failing when the storage token expired. ## Summary of changes Drop and recreate the remote storage config every time we upload in order to force reload the identity token file.	2025-01-14 12:53:32 +00:00
Conrad Ludgate	a338aee132	feat(local_proxy): use ed25519 signatures with pg_session_jwt (#10290 ) Generally ed25519 seems to be much preferred for cryptographic strength to P256 nowadays, and it is NIST approved finally. We should use it where we can as it's also faster than p256. This PR makes the re-signed JWTs between local_proxy and pg_session_jwt use ed25519. This does introduce a new dependency on ed25519, but I do recall some Neon Authorise customers asking for support for ed25519, so I am justifying this dependency addition in the context that we can then introduce support for customer ed25519 keys sources: * https://csrc.nist.gov/pubs/fips/186-5/final subsection 7 (EdDSA) * https://datatracker.ietf.org/doc/html/rfc8037#section-3.1	2025-01-13 15:20:46 +00:00
Conrad Ludgate	735c66dc65	fix(proxy): propagate the existing ComputeUserInfo to connect for cancellation (#10322 ) ## Problem We were incorrectly constructing the ComputeUserInfo, used for cancellation checks, based on the return parameters from postgres. This didn't contain the correct info. ## Summary of changes Propagate down the existing ComputeUserInfo.	2025-01-10 09:36:51 +00:00
Folke Behrens	77660f3d88	proxy: Fix parsing of UnknownTopic with payload (#10339 ) ## Problem When the proxy receives a `Notification` with an unknown topic it's supposed to use the `UnknownTopic` unit variant. Unfortunately, in adjacently tagged enums serde will not simply ignore the configured content if found and try to deserialize a map/object instead. ## Summary of changes * Use a custom deserialize function to ignore variant content. * Add a little unit test covering both cases.	2025-01-10 09:12:31 +00:00
Ivan Efremov	fcfff72454	impr(proxy): Decouple ip_allowlist from the CancelClosure (#10199 ) This PR removes the direct dependency of the IP allowlist from CancelClosure, allowing for more scalable and flexible IP restrictions and enabling the future use of Redis-based CancelMap storage. Changes: - Introduce a new BackendAuth async trait that retrieves the IP allowlist through existing authentication methods; - Improve cancellation error handling by instrument() async cancel_sesion() rather than dropping it. - Set and store IP allowlist for SCRAM Proxy to consistently perform IP allowance check Relates to #9660	2025-01-08 19:34:53 +00:00
Folke Behrens	0a117fb1f1	proxy: Parse Notification twice only for unknown topic (#10296 ) ## Problem We currently parse Notification twice even in the happy path. ## Summary of changes Use `#[serde(other)]` to catch unknown topics and defer the second parsing.	2025-01-07 15:24:54 +00:00
Conrad Ludgate	fda52a0005	feat(proxy): dont trigger error alerts for unknown topics (#10266 ) ## Problem Before the holidays, and just before our code freeze, a change to cplane was made that started publishing the topics from #10197. This triggered our alerts and put us in a sticky situation as it was not an error, and we didn't want to silence the alert for the entire holidays, and we didn't want to release proxy 2 days in a row if it was not essential. We fixed it eventually by rewriting the alert based on logs, but this is not a good solution. ## Summary of changes Introduces an intermediate parsing step to check the topic name first, to allow us to ignore parsing errors for any topics we do not know about.	2025-01-06 13:05:35 +00:00
Ivan Efremov	7a598b9842	[proxy/docs]imprv: Add local testing section to proxy README (#10230 ) Add commands to run proxy locally with the mocked control plane	2025-01-03 10:04:58 +00:00
Raphael 'kena' Poss	1622fd8bda	proxy: recognize but ignore the 3 new redis message types (#10197 ) ## Problem https://neondb.slack.com/archives/C085MBDUSS2/p1734604792755369 ## Summary of changes Recognize and ignore the 3 new broadcast messages: - `/block_public_or_vpc_access_updated` - `/allowed_vpc_endpoints_updated_for_org` - `/allowed_vpc_endpoints_updated_for_projects`	2025-01-02 16:02:48 +00:00
Folke Behrens	ee22d4c9ef	proxy: Set TCP_NODELAY for compute connections (#10240 ) neondatabase/cloud#19184	2025-01-02 13:32:24 +00:00
Conrad Ludgate	38c7a2abfc	chore(proxy): pre-load native tls certificates and propagate compute client config (#10182 ) Now that we construct the TLS client config for cancellation as well as connect, it feels appropriate to construct the same config once and re-use it elsewhere. It might also help should #7500 require any extra setup, so we can easily add it to all the appropriate call sites.	2025-01-02 09:36:13 +00:00
Folke Behrens	b135194090	proxy: Delay SASL complete message until auth is done (#10189 ) The final SASL complete message can be bundled with the remainder of the auth flow messages until ReadyForQuery. neondatabase/cloud#19184	2024-12-19 10:37:08 +00:00
Conrad Ludgate	d63602cc78	chore(proxy): fully remove allow-self-signed-compute flag (#10168 ) When https://github.com/neondatabase/cloud/pull/21856 is merged, this flag is no longer necessary.	2024-12-18 16:03:14 +00:00
Ivan Efremov	93e958341f	[proxy]: Use TLS for cancellation queries (#10152 ) ## Problem pg_sni_router assumes that all the streams are upgradable to TLS. Cancellation requests were declined because of using NoTls config. ## Summary of changes Provide TLS client config for cancellation requests. Fixes [#21789](https://github.com/orgs/neondatabase/projects/65/views/1?pane=issue&itemId=90911361&issue=neondatabase%7Ccloud%7C21789)	2024-12-17 19:26:54 +00:00
Folke Behrens	aa7ab9b3ac	proxy: Allow dumping TLS session keys for debugging (#10163 ) ## Problem To debug issues with TLS connections there's no easy way to decrypt packets unless a client has special support for logging the keys. ## Summary of changes Add TLS session keys logging to proxy via `SSLKEYLOGFILE` env var gated by flag.	2024-12-16 18:56:24 +00:00
Conrad Ludgate	59b7ff8988	chore(proxy): disallow unwrap and unimplemented (#10142 ) As the title says, I updated the lint rules to no longer allow unwrap or unimplemented. Three special cases: * Tests are allowed to use them * std::sync::Mutex lock().unwrap() is common because it's usually correct to continue panicking on poison * `tokio::spawn_blocking(...).await.unwrap()` is common because it will only error if the blocking fn panics, so continuing the panic is also correct I've introduced two extension traits to help with these last two, that are a bit more explicit so they don't need an expect message every time.	2024-12-16 16:37:15 +00:00
Conrad Ludgate	2e4c9c5704	chore(proxy): remove allow_self_signed from regular proxy (#10157 ) I noticed that the only place we use this flag is for testing console redirect proxy. Makes sense to me to make this assumption more explicit.	2024-12-16 16:11:39 +00:00
Conrad Ludgate	24d6587914	chore(proxy): refactor self-signed config (#10154 ) ## Problem While reviewing #10152 I found it tricky to actually determine whether the connection used `allow_self_signed_compute` or not. I've tried to remove this setting in the past: * https://github.com/neondatabase/neon/pull/7884 * https://github.com/neondatabase/neon/pull/7437 * https://github.com/neondatabase/cloud/pull/13702 But each time it seems it is used by e2e tests ## Summary of changes The `node_info.allow_self_signed_computes` is always initialised to false, and then sometimes inherits the proxy config value. There's no need this needs to be in the node_info, so removing it and propagating it via `TcpMechansim` is simpler.	2024-12-16 11:15:25 +00:00
Conrad Ludgate	e502e880b5	chore(proxy): remove code for old API (#10109 ) ## Problem Now that https://github.com/neondatabase/cloud/issues/15245 is done, we can remove the old code. ## Summary of changes Removes support for the ManagementV2 API, in favour of the ProxyV1 API.	2024-12-12 13:42:50 +00:00
Ivan Efremov	34c1295594	[proxy] impr: Additional logging for cancellation queries (#10039 ) ## Problem Since cancellation tasks spawned in the background sometimes logs missing context. https://neondb.slack.com/archives/C060N3SEF9D/p1733427801527419?thread_ts=1733419882.560159&cid=C060N3SEF9D ## Summary of changes Add `session_id` and change loglevel for cancellation queries	2024-12-10 10:14:28 +00:00
Ivan Efremov	ffc9c33eb2	proxy: Present new auth backend cplane_proxy_v1 (#10012 ) Implement a new auth backend based on the current Neon backend to switch to the new Proxy V1 cplane API. Implements [#21048](https://github.com/neondatabase/cloud/issues/21048)	2024-12-05 05:30:38 +00:00
Conrad Ludgate	bd52822e14	feat(proxy): add option to forward startup params (#9979 ) (stacked on #9990 and #9995) Partially fixes #1287 with a custom option field to enable the fixed behaviour. This allows us to gradually roll out the fix without silently changing the observed behaviour for our customers. related to https://github.com/neondatabase/cloud/issues/15284	2024-12-04 12:58:35 +00:00
Conrad Ludgate	9ef0662a42	chore(proxy): enforce single host+port (#9995 ) proxy doesn't ever provide multiple hosts/ports, so this code adds a lot of complexity of error handling for no good reason. (stacked on #9990)	2024-12-03 20:00:14 +00:00
Alexey Immoreev	3baef0bca3	Improvement: add console redirect timeout warning (#9985 ) ## Problem There is no information on session being cancelled in 2 minutes at the moment ## Summary of changes The timeout being logged for the user	2024-12-03 18:59:44 +00:00
Conrad Ludgate	27a42d0f96	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-03 18:39:23 +00:00
Conrad Ludgate	2dc238e5b3	feat(proxy): emit JWT auth method and JWT issuer in parquet logs (#9971 ) Fix the HTTP AuthMethod to accomodate the JWT authorization method. Introduces the JWT issuer as an additional field in the parquet logs	2024-12-02 17:54:32 +00:00
Folke Behrens	1b60571636	proxy: Create Elasticache credentials provider lazily (#9967 ) ## Problem The credentials providers tries to connect to AWS STS even when we use plain Redis connections. ## Summary of changes * Construct the CredentialsProvider only when needed ("irsa").	2024-12-02 15:38:12 +00:00
Conrad Ludgate	cd1d2d1996	fix(proxy): forward notifications from authentication (#9948 ) Fixes https://github.com/neondatabase/cloud/issues/20973. This refactors `connect_raw` in order to return direct access to the delayed notices. I cannot find a way to test this with psycopg2 unfortunately, although testing it with psql does return the expected results.	2024-12-02 12:29:57 +00:00
Folke Behrens	4abc8e5282	Merge the consumption metric pushes (#9939 ) #8564 ## Problem The main and backup consumption metric pushes are completely independent, resulting in different event time windows and different idempotency keys. ## Summary of changes * Merge the push tasks, but keep chunks the same size.	2024-11-30 10:11:37 +00:00
John Spray	d5624cc505	pageserver: download small objects using a smaller timeout (#9938 ) ## Problem It appears that the Azure storage API tends to hang TCP connections more than S3 does. Currently we use a 2 minute timeout for all downloads. This is large because sometimes the objects we download are large. However, waiting 2 minutes when doing something like downloading a manifest on tenant attach is problematic, because when someone is doing a "create tenant, create timeline" workflow, that 2 minutes is long enough for them reasonably to give up creating that timeline. Rather than propagate oversized timeouts further up the stack, we should use a different timeout for objects that we expect to be small. Closes: https://github.com/neondatabase/neon/issues/9836 ## Summary of changes - Add a `small_timeout` configuration attribute to remote storage, defaulting to 30 seconds (still a very generous period to do something like download an index) - Add a DownloadKind parameter to DownloadOpts, so that callers can indicate whether they expect the object to be small or large. - In the azure client, use small timeout for HEAD requests, and for GET requests if DownloadKind::Small is used. - Use DownloadKind::Small for manifests, indices, and heatmap downloads. This PR intentionally does not make the equivalent change to the S3 client, to reduce blast radius in case this has unexpected consequences (we could accomplish the same thing by editing lots of configs, but just skipping the code is simpler for right now)	2024-11-29 15:11:44 +00:00
Conrad Ludgate	1d642d6a57	chore(proxy): vendor a subset of rust-postgres (#9930 ) Our rust-postgres fork is getting messy. Mostly because proxy wants more control over the raw protocol than tokio-postgres provides. As such, it's diverging more and more. Storage and compute also make use of rust-postgres, but in more normal usage, thus they don't need our crazy changes. Idea: * proxy maintains their subset * other teams use a minimal patch set against upstream rust-postgres Reviewing this code will be difficult. To implement it, I 1. Copied tokio-postgres, postgres-protocol and postgres-types from `00940fcdb5` 2. Updated their package names with the `2` suffix to make them compile in the workspace. 3. Updated proxy to use those packages 4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches applied https://github.com/jbg/tokio-postgres-rustls/pull/32 https://github.com/jbg/tokio-postgres-rustls/pull/33) 5. Removed as much dead code as I could find in the vendored libraries 6. Updated the tokio-postgres-rustls code to use our existing channel binding implementation	2024-11-29 11:08:01 +00:00
Ivan Efremov	8173dc600a	proxy: spawn cancellation checks in the background (#9918 ) ## Problem For cancellation, a connection is open during all the cancel checks. ## Summary of changes Spawn cancellation checks in the background, and close connection immediately. Use task_tracker for cancellation checks.	2024-11-28 06:32:22 +00:00

1 2 3 4 5 ...

580 Commits