rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-10 23:12:54 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	d6beb3ffbb	[proxy] rewrite pg-text to json routines (#12413 ) We would like to move towards an arena system for JSON encoding the responses. This change pushes an "out" parameter into the pg-test to json routines to make swapping in an arena system easier in the future. (see #11992) This additionally removes the redundant `column: &[Type]` argument, as well as rewriting the pg_array parser. --- I rewrote the pg_array parser since while making these changes I found it hard to reason about. I went back to the specification and rewrote it from scratch. There's 4 separate routines: 1. pg_array_parse - checks for any prelude (multidimensional array ranges) 2. pg_array_parse_inner - only deals with the arrays themselves 3. pg_array_parse_item - parses a single item from the array, this might be quoted, unquoted, or another nested array. 4. pg_array_parse_quoted - parses a quoted string, following the relevant string escaping rules.	2025-07-02 12:46:11 +00:00
Conrad Ludgate	4932963bac	[proxy]: dont log user errors from postgres (#12412 ) ## Problem #8843 User initiated sql queries are being classified as "postgres" errors, whereas they're really user errors. ## Summary of changes Classify user-initiated postgres errors as user errors if they are related to a sql query that we ran on their behalf. Do not log those errors.	2025-07-01 13:03:34 +00:00
Folke Behrens	0ee15002fc	proxy: Move client connection accept and handshake to pglb (#12380 ) * This must be a no-op. * Move proxy::task_main to pglb::task_main. * Move client accept, TLS and handshake to pglb. * Keep auth and wake in proxy.	2025-06-27 15:20:23 +00:00
Conrad Ludgate	fd1e8ec257	[proxy] review and cleanup CLI args (#12167 ) I was looking at how we could expose our proxy config as toml again, and as I was writing out the schema format, I noticed some cruft in our CLI args that no longer seem to be in use. The redis change is the most complex, but I am pretty sure it's sound. Since https://github.com/neondatabase/cloud/pull/15613 cplane longer publishes to the global redis instance.	2025-06-26 11:25:41 +00:00
Conrad Ludgate	a298d2c29b	[proxy] replace the batch cancellation queue, shorten the TTL for cancel keys (#11943 ) See #11942 Idea: * if connections are short lived, they can get enqueued and then also remove themselves later if they never made it to redis. This reduces the load on the queue. * short lived connections (<10m, most?) will only issue 1 command, we remove the delete command and rely on ttl. * we can enqueue as many commands as we want, as we can always cancel the enqueue, thanks to the ~~intrusive linked lists~~ `BTreeMap`.	2025-06-20 11:48:01 +00:00
Folke Behrens	1dce65308d	Update base64 to 0.22 (#12215 ) ## Problem Base64 0.13 is outdated. ## Summary of changes Update base64 to 0.22. Affects mostly proxy and proxy libs. Also upgrade serde_with to remove another dep on base64 0.13 from dep tree.	2025-06-12 16:12:47 +00:00
Conrad Ludgate	67b94c5992	[proxy] per endpoint configuration for rate limits (#12148 ) https://github.com/neondatabase/cloud/issues/28333 Adds a new `rate_limit` response type to EndpointAccessControl, uses it for rate limiting, and adds a generic invalidation for the cache.	2025-06-10 14:26:08 +00:00
Folke Behrens	e38193c530	proxy: Move connect_to_compute back to proxy (#12181 ) It's mostly responsible for waking, retrying, and caching. A new, thin wrapper around compute_once will be PGLB's entry point	2025-06-10 11:23:03 +00:00
Conrad Ludgate	58327ef74d	[proxy] fix sql-over-http password setting (#12177 ) ## Problem Looks like our sql-over-http tests get to rely on "trust" authentication, so the path that made sure the authkeys data was set was never being hit. ## Summary of changes Slight refactor to WakeComputeBackends, as well as making sure auth keys are propagated. Fix tests to ensure passwords are tested.	2025-06-10 08:46:29 +00:00
Conrad Ludgate	4d99b6ff4d	[proxy] separate compute connect from compute authentication (#12145 ) ## Problem PGLB/Neonkeeper needs to separate the concerns of connecting to compute, and authenticating to compute. Additionally, the code within `connect_to_compute` is rather messy, spending effort on recovering the authentication info after wake_compute. ## Summary of changes Split `ConnCfg` into `ConnectInfo` and `AuthInfo`. `wake_compute` only returns `ConnectInfo` and `AuthInfo` is determined separately from the `handshake`/`authenticate` process. Additionally, `ConnectInfo::connect_raw` is in-charge or establishing the TLS connection, and the `postgres_client::Config::connect_raw` is configured to use `NoTls` which will force it to skip the TLS negotiation. This should just work.	2025-06-06 10:29:55 +00:00
Folke Behrens	1577665c20	proxy: Move PGLB-related modules into pglb root module. (#12144 ) Split the modules responsible for passing data and connecting to compute from auth and waking for PGLB. This PR just moves files. The waking is going to get removed from pglb after this.	2025-06-05 11:00:23 +00:00
Conrad Ludgate	c8a96cf722	update proxy protocol parsing to not a rw wrapper (#12035 ) ## Problem I believe in all environments we now specify either required/rejected for proxy-protocol V2 as required. We no longer rely on the supported flow. This means we no longer need to keep around read bytes incase they're not in a header. While I designed ChainRW to be fast (the hot path with an empty buffer is very easy to branch predict), it's still unnecessary. ## Summary of changes * Remove the ChainRW wrapper * Refactor how we read the proxy-protocol header using read_exact. Slightly worse perf but it's hardly significant. * Don't try and parse the header if it's rejected.	2025-06-05 07:12:00 +00:00
Conrad Ludgate	589bfdfd02	proxy: Changes to rate limits and GetEndpointAccessControl caches. (#12048 ) Precursor to https://github.com/neondatabase/cloud/issues/28333. We want per-endpoint configuration for rate limits, which will be distributed via the `GetEndpointAccessControl` API. This lays some of the ground work. 1. Allow the endpoint rate limiter to accept a custom leaky bucket config on check. 2. Remove the unused auth rate limiter, as I don't want to think about how it fits into this. 3. Refactor the caching of `GetEndpointAccessControl`, as it adds friction for adding new cached data to the API. That third one was rather large. I couldn't find any way to split it up. The core idea is that there's now only 2 cache APIs. `get_endpoint_access_controls` and `get_role_access_controls`. I'm pretty sure the behaviour is unchanged, except I did a drive by change to fix #8989 because it felt harmless. The change in question is that when a password validation fails, we eagerly expire the role cache if the role was cached for 5 minutes. This is to allow for edge cases where a user tries to connect with a reset password, but the cache never expires the entry due to some redis related quirk (lag, or misconfiguration, or cplane error)	2025-06-02 08:38:35 +00:00
Conrad Ludgate	87179e26b3	completely rewrite pq_proto (#12085 ) libs/pqproto is designed for safekeeper/pageserver with maximum throughput. proxy only needs it for handshakes/authentication where throughput is not a concern but memory efficiency is. For this reason, we switch to using read_exact and only allocating as much memory as we need to. All reads return a `&'a [u8]` instead of a `Bytes` because accidental sharing of bytes can cause fragmentation. Returning the reference enforces all callers only hold onto the bytes they absolutely need. For example, before this change, `pqproto` was allocating 8KiB for the initial read `BytesMut`, and proxy was holding the `Bytes` in the `StartupMessageParams` for the entire connection through to passthrough.	2025-06-01 18:41:45 +00:00
Conrad Ludgate	6768a71c86	proxy(tokio-postgres): refactor typeinfo query to occur earlier (#11993 ) ## Problem For #11992 I realised we need to get the type info before executing the query. This is important to know how to decode rows with custom types, eg the following query: ```sql CREATE TYPE foo AS ENUM ('foo','bar','baz'); SELECT ARRAY['foo'::foo, 'bar'::foo, 'baz'::foo] AS data; ``` Getting that to work was harder that it seems. The original tokio-postgres setup has a split between `Client` and `Connection`, where messages are passed between. Because multiple clients were supported, each client message included a dedicated response channel. Each request would be terminated by the `ReadyForQuery` message. The flow I opted to use for parsing types early would not trigger a `ReadyForQuery`. The flow is as follows: ``` PARSE "" // parse the user provided query DESCRIBE "" // describe the query, returning param/result type oids FLUSH // force postgres to flush the responses early // wait for descriptions // check if we know the types, if we don't then // setup the typeinfo query and execute it against each OID: PARSE typeinfo // prepare our typeinfo query DESCRIBE typeinfo FLUSH // force postgres to flush the responses early // wait for typeinfo statement // for each OID we don't know: BIND typeinfo EXECUTE FLUSH // wait for type info, might reveal more OIDs to inspect // close the typeinfo query, we cache the OID->type map and this is kinder to pgbouncer. CLOSE typeinfo // finally once we know all the OIDs: BIND "" // bind the user provided query - already parsed - to the user provided params EXECUTE // run the user provided query SYNC // commit the transaction ``` ## Summary of changes Please review commit by commit. The main challenge was allowing one query to issue multiple sub-queries. To do this I first made sure that the client could fully own the connection, which required removing any shared client state. I then had to replace the way responses are sent to the client, by using only a single permanent channel. This required some additional effort to track which query is being processed. Lastly I had to modify the query/typeinfo functions to not issue `sync` commands, so it would fit into the desired flow above. To note: the flow above does force an extra roundtrip into each query. I don't know yet if this has a measurable latency overhead.	2025-05-23 19:41:12 +00:00
Conrad Ludgate	bef5954fd7	feat(proxy): track SNI usage by protocol, including for http (#11863 ) ## Problem We want to see how many users of the legacy serverless driver are still using the old URL for SQL-over-HTTP traffic. ## Summary of changes Adds a protocol field to the connections_by_sni metric. Ensures it's incremented for sql-over-http.	2025-05-08 16:46:57 +00:00
Jakub Kołodziejczak	79ee78ea32	feat(compute): enable audit logs for pg_session_jwt extension (#11829 ) related to https://github.com/neondatabase/cloud/issues/28480 related to https://github.com/neondatabase/pg_session_jwt/pull/36 cc @MihaiBojin @conradludgate @lneves12	2025-05-06 15:18:50 +00:00
Conrad Ludgate	6131d86ec9	proxy: allow invalid SNI (#11792 ) ## Problem Some PrivateLink customers are unable to use Private DNS. As such they use an invalid domain name to address Neon. We currently are rejecting those connections because we cannot resolve the correct certificate. ## Summary of changes 1. Ensure a certificate is always returned. 2. If there is an SNI field, use endpoint fallback if it doesn't match. I suggest reviewing each commit separately.	2025-05-05 11:18:55 +00:00
Folke Behrens	ec9079f483	Allow unwrap() in tests when clippy::unwrap_used is denied (#11616 ) ## Problem The proxy denies using `unwrap()`s in regular code, but we want to use it in test code and so have to allow it for each test block. ## Summary of changes Set `allow-unwrap-in-tests = true` in clippy.toml and remove all exceptions.	2025-04-16 20:05:21 +00:00
Ivan Efremov	b9b25e13a0	feat(proxy): Return prefixed errors to testodrome (#11561 ) Testodrome measures uptime based on the failed requests and errors. In case of testodrome request we send back error based on the service. This will help us distinguish error types in testodrome and rely on the uptime SLI.	2025-04-16 19:03:23 +00:00
Conrad Ludgate	fc233794f6	fix(proxy): make sure that sql-over-http is TLS aware (#11612 ) I noticed that while auth-broker -> local-proxy is TLS aware, and TCP proxy -> postgres is TLS aware, HTTP proxy -> postgres is not 😅	2025-04-16 18:37:17 +00:00
Conrad Ludgate	72832b3214	chore: fix clippy lints from nightly-2025-03-16 (#11273 ) I like to run nightly clippy every so often to make our future rust upgrades easier. Some notable changes: * Prefer `next_back()` over `last()`. Generic iterators will implement `last()` to run forward through the iterator until the end. * Prefer `io::Error::other()`. * Use implicit returns One case where I haven't dealt with the issues is the now [more-sensitive "large enum variant" lint](https://github.com/rust-lang/rust-clippy/pull/13833). I chose not to take any decisions around it here, and simply marked them as allow for now.	2025-04-09 15:04:42 +00:00
Luís Tavares	43a7423f72	feat: bump pg_session_jwt extension to 0.3.0 (#11399 ) ## Problem Bumps https://github.com/neondatabase/pg_session_jwt to the latest release [v0.3.0](https://github.com/neondatabase/pg_session_jwt/releases/tag/v0.3.0) that introduces PostgREST fallback mechanisms. ## Summary of changes Updates the extension download tar and the extension version in the proxy constant. ## Subscribers @mrl5	2025-04-03 13:01:18 +00:00
Folke Behrens	4bb7087d4d	proxy: Fix some clippy warnings coming in next versions (#11359 )	2025-03-26 10:50:16 +00:00
Ivan Efremov	86fe26c676	fix(proxy): Fix testodrome HTTP header handling in proxy (#11292 ) Relates to #22486	2025-03-18 15:14:08 +00:00
Conrad Ludgate	7fe5a689b4	feat(proxy): export ingress metrics (#11244 ) ## Problem We exposed the direction tag in #10925 but didn't actually include the ingress tag in the export to allow for an adaption period. ## Summary of changes We now export the ingress direction	2025-03-14 13:54:57 +00:00
Conrad Ludgate	3dec117572	feat(compute_ctl): use TLS if configured (#10972 ) Closes: https://github.com/neondatabase/cloud/issues/22998 If control-plane reports that TLS should be used, load the certificates (and watch for updates), make sure postgres use them, and detects updates. Procedure: 1. Load certificates 2. Reconfigure postgres/pgbouncer 3. Loop on a timer until certificates have loaded 4. Go to 1 Notes: 1. We only run this procedure if requested on startup by control plane. 2. We needed to compile pgbouncer with openssl enabled 3. Postgres doesn't allow tls keys to be globally accessible - must be read only to the postgres user. I couldn't convince the autoscaling team to let me put this logic into the VM settings, so instead compute_ctl will copy the keys to be read-only by postgres. 4. To mitigate a race condition, we also verify that the key matches the cert.	2025-03-13 15:03:22 +00:00
Conrad Ludgate	7aec1364dd	chore(proxy): remove enum and composite type queries (#11178 ) In our json encoding, we only need to know about array types. Information about composites or enums are not actually used. Enums are quite popular, needing to type query them when not needed can add some latency cost for no gain.	2025-03-12 15:47:17 +00:00
Ivan Efremov	011f7c21a3	fix(proxy): Add testodrome query id HTTP header (#11167 ) Handle "X-Neon-Query-ID" header to glue data with testodrome queries. Relates to the #22486	2025-03-11 17:17:30 +00:00
Conrad Ludgate	d1b60fa0b6	fix(proxy): delete prepared statements when discarding (#11165 ) Fixes https://github.com/neondatabase/serverless/issues/144 When tables have enums, we need to perform type queries for that data. We cache these query statements for performance reasons. In Neon RLS, we run "discard all" for security reasons, which discards all the statements. When we need to type check again, the statements are no longer valid. This fixes it to discard the statements as well. I've also added some new logs and error types to monitor this. Currently we don't see the prepared statement errors in our logs.	2025-03-11 10:48:50 +00:00
Conrad Ludgate	d9ced89ec0	feat(proxy): require TLS to compute if prompted by cplane (#10717 ) https://github.com/neondatabase/cloud/issues/23008 For TLS between proxy and compute, we are using an internally provisioned CA to sign the compute certificates. This change ensures that proxy will load them from a supplied env var pointing to the correct file - this file and env var will be configured later, using a kubernetes secret. Control plane responds with a `server_name` field if and only if the compute uses TLS. This server name is the name we use to validate the certificate. Control plane still sends us the IP to connect to as well (to support overlay IP). To support this change, I'd had to split `host` and `host_addr` into separate fields. Using `host_addr` and bypassing `lookup_addr` if possible (which is what happens in production). `host` then is only used for the TLS connection. There's no blocker to merging this. The code paths will not be triggered until the new control plane is deployed and the `enableTLS` compute flag is enabled on a project.	2025-02-28 14:20:25 +00:00
Folke Behrens	0d36f52a6c	proxy: Record and export user-agent header (#10955 ) neondatabase/cloud#24464	2025-02-26 11:39:34 +00:00
Arpad Müller	fdde58120c	Upgrade proxy crates to edition 2024 (#10942 ) This upgrades the `proxy/` crate as well as the forked libraries in `libs/proxy/` to edition 2024. Also reformats the imports of those forked libraries via: ``` cargo +nightly fmt -p proxy -p postgres-protocol2 -p postgres-types2 -p tokio-postgres2 -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true ``` It can be read commit-by-commit: the first commit has no formatting changes, only changes to accomodate the new edition. Part of #10918	2025-02-24 15:26:28 +00:00
Conrad Ludgate	fb77f28326	feat(proxy): add direction and private link id to billing export (#10925 ) ref: https://github.com/neondatabase/cloud/issues/23385 Adds a direction flag as well as private-link ID to the traffic reporting pipeline. We do not yet actually count ingress, but we include the flag anyway. I have additionally moved vpce_id string parsing earlier, since we expect it to be utf8 (ascii).	2025-02-24 11:49:11 +00:00
Conrad Ludgate	719ec378cd	fix(local_proxy): discard all in tx (#10864 ) ## Problem `discard all` cannot run in a transaction (even if implicit) ## Summary of changes Split up the query into two, we don't need transaction support.	2025-02-18 08:54:20 +00:00
Conrad Ludgate	3204efc860	chore(proxy): use specially named prepared statements for type-checking (#10843 ) I was looking into https://github.com/neondatabase/serverless/issues/144, I recall previous cases where proxy would trigger these prepared statements which would conflict with other statements prepared by our client downstream. Because of that, and also to aid in debugging, I've made sure all prepared statements that proxy needs to make have specific names that likely won't conflict and makes it clear in a error log if it's our statements that are causing issues	2025-02-17 16:19:57 +00:00
Folke Behrens	da7496e1ee	proxy: Post-refactor + future clippy lint cleanup (#10824 ) * Clean up deps and code after logging and binary refactor * Also include future clippy lint cleanup	2025-02-14 12:34:09 +00:00
Heikki Linnakangas	635b67508b	Split utils::http to separate crate (#10753 ) Avoids compiling the crate and its dependencies into binaries that don't need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the release-line-debug-size-lto profile.	2025-02-11 22:06:53 +00:00
Folke Behrens	f62bc28086	proxy: Move binaries into the lib (#10758 ) * This way all clippy lints defined in the lib also cover the binary code. * It's much easier to detect unused code. * Fix all discovered lints.	2025-02-11 19:46:23 +00:00
Ivan Efremov	73633e27ed	fix(proxy): Log errors from the local proxy in auth-broker (#10659 ) Handle errors from local proxy by parsing HTTP response in auth broker code Closes [#19476](https://github.com/neondatabase/cloud/issues/19476)	2025-02-10 16:06:13 +00:00
Stefan Radig	6dd48ba148	feat(proxy): Implement access control with VPC endpoint checks and block for public internet / VPC (#10143 ) - Wired up filtering on VPC endpoints - Wired up block access from public internet / VPC depending on per project flag - Added cache invalidation for VPC endpoints (partially based on PR from Raphael) - Removed BackendIpAllowlist trait --------- Co-authored-by: Ivan Efremov <ivan@neon.tech>	2025-01-31 20:32:57 +00:00
Conrad Ludgate	738bf83583	chore: replace dashmap with clashmap (#10582 ) ## Problem Because dashmap 6 switched to hashbrown RawTable API, it required us to use unsafe code in the upgrade: https://github.com/neondatabase/neon/pull/8107 ## Summary of changes Switch to clashmap, a fork maintained by me which removes much of the unsafe and ultimately switches to HashTable instead of RawTable to remove much of the unsafe requirement on us.	2025-01-31 09:53:43 +00:00
Ivan Efremov	222cc181e9	impr(proxy): Move the CancelMap to Redis hashes (#10364 ) ## Problem The approach of having CancelMap as an in-memory structure increases code complexity, as well as putting additional load for Redis streams. ## Summary of changes - Implement a set of KV ops for Redis client; - Remove cancel notifications code; - Send KV ops over the bounded channel to the handling background task for removing and adding the cancel keys. Closes #9660	2025-01-29 11:19:10 +00:00
Conrad Ludgate	a338aee132	feat(local_proxy): use ed25519 signatures with pg_session_jwt (#10290 ) Generally ed25519 seems to be much preferred for cryptographic strength to P256 nowadays, and it is NIST approved finally. We should use it where we can as it's also faster than p256. This PR makes the re-signed JWTs between local_proxy and pg_session_jwt use ed25519. This does introduce a new dependency on ed25519, but I do recall some Neon Authorise customers asking for support for ed25519, so I am justifying this dependency addition in the context that we can then introduce support for customer ed25519 keys sources: * https://csrc.nist.gov/pubs/fips/186-5/final subsection 7 (EdDSA) * https://datatracker.ietf.org/doc/html/rfc8037#section-3.1	2025-01-13 15:20:46 +00:00
Conrad Ludgate	38c7a2abfc	chore(proxy): pre-load native tls certificates and propagate compute client config (#10182 ) Now that we construct the TLS client config for cancellation as well as connect, it feels appropriate to construct the same config once and re-use it elsewhere. It might also help should #7500 require any extra setup, so we can easily add it to all the appropriate call sites.	2025-01-02 09:36:13 +00:00
Conrad Ludgate	59b7ff8988	chore(proxy): disallow unwrap and unimplemented (#10142 ) As the title says, I updated the lint rules to no longer allow unwrap or unimplemented. Three special cases: * Tests are allowed to use them * std::sync::Mutex lock().unwrap() is common because it's usually correct to continue panicking on poison * `tokio::spawn_blocking(...).await.unwrap()` is common because it will only error if the blocking fn panics, so continuing the panic is also correct I've introduced two extension traits to help with these last two, that are a bit more explicit so they don't need an expect message every time.	2024-12-16 16:37:15 +00:00
Conrad Ludgate	24d6587914	chore(proxy): refactor self-signed config (#10154 ) ## Problem While reviewing #10152 I found it tricky to actually determine whether the connection used `allow_self_signed_compute` or not. I've tried to remove this setting in the past: * https://github.com/neondatabase/neon/pull/7884 * https://github.com/neondatabase/neon/pull/7437 * https://github.com/neondatabase/cloud/pull/13702 But each time it seems it is used by e2e tests ## Summary of changes The `node_info.allow_self_signed_computes` is always initialised to false, and then sometimes inherits the proxy config value. There's no need this needs to be in the node_info, so removing it and propagating it via `TcpMechansim` is simpler.	2024-12-16 11:15:25 +00:00
Conrad Ludgate	bd52822e14	feat(proxy): add option to forward startup params (#9979 ) (stacked on #9990 and #9995) Partially fixes #1287 with a custom option field to enable the fixed behaviour. This allows us to gradually roll out the fix without silently changing the observed behaviour for our customers. related to https://github.com/neondatabase/cloud/issues/15284	2024-12-04 12:58:35 +00:00
Conrad Ludgate	9ef0662a42	chore(proxy): enforce single host+port (#9995 ) proxy doesn't ever provide multiple hosts/ports, so this code adds a lot of complexity of error handling for no good reason. (stacked on #9990)	2024-12-03 20:00:14 +00:00
Conrad Ludgate	27a42d0f96	chore(proxy): remove postgres config parser and md5 support (#9990 ) Keeping the `mock` postgres cplane adaptor using "stock" tokio-postgres allows us to remove a lot of dead weight from our actual postgres connection logic.	2024-12-03 18:39:23 +00:00

1 2 3 4

180 Commits