rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-03 19:42:55 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	c0ff4f18dc	proxy: hyper1 for only proxy (#7073 ) ## Problem hyper1 offers control over the HTTP connection that hyper0_14 does not. We're blocked on switching all services to hyper1 because of how we use tonic, but no reason we can't switch proxy over. ## Summary of changes 1. hyper0.14 -> hyper1 1. self managed server 2. Remove the `WithConnectionGuard` wrapper from `protocol2` 2. Remove TLS listener as it's no longer necessary 3. include first session ID in connection startup logs	2024-04-10 08:23:59 +00:00
Anna Khanova	dbac2d2c47	Proxy read ids from redis (#7205 ) ## Problem Proxy doesn't know about existing endpoints. ## Summary of changes * Added caching of all available endpoints. * On the high load, use it before going to cplane. * Report metrics for the outcome. * For rate limiter and credentials caching don't distinguish between `-pooled` and not TODOs: * Make metrics more meaningful * Consider integrating it with the endpoint rate limiter * Test it together with cplane in preview	2024-04-10 02:40:14 +02:00
Conrad Ludgate	55da8eff4f	proxy: report metrics based on cold start info (#7324 ) ## Problem Would be nice to have a bit more info on cold start metrics. ## Summary of changes * Change connect compute latency to include `cold_start_info`. * Update `ColdStartInfo` to include HttpPoolHit and WarmCached. * Several changes to make more use of interned strings	2024-04-05 16:14:50 +01:00
Anna Khanova	7ce613354e	Fix length (#7308 ) ## Problem Bug ## Summary of changes Use `compressed_data.len()` instead of `data.len()`.	2024-04-04 10:29:10 +00:00
Conrad Ludgate	d8da51e78a	remove http timeout (#7291 ) ## Problem https://github.com/neondatabase/cloud/issues/11051 additionally, I felt like the http logic was a bit complex. ## Summary of changes 1. Removes timeout for HTTP requests. 2. Split out header parsing to a `HttpHeaders` type. 3. Moved db client handling to `QueryData::process` and `BatchQueryData::process` to simplify the logic of `handle_inner` a bit.	2024-04-03 11:23:26 +01:00
Anna Khanova	582cec53c5	proxy: upload consumption events to S3 (#7213 ) ## Problem If vector is unavailable, we are missing consumption events. https://github.com/neondatabase/cloud/issues/9826 ## Summary of changes Added integration with the consumption bucket.	2024-04-02 21:46:23 +02:00
Conrad Ludgate	12512f3173	add authentication rate limiting (#6865 ) ## Problem https://github.com/neondatabase/cloud/issues/9642 ## Summary of changes 1. Make `EndpointRateLimiter` generic, renamed as `BucketRateLimiter` 2. Add support for claiming multiple tokens at once 3. Add `AuthRateLimiter` alias. 4. Check `(Endpoint, IP)` pair during authentication, weighted by how many hashes proxy would be doing. TODO: handle ipv6 subnets. will do this in a separate PR.	2024-03-26 19:31:19 +00:00
Anna Khanova	6c18109734	proxy: reuse sess_id as request_id for the cplane requests (#7245 ) ## Problem https://github.com/neondatabase/cloud/issues/11599 ## Summary of changes Reuse the same sess_id for requests within the one session. TODO: get rid of `session_id` in query params.	2024-03-26 11:27:48 +00:00
Conrad Ludgate	72103d481d	proxy: fix stack overflow in cancel publisher (#7212 ) ## Problem stack overflow in blanket impl for `CancellationPublisher` ## Summary of changes Removes `async_trait` and fixes the impl order to make it non-recursive.	2024-03-23 06:36:58 +00:00
Conrad Ludgate	77f3a30440	proxy: unit tests for auth_quirks (#7199 ) ## Problem I noticed code coverage for auth_quirks was pretty bare ## Summary of changes Adds 3 happy path unit tests for auth_quirks * scram * cleartext (websockets) * cleartext (password hack)	2024-03-22 13:31:10 +00:00
Anna Khanova	6770ddba2e	proxy: connect redis with AWS IAM (#7189 ) ## Problem Support of IAM Roles for Service Accounts for authentication. ## Summary of changes * Obtain aws 15m-long credentials * Retrieve redis password from credentials * Update every 1h to keep connection for more than 12h * For now allow to have different endpoints for pubsub/stream redis. TODOs: * PubSub doesn't support credentials refresh, consider using stream instead. * We need an AWS role for proxy to be able to connect to both: S3 and elasticache. Credentials obtaining and connection refresh was tested on xenon preview. https://github.com/neondatabase/cloud/issues/10365	2024-03-22 09:38:04 +01:00
Conrad Ludgate	d5304337cf	proxy: simplify password validation (#7188 ) ## Problem for HTTP/WS/password hack flows we imitate SCRAM to validate passwords. This code was unnecessarily complicated. ## Summary of changes Copy in the `pbkdf2` and 'derive keys' steps from the `postgres_protocol` crate in our `rust-postgres` fork. Derive the `client_key`, `server_key` and `stored_key` from the password directly. Use constant time equality to compare the `stored_key` and `server_key` with the ones we are sent from cplane.	2024-03-21 13:54:06 +00:00
Vlad Lazar	c75b584430	storage_controller: add metrics (#7178 ) ## Problem Storage controller had basically no metrics. ## Summary of changes 1. Migrate the existing metrics to use Conrad's [`measured`](https://docs.rs/measured/0.0.14/measured/) crate. 2. Add metrics for incoming http requests 3. Add metrics for outgoing http requests to the pageserver 4. Add metrics for outgoing pass through requests to the pageserver 5. Add metrics for database queries Note that the metrics response for the attachment service does not use chunked encoding like the rest of the metrics endpoints. Conrad has kindly extended the crate such that it can now be done. Let's leave it for a follow-up since the payload shouldn't be that big at this point. Fixes https://github.com/neondatabase/neon/issues/6875	2024-03-21 12:00:20 +00:00
Conrad Ludgate	5ec6862bcf	proxy: async aware password validation (#7176 ) ## Problem spawn_blocking in #7171 was a hack ## Summary of changes https://github.com/neondatabase/rust-postgres/pull/29	2024-03-21 11:58:41 +01:00
Conrad Ludgate	6d996427b1	proxy: enable sha2 asm support (#7184 ) ## Problem faster sha2 hashing. ## Summary of changes enable asm feature for sha2. this feature will be default in sha2 0.11, so we might as well lean into it now. It provides a noticeable speed boost on macos aarch64. Haven't tested on x86 though	2024-03-20 12:26:31 +00:00
Conrad Ludgate	49be446d95	async password validation (#7171 ) ## Problem password hashing can block main thread ## Summary of changes spawn_blocking the password hash call	2024-03-18 23:57:32 +01:00
Anna Khanova	46098ea0ea	proxy: add more missing warm logging (#7133 ) ## Problem There is one more missing thing about cached connections for `cold_start_info`. ## Summary of changes Fix and add comments.	2024-03-15 11:13:15 +00:00
Conrad Ludgate	3bd6551b36	proxy http cancellation safety (#7117 ) ## Problem hyper auto-cancels the request futures on connection close. `sql_over_http::handle` is not 'drop cancel safe', so we need to do some other work to make sure connections are queries in the right way. ## Summary of changes 1. tokio::spawn the request handler to resolve the initial cancel-safety issue 2. share a cancellation token, and cancel it when the request `Service` is dropped. 3. Add a new log span to be able to track the HTTP connection lifecycle.	2024-03-14 08:20:56 +00:00
Anna Khanova	b0aff04157	proxy: add new dimension to exclude cplane latency (#7011 ) ## Problem Currently cplane communication is a part of the latency monitoring. It doesn't allow to setup the proper alerting based on proxy latency. ## Summary of changes Added dimension to exclude cplane latency.	2024-03-13 13:50:05 +01:00
Anna Khanova	0554bee022	proxy: Report warm cold start if connection is from the local cache (#7104 ) ## Problem * quotes in serialized string * no status if connection is from local cache ## Summary of changes * remove quotes * report warm if connection if from local cache	2024-03-13 11:45:19 +00:00
Conrad Ludgate	83855a907c	proxy http error classification (#7098 ) ## Problem Missing error classification for SQL-over-HTTP queries. Not respecting `UserFacingError` for SQL-over-HTTP queries. ## Summary of changes Adds error classification. Adds user facing errors.	2024-03-13 07:35:49 +01:00
Conrad Ludgate	1f7d54f987	proxy refactor tls listener (#7056 ) ## Problem Now that we have tls-listener vendored, we can refactor and remove a lot of bloated code and make the whole flow a bit simpler ## Summary of changes 1. Remove dead code 2. Move the error handling to inside the `TlsListener` accept() function 3. Extract the peer_addr from the PROXY protocol header and log it with errors	2024-03-12 13:05:40 +00:00
Conrad Ludgate	09699d4bd8	proxy: cancel http queries on timeout (#7031 ) ## Problem On HTTP query timeout, we should try and cancel the current in-flight SQL query. ## Summary of changes Trigger a cancellation command in postgres once the timeout is reach	2024-03-12 11:52:00 +00:00
Conrad Ludgate	cc5d6c66b3	proxy: categorise new cplane error message (#7057 ) ## Problem `422 Unprocessable Entity: compute time quota of non-primary branches is exceeded` being marked as a control plane error. ## Summary of changes Add the manual checks to make this a user error that should not be retried.	2024-03-11 09:20:09 +01:00
Conrad Ludgate	2c132e45cb	proxy: do not store ephemeral endpoints in http pool (#6819 ) ## Problem For the ephemeral endpoint feature, it's not really too helpful to keep them around in the connection pool. This isn't really pressing but I think it's still a bit better this way. ## Summary of changes Add `is_ephemeral` function to `NeonOptions`. Allow `serverless::ConnInfo::endpoint_cache_key()` to return an `Option`. Handle that option appropriately	2024-03-08 07:56:23 +00:00
Conrad Ludgate	02358b21a4	update rustls (#7048 ) ## Summary of changes Update rustls from 0.21 to 0.22. reqwest/tonic/aws-smithy still use rustls 0.21. no upgrade route available yet.	2024-03-07 18:23:19 +00:00
Conrad Ludgate	c2876ec55d	proxy http tls investigations (#7045 ) ## Problem Some HTTP-specific TLS errors ## Summary of changes Add more logging, vendor `tls-listener` with minor modifications.	2024-03-07 12:36:47 +00:00
Anna Khanova	15b3665dc4	proxy: fix bug with populating the data (#7023 ) ## Problem Branch/project and coldStart were not populated to data events. ## Summary of changes Populate it. Also added logging for the coldstart info.	2024-03-05 15:32:58 +00:00
Anna Khanova	bdbb2f4afc	proxy: report redis broken message metric (#7021 ) ## Problem Not really a problem. Improving visibility around redis communication. ## Summary of changes Added metric on the number of broken messages.	2024-03-05 16:02:51 +01:00
Joonas Koivunen	752bf5a22f	build: clippy disallow futures::pin_mut macro (#7016 ) `std` has had `pin!` macro for some time, there is no need for us to use the older alternatives. Cannot disallow `tokio::pin` because tokio macros use that.	2024-03-05 10:14:37 +00:00
Anna Khanova	3114be034a	proxy: change is cold start to enum (#6948 ) ## Problem Actually it's good idea to distinguish between cases when it's a cold start, but we took the compute from the pool ## Summary of changes Updated to enum.	2024-03-04 10:31:28 +01:00
Arpad Müller	82853cc1d1	Fix warnings and compile errors on nightly (#6886 ) Nightly has added a bunch of compiler and linter warnings. There is also two dependencies that fail compilation on latest nightly due to using the old `stdsimd` feature name. This PR fixes them.	2024-03-01 17:14:19 +01:00
Conrad Ludgate	48957e23b7	proxy: refactor span usage (#6946 ) ## Problem Hard to find error reasons by endpoint for HTTP flow. ## Summary of changes I want all root spans to have session id and endpoint id. I want all root spans to be consistent.	2024-02-28 17:10:07 +04:00
Anna Khanova	896d51367e	proxy: introdice is cold start for analytics (#6902 ) ## Problem Data team cannot distinguish between cold start and not cold start. ## Summary of changes Report `is_cold_start` to analytics. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2024-02-27 19:53:02 +04:00
Anna Khanova	1718c0b59b	Proxy: cancel query on connection drop (#6832 ) ## Problem https://github.com/neondatabase/cloud/issues/10259 ## Summary of changes Make sure that the request is dropped once the connection was dropped.	2024-02-21 22:43:55 +00:00
Conrad Ludgate	60e5a56a5a	proxy: include client IP in ip deny message (#6854 ) ## Problem Debugging IP deny errors is difficult for our users ## Summary of changes Include the client IP in the deny message	2024-02-21 18:24:59 +01:00
Conrad Ludgate	e0af945f8f	proxy: improve error classification (#6841 ) ## Problem ## Summary of changes 1. Classify further cplane API errors 2. add 'serviceratelimit' and make a few of the timeout errors return that. 3. a few additional minor changes	2024-02-21 10:04:09 +00:00
Conrad Ludgate	21a86487a2	proxy: fix #6529 (#6807 ) ## Problem `application_name` for HTTP is not being recorded ## Summary of changes get `application_name` query param	2024-02-20 11:58:01 +01:00
Conrad Ludgate	686b3c79c8	http2 alpn (#6815 ) ## Problem Proxy already supported HTTP2, but I expect no one is using it because we don't advertise it in the TLS handshake. ## Summary of changes #6335 without the websocket changes.	2024-02-20 10:44:46 +00:00
Conrad Ludgate	d0d4871682	proxy: use postgres_protocol scram/sasl code (#4748 ) 1) `scram::password` was used in tests only. can be replaced with `postgres_protocol::password`. 2) `postgres_protocol::authentication::sasl` provides a client impl of SASL which improves our ability to test	2024-02-19 12:54:17 +00:00
Joonas Koivunen	80854b98ff	move timeouts and cancellation handling to remote_storage (#6697 ) Cancellation and timeouts are handled at remote_storage callsites, if they are. However they should always be handled, because we've had transient problems with remote storage connections. - Add cancellation token to the `trait RemoteStorage` methods - For `download`, `list` methods there is `DownloadError::{Cancelled,Timeout}` - For the rest now using `anyhow::Error`, it will have root cause `remote_storage::TimeoutOrCancel::{Cancel,Timeout}` - Both types have `::is_permanent` equivalent which should be passed to `backoff::retry` - New generic RemoteStorageConfig option `timeout`, defaults to 120s - Start counting timeouts only after acquiring concurrency limiter permit - Cancellable permit acquiring - Download stream timeout or cancellation is communicated via an `std::io::Error` - Exit backoff::retry by marking cancellation errors permanent Fixes: #6096 Closes: #4781 Co-authored-by: arpad-m <arpad-m@users.noreply.github.com>	2024-02-14 23:24:07 +00:00
Anna Khanova	c7538a2c20	Proxy: remove fail fast logic to connect to compute (#6759 ) ## Problem Flaky tests ## Summary of changes Remove failfast logic	2024-02-14 18:43:52 +00:00
Conrad Ludgate	a9ec4eb4fc	hold cancel session (#6750 ) ## Problem In a recent refactor, we accidentally dropped the cancel session early ## Summary of changes Hold the cancel session during proxy passthrough	2024-02-14 10:26:32 +00:00
Anna Khanova	331935df91	Proxy: send cancel notifications to all instances (#6719 ) ## Problem If cancel request ends up on the wrong proxy instance, it doesn't take an effect. ## Summary of changes Send redis notifications to all proxy pods about the cancel request. Related issue: https://github.com/neondatabase/neon/issues/5839, https://github.com/neondatabase/cloud/issues/10262	2024-02-13 17:58:58 +01:00
Anna Khanova	fac50a6264	Proxy refactor auth+connect (#6708 ) ## Problem Not really a problem, just refactoring. ## Summary of changes Separate authenticate from wake compute. Do not call wake compute second time if we managed to connect to postgres or if we got it not from cache.	2024-02-12 18:41:02 +00:00
Conrad Ludgate	789a71c4ee	proxy: add more http logging (#6726 ) ## Problem hard to see where time is taken during HTTP flow. ## Summary of changes add a lot more for query state. add a conn_id field to the sql-over-http span	2024-02-12 15:03:45 +00:00
Conrad Ludgate	98ec5c5c46	proxy: some more parquet data (#6711 ) ## Summary of changes add auth_method and database to the parquet logs	2024-02-12 13:14:06 +00:00
Anna Khanova	020e607637	Proxy: copy bidirectional fork (#6720 ) ## Problem `tokio::io::copy_bidirectional` doesn't close the connection once one of the sides closes it. It's not really suitable for the postgres protocol. ## Summary of changes Fork `copy_bidirectional` and initiate a shutdown for both connections. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-02-12 14:04:46 +01:00
Conrad Ludgate	cbd3a32d4d	proxy: decode username and password (#6700 ) ## Problem usernames and passwords can be URL 'percent' encoded in the connection string URL provided by serverless driver. ## Summary of changes Decode the parameters when getting conn info	2024-02-09 19:22:23 +00:00
Conrad Ludgate	96d89cde51	Proxy error reworking (#6453 ) ## Problem Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and doing a bit less radical changes. smaller commits. We currently don't report error classifications in proxy as the current error handling made it hard to do so. ## Summary of changes 1. Add a `ReportableError` trait that all errors will implement. This provides the error classification functionality. 2. Handle Client requests a strongly typed error * this error is a `ReportableError` and is logged appropriately 3. The handle client error only has a few possible error types, to account for the fact that at this point errors should be returned to the user.	2024-02-09 15:50:51 +00:00

1 2 3 4 5 ...

371 Commits