rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-13 08:22:55 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	32126d705b	proxy refactor serverless (#4685 ) ## Problem Our serverless backend was a bit jumbled. As a comment indicated, we were handling SQL-over-HTTP in our `websocket.rs` file. I've extracted out the `sql_over_http` and `websocket` files from the `http` module and put them into a new module called `serverless`. ## Summary of changes ```sh mkdir proxy/src/serverless mv proxy/src/http/{conn_pool,sql_over_http,websocket}.rs proxy/src/serverless/ mv proxy/src/http/server.rs proxy/src/http/health_server.rs mv proxy/src/metrics proxy/src/usage_metrics.rs ``` I have also extracted the hyper server and handler from websocket.rs into `serverless.rs`	2023-10-25 15:43:03 +01:00
Conrad Ludgate	a461c459d8	fix http pool test (#5653 ) ## Problem We defer the returning of connections the the connection pool. It's possible for our test to be faster than the returning of connections - which then gets a differing process ID because it opens a new connection. ## Summary of changes 1. Delay the tests just a little (20ms) to give more chance for connections to return. 2. Correlate connection IDs with the connection logs a bit more	2023-10-25 13:20:45 +01:00
Conrad Ludgate	94b4e76e13	proxy: latency connect outcome (#5588 ) ## Problem I recently updated the latency timers to include cache miss and pool miss, as well as connection protocol. By moving the latency timer to start before authentication, we count a lot more failures and it's messed up the latency dashboard. ## Summary of changes Add another label to LatencyTimer metrics for outcome. Explicitly report on success	2023-10-23 15:17:28 +01:00
Conrad Ludgate	543b8153c6	proxy: add flag to reject requests without proxy protocol client ip (#5417 ) ## Problem We need a flag to require proxy protocol (prerequisite for #5416) ## Summary of changes Add a cli flag to require client IP addresses. Error if IP address is missing when the flag is active.	2023-10-17 16:59:35 +01:00
Conrad Ludgate	f775928dfc	proxy: refactor how and when connections are returned to the pool (#5095 ) ## Problem Transactions break connections in the pool fixes #4698 ## Summary of changes * Pool `Client`s are smart object that return themselves to the pool * Pool `Client`s can be 'discard'ed * Pool `Client`s are discarded when certain errors are encountered. * Pool `Client`s are discarded when ReadyForQuery returns a non-idle state.	2023-10-17 13:55:52 +00:00
Conrad Ludgate	8c522ea034	proxy: count cache-miss for compute latency (#5539 ) ## Problem Would be good to view latency for hot-path vs cold-path ## Summary of changes add some labels to latency metrics	2023-10-16 16:31:04 +01:00
khanova	21deb81acb	Fix case for array of jsons (#5523 ) ## Problem Currently proxy doesn't handle array of json parameters correctly. ## Summary of changes Added one more level of quotes escaping for the array of jsons case. Resolves: https://github.com/neondatabase/neon/issues/5515	2023-10-12 14:32:49 +02:00
khanova	dbb21d6592	Make http timeout configurable (#5532 ) ## Problem Currently http timeout is hardcoded to 15 seconds. ## Summary of changes Added an option to configure it via cli args. Context: https://neondb.slack.com/archives/C04DGM6SMTM/p1696941726151899	2023-10-12 11:41:07 +02:00
Conrad Ludgate	d4dc86f8e3	proxy: more connection metrics (#5464 ) ## Problem Hard to tell 1. How many clients are connected to proxy 2. How many requests clients are making 3. How many connections are made to a database 1 and 2 are different because of the properties of HTTP. We have 2 already tracked through `proxy_accepted_connections_total` and `proxy_closed_connections_total`, but nothing for 1 and 3 ## Summary of changes Adds 2 new counter gauges. * `proxy_opened_client_connections_total`,`proxy_closed_client_connections_total` - how many client connections are open to proxy * `proxy_opened_db_connections_total`,`proxy_closed_db_connections_total` - how many active connections are made through to a database. For TCP and Websockets, we expect all 3 of these quantities to be roughly the same, barring users connecting but with invalid details. For HTTP: * client_connections/connections can differ because the client connections can be reused. * connections/db_connections can differ because of connection pooling.	2023-10-10 16:33:20 +01:00
khanova	aec9188d36	Added timeout for http requests (#5514 ) # Problem Proxy timeout for HTTP-requests ## Summary of changes If the HTTP-request exceeds 15s, it would be killed. Resolves: https://github.com/neondatabase/neon/issues/4847	2023-10-10 13:39:38 +02:00
Conrad Ludgate	f002b1a219	proxy: http limits (#5460 ) ## Problem 1MB request body is apparently too small for some clients ## Summary of changes Update to 10 MB request body. Also revert the removal of response limits while we don't have streaming support.	2023-10-04 15:01:05 +01:00
Conrad Ludgate	528fb1bd81	proxy: metrics2 (#5179 ) ## Problem We need to count metrics always when a connection is open. Not only when the transfer is 0. We also need to count bytes usage for HTTP. ## Summary of changes New structure for usage metrics. A `DashMap<Ids, Arc<Counters>>`. If the arc has 1 owner (the map) then I can conclude that no connections are open. If the counters has "open_connections" non zero, then I can conclude a new connection was opened in the last interval and should be reported on. Also, keep count of how many bytes processed for HTTP and report it here.	2023-09-28 11:38:26 +01:00
Conrad Ludgate	d11621d904	Proxy: proxy protocol v2 (#5028 ) ## Problem We need to log the client IP, not the IP of the NLB. ## Summary of changes Parse the proxy [protocol version 2](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt) if possible	2023-08-31 14:30:25 +03:00
Conrad Ludgate	3b81e0c86d	chore: remove webpki (#5069 ) ## Problem webpki is unmaintained Closes https://github.com/neondatabase/neon/security/dependabot/33 ## Summary of changes Update all dependents of webpki.	2023-08-30 15:14:03 +01:00
Conrad Ludgate	faf070f288	proxy: dont return connection pending (#5107 ) ## Problem We were returning Pending when a connection had a notice/notification (introduced recently in #5020). When returning pending, the runtime assumes you will call `cx.waker().wake()` in order to continue processing. We weren't doing that, so the connection task would get stuck ## Summary of changes Don't return pending. Loop instead	2023-08-25 15:08:45 +03:00
Conrad Ludgate	0b001a0001	proxy: remove connections on shutdown (#5051 ) ## Problem On shutdown, proxy connections are staying open. ## Summary of changes Remove the connections on shutdown	2023-08-21 19:20:58 +01:00
Conrad Ludgate	ec10838aa4	proxy: pool connection logs (#5020 ) ## Problem Errors and notices that happen during a pooled connection lifecycle have no session identifiers ## Summary of changes Using a watch channel, we set the session ID whenever it changes. This way we can see the status of a connection for that session Also, adding a connection id to be able to search the entire connection lifecycle	2023-08-18 11:44:08 +01:00
Conrad Ludgate	25934ec1ba	proxy: reduce global conn pool contention (#4747 ) ## Problem As documented, the global connection pool will be high contention. ## Summary of changes Use DashMap rather than Mutex<HashMap>. Of note, DashMap currently uses a RwLock internally, but it's partially sharded to reduce contention by a factor of N. We could potentially use flurry which is a port of Java's concurrent hashmap, but I have no good understanding of it's performance characteristics. Dashmap is at least equivalent to hashmap but less contention. See the read heavy benchmark to analyse our expected performance <https://github.com/xacrimon/conc-map-bench#ready-heavy> I also spoke with the developer of dashmap recently, and they are working on porting the implementation to use concurrent HAMT FWIW	2023-08-16 17:20:28 +01:00
Arthur Petukhovsky	1b97a3074c	Disable neon-pool-opt-in (#4995 )	2023-08-15 20:57:56 +03:00
George MacKerron	218be9eb32	Added deferrable transaction option to http batch queries (#4993 ) ## Problem HTTP batch queries currently allow us to set the isolation level and read only, but not deferrable. ## Summary of changes Add support for deferrable. Echo deferrable status in response headers only if true. Likewise, now echo read-only status in response headers only if true.	2023-08-15 14:52:00 +01:00
George MacKerron	1ca08cc523	Changed batch query body to from [...] to { queries: [...] } (#4975 ) ## Problem It's nice if `single query : single response :: batch query : batch response`. But at present, in the single case we send `{ query: '', params: [] }` and get back a single `{ rows: [], ... }` object, while in the batch case we send an array of `{ query: '', params: [] }` objects and get back not an array of `{ rows: [], ... }` objects but a `{ results: [ { rows: [] , ... }, { rows: [] , ... }, ... ] }` object instead. ## Summary of changes With this change, the batch query body becomes `{ queries: [{ query: '', params: [] }, ... ] }`, which restores a consistent relationship between the request and response bodies.	2023-08-14 16:07:33 +01:00
Arthur Petukhovsky	3a6b99f03c	proxy: improve http logs (#4976 ) Fix multiline logs on websocket errors and always print sql-over-http errors sent to the user.	2023-08-11 18:18:07 +03:00
Arthur Petukhovsky	73d7a9bc6e	proxy: propagate ws span (#4966 ) Found this log on staging: ``` 2023-08-10T17:42:58.573790Z INFO handling interactive connection from client protocol="ws" ``` We seem to be losing websocket span in spawn, this patch fixes it.	2023-08-10 23:38:22 +03:00
George MacKerron	538373019a	Increase max sql-over-http response size from 1MB to 10MB (#4961 ) ## Problem 1MB response limit is very small. ## Summary of changes This data is not yet tracked, so we shoudn't raise the limit too high yet. But as discussed with @kelvich and @conradludgate, this PR lifts it to 10MB, and adds also details of the limit to the error response.	2023-08-10 17:21:52 +01:00
Alex Chi Z	7b6c849456	support isolation level + read only for http batch sql (#4830 ) We will retrieve `neon-batch-isolation-level` and `neon-batch-read-only` from the http header, which sets the txn properties. https://github.com/neondatabase/serverless/pull/38#issuecomment-1653130981 --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-08-01 02:59:11 +03:00
Conrad Ludgate	35370f967f	proxy: add some connection init logs (#4812 ) ## Problem The first session event we emit is after we receive the first startup packet from the client. This means we can't detect any issues between TCP open and handling of the first PG packet ## Summary of changes Add some new logs for websocket upgrade and connection handling	2023-07-26 15:03:51 +00:00
Alex Chi Z	bcc2aee704	proxy: add tests for batch http sql (#4793 ) This PR adds an integration test case for batch HTTP SQL endpoint. https://github.com/neondatabase/neon/pull/4654/ should be merged first before we land this PR. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-25 15:08:24 +00:00
Nick Randall	062159ac17	support non-interactive transactions in sql-over-http (#4654 ) This PR adds support for non-interactive transaction query endpoint. It accepts an array of queries and parameters and returns an array of query results. The queries will be run in a single transaction one after another on the proxy side.	2023-07-25 13:03:55 +01:00
Conrad Ludgate	2e8a3afab1	proxy: merge handle_client (#4740 ) ## Problem Second half of #4699. we were maintaining 2 implementations of handle_client. ## Summary of changes Merge the handle_client code, but abstract some of the details. ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-07-17 22:20:23 +01:00
George MacKerron	196943c78f	CORS preflight OPTIONS support for /sql (http fetch) endpoint (#4706 ) ## Problem HTTP fetch can't be used from browsers because proxy doesn't support [CORS 'preflight' `OPTIONS` requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests). ## Summary of changes Added a simple `OPTIONS` endpoint for `/sql`.	2023-07-17 20:01:25 +01:00
Conrad Ludgate	7c85c7ea91	proxy: merge connect compute (#4713 ) ## Problem Half of #4699. TCP/WS have one implementation of `connect_to_compute`, HTTP has another implementation of `connect_to_compute`. Having both is annoying to deal with. ## Summary of changes Creates a set of traits `ConnectMechanism` and `ShouldError` that allows the `connect_to_compute` to be generic over raw TCP stream or tokio_postgres based connections. I'm not super happy with this. I think it would be nice to remove tokio_postgres entirely but that will need a lot more thought to be put into it. I have also slightly refactored the caching to use fewer references. Instead using ownership to ensure the state of retrying is encoded in the type system.	2023-07-17 15:53:01 +01:00
Conrad Ludgate	db4d094afa	proxy: add more error cases to retry connect (#4707 ) ## Problem In the logs, I noticed we still weren't retrying in some cases. Seemed to be timeouts but we explicitly wanted to handle those ## Summary of changes Retry on io::ErrorKind::TimedOut errors. Handle IO errors in tokio_postgres::Error.	2023-07-13 11:47:27 +01:00
Conrad Ludgate	0626e0bfd3	proxy: refactor some error handling and shutdowns (#4684 ) ## Problem It took me a while to understand the purpose of all the tasks spawned in the main functions. ## Summary of changes Utilising the type system and less macros, plus much more comments, document the shutdown procedure of each task in detail	2023-07-13 11:03:37 +01:00
Conrad Ludgate	a1d6b1a4af	proxy wake_compute loop (#4675 ) ## Problem If we fail to wake up the compute node, a subsequent connect attempt will definitely fail. However, kubernetes won't fail the connection immediately, instead it hangs until we timeout (10s). ## Summary of changes Refactor the loop to allow fast retries of compute_wake and to skip a connect attempt.	2023-07-12 11:38:36 +01:00
Conrad Ludgate	ac758e4f51	allow repeated IO errors from compute node (#4624 ) ## Problem #4598 compute nodes are not accessible some time after wake up due to kubernetes DNS not being fully propagated. ## Summary of changes Update connect retry mechanism to support handling IO errors and sleeping for 100ms ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-07-07 19:50:50 +03:00
Stas Kelvich	dbf88cf2d7	Minimalistic pool for http endpoint compute connections (under opt-in flag) Cache up to 20 connections per endpoint. Once all pooled connections are used current implementation can open an extra connection, so the maximum number of simultaneous connections is not enforced. There are more things to do here, especially with background clean-up of closed connections, and checks for transaction state. But current implementation allows to check for smaller coonection latencies that this cache should bring.	2023-07-05 12:00:03 +03:00
Arthur Petukhovsky	a7a0c3cd27	Invalidate proxy cache in http-over-sql (#4500 ) HTTP queries failed with errors `error connecting to server: failed to lookup address information: Name or service not known\n\nCaused by:\n failed to lookup address information: Name or service not known` The fix reused cache invalidation logic in proxy from usual postgres connections and added it to HTTP-over-SQL queries. Also removed a timeout for HTTP request, because it almost never worked on staging (50s+ time just to start the compute), and we can have the similar case in production. Should be ok, since we have a limits for the requests and responses.	2023-06-14 19:24:46 +03:00
Stas Kelvich	4385e0c291	Return more RowDescription fields via proxy json endpoint As we aim to align client-side behavior with node-postgres, it's necessary for us to return these fields, because node-postgres does so as well.	2023-06-13 22:31:18 +03:00
Stas Kelvich	c82d19d8d6	Fix NULLs handling in proxy json endpoint There were few problems with null handling: * query_raw_txt() accepted vector of string so it always (erroneously) treated "null" as a string instead of null. Change rust pg client to accept the vector of Option<String> instead of just Strings. Adopt coding here to pass nulls as None. * pg_text_to_json() had a check that always interpreted "NULL" string as null. That is wrong and nulls were already handled by match None. This bug appeared as a bad attempt to parse arrays containing NULL elements. Fix coding by checking presence of quotes while parsing an array (no quotes -> null, quoted -> "null" string). Array parser fix also slightly changes behavior by always cleaning current entry when pushing to the resulting vector. This seems to be an omission by previous coding, however looks like it was harmless as entry was not cleared only at the end of the nested or to-level array.	2023-06-08 16:00:18 +03:00
Stas Kelvich	d73639646e	Add more output options to proxy json endpoint With this commit client can pass following optional headers: `Neon-Raw-Text-Output: true`. Return postgres values as text, without parsing them. So numbers, objects, booleans, nulls and arrays will be returned as text. That can be useful in cases when client code wants to implement it's own parsing or reuse parsing libraries from e.g. node-postgres. `Neon-Array-Mode: true`. Return postgres rows as arrays instead of objects. That is more compact representation and also helps in some edge cases where it is hard to use rows represented as objects (e.g. when several fields have the same name).	2023-06-08 16:00:18 +03:00
Stas Kelvich	dad3519351	Add SQL-over-HTTP endpoint to Proxy This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON response structure resembling that of the node-postgres driver. This method, using HTTP POST, achieves smaller amortized latencies in edge setups due to fewer round trips and an enhanced open connection reuse by the v8 engine. This update involves several intricacies: 1. SQL injection protection: We employed the extended query protocol, modifying the rust-postgres driver to send queries in one roundtrip using a text protocol rather than binary, bypassing potential issues like those identified in https://github.com/sfackler/rust-postgres/issues/1030. 2. Postgres type compatibility: As not all postgres types have binary representations (e.g., acl's in pg_class), we adjusted rust-postgres to respond with text protocol, simplifying serialization and fixing queries with text-only types in response. 3. Data type conversion: Considering JSON supports fewer data types than Postgres, we perform conversions where possible, passing all other types as strings. Key conversions include: - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain text) - postgres bool, null, text -> json bool, null, string - postgres array -> json array - postgres json and jsonb -> json object 4. Alignment with node-postgres: To facilitate integration with js libraries, we've matched the response structure of node-postgres, returning command tags and column oids. Command tag capturing was added to the rust-postgres functionality as part of this change.	2023-05-23 20:01:40 +03:00
Sasha Krassovsky	fd31fafeee	Make proxy shutdown when all connections are closed (#3764 ) ## Describe your changes Makes Proxy start draining connections on SIGTERM. ## Issue ticket number and link #3333	2023-04-13 19:31:30 +03:00
Dmitry Ivanov	956b6f17ca	[proxy] Handle some unix signals. On the surface, this doesn't add much, but there are some benefits: * We can do graceful shutdowns and thus record more code coverage data. * We now have a foundation for the more interesting behaviors, e.g. "stop accepting new connections after SIGTERM but keep serving the existing ones". * We give the otel machinery a chance to flush trace events before finally shutting down.	2023-02-17 15:32:14 +03:00
Arthur Petukhovsky	f383b4d540	Enable TCP_NODELAY for wss connections	2023-02-10 21:40:28 +03:00
Dmitry Ivanov	9657459d80	[proxy] Fix possible unsoundness in the websocket machinery (#3569 ) This PR replaces the ill-advised `unsafe Sync` impl with a de-facto standard way to solve the underlying problem. TLDR: - tokio::task::spawn requires future to be Send - ∀t. (t : Sync) <=> (&t : Send) - ∀t. (t : Send + !Sync) => (&t : !Send)	2023-02-10 12:45:38 +03:00
Dmitry Ivanov	ea0278cf27	[proxy] Implement compute node info cache (#3331 ) This patch adds a timed LRU cache implementation and a compute node info cache on top of that. Cache entries might expire on their own (default ttl=5mins) or become invalid due to real-world events, e.g. compute node scale-to-zero event, so we add a connection retry loop with a wake-up call. Solved problems: - [x] Find a decent LRU implementation. - [x] Implement timed LRU on top of that. - [x] Cache results of `proxy_wake_compute` API call. - [x] Don't invalidate newer cache entries for the same key. - [x] Add cmdline configuration knobs (requires some refactoring). - [x] Add failed connection estab metric. - [x] Refactor auth backends to make things simpler (retries, cache placement, etc). - [x] Address review comments (add code comments + cleanup). - [x] Retry `/proxy_wake_compute` if we couldn't connect to a compute (e.g. stalled cache entry). - [x] Add high-level description for `TimedLru`. TODOs (will be addressed later): - [ ] Add cache metrics (hit, spurious hit, miss). - [ ] Synchronize http requests across concurrent per-client tasks (https://github.com/neondatabase/neon/pull/3331#issuecomment-1399216069). - [ ] Cache results of `proxy_get_role_secret` API call.	2023-02-01 17:11:41 +03:00
Kirill Bulatov	fe8cef3427	Use ready! rustc 1.64 macro (#3315 ) rustc [1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22) had brought `ready!` macro: https://doc.rust-lang.org/stable/std/task/macro.ready.html Use it to shorten the code slightly.	2023-01-12 21:27:34 +02:00
Arthur Petukhovsky	debd134b15	Implement wss support in proxy (#3247 ) This is a hacky implementation of WebSocket server, embedded into our postgres proxy. The server is used to allow https://github.com/neondatabase/serverless to connect to our postgres from browser and serverless javascript functions. How it will work (general schema): - browser opens a websocket connection to `wss://ep-abc-xyz-123.xx-central-1.aws.neon.tech/` - proxy accepts this connection and terminates TLS (https) - inside encrypted tunnel (HTTPS), browser initiates plain (non-encrypted) postgres connection - proxy performs auth as in usual plain pg connection and forwards connection to the compute Related issue: #3225	2023-01-06 18:34:18 +03:00
Dmitry Ivanov	e516c376d6	[proxy] Improve logging (#2554 ) * [proxy] Use `tracing::` instead of `println!` for logging Fix a minor misnomer * Log more stuff	2022-10-07 14:34:57 +03:00
Dmitry Ivanov	e9a103c09f	[proxy] Pass extra parameters to the console (#2467 ) With this change we now pass additional params to the console's auth methods.	2022-09-21 21:42:47 +03:00

50 Commits