rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 12:40:37 +00:00

Author	SHA1	Message	Date
Conrad Ludgate	e1a564ace2	proxy simplify cancellation (#5916 ) ## Problem The cancellation code was confusing and error prone (as seen before in our memory leaks). ## Summary of changes * Use the new `TaskTracker` primitve instead of JoinSet to gracefully wait for tasks to shutdown. * Updated libs/utils/completion to use `TaskTracker` * Remove `tokio::select` in favour of `futures::future::select` in a specialised `run_until_cancelled()` helper function	2023-12-08 16:21:17 +00:00
Conrad Ludgate	699049b8f3	proxy: make auth more type safe (#5689 ) ## Problem `a5292f7e67/proxy/src/auth/backend.rs (L146-L148)` `a5292f7e67/proxy/src/console/provider/neon.rs (L90)` `a5292f7e67/proxy/src/console/provider/neon.rs (L154)` ## Summary of changes 1. Test backend is only enabled on `cfg(test)`. 2. Postgres mock backend + MD5 auth keys are only enabled on `cfg(feature = testing)` 3. Password hack and cleartext flow will have their passwords validated before proceeding. 4. Distinguish between ClientCredentials with endpoint and without, removing many panics in the process	2023-12-08 11:48:37 +00:00
Anna Khanova	12f02523a4	Enable dynamic rate limiter (#6029 ) ## Problem Limit the number of open connections between the control plane and proxy. ## Summary of changes Enable dynamic rate limiter in prod. Unfortunately the latency metrics are a bit broken, but from logs I see that on staging for the past 7 days only 2 times latency for acquiring was greater than 1ms (for most of the cases it's insignificant).	2023-12-04 15:00:24 +00:00
Conrad Ludgate	f39fca0049	proxy: chore: replace strings with SmolStr (#5786 ) ## Problem no problem ## Summary of changes replaces boxstr with arcstr as it's cheaper to clone. mild perf improvement. probably should look into other smallstring optimsations tbh, they will likely be even better. The longest endpoint name I was able to construct is something like `ep-weathered-wildflower-12345678` which is 32 bytes. Most string optimisations top out at 23 bytes	2023-11-30 20:52:30 +00:00
Anna Khanova	e12e2681e9	IP allowlist on the proxy side (#5906 ) ## Problem Per-project IP allowlist: https://github.com/neondatabase/cloud/issues/8116 ## Summary of changes Implemented IP filtering on the proxy side. To retrieve ip allowlist for all scenarios, added `get_auth_info` call to the control plane for: * sql-over-http * password_hack * cleartext_hack Added cache with ttl for sql-over-http path This might slow down a bit, consider using redis in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2023-11-30 13:14:33 +00:00
Conrad Ludgate	fc77c42c57	proxy: add flag to enable http pool for all users (#5959 ) ## Problem #5123 ## Summary of changes Add `--sql-over-http-pool-opt-in true` default cli arg. Allows us to set `--sql-over-http-pool-opt-in false` region-by-region	2023-11-30 10:19:30 +00:00
Conrad Ludgate	316309c85b	channel binding (#5683 ) ## Problem channel binding protects scram from sophisticated MITM attacks where the attacker is able to produce 'valid' TLS certificates. ## Summary of changes get the tls-server-end-point channel binding, and verify it is correct for the SCRAM-SHA-256-PLUS authentication flow	2023-11-27 21:45:15 +00:00
Conrad Ludgate	a56fd45f56	proxy: fix memory leak again (#5909 ) ## Problem The connections.join_next helped but it wasn't enough... The way I implemented the improvement before was still faulty but it mostly worked so it looked like it was working correctly. From [`tokio::select` docs](https://docs.rs/tokio/latest/tokio/macro.select.html): > 4. Once an <async expression> returns a value, attempt to apply the value to the provided <pattern>, if the pattern matches, evaluate <handler> and return. If the pattern does not match, disable the current branch and for the remainder of the current call to select!. Continue from step 3. The `connections.join_next()` future would complete and `Some(Err(e))` branch would be evaluated but not match (as the future would complete without panicking, we would hope). Since the branch doesn't match, it's disabled. The select continues but never attempts to call `join_next` again. Getting unlucky, more TCP connections are created than we attempt to join_next. ## Summary of changes Replace the `Some(Err(e))` pattern with `Some(e)`. Because of the auto-disabling feature, we don't need the `if !connections.is_empty()` step as the `None` pattern will disable it for us.	2023-11-23 19:11:24 +00:00
khanova	2f0d245c2a	Proxy control plane rate limiter (#5785 ) ## Problem Proxy might overload the control plane. ## Summary of changes Implement rate limiter for proxy<->control plane connection. Resolves https://github.com/neondatabase/neon/issues/5707 Used implementation ideas from https://github.com/conradludgate/squeeze/	2023-11-15 09:15:59 +00:00
Conrad Ludgate	7cdde285a5	proxy: limit concurrent wake_compute requests per endpoint (#5799 ) ## Problem A user can perform many database connections at the same instant of time - these will all cache miss and materialise as requests to the control plane. #5705 ## Summary of changes I am using a `DashMap` (a sharded `RwLock<HashMap>`) of endpoints -> semaphores to apply a limiter. If the limiter is enabled (permits > 0), the semaphore will be retrieved per endpoint and a permit will be awaited before continuing to call the wake_compute endpoint. ### Important details This dashmap would grow uncontrollably without maintenance. It's not a cache so I don't think an LRU-based reclamation makes sense. Instead, I've made use of the sharding functionality of DashMap to lock a single shard and clear out unused semaphores periodically. I ran a test in release, using 128 tokio tasks among 12 threads each pushing 1000 entries into the map per second, clearing a shard every 2 seconds (64 second epoch with 32 shards). The endpoint names were sampled from a gamma distribution to make sure some overlap would occur, and each permit was held for 1ms. The histogram for time to clear each shard settled between 256-512us without any variance in my testing. Holding a lock for under a millisecond for 1 of the shards does not concern me as blocking	2023-11-09 14:14:30 +00:00
duguorong009	39f8fd6945	feat: add `build_tag` env support for `set_build_info_metric` (#5576 ) - Add a new util `project_build_tag` macro, similar to `project_git_version` - Update the `set_build_info_metric` to accept and make use of `build_tag` info - Update all codes which use the `set_build_info_metric`	2023-10-27 10:47:11 +01:00
Conrad Ludgate	32126d705b	proxy refactor serverless (#4685 ) ## Problem Our serverless backend was a bit jumbled. As a comment indicated, we were handling SQL-over-HTTP in our `websocket.rs` file. I've extracted out the `sql_over_http` and `websocket` files from the `http` module and put them into a new module called `serverless`. ## Summary of changes ```sh mkdir proxy/src/serverless mv proxy/src/http/{conn_pool,sql_over_http,websocket}.rs proxy/src/serverless/ mv proxy/src/http/server.rs proxy/src/http/health_server.rs mv proxy/src/metrics proxy/src/usage_metrics.rs ``` I have also extracted the hyper server and handler from websocket.rs into `serverless.rs`	2023-10-25 15:43:03 +01:00
khanova	b514da90cb	Set up timeout for scram protocol execution (#5551 ) ## Problem Context: https://github.com/neondatabase/neon/issues/5511#issuecomment-1759649679 Some of out scram protocol execution timed out only after 17 minutes. ## Summary of changes Make timeout for scram execution meaningful and configurable.	2023-10-23 15:11:05 +01:00
Conrad Ludgate	543b8153c6	proxy: add flag to reject requests without proxy protocol client ip (#5417 ) ## Problem We need a flag to require proxy protocol (prerequisite for #5416) ## Summary of changes Add a cli flag to require client IP addresses. Error if IP address is missing when the flag is active.	2023-10-17 16:59:35 +01:00
khanova	dbb21d6592	Make http timeout configurable (#5532 ) ## Problem Currently http timeout is hardcoded to 15 seconds. ## Summary of changes Added an option to configure it via cli args. Context: https://neondb.slack.com/archives/C04DGM6SMTM/p1696941726151899	2023-10-12 11:41:07 +02:00
Conrad Ludgate	c216b16b0f	proxy: fix memory leak (#5472 ) ## Problem these JoinSets live for the duration of the process. they might have many millions of connections spawned on them and they never get cleared. Fixes #4672 ## Summary of changes Drain the connections as we go	2023-10-05 07:30:28 +01:00
Alex Chi Z	1309571f5d	proxy: switch to structopt for clap parsing (#4714 ) Using `#[clap]` for parsing cli opts, which is easier to maintain. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2023-07-14 19:11:01 +03:00
Conrad Ludgate	0626e0bfd3	proxy: refactor some error handling and shutdowns (#4684 ) ## Problem It took me a while to understand the purpose of all the tasks spawned in the main functions. ## Summary of changes Utilising the type system and less macros, plus much more comments, document the shutdown procedure of each task in detail	2023-07-13 11:03:37 +01:00
Joonas Koivunen	ef80a902c8	pg_sni_router: add session_id to more messages (#4403 ) See superceded #4390. - capture log in test - expand the span to cover init and error reporting - remove obvious logging by logging only unexpected	2023-06-02 14:59:10 +03:00
Stas Kelvich	b1329db495	fix sigterm handling	2023-04-28 17:15:43 +03:00
Stas Kelvich	9486d76b2a	Add tests for link auth to compute connection	2023-04-28 17:15:43 +03:00
Stas Kelvich	645e4f6ab9	use TLS in link proxy	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	53e5d18da5	Start passthrough earlier As soon as we have received the SSLRequest packet, and have figured out the hostname to connect to from the SNI, we can start passing through data. We don't need to parse the StartupPacket that the client will send next.	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	3813c703c9	Add an option for destination port. Makes it easier to test locally.	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	b15204fa8c	Fix --help, and required args	2023-04-28 17:15:43 +03:00
Alexey Kondratov	81c75586ab	Take port from SNI, formatting, make clippy happy	2023-04-28 17:15:43 +03:00
Anton Chaporgin	556fb1642a	fixed the way hostname is parsed	2023-04-28 17:15:43 +03:00
Stas Kelvich	23aca81943	Add SNI-based proxy router In order to not to create NodePorts for each compute we can setup services that accept connections on wildcard domains and then use information from domain name to route connection to some internal service. There are ready solutions for HTTPS and TLS connections but postgresql protocol uses opportunistic TLS and we haven't found any ready solutions. This patch introduces `pg_sni_router` which routes connections to `aaa--bbb--123.external.domain` to `aaa.bbb.123.internal.domain`. In the long run we can avoid console -> compute psql communications, but now this router seems to be the easier way forward.	2023-04-28 17:15:43 +03:00

28 Commits