rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-13 00:12:54 +00:00

Author	SHA1	Message	Date
Stas Kelvich	b1329db495	fix sigterm handling	2023-04-28 17:15:43 +03:00
Stas Kelvich	4ac6a9f089	add backward compatibility to proxy	2023-04-28 17:15:43 +03:00
Stas Kelvich	9486d76b2a	Add tests for link auth to compute connection	2023-04-28 17:15:43 +03:00
Stas Kelvich	040f736909	remove changes in main proxy that are now not needed	2023-04-28 17:15:43 +03:00
Stas Kelvich	645e4f6ab9	use TLS in link proxy	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	53e5d18da5	Start passthrough earlier As soon as we have received the SSLRequest packet, and have figured out the hostname to connect to from the SNI, we can start passing through data. We don't need to parse the StartupPacket that the client will send next.	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	3813c703c9	Add an option for destination port. Makes it easier to test locally.	2023-04-28 17:15:43 +03:00
Heikki Linnakangas	b15204fa8c	Fix --help, and required args	2023-04-28 17:15:43 +03:00
Alexey Kondratov	81c75586ab	Take port from SNI, formatting, make clippy happy	2023-04-28 17:15:43 +03:00
Anton Chaporgin	556fb1642a	fixed the way hostname is parsed	2023-04-28 17:15:43 +03:00
Stas Kelvich	23aca81943	Add SNI-based proxy router In order to not to create NodePorts for each compute we can setup services that accept connections on wildcard domains and then use information from domain name to route connection to some internal service. There are ready solutions for HTTPS and TLS connections but postgresql protocol uses opportunistic TLS and we haven't found any ready solutions. This patch introduces `pg_sni_router` which routes connections to `aaa--bbb--123.external.domain` to `aaa.bbb.123.internal.domain`. In the long run we can avoid console -> compute psql communications, but now this router seems to be the easier way forward.	2023-04-28 17:15:43 +03:00
Arseny Sher	f5b4697c90	Log session_id when proxy per client task errors out.	2023-04-27 19:08:22 +04:00
Arseny Sher	0112a602e1	Add timeout on proxy -> compute connection establishment. Otherwise we sit up to default tcp_syn_retries (about 2+ min) before gettings os error 110 if compute has been migrated to another pod.	2023-04-27 09:50:52 +04:00
Anastasia Lubennikova	92214578af	Fix proxy_io_bytes_per_client metric: use branch_id identifier properly. (#4084 ) It fixes the miscalculation of the metric for projects that use multiple branches for the same endpoint. We were under billing users with such projects. So we need to communicate the change in Release Notes.	2023-04-26 17:47:54 +03:00
Sasha Krassovsky	fd31fafeee	Make proxy shutdown when all connections are closed (#3764 ) ## Describe your changes Makes Proxy start draining connections on SIGTERM. ## Issue ticket number and link #3333	2023-04-13 19:31:30 +03:00
Stas Kelvich	5d0ecadf7c	Add support for non-SNI case in multi-cert proxy When no SNI is provided use the default certificate, otherwise we can't get to the options parameter which can be used to set endpoint name too. That means that non-SNI flow will not work for CNAME domains in verify-full mode.	2023-04-12 18:16:49 +03:00
Stas Kelvich	b1c2a6384a	Set non-wildcard common names in link auth proxy Old coding here ignored non-wildcard common names and passed None instead. With my recent changes I started throwing an error in that case. Old logic doesn't seem to be a great choice, so instead of passing None I actually set non-wildcard common names too. That way it is possible to avoid handling cases with None in downstream code.	2023-04-07 01:24:27 +03:00
Anastasia Lubennikova	6d01d835a8	[proxy] Report error if proxy_io_bytes_per_client metric has decreased	2023-04-06 23:14:07 +03:00
Stas Kelvich	d8df5237fa	Aligne extra certificate name with default cert-manager names	2023-04-05 21:29:21 +03:00
Stas Kelvich	c3ca48c62b	Support extra domain names for proxy. Make it possible to specify directory where proxy will look up for extra certificates. Proxy will iterate through subdirs of that directory and load `key.pem` and `cert.pem` files from each subdir. Certs directory structure may look like that: certs \|--example.com \| \|--key.pem \| \|--cert.pem \|--foo.bar \|--key.pem \|--cert.pem Actual domain names are taken from certs and key, subdir names are ignored.	2023-04-05 20:06:48 +03:00
Dmitry Ivanov	f85a61ceac	[proxy] Fix regression in logging For some reason, `tracing::instrument` proc_macro doesn't always print elements specified via `fields()` or even show that it's impossible (e.g. there's no Display impl). Work around this using the `?foo` notation. Before: 2023-04-03T14:48:06.017504Z INFO handle_client🤝 received SslRequest After: 2023-04-03T14:51:24.424176Z INFO handle_client{session_id=7bd07be8-3462-404e-8ccc-0a5332bf3ace}🤝 received SslRequest	2023-04-03 18:49:30 +03:00
Arseny Sher	a7ab53c80c	Forward framed read buf contents to compute before proxy pass. Otherwise they get lost. Normally buffer is empty before proxy pass, but this is not the case with pipeline mode of out npm driver; fixes connection hangup introduced by `b80fe41af3` for it. fixes https://github.com/neondatabase/neon/issues/3822	2023-03-15 14:32:41 +03:00
Joonas Koivunen	c23c8946a3	chore: clippies introduced with rust 1.68 (#3781 ) - handle automatically fixable future clippies - tune run-clippy.sh to remove macos specifics which we no longer have Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-03-14 15:29:02 +02:00
Dmitry Ivanov	2e4bf7cee4	[proxy] Immediately log all compute node connection errors.	2023-03-14 01:45:57 +03:00
Arthur Petukhovsky	d9a1329834	Make postgres_backend use generic IO type (#3789 ) - Support measuring inbound and outbound traffic in MeasuredStream - Start using MeasuredStream in safekeepers code	2023-03-13 12:18:10 +03:00
Anastasia Lubennikova	b7fddfa70d	Add branch_id field to proxy_io_bytes_per_client metric. Since we allow switching endpoints between different branches, it is important to use composite key. Otherwise, we may try to calculate delta between metric values for two different branches.	2023-03-10 15:00:48 +02:00
Arseny Sher	b80fe41af3	Refactor postgres protocol parsing. 1) Remove allocation and data copy during each message read. Instead, parsing functions now accept BytesMut from which data they form messages, with pointers (e.g. in CopyData) pointing directly into BytesMut buffer. Accordingly, move ConnectionError containing IO error subtype into framed.rs providing this and leave in pq_proto only ProtocolError. 2) Remove anyhow from pq_proto. 3) Move FeStartupPacket out of FeMessage. Now FeStartupPacket::parse returns it directly, eliminating dead code where user wants startup packet but has to match for others. proxy stream.rs is adapted to framed.rs with minimal changes. It also benefits from framed.rs improvements described above.	2023-03-09 20:45:56 +03:00
Arseny Sher	0d8ced8534	Remove sync postgres_backend, tidy up its split usage. - Add support for splitting async postgres_backend into read and write halfes. Safekeeper needs this for bidirectional streams. To this end, encapsulate reading-writing postgres messages to framed.rs with split support without any additional changes (relying on BufRead for reading and BytesMut out buffer for writing). - Use async postgres_backend throughout safekeeper (and in proxy auth link part). - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: https://github.com/neondatabase/neon/pull/3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split.	2023-03-09 20:45:56 +03:00
Arseny Sher	7627d85345	Move async postgres_backend to its own crate. To untie cyclic dependency between sync and async versions of postgres_backend, copy QueryError and some logging/error routines to postgres_backend.rs. This is temporal glue to make commits smaller, sync version will be dropped by the upcoming commit completely.	2023-03-09 20:45:56 +03:00
Joonas Koivunen	d7d3f451f0	Use tracing panic hook in all binaries (#3634 ) Enables tracing panic hook in addition to pageserver introduced in #3475: - proxy - safekeeper - storage_broker For proxy, a drop guard which resets the original std panic hook was added on the first commit. Other binaries don't need it so they never reset anything by `disarm`ing the drop guard. The aim of the change is to make sure all panics a) have span information b) are logged similar to other messages, not interleaved with other messages as happens right now. Interleaving happens right now because std prints panics to stderr, and other logging happens in stdout. If this was handled gracefully by some utility, the log message splitter would treat panics as belonging to the previous message because it expects a message to start with a timestamp. Cc: #3468	2023-02-21 10:03:55 +02:00
Anastasia Lubennikova	40799d8ae7	Add debug messages to catch abnormal consumption metric values	2023-02-17 17:57:45 +02:00
Dmitry Ivanov	d90cd36bcc	[proxy] Improve tracing spans here and there.	2023-02-17 15:32:14 +03:00
Dmitry Ivanov	956b6f17ca	[proxy] Handle some unix signals. On the surface, this doesn't add much, but there are some benefits: * We can do graceful shutdowns and thus record more code coverage data. * We now have a foundation for the more interesting behaviors, e.g. "stop accepting new connections after SIGTERM but keep serving the existing ones". * We give the otel machinery a chance to flush trace events before finally shutting down.	2023-02-17 15:32:14 +03:00
Heikki Linnakangas	6f9af0aa8c	[proxy] Enable OpenTelemetry tracing. This commit sets up OpenTelemetry tracing and exporter, so that they can be exported as OpenTelemetry traces as well. All outgoing HTTP requests will be traced. A separate (child) span is created for each outgoing HTTP request, and the tracing context is also propagated to the server in the HTTP headers. If tracing is enabled in the control plane and compute node too, you can now get an end-to-end distributed trace of what happens when a new connection is established, starting from the handshake with the client, creating the 'start_compute' operation in the control plane, starting the compute node, all the way to down to fetching the base backup and the availability checks in compute_ctl. Co-authored-by: Dmitry Ivanov <dima@neon.tech>	2023-02-17 15:32:14 +03:00
Dmitry Ivanov	a4d5c8085b	Move hacks to a dedicated module.	2023-02-16 22:10:56 +03:00
Dmitry Ivanov	edffe0dd9d	Extract password hack & cleartext hack	2023-02-16 22:10:56 +03:00
Heikki Linnakangas	d9c518b2cc	Refactor use_cleartext_password_flow. It's not a property of the credentials that we receive from the client, so remove it from ClientCredentials. Instead, pass it as an argument directly to 'authenticate' function, where it's actually used. All the rest of the changes is just plumbing to pass it through the call stack to 'authenticate'	2023-02-16 22:10:56 +03:00
Dmitry Ivanov	1d9d7c02db	[proxy] Don't forward empty `options` to compute nodes Clients may specify endpoint/project name via `options=project=...`, so we should not only remove `project=` from `options` but also drop `options` entirely, because connection pools don't support it. Discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1676464382670119	2023-02-15 22:05:03 +03:00
Dmitry Ivanov	3569c1bacd	[proxy] Fix: don't cache user & dbname in node info cache Upstream proxy erroneously stores user & dbname in compute node info cache entries, thus causing "funny" connection problems if such an entry is reused while connecting to e.g. a different DB on the same compute node. This PR fixes the problem but doesn't eliminate the root cause just yet. I'll revisit this code and make it more type-safe in the upcoming PR.	2023-02-14 17:54:01 +03:00
Dmitry Ivanov	eaff14da5f	[proxy] Restore INFO as the default tracing level Also move tracing init to its own function.	2023-02-13 17:09:43 +03:00
Arthur Petukhovsky	f383b4d540	Enable TCP_NODELAY for wss connections	2023-02-10 21:40:28 +03:00
Dmitry Ivanov	694150ce40	[proxy] Respect the magic `RUST_LOG` env variable Usage: `RUST_LOG=trace proxy ...`	2023-02-10 18:49:32 +03:00
Dmitry Ivanov	9657459d80	[proxy] Fix possible unsoundness in the websocket machinery (#3569 ) This PR replaces the ill-advised `unsafe Sync` impl with a de-facto standard way to solve the underlying problem. TLDR: - tokio::task::spawn requires future to be Send - ∀t. (t : Sync) <=> (&t : Send) - ∀t. (t : Send + !Sync) => (&t : !Send)	2023-02-10 12:45:38 +03:00
Dmitry Ivanov	96e78394f5	[proxy] Fix project (aka endpoint) init in the password hack handler (#3529 ) The project/endpoint should be set in the original (non-as_ref'd) creds, because we call `wake_compute` not only in `try_password_hack` but also later in the connection retry logic. This PR also removes the obsolete `as_ref` method and makes the code simpler because we no longer need this complication after a recent refactoring. Further action points: finally introduce typestate in creds (planned).	2023-02-02 22:56:15 +02:00
Dmitry Ivanov	ea0278cf27	[proxy] Implement compute node info cache (#3331 ) This patch adds a timed LRU cache implementation and a compute node info cache on top of that. Cache entries might expire on their own (default ttl=5mins) or become invalid due to real-world events, e.g. compute node scale-to-zero event, so we add a connection retry loop with a wake-up call. Solved problems: - [x] Find a decent LRU implementation. - [x] Implement timed LRU on top of that. - [x] Cache results of `proxy_wake_compute` API call. - [x] Don't invalidate newer cache entries for the same key. - [x] Add cmdline configuration knobs (requires some refactoring). - [x] Add failed connection estab metric. - [x] Refactor auth backends to make things simpler (retries, cache placement, etc). - [x] Address review comments (add code comments + cleanup). - [x] Retry `/proxy_wake_compute` if we couldn't connect to a compute (e.g. stalled cache entry). - [x] Add high-level description for `TimedLru`. TODOs (will be addressed later): - [ ] Add cache metrics (hit, spurious hit, miss). - [ ] Synchronize http requests across concurrent per-client tasks (https://github.com/neondatabase/neon/pull/3331#issuecomment-1399216069). - [ ] Cache results of `proxy_get_role_secret` API call.	2023-02-01 17:11:41 +03:00
Vadim Kharitonov	bc4f594ed6	Fix Sentry Version	2023-01-25 12:07:38 +01:00
danieltprice	424fd0bd63	Update auth.rs (#3349 ) Update SNI error message. Users now specify the endpoint ID when making a connection to Neon. This should be reflected in the error message.	2023-01-16 12:32:00 -04:00
Anastasia Lubennikova	2cbe84b78f	Proxy metrics (#3290 ) Implement proxy metrics collection. Only collect metric for outbound traffic. Add proxy CLI parameters: - metric-collection-endpoint - metric-collection-interval. Add test_proxy_metric_collection test. Move shared consumption metrics code to libs/consumption_metrics. Refactor the code.	2023-01-16 15:17:28 +00:00
Kirill Bulatov	bce4233d3a	Rework Cargo.toml dependencies (#3322 ) * Use workspace variables from cargo, coming with rustc [1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22) See https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table and https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table sections. Now, all dependencies in all non-root `Cargo.toml` files are defined as ``` clap.workspace = true ``` sometimes, when extra features are needed, as ``` bytes = {workspace = true, features = ['serde'] } ``` With the actual declarations (with shared features and version numbers/file paths/etc.) in the root Cargo.toml. Features are additive: https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#inheriting-a-dependency-from-a-workspace * Uses the mechanism above to set common, 2021, edition and license across the workspace * Mechanically bumps a few dependencies * Updates hakari format, as it suggested: ``` work/neon/neon kb/cargo-templated ❯ cargo hakari generate info: no changes detected info: new hakari format version available: 3 (current: 2) (add or update `dep-format-version = "3"` in hakari.toml, then run `cargo hakari generate && cargo hakari manage-deps`) ```	2023-01-13 18:13:34 +02:00
Kirill Bulatov	fe8cef3427	Use ready! rustc 1.64 macro (#3315 ) rustc [1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22) had brought `ready!` macro: https://doc.rust-lang.org/stable/std/task/macro.ready.html Use it to shorten the code slightly.	2023-01-12 21:27:34 +02:00

1 2 3 4

173 Commits