Commit Graph

25 Commits

Author SHA1 Message Date
arpad-m
982fce1e72 Fix rustdoc warnings and test cargo doc in CI (#4711)
## Problem

`cargo +nightly doc` is giving a lot of warnings: broken links, naked
URLs, etc.

## Summary of changes

* update the `proc-macro2` dependency so that it can compile on latest
Rust nightly, see https://github.com/dtolnay/proc-macro2/pull/391 and
https://github.com/dtolnay/proc-macro2/issues/398
* allow the `private_intra_doc_links` lint, as linking to something
that's private is always more useful than just mentioning it without a
link: if the link breaks in the future, at least there is a warning due
to that. Also, one might enable
[`--document-private-items`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html#documentation-options)
in the future and make these links work in general.
* fix all the remaining warnings given by `cargo +nightly doc`
* make it possible to run `cargo doc` on stable Rust by updating
`opentelemetry` and associated crates to version 0.19, pulling in a fix
that previously broke `cargo doc` on stable:
https://github.com/open-telemetry/opentelemetry-rust/pull/904
* Add `cargo doc` to CI to ensure that it won't get broken in the
future.

Fixes #2557

## Future work
* Potentially, it might make sense, for development purposes, to publish
the generated rustdocs somewhere, like for example [how the rust
compiler does
it](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html).
I will file an issue for discussion.
2023-07-15 05:11:25 +03:00
Stas Kelvich
cbf9a40889 Set a shorter timeout for the initial connection attempti in proxy.
In case we try to connect to an outdated address that is no longer valid, the
default behavior of Kubernetes is to drop the packets, causing us to wait for
the entire timeout period. We want to fail fast in such cases.

A specific case to consider is when we have cached compute node information
with a 5-minute TTL (Time To Live), but the user has executed a `/suspend` API
call, resulting in the nonexistence of the compute node.
2023-07-04 20:34:22 +03:00
Stas Kelvich
dad3519351 Add SQL-over-HTTP endpoint to Proxy
This commit introduces an SQL-over-HTTP endpoint in the proxy, with a JSON
response structure resembling that of the node-postgres driver. This method,
using HTTP POST, achieves smaller amortized latencies in edge setups due to
fewer round trips and an enhanced open connection reuse by the v8 engine.

This update involves several intricacies:
1. SQL injection protection: We employed the extended query protocol, modifying
   the rust-postgres driver to send queries in one roundtrip using a text
   protocol rather than binary, bypassing potential issues like those identified
   in https://github.com/sfackler/rust-postgres/issues/1030.

2. Postgres type compatibility: As not all postgres types have binary
   representations (e.g., acl's in pg_class), we adjusted rust-postgres to
   respond with text protocol, simplifying serialization and fixing queries with
   text-only types in response.

3. Data type conversion: Considering JSON supports fewer data types than
   Postgres, we perform conversions where possible, passing all other types as
   strings. Key conversions include:
   - postgres int2, int4, float4, float8 -> json number (NaN and Inf remain
     text)
   - postgres bool, null, text -> json bool, null, string
   - postgres array -> json array
   - postgres json and jsonb -> json object

4. Alignment with node-postgres: To facilitate integration with js libraries,
   we've matched the response structure of node-postgres, returning command tags
   and column oids. Command tag capturing was added to the rust-postgres
   functionality as part of this change.
2023-05-23 20:01:40 +03:00
Stas Kelvich
9486d76b2a Add tests for link auth to compute connection 2023-04-28 17:15:43 +03:00
Stas Kelvich
5d0ecadf7c Add support for non-SNI case in multi-cert proxy
When no SNI is provided use the default certificate, otherwise we can't
get to the options parameter which can be used to set endpoint name too.
That means that non-SNI flow will not work for CNAME domains in verify-full
mode.
2023-04-12 18:16:49 +03:00
Stas Kelvich
b1c2a6384a Set non-wildcard common names in link auth proxy
Old coding here ignored non-wildcard common names and passed None instead. With my recent changes
I started throwing an error in that case. Old logic doesn't seem to be a great choice, so instead
of passing None I actually set non-wildcard common names too. That way it is possible to avoid handling
cases with None in downstream code.
2023-04-07 01:24:27 +03:00
Stas Kelvich
d8df5237fa Aligne extra certificate name with default cert-manager names 2023-04-05 21:29:21 +03:00
Stas Kelvich
c3ca48c62b Support extra domain names for proxy.
Make it possible to specify directory where proxy will look up for
extra certificates. Proxy will iterate through subdirs of that directory
and load `key.pem` and `cert.pem` files from each subdir. Certs directory
structure may look like that:

  certs
  |--example.com
  |  |--key.pem
  |  |--cert.pem
  |--foo.bar
     |--key.pem
     |--cert.pem

Actual domain names are taken from certs and key, subdir names are
ignored.
2023-04-05 20:06:48 +03:00
Dmitry Ivanov
956b6f17ca [proxy] Handle some unix signals.
On the surface, this doesn't add much, but there are some benefits:

* We can do graceful shutdowns and thus record more code coverage data.

* We now have a foundation for the more interesting behaviors, e.g. "stop
  accepting new connections after SIGTERM but keep serving the existing ones".

* We give the otel machinery a chance to flush trace events before
  finally shutting down.
2023-02-17 15:32:14 +03:00
Dmitry Ivanov
ea0278cf27 [proxy] Implement compute node info cache (#3331)
This patch adds a timed LRU cache implementation and a compute node info cache on top of that.
Cache entries might expire on their own (default ttl=5mins) or become invalid due to real-world events,
e.g. compute node scale-to-zero event, so we add a connection retry loop with a wake-up call.

Solved problems:
- [x] Find a decent LRU implementation.
- [x] Implement timed LRU on top of that.
- [x] Cache results of `proxy_wake_compute` API call.
- [x] Don't invalidate newer cache entries for the same key.
- [x] Add cmdline configuration knobs (requires some refactoring).
- [x] Add failed connection estab metric.
- [x] Refactor auth backends to make things simpler (retries, cache
placement, etc).
- [x] Address review comments (add code comments + cleanup).
- [x] Retry `/proxy_wake_compute` if we couldn't connect to a compute
(e.g. stalled cache entry).
- [x] Add high-level description for `TimedLru`.

TODOs (will be addressed later):
- [ ] Add cache metrics (hit, spurious hit, miss).
- [ ] Synchronize http requests across concurrent per-client tasks
(https://github.com/neondatabase/neon/pull/3331#issuecomment-1399216069).
- [ ] Cache results of `proxy_get_role_secret` API call.
2023-02-01 17:11:41 +03:00
Anastasia Lubennikova
2cbe84b78f Proxy metrics (#3290)
Implement proxy metrics collection.
Only collect metric for outbound traffic.

Add proxy CLI parameters:
- metric-collection-endpoint
- metric-collection-interval.

Add test_proxy_metric_collection test.

Move shared consumption metrics code to libs/consumption_metrics.
Refactor the code.
2023-01-16 15:17:28 +00:00
Dmitry Ivanov
e9a103c09f [proxy] Pass extra parameters to the console (#2467)
With this change we now pass additional params
to the console's auth methods.
2022-09-21 21:42:47 +03:00
Dmitry Ivanov
87bf7be537 [proxy] Drop support for legacy cloud API (#2448)
Apparently, it no longer exists in the cloud.
2022-09-14 21:27:47 +03:00
Dmitry Ivanov
5f4ccae5c5 [proxy] Add the password hack authentication flow (#2095)
[proxy] Add the `password hack` authentication flow

This lets us authenticate users which can use neither
SNI (due to old libpq) nor connection string `options`
(due to restrictions in other client libraries).

Note: `PasswordHack` will accept passwords which are not
encoded in base64 via the "password" field. The assumption
is that most user passwords will be valid utf-8 strings,
and the rest may still be passed via "password_".
2022-07-25 17:23:10 +03:00
KlimentSerafimov
d059e588a6 Added invariant check for project name. (#1921)
Summary: Added invariant checking for project name. Refactored ClientCredentials and TlsConfig.

* Added formatting invariant check for project name:
**\forall c \in project_name . c \in [alnum] U {'-'}. 
** sni_data == <project_name>.<common_name>
* Added exhaustive tests for get_project_name.
* Refactored TlsConfig to contain common_name : Option<String>.
* Refactored ClientCredentials construction to construct project_name directly.
* Merged ProjectNameError into ClientCredsParseError.
* Tweaked proxy tests to accommodate refactored ClientCredentials construction semantics. 
* [Pytests] Added project option argument to test_proxy_select_1.
* Removed project param from Api since now it's contained in creds.
* Refactored &Option<String> -> Option<&str>.

Co-authored-by: Dmitrii Ivanov <dima@neon.tech>.
2022-06-22 09:34:24 -04:00
Dmitry Ivanov
b3ec6e0661 [proxy] Propagate SASL/SCRAM auth errors to the user
This will replace the vague (and incorrect) "Internal error" with a nice
and helpful authentication error, e.g. "password doesn't match".
2022-05-27 21:50:43 +03:00
Dmitry Ivanov
5d813f9738 [proxy] Refactoring
This patch attempts to fix some of the technical debt
we had to introduce in previous patches.
2022-05-27 21:50:43 +03:00
Dmitry Rodionov
38f2d165b7 allow TLS 1.2 in proxy to be compatible with older client libraries 2022-05-26 13:21:29 +03:00
Stas Kelvich
0323bb5870 [proxy] Refactor cplane API and add new console SCRAM auth API
Now proxy binary accepts `--auth-backend` CLI option, which determines
auth scheme and cluster routing method. Following backends are currently
implemented:

* legacy
    old method, when username ends with `@zenith` it uses md5 auth dbname as
    the cluster name; otherwise, it sends a login link and waits for the console
    to call back
* console
    new SCRAM-based console API; uses SNI info to select the destination
    cluster
* postgres
    uses postgres to select auth secrets of existing roles. Useful for local
    testing
* link
    sends login link for all usernames
2022-05-02 18:32:18 +03:00
Dmitry Ivanov
af0195b604 [proxy] Introduce cloud::Api for communication with Neon Cloud
* `cloud::legacy` talks to Cloud API V1.
* `cloud::api` defines Cloud API v2.
* `cloud::local` mocks the Cloud API V2 using a local postgres instance.
* It's possible to choose between API versions using the `--api-version` flag.
2022-05-02 18:32:18 +03:00
Dmitry Ivanov
c9d897f9b6 [proxy] Update rustls (#1510) 2022-04-15 12:06:25 +03:00
Dmitry Ivanov
705f51db27 [proxy] Propagate some errors to user (#1329)
* [proxy] Propagate most errors to user

This change enables propagation of most errors to the user
(e.g. auth and connectivity errors). Some of them will be
stripped of sensitive information.

As a side effect, most occurrences of `anyhow::Error` were
replaced with concrete error types.

* [proxy] Box weighty errors
2022-03-16 21:20:04 +03:00
Bojan Serafimov
65a0b2736b Add static router 2022-02-21 16:43:28 +03:00
Bojan Serafimov
92787159f7 Add client auth method option 2022-02-21 16:43:28 +03:00
Dmitry Ivanov
a47dade622 [proxy] Migrate to async
This change makes most parts of the code asynchronous, except
for the `mgmt` subsystem (we're going to drop it anyway).

Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2022-02-17 11:54:27 +03:00