Commit Graph

62 Commits

Author SHA1 Message Date
Folke Behrens
f14e45f0ce proxy: format imports with nightly rustfmt (#9414)
```shell
cargo +nightly fmt -p proxy -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true
```

These rust-analyzer settings for VSCode should help retain this style:
```json
  "rust-analyzer.imports.group.enable": true,
  "rust-analyzer.imports.prefix": "crate",
  "rust-analyzer.imports.merge.glob": false,
  "rust-analyzer.imports.granularity.group": "module",
  "rust-analyzer.imports.granularity.enforce": true,
```
2024-10-16 15:01:56 +02:00
Conrad Ludgate
ab5bbb445b proxy: refactor auth backends (#9271)
preliminary for #9270 

The auth::Backend didn't need to be in the mega ProxyConfig object, so I
split it off and passed it manually in the few places it was necessary.

I've also refined some of the uses of config I saw while doing this
small refactor.

I've also followed the trend and make the console redirect backend it's
own struct, same as LocalBackend and ControlPlaneBackend.
2024-10-11 20:14:52 +01:00
Folke Behrens
6baf1aae33 proxy: Demote some errors to warnings in logs (#9354) 2024-10-11 11:29:08 +02:00
Conrad Ludgate
75434060a5 local_proxy: integrate with pg_session_jwt extension (#9086) 2024-10-09 18:24:10 +01:00
Folke Behrens
54d1185789 proxy: Unalias hyper1 and replace one use of hyper0 in test (#9324)
Leaves one final use of hyper0 in proxy for the health service,
which requires some coordinated effort with other services.
2024-10-09 12:44:17 +02:00
Conrad Ludgate
94a5ca2817 proxy: auth broker (#8855)
Opens http2 connection to local-proxy and forwards requests over with
all headers and body

closes https://github.com/neondatabase/cloud/issues/16039
2024-09-30 20:43:45 +01:00
Conrad Ludgate
0a1ca7670c proxy: remove auth info from http conn info & fixup jwt api trait (#9047)
misc changes split out from #8855 

- **allow cloning the request context in a read-only fashion for
background tasks**
- **propagate endpoint and request context through the jwk cache**
- **only allow password based auth for md5 during testing**
- **remove auth info from conn info**
2024-09-19 15:09:30 +00:00
Folke Behrens
c5cd8577ff proxy: make sql-over-http max request/response sizes configurable (#9029) 2024-09-18 13:58:51 +02:00
Folke Behrens
af6f63617e proxy: clean up code and lints for 1.81 and 1.82 (#8945) 2024-09-06 17:13:30 +02:00
Conrad Ludgate
12850dd5e9 proxy: remove dead code (#8847)
By marking everything possible as pub(crate), we find a few dead code
candidates.
2024-08-27 12:00:35 +01:00
Folke Behrens
d6eede515a proxy: clippy lints: handle some low hanging fruit (#8829)
Should be mostly uncontroversial ones.
2024-08-26 15:16:54 +02:00
Conrad Ludgate
701cb61b57 proxy: local auth backend (#8806)
Adds a Local authentication backend. Updates http to extract JWT bearer
tokens and passes them to the local backend to validate.
2024-08-23 18:48:06 +00:00
Conrad Ludgate
0170611a97 proxy: small changes (#8752)
## Problem

#8736 is getting too big. splitting off some simple changes here

## Summary of changes

Local proxy wont always be using tls, so make it optional. Local proxy
wont be using ws for now, so make it optional. Remove a dead config var.
2024-08-20 14:16:27 +01:00
Tristan Partin
c624317b0e Decode the database name in SQL/HTTP connections
A url::Url does not hand you back a URL decoded value for path values,
so we must decode them ourselves.

Link: https://docs.rs/url/2.5.2/url/struct.Url.html#method.path
Link: https://docs.rs/url/2.5.2/url/struct.Url.html#method.path_segments
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-08-13 16:32:58 -05:00
Conrad Ludgate
7e08fbd1b9 Revert "proxy: update tokio-postgres to allow arbitrary config params (#8076)" (#8654)
This reverts #8076 - which was already reverted from the release branch
since forever (it would have been a breaking change to release for all
users who currently set TimeZone options). It's causing conflicts now so
we should revert it here as well.
2024-08-09 09:09:29 +01:00
Conrad Ludgate
ad0988f278 proxy: random changes (#8602)
## Problem

1. Hard to correlate startup parameters with the endpoint that provided
them.
2. Some configurations are not needed in the `ProxyConfig` struct.

## Summary of changes

Because of some borrow checker fun, I needed to switch to an
interior-mutability implementation of our `RequestMonitoring` context
system. Using https://docs.rs/try-lock/latest/try_lock/ as a cheap lock
for such a use-case (needed to be thread safe).

Removed the lock of each startup message, instead just logging only the
startup params in a successful handshake.

Also removed from values from `ProxyConfig` and kept as arguments.
(needed for local-proxy config)
2024-08-07 14:37:03 +01:00
Luca Bruno
8da3b547f8 proxy/http: switch to typed_json (#8377)
## Summary of changes

This switches JSON rendering logic to `typed_json` in order to
reduce the number of allocations in the HTTP responder path.

Followup from
https://github.com/neondatabase/neon/pull/8319#issuecomment-2216991760.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2024-07-15 12:38:52 +01:00
Luca BRUNO
c196cf6ac1 proxy/http: avoid spurious vector reallocations
This tweaks the rows-to-JSON rendering logic in order to avoid
allocating 0-sized temporary vectors and later growing them
to insert elements.
As the exact size is known in advance, both vectors can be built
with an exact capacity upfront. This will avoid further vector
growing/reallocation in the rendering hotpath.

Signed-off-by: Luca BRUNO <lucab@lucabruno.net>
2024-07-09 15:20:00 +01:00
Conrad Ludgate
78d9059fc7 proxy: update tokio-postgres to allow arbitrary config params (#8076)
## Problem

Fixes https://github.com/neondatabase/neon/issues/1287

## Summary of changes

tokio-postgres now supports arbitrary server params through the
`param(key, value)` method. Some keys are special so we explicitly
filter them out.
2024-06-24 10:20:27 +00:00
Conrad Ludgate
b998b70315 proxy: reduce some per-task memory usage (#8095)
## Problem

Some tasks are using around upwards of 10KB of memory at all times,
sometimes having buffers that swing them up to 30MB.

## Summary of changes

Split some of the async tasks in selective places and box them as
appropriate to try and reduce the constant memory usage. Especially in
the locations where the large future is only a small part of the total
runtime of the task.

Also, reduces the size of the CopyBuffer buffer size from 8KB to 1KB.

In my local testing and in staging this had a minor improvement. sadly
not the improvement I was hoping for :/ Might have more impact in
production
2024-06-19 13:34:15 +01:00
Conrad Ludgate
9a081c230f proxy: lazily parse startup pg params (#7905)
## Problem

proxy params being a `HashMap<String,String>` when it contains just
```
application_name: psql
database: neondb
user: neondb_owner
```
is quite wasteful allocation wise.

## Summary of changes

Keep the params in the wire protocol form, eg:
```
application_name\0psql\0database\0neondb\0user\0neondb_owner\0
```

Using a linear search for the map is fast enough at small sizes, which
is the normal case.
2024-05-30 11:02:38 +00:00
Conrad Ludgate
fddd11dd1a proxy: upload postgres connection options as json in the parquet upload (#7903)
## Problem

https://github.com/neondatabase/cloud/issues/9943

## Summary of changes

Captures the postgres options, converts them to json, uploads them in
parquet.
2024-05-30 11:10:27 +01:00
Conrad Ludgate
0c99e5ec6d proxy: cull http connections (#7632)
## Problem

Some HTTP client connections can stay open for quite a long time.

## Summary of changes

When there are too many HTTP client connections, pick a random
connection and gracefully cancel it.
2024-05-07 18:15:06 +01:00
Conrad Ludgate
e5c50bb12b proxy: rate limit authentication by masked IPv6. (#7316)
## Problem

Many users have access to ipv6 subnets (eg a /64). That gives them 2^64
addresses to play with

## Summary of changes

Truncate the address to /64 to reduce the attack surface.

Todo:
~~Will NAT64 be an issue here? AFAIU they put the IPv4 address at the
end of the IPv6 address. By truncating we will lose all that detail.~~
It's the same problem as a host sharing IPv6 addresses between clients.
I don't think it's up to us to solve. If a customer is getting DDoSed,
then they likely need to arrange a dedicated IP with us.
2024-04-16 14:16:34 +00:00
Conrad Ludgate
5299f917d6 proxy: replace prometheus with measured (#6717)
## Problem

My benchmarks show that prometheus is not very good.
https://github.com/conradludgate/measured

We're already using it in storage_controller and it seems to be working
well.

## Summary of changes

Replace prometheus with my new measured crate in proxy only.

Apologies for the large diff. I tried to keep it as minimal as I could.
The label types add a bit of boiler plate (but reduce the chance we
mistype the labels), and some of our custom metrics like CounterPair and
HLL needed to be rewritten.
2024-04-11 16:26:01 +00:00
Conrad Ludgate
c0ff4f18dc proxy: hyper1 for only proxy (#7073)
## Problem

hyper1 offers control over the HTTP connection that hyper0_14 does not.
We're blocked on switching all services to hyper1 because of how we use
tonic, but no reason we can't switch proxy over.

## Summary of changes

1. hyper0.14 -> hyper1
    1. self managed server
    2. Remove the `WithConnectionGuard` wrapper from `protocol2`
2. Remove TLS listener as it's no longer necessary
3. include first session ID in connection startup logs
2024-04-10 08:23:59 +00:00
Conrad Ludgate
d8da51e78a remove http timeout (#7291)
## Problem

https://github.com/neondatabase/cloud/issues/11051

additionally, I felt like the http logic was a bit complex.

## Summary of changes

1. Removes timeout for HTTP requests.
2. Split out header parsing to a `HttpHeaders` type.
3. Moved db client handling to `QueryData::process` and
`BatchQueryData::process` to simplify the logic of `handle_inner` a bit.
2024-04-03 11:23:26 +01:00
Anna Khanova
582cec53c5 proxy: upload consumption events to S3 (#7213)
## Problem

If vector is unavailable, we are missing consumption events.

https://github.com/neondatabase/cloud/issues/9826

## Summary of changes

Added integration with the consumption bucket.
2024-04-02 21:46:23 +02:00
Conrad Ludgate
3bd6551b36 proxy http cancellation safety (#7117)
## Problem

hyper auto-cancels the request futures on connection close.
`sql_over_http::handle` is not 'drop cancel safe', so we need to do some
other work to make sure connections are queries in the right way.

## Summary of changes

1. tokio::spawn the request handler to resolve the initial cancel-safety
issue
2. share a cancellation token, and cancel it when the request `Service`
is dropped.
3. Add a new log span to be able to track the HTTP connection lifecycle.
2024-03-14 08:20:56 +00:00
Conrad Ludgate
83855a907c proxy http error classification (#7098)
## Problem

Missing error classification for SQL-over-HTTP queries.
Not respecting `UserFacingError` for SQL-over-HTTP queries.

## Summary of changes

Adds error classification.
Adds user facing errors.
2024-03-13 07:35:49 +01:00
Conrad Ludgate
09699d4bd8 proxy: cancel http queries on timeout (#7031)
## Problem

On HTTP query timeout, we should try and cancel the current in-flight
SQL query.

## Summary of changes

Trigger a cancellation command in postgres once the timeout is reach
2024-03-12 11:52:00 +00:00
Joonas Koivunen
752bf5a22f build: clippy disallow futures::pin_mut macro (#7016)
`std` has had `pin!` macro for some time, there is no need for us to use
the older alternatives. Cannot disallow `tokio::pin` because tokio
macros use that.
2024-03-05 10:14:37 +00:00
Conrad Ludgate
48957e23b7 proxy: refactor span usage (#6946)
## Problem

Hard to find error reasons by endpoint for HTTP flow.

## Summary of changes

I want all root spans to have session id and endpoint id. I want all
root spans to be consistent.
2024-02-28 17:10:07 +04:00
Conrad Ludgate
e0af945f8f proxy: improve error classification (#6841)
## Problem

## Summary of changes

1. Classify further cplane API errors
2. add 'serviceratelimit' and make a few of the timeout errors return
that.
3. a few additional minor changes
2024-02-21 10:04:09 +00:00
Conrad Ludgate
21a86487a2 proxy: fix #6529 (#6807)
## Problem

`application_name` for HTTP is not being recorded

## Summary of changes

get `application_name` query param
2024-02-20 11:58:01 +01:00
Conrad Ludgate
789a71c4ee proxy: add more http logging (#6726)
## Problem

hard to see where time is taken during HTTP flow.

## Summary of changes

add a lot more for query state. add a conn_id field to the sql-over-http
span
2024-02-12 15:03:45 +00:00
Conrad Ludgate
98ec5c5c46 proxy: some more parquet data (#6711)
## Summary of changes

add auth_method and database to the parquet logs
2024-02-12 13:14:06 +00:00
Conrad Ludgate
cbd3a32d4d proxy: decode username and password (#6700)
## Problem

usernames and passwords can be URL 'percent' encoded in the connection
string URL provided by serverless driver.

## Summary of changes

Decode the parameters when getting conn info
2024-02-09 19:22:23 +00:00
Conrad Ludgate
96d89cde51 Proxy error reworking (#6453)
## Problem

Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and
doing a bit less radical changes. smaller commits.

We currently don't report error classifications in proxy as the current
error handling made it hard to do so.

## Summary of changes

1. Add a `ReportableError` trait that all errors will implement. This
provides the error classification functionality.
2. Handle Client requests a strongly typed error
    * this error is a `ReportableError` and is logged appropriately
3. The handle client error only has a few possible error types, to
account for the fact that at this point errors should be returned to the
user.
2024-02-09 15:50:51 +00:00
Conrad Ludgate
ea089dc977 proxy: add per query array mode flag (#6678)
## Problem

Drizzle needs to be able to configure the array_mode flag per query.

## Summary of changes

Adds an array_mode flag to the query data json that will otherwise
default to the header flag.
2024-02-09 10:29:20 +00:00
Anna Khanova
c63e3e7e84 Proxy: improve http-pool (#6577)
## Problem

The password check logic for the sql-over-http is a bit non-intuitive. 

## Summary of changes

1. Perform scram auth using the same logic as for websocket cleartext
password.
2. Split establish connection logic and connection pool.
3. Parallelize param parsing logic with authentication + wake compute.
4. Limit the total number of clients
2024-02-08 12:57:05 +01:00
Conrad Ludgate
6506fd14c4 proxy: more refactors (#6526)
## Problem

not really any problem, just some drive-by changes

## Summary of changes

1. move wake compute
2. move json processing
3. move handle_try_wake
4. move test backend to api provider
5. reduce wake-compute concerns
6. remove duplicate wake-compute loop
2024-02-02 16:07:35 +00:00
Conrad Ludgate
ec8dcc2231 flatten proxy flow (#6447)
## Problem

Taking my ideas from https://github.com/neondatabase/neon/pull/6283 and
doing a bit less radical changes. smaller commits.

Proxy flow was quite deeply nested, which makes adding more interesting
error handling quite tricky.

## Summary of changes

I recommend reviewing commit by commit.

1. move handshake logic into a separate file
2. move passthrough logic into a separate file
3. no longer accept a closure in CancelMap session logic
4. Remove connect_to_db, copy logic into handle_client
5. flatten auth_and_wake_compute in authenticate
6. record info for link auth
2024-01-29 17:38:03 +00:00
Anna Khanova
8253cf1931 proxy: Relax endpoint check (#6503)
## Problem

http-over-sql allowes host to be in format api.aws.... however it's not
the case for the websocket flow.

## Summary of changes

Relax endpoint check for the ws serverless connections.
2024-01-28 21:27:14 +00:00
Conrad Ludgate
210700d0d9 proxy: add newtype wrappers for string based IDs (#6445)
## Problem

too many string based IDs. easy to mix up ID types.

## Summary of changes

Add a bunch of `SmolStr` wrappers that provide convenience methods but
are type safe
2024-01-24 16:38:10 +00:00
Conrad Ludgate
7e7e9f5191 proxy: add more columns to parquet upload (#6405)
## Problem

Some fields were missed in the initial spec.

## Summary of changes

Adds a success boolean (defaults to false unless specifically marked as
successful).
Adds a duration_us integer that tracks how many microseconds were taken
from session start through to request completion.
2024-01-20 09:38:11 +00:00
Anna Khanova
3f2187eb92 Proxy relax sni check (#6323)
## Problem

Using the same domain name () for serverless driver can help with
connection caching.
https://github.com/neondatabase/neon/issues/6290

## Summary of changes

Relax SNI check.
2024-01-16 08:42:13 +00:00
Conrad Ludgate
0bac8ddd76 proxy: fix serverless error message info (#6279)
## Problem


https://github.com/neondatabase/serverless/issues/51#issuecomment-1878677318

## Summary of changes

1. When we have a db_error, use db_error.message() as the message.
2. include error position.
3. line should be a string (weird?)
4. `datatype` -> `dataType`
2024-01-15 16:43:19 +01:00
Conrad Ludgate
551f0cc097 proxy: refactor how neon-options are handled (#6306)
## Problem

HTTP connection pool was not respecting the PitR options.

## Summary of changes

1. refactor neon_options a bit to allow easier access to cache_key
2. make HTTP not go through `StartupMessageParams`
3. expose SNI processing to replace what was removed in step 2.
2024-01-11 14:58:31 +00:00
Conrad Ludgate
8a646cb750 proxy: add request context for observability and blocking (#6160)
## Summary of changes

### RequestMonitoring

We want to add an event stream with information on each request for
easier analysis than what we can do with diagnostic logs alone
(https://github.com/neondatabase/cloud/issues/8807). This
RequestMonitoring will keep a record of the final state of a request. On
drop it will be pushed into a queue to be uploaded.

Because this context is a bag of data, I don't want this information to
impact logic of request handling. I personally think that weakly typed
data (such as all these options) makes for spaghetti code. I will
however allow for this data to impact rate-limiting and blocking of
requests, as this does not _really_ change how a request is handled.

### Parquet

Each `RequestMonitoring` is flushed into a channel where it is converted
into `RequestData`, which is accumulated into parquet files. Each file
will have a certain number of rows per row group, and several row groups
will eventually fill up the file, which we then upload to S3.

We will also upload smaller files if they take too long to construct.
2024-01-08 11:42:43 +00:00