## Problem
GetPage bulk requests such as prefetches and vacuum can head-of-line
block foreground requests, causing increased latency.
Touches #11735.
Requires #12469.
## Summary of changes
* Use dedicated channel/client/stream pools for bulk GetPage requests.
* Use lower concurrency but higher queue depth for bulk pools.
* Make pool limits configurable.
* Require unbounded client pool for stream pool, to avoid accidental
starvation.
## Problem
For the communicator, we need a rich Pageserver gRPC client.
Touches #11735.
Requires #12434.
## Summary of changes
This patch adds an initial rich Pageserver gRPC client. It supports:
* Sharded tenants across multiple Pageservers.
* Pooling of connections, clients, and streams for efficient resource
use.
* Concurrent use by many callers.
* Internal handling of GetPage bidirectional streams, with pipelining
and error handling.
* Automatic retries.
* Observability.
The client is still under development. In particular, it needs GetPage
batch splitting, shard map updates, and performance optimization. This
will be addressed in follow-up PRs.
This patch tightens up `page_api::Client`. It's mostly superficial
changes, but also adds a new constructor that takes an existing gRPC
channel, for use with the communicator connection pool.
## Problem
The gRPC API does not provide LSN leases.
## Summary of changes
* Add LSN lease support to the gRPC API.
* Use gRPC LSN leases for static computes with `grpc://` connstrings.
* Move `PageserverProtocol` into the `compute_api::spec` module and
reuse it.
## Problem
`compute_ctl` should support gRPC base backups.
Requires #12111.
Requires #12243.
Touches #11926.
## Summary of changes
Support `grpc://` connstrings for `compute_ctl` base backups.
## Problem
gRPC base backups use gRPC compression. However, this has two problems:
* Base backup caching will cache compressed base backups (making gRPC
compression pointless).
* Tonic does not support varying the compression level, and zstd default
level is 10% slower than gzip fastest level.
Touches https://github.com/neondatabase/neon/issues/11728.
Touches https://github.com/neondatabase/cloud/issues/29353.
## Summary of changes
This patch adds a gRPC parameter `BaseBackupRequest::compression`
specifying the compression algorithm. It also moves compression into
`send_basebackup_tarball` to reduce code duplication.
A follow-up PR will integrate the base backup cache with gRPC.
The 1.88.0 stable release is near (this Thursday). We'd like to fix most
warnings beforehand so that the compiler upgrade doesn't require
approval from too many teams.
This is therefore a preparation PR (like similar PRs before it).
There is a lot of changes for this release, mostly because the
`uninlined_format_args` lint has been added to the `style` lint group.
One can read more about the lint
[here](https://rust-lang.github.io/rust-clippy/master/#/uninlined_format_args).
The PR is the result of `cargo +beta clippy --fix` and `cargo fmt`. One
remaining warning is left for the proxy team.
---------
Co-authored-by: Conrad Ludgate <conrad@neon.tech>
This improves `pagebench getpage-latest-lsn` gRPC support by:
* Using `page_api::Client`.
* Removing `--protocol`, and using the `page-server-connstring` scheme
instead.
* Adding `--compression` to enable zstd compression.
## Problem
Pagebench does not support gRPC for `basebackup` benchmarks.
Requires #12243.
Touches #11728.
## Summary of changes
Add gRPC support via gRPC connstrings, e.g. `pagebench basebackup
--page-service-connstring grpc://localhost:51051`.
Also change `--gzip-probability` to `--no-compression`, since this must
be specified per-client for gRPC.
## Problem
The gRPC page service should support compression.
Requires #12111.
Touches #11728.
Touches https://github.com/neondatabase/cloud/issues/25679.
## Summary of changes
Add support for gzip and zstd compression in the server, and a client
parameter to enable compression.
This will need further benchmarking under realistic network conditions.
## Problem
Full basebackups are used in tests, and may be useful for debugging as
well, so we should support them in the gRPC API.
Touches #11728.
## Summary of changes
Add `GetBaseBackupRequest::full` to generate full base backups.
The libpq implementation also allows specifying `prev_lsn` for full
backups, i.e. the end LSN of the previous WAL record. This is omitted in
the gRPC API, since it's not used by any tests, and presumably of
limited value since it's autodetected. We can add it later if we find
that we need it.
## Problem
When converting `proto::GetPageRequest` into `page_api::GetPageRequest`
and validating the request, errors are returned as `tonic::Status`. This
will tear down the GetPage stream, which is disruptive and unnecessary.
## Summary of changes
Emit invalid request errors as `GetPageResponse` with an appropriate
`status_code` instead.
Also move the conversion from `tonic::Status` to `GetPageResponse` out
into the stream handler.
## Problem
The gRPC base backup implementation has a few issues: chunks are not
properly bounded, and it's not possible to omit the LSN.
Touches #11728.
## Summary of changes
* Properly bound chunks by using a limited writer.
* Use an `Option<Lsn>` rather than a `ReadLsn` (the latter requires an
LSN).
## Problem
The `page_api` domain types are missing a few derives.
## Summary of changes
Add `Clone`, `Copy`, and `Debug` derives for all types where
appropriate.
## Problem
Currently, `page_api` domain types validate message invariants both when
converting Protobuf → domain and domain → Protobuf. This is annoying for
clients, because they can't use stream combinators to convert streamed
requests (needed for hot path performance), and also performs the
validation twice in the common case.
Blocks #12099.
## Summary of changes
Only validate the Protobuf → domain type conversion, i.e. on the
receiver side, and make domain → Protobuf infallible. This is where it
matters -- the Protobuf types are less strict than the domain types, and
receivers should expect all sorts of junk from senders (they're not
required to validate anyway, and can just construct an invalid message
manually).
Also adds a missing `impl From<CheckRelExistsRequest> for
proto::CheckRelExistsRequest`.
## Problem
The gRPC `page_api` domain types used smallvecs to avoid heap
allocations in the common case where a single page is requested.
However, this is pointless: the Protobuf types use a normal vec, and
converting a smallvec into a vec always causes a heap allocation anyway.
## Summary of changes
Use a normal `Vec` instead of a `SmallVec` in `page_api` domain types.
## Problem
We should expose the page service over gRPC.
Requires #12093.
Touches #11728.
## Summary of changes
This patch adds an initial page service implementation over gRPC. It
ties in with the existing `PageServerHandler` request logic, to avoid
the implementations drifting apart for the core read path.
This is just a bare-bones functional implementation. Several important
aspects have been omitted, and will be addressed in follow-up PRs:
* Limited observability: minimal tracing, no logging, limited metrics
and timing, etc.
* Rate limiting will currently block.
* No performance optimization.
* No cancellation handling.
* No tests.
I've only done rudimentary testing of this, but Pagebench passes at
least.
## Problem
The page API gRPC errors need a few tweaks to integrate better with the
GetPage machinery.
Touches https://github.com/neondatabase/neon/issues/11728.
## Summary of changes
* Add `GetPageStatus::InternalError` for internal server errors.
* Rename `GetPageStatus::Invalid` to `InvalidRequest` for clarity.
* Rename `status` and `GetPageStatus` to `status_code` and
`GetPageStatusCode`.
* Add an `Into<tonic::Status>` implementation for `ProtocolError`.
## Problem
For the gRPC Pageserver API, we should convert the Protobuf types to
stricter, canonical Rust types.
Touches https://github.com/neondatabase/neon/issues/11728.
## Summary of changes
Adds Rust domain types that mirror the Protobuf types, with conversion
and validation.
## Problem
A binary Protobuf schema descriptor can be used to expose an API
reflection service, which in turn allows convenient usage of e.g.
`grpcurl` against the gRPC server.
Touches #11728.
## Summary of changes
* Generate a binary schema descriptor as
`pageserver_page_api::proto::FILE_DESCRIPTOR_SET`.
* Opportunistically rename the Protobuf package from `page_service` to
`page_api`.
## Problem
For the [communicator
project](https://github.com/neondatabase/company_projects/issues/352),
we want to move to gRPC for the page service protocol.
Touches #11728.
## Summary of changes
This patch adds an experimental gRPC Protobuf schema for the page
service. It is equivalent to the current page service, but with several
improvements, e.g.:
* Connection multiplexing.
* Reduced head-of-line blocking.
* Client-side batching.
* Explicit tenant shard routing.
* GetPage request classification (normal vs. prefetch).
* Explicit rate limiting ("slow down" response status).
The API is exposed as a new `pageserver/page_api` package. This is
separate from the `pageserver_api` package to reduce the dependency
footprint for the communicator. The longer-term plan is to also split
out e.g. the WAL ingestion service to a separate gRPC package, e.g.
`pageserver/wal_api`.
Subsequent PRs will: add Rust domain types for the Protobuf types,
expose a gRPC server, and implement the page service.
Preliminary prototype benchmarks of this gRPC API is within 10% of
baseline libpq performance. We'll do further benchmarking and
optimization as the implementation lands in `main` and is deployed to
staging.