Files
neon/pageserver
Erik Grinaker 65d1be6e90 pageserver: route gRPC requests to child shards (#12702)
## Problem

During shard splits, each parent shard is split and removed
incrementally. Only when all parent shards have split is the split
committed and the compute notified. This can take several minutes for
large tenants. In the meanwhile, the compute will be sending requests to
the (now-removed) parent shards.

This was (mostly) not a problem for the libpq protocol, because it does
shard routing on the server-side. The compute just sends requests to
some Pageserver, and the server will figure out which local shard should
serve it.

It is a problem for the gRPC protocol, where the client explicitly says
which shard it's talking to.

Touches [LKB-191](https://databricks.atlassian.net/browse/LKB-191).
Requires #12772.

## Summary of changes

* Add server-side routing of gRPC requests to any local child shards if
the parent does not exist.
* Add server-side splitting of GetPage batch requests straddling
multiple child shards.
* Move the `GetPageSplitter` into `pageserver_page_api`.

I really don't like this approach, but it avoids making changes to the
split protocol. I could be convinced we should change the split protocol
instead, e.g. to keep the parent shard alive until the split commits and
the compute has been notified, but we can also do that as a later change
without blocking the communicator on it.
2025-07-29 16:28:57 +00:00
..
2025-07-22 09:31:39 +00:00
2025-07-17 23:17:01 +00:00
2025-07-22 09:31:39 +00:00
2025-07-08 19:43:01 +00:00