pageserver: support keys at different LSNs in one get page batch (#11494)

## Problem

Get page batching stops when we encounter requests at different LSNs.
We are leaving batching factor on the table.

## Summary of changes

The goal is to support keys with different LSNs in a single batch and
still serve them with a single vectored get.
Important restriction: the same key at different LSNs is not supported
in one batch. Returning different key
versions is a much more intrusive change.

Firstly, the read path is changed to support "scattered" queries. This
is a conceptually simple step from
https://github.com/neondatabase/neon/pull/11463. Instead of initializing
the fringe for one keyspace,
we do it for multiple at different LSNs and let the logic already
present into the fringe handle selection.

Secondly, page service code is updated to support batching at different
LSNs. Eeach request parsed from the wire determines its effective
request LSN and keeps it in mem for the batcher toinspect. The batcher
allows keys at
different LSNs in one batch as long one key is not requested at
different LSNs.

I'd suggest doing the first pass commit by commit to get a feel for the
changes.

## Results

I used the batching test from [Christian's
PR](https://github.com/neondatabase/neon/pull/11391) which increases the
change of batch breaks. Looking at the logs I think the new code is at
the max batching factor for the workload (we
only break batches due to them being oversized or because the executor
is idle).

```
Main:
Reasons for stopping batching: {'LSN changed': 22843, 'of batch size': 33417}
test_throughput[release-pg16-50-pipelining_config0-30-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 14.6662

My branch:
Reasons for stopping batching: {'of batch size': 37024}
test_throughput[release-pg16-50-pipelining_config0-30-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 19.8333
```

Related: https://github.com/neondatabase/neon/issues/10765
This commit is contained in:
Vlad Lazar
2025-04-14 10:05:29 +01:00
committed by GitHub
parent 8936a7abd8
commit a338984dc7
12 changed files with 591 additions and 265 deletions

View File

@@ -1255,6 +1255,7 @@ class NeonEnv:
"mode": "pipelined",
"execution": "concurrent-futures",
"max_batch_size": 32,
"batching": "scattered-lsn",
}
get_vectored_concurrent_io = self.pageserver_get_vectored_concurrent_io
@@ -1321,6 +1322,10 @@ class NeonEnv:
log.info("test may use old binaries, ignoring warnings about unknown config items")
ps.allowed_errors.append(".*ignoring unknown configuration item.*")
# Allow old software to start until https://github.com/neondatabase/neon/pull/11275
# lands in the compatiblity snapshot.
ps_cfg["page_service_pipelining"].pop("batching")
self.pageservers.append(ps)
cfg["pageservers"].append(ps_cfg)

View File

@@ -31,20 +31,28 @@ class PageServicePipeliningConfigSerial(PageServicePipeliningConfig):
class PageServicePipeliningConfigPipelined(PageServicePipeliningConfig):
max_batch_size: int
execution: str
batching: str
mode: str = "pipelined"
EXECUTION = ["concurrent-futures"]
BATCHING = ["uniform-lsn", "scattered-lsn"]
NON_BATCHABLE: list[PageServicePipeliningConfig] = [PageServicePipeliningConfigSerial()]
for max_batch_size in [1, 32]:
for execution in EXECUTION:
NON_BATCHABLE.append(PageServicePipeliningConfigPipelined(max_batch_size, execution))
for batching in BATCHING:
NON_BATCHABLE.append(
PageServicePipeliningConfigPipelined(max_batch_size, execution, batching)
)
BATCHABLE: list[PageServicePipeliningConfig] = []
for max_batch_size in [32]:
for execution in EXECUTION:
BATCHABLE.append(PageServicePipeliningConfigPipelined(max_batch_size, execution))
for batching in BATCHING:
BATCHABLE.append(
PageServicePipeliningConfigPipelined(max_batch_size, execution, batching)
)
@pytest.mark.parametrize(
@@ -300,7 +308,10 @@ def test_throughput(
PRECISION_CONFIGS: list[PageServicePipeliningConfig] = [PageServicePipeliningConfigSerial()]
for max_batch_size in [1, 32]:
for execution in EXECUTION:
PRECISION_CONFIGS.append(PageServicePipeliningConfigPipelined(max_batch_size, execution))
for batching in BATCHING:
PRECISION_CONFIGS.append(
PageServicePipeliningConfigPipelined(max_batch_size, execution, batching)
)
@pytest.mark.parametrize(

View File

@@ -16,6 +16,7 @@ def test_slow_flush(neon_env_builder: NeonEnvBuilder, neon_binpath: Path, kind:
"mode": "pipelined",
"max_batch_size": 32,
"execution": "concurrent-futures",
"batching": "uniform-lsn",
}
neon_env_builder.pageserver_config_override = patch_pageserver_toml