Commit Graph

4 Commits

Author SHA1 Message Date
Heikki Linnakangas
1dce2a9e74 Change how pageserver connection info is passed in compute spec (#12604)
Add a new 'pageserver_connection_info' field in the compute spec. It
replaces the old 'pageserver_connstring' field with a more complicated
struct that includes both libpq and grpc URLs, for each shard (or only
one of the the URLs, depending on the configuration). It also includes a
flag suggesting which one to use; compute_ctl now uses it to decide
which protocol to use for the basebackup.

This is backwards-compatible with everything that's in production. If
the control plane fills in `pageserver_connection_info`, compute_ctl
uses that. If it fills in the
`pageserver_connstring`/`shard_stripe_size` fields, it uses those. As
last resort, it uses the 'neon.pageserver_connstring' GUC from the list
of Postgres settings.

The 'grpc' flag in the endpoint config is now more of a suggestion, and
it's used to populate the 'prefer_protocol' flag in the compute spec.
Regardless of the flag, compute_ctl gets both URLs, so it can choose to
use libpq or grpc as it wishes. It currently always obeys the flag to
choose which method to use for getting the basebackup, but Postgres
itself will always use the libpq protocol. (That will be changed with
the new rust-based communicator project, which implements the gRPC
client in the compute).

After that, the `pageserver_connection_info.prefer_protocol` flag in the
spec file can be used to control whether compute_ctl uses grpc or libpq.
The actual compute's grpc usage will be controlled by the
`neon.enable_new_communicator` GUC (not yet; that will be introduced in
the future, with the new rust-base communicator project). It can be set
separately from 'prefer_protocol'.

Later:

- Once all old computes are gone, remove the code to pass
`neon.pageserver_connstring`
2025-07-29 22:20:05 +00:00
Dmitrii Kovalkov
dee73f0cb4 pageserver: implement max_total_size_bytes limit for basebackup cache (#12230)
## Problem
The cache was introduced as a hackathon project and the only supported
limit was the number of entries.
The basebackup entry size may vary. We need to have more control over
disk space usage to ship it to production.

- Part of https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Store the size of entries in the cache and use it to limit
`max_total_size_bytes`
- Add the size of the cache in bytes to metrics.
2025-06-17 15:08:59 +00:00
Dmitrii Kovalkov
f7ec7668a2 pageserver, tests: prepare test_basebackup_cache for --timelines-onto-safekeepers (#12143)
## Problem
- `test_basebackup_cache` fails in
https://github.com/neondatabase/neon/pull/11712 because after the
timelines on safekeepers are managed by storage controller, they do
contain proper start_lsn and the compute_ctl tool sends the first
basebackup request with this LSN.
- `Failed to prepare basebackup` log messages during timeline
initialization, because the timeline is not yet in the global timeline
map.

- Relates to https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Account for `timeline_onto_safekeepers` storcon's option in the test.
- Do not trigger basebackup prepare during the timeline initialization.
2025-06-05 12:04:37 +00:00
Dmitrii Kovalkov
136eaeb74a pageserver: basebackup cache (hackathon project) (#11989)
## Problem
Basebackup cache is on the hot path of compute startup and is generated
on every request (may be slow).

- Issue: https://github.com/neondatabase/cloud/issues/29353

## Summary of changes
- Add `BasebackupCache` which stores basebackups on local disk.
- Basebackup prepare requests are triggered by
`XLOG_CHECKPOINT_SHUTDOWN` records in the log.
- Limit the size of the cache by number of entries.
- Add `basebackup_cache_enabled` feature flag to TenantConfig.
- Write tests for the cache

## Not implemented yet
- Limit the size of the cache by total size in bytes

---------

Co-authored-by: Aleksandr Sarantsev <aleksandr@neon.tech>
2025-05-22 12:45:00 +00:00