The `NeonCli.init()` persists the non-default pageserver config values
for remote storage & `NeonEnvBuilder.pageserver_config_override` in
`pageserver.toml`.
We don't need to repeat them on each pageserver start after that.
- On a non-pooled start, do not reset the 'start_time' after launching
the HTTP service. In a non-pooled start, it's fair to include that in
the total startup time.
- When setting wait_for_spec_ms and resetting start_time, call
Utc::now() only once. It's a waste of cycles to call it twice, but also,
it ensures the time between setting wait_for_spec_ms and resetting
start_time is included in one or the other time period.
These differences should be insignificant in practice, in the
microsecond range, but IMHO it seems more logical and readable this way
too. Also fix and clarify some of the surrounding comments.
(This caught my eye while reviewing PR #7577)
Part of neondatabase/cloud#12047. Resolves#7239.
In short, this PR:
1. Adds `ComputeSpec.swap_size_bytes: Option<u64>`
2. Adds a flag to compute_ctl: `--resize-swap-on-bind`
3. Implements running `/neonvm/bin/resize-swap` with the value from the
compute spec before starting postgres, if both the value in the spec
*AND* the flag are specified.
4. Adds `sudo` to the final image
5. Adds a file in `/etc/sudoers.d` to allow `compute_ctl` to resize swap
Various bits of reasoning about design decisions in the added comments.
In short: We have both a compute spec field and a flag to make rollout
easier to implement. The flag will most likely be removed as part of
cleanups for neondatabase/cloud#12047.
This is the first step towards representing all of Pageserver
configuration as clean `serde::Serialize`able Rust structs in
`pageserver_api`.
The `neon_local` code will then use those structs instead of the crude
`toml_edit` / string concatenation that it does today.
refs https://github.com/neondatabase/neon/issues/7555
---------
Co-authored-by: Alex Chi Z <iskyzh@gmail.com>
## Problem
Too many connect_compute attempts can overwhelm postgres, getting the
connections stuck.
## Summary of changes
Limit number of connection attempts that can happen at a given time.
This pull request adds the scan interface. Scan operates on a sparse
keyspace and retrieves all the key-value pairs from the keyspaces.
Currently, scan only supports the metadata keyspace, and by default do
not retrieve anything from the ancestor branch. This should be fixed in
the future if we need to have some keyspaces that inherits from the
parent.
The scan interface reuses the vectored get code path by disabling the
missing key errors.
This pull request also changes the behavior of vectored get on aux file
v1/v2 key/keyspace: if the key is not found, it is simply not included in the
result, instead of throwing a missing key error.
TODOs in future pull requests: limit memory consumption, ensure the
search stops when all keys are covered by the image layer, remove
`#[allow(dead_code)]` once the code path is used in basebackups / aux
files, remove unnecessary fine-grained keyspace tracking in vectored get
(or have another code path for scan) to improve performance.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
The logic in Service::optimize_all would sometimes choose to migrate a
tenant to a secondary location that was only recently created, resulting
in Reconciler::live_migrate hitting its 5 minute timeout warming up the
location, and proceeding to attach a tenant to a location that doesn't
have a warm enough local set of layer files for good performance.
Closes: #7532
## Summary of changes
- Add a pageserver API for checking download progress of a secondary
location
- During `optimize_all`, connect to pageservers of candidate
optimization secondary locations, and check they are warm.
- During shard split, do heatmap uploads and start secondary downloads,
so that the new shards' secondary locations start downloading ASAP,
rather than waiting minutes for background downloads to kick in.
I have intentionally not implemented this by continuously reading the
status of locations, to avoid dealing with the scale challenge of
efficiently polling & updating 10k-100k locations status. If we
implement that in the future, then this code can be simplified to act
based on latest state of a location rather than fetching it inline
during optimize_all.
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.
Release notes: https://blog.rust-lang.org/2024/05/02/Rust-1.78.0.html
Prior update was in #7198
## Problem
After some time the load from heatmap uploads gets rather spiky. They're
unintentionally synchronising.
Chart (does this make a _boing_ sound in anyone else's head?):

## Summary of changes
- Add a helper `period_jitter` and apply a 5% jitter from downloader and
heatmap_uploader when updating the next runtime at the end of an
interation.
- Refactor existing places that we pick a startup interval into
`period_warmup`, so that the intent is obvious.
The current implementation of finding timeline gc cutoff Lsn(s) is done
while holding `Tenant::gc_cs`. In recent incidents long create branch
times were caused by holding the `Tenant::gc_cs` over extremely long
`Timeline::find_lsn_by_timestamp`. The fix is to find the GC cutoff
values before taking the `Tenant::gc_cs` lock. This change is safe to do
because the GC cutoff values and the branch points have no dependencies
on each other. In the case of `Timeline::find_gc_cutoff` taking a long
time with this change, we should no longer see `Tenant::gc_cs`
interfering with branch creation.
Additionally, the `Tenant::refresh_gc_info` is now tolerant of timeline
deletions (or any other failures to find the pitr_cutoff). This helps
with the synthetic size calculation being constantly completed instead
of having a break for a timely timeline deletion.
Fixes: #7560Fixes: #7587
## Problem
We were matching on `/tenant/:tenant_id` and
`/tenant/:tenant_id/timeline*`, but not non-timeline tenant sub-paths.
There aren't many: this was only noticeable when using the
synthetic_size endpoint by hand.
## Summary of changes
- Change the wildcard from `/tenant/:tenant_id/timeline*` to
`/tenant/:tenant_id/*`
- Add test lines that exercise this
## Problem
This test triggers layer download failures on demand. It is possible to
modify the failpoint
during a `Timeline::get_vectored` right between the vectored read and
it's validation read.
This means that one of the reads can fail while the other one succeeds
and vice versa.
## Summary of changes
These errors are expected, so allow them to happen.
Split `GcInfo` and replace `Timeline::update_gc_info` with a method that
simply finds gc cutoffs `Timeline::find_gc_cutoffs` to be combined as
`Timeline::gc_info` at the caller.
This change will be followed up with a change that finds the GC cutoff
values before taking the `Tenant::gc_cs` lock.
Cc: #7560