We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.
[Announcement blog
post](https://blog.rust-lang.org/2025/04/03/Rust-1.86.0.html).
Prior update was in #10914.
## Problem
close https://github.com/neondatabase/neon/issues/11279
## Summary of changes
* Allow passthrough of other methods in tenant timeline shard0
passthrough of storcon.
* Passthrough mark invisible API in storcon.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Tenants in attachment state `Stale` can't upload layers, and don't run
compaction, but still do periodic L0 layer flushes in the tenant
housekeeping loop. If the tenant remains stuck in stale mode, this
causes a large buildup of L0 layers, causing logging, metrics increases,
and possibly alerts.
Resolves#11245.
## Summary of changes
Don't perform periodic layer flushes in stale attachment state.
## Problem
ref https://github.com/neondatabase/neon/issues/11279
Imagine we have a branch with 3 snapshots A, B, and C:
```
base---+---+---+---main
\-A \-B \-C
base=100G, base-A=1G, A-B=1G, B-C=1G, C-main=1G
```
at this point, the synthetic size should be 100+1+1+1+1=104G.
after the deletion, the structure looks like:
```
base---+---+---+
\-A \-B \-C
```
If we simply assume main never exists, the size will be calculated as
size(A) + size(B) + size(C)=300GB, which obviously is not what the user
would expect.
The correct way to do this is to assume part of main still exists, that
is to say, set C-main=1G:
```
base---+---+---+main
\-A \-B \-C
```
And we will get the correct synthetic size of 100G+1+1+1=103G.
## Summary of changes
* Do not generate gc cutoff point for invisible branches.
* Use the same LSN as the last branchpoint for branch end.
* Remove test_api_handler for mark_invisible.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Add an optional `safekeepers` field to `TimelineInfo` which is returned
by the storcon upon timeline creation if the
`--timelines-onto-safekeepers` flag is enabled. It contains the list of
safekeepers chosen.
Other contexts where we return `TimelineInfo` do not contain the
`safekeepers` field, sadly I couldn't make this more type safe like done
in Rust via `TimelineCreateResponseStorcon`, as there is no way of
flattening or inheritance (and I don't that duplicating the entire type
for some minor type safety improvements is worth it).
The storcon side has been done in #11058.
Part of https://github.com/neondatabase/cloud/issues/16176
cc https://github.com/neondatabase/cloud/issues/16796
## Problem
`CompactFlags::NoYield` was a bit inconvenient, since every caller
except for the background compaction loop should generally set it (e.g.
HTTP API calls, tests, etc). It was also inconsistent with
`CompactionOutcome::YieldForL0`.
## Summary of changes
Invert `CompactFlags::NoYield` as `CompactFlags::YieldForL0`. There
should be no behavioral changes.
## Problem
Previously, L0 flushes would wait for uploads, as a simple form of
backpressure. However, this prevented flush pipelining and upload
parallelism. It has since been disabled by default and replaced by L0
compaction backpressure.
Touches https://github.com/neondatabase/cloud/issues/24664.
## Summary of changes
This patch removes L0 flush upload waits, along with the
`l0_flush_wait_upload`. This can't be merged until the setting has been
removed across the fleet.
To help with narrowing down
https://github.com/neondatabase/cloud/issues/26362, we make the case
more noisy where we are wait for the shutdown of a specific task (in the
case of that issue, the `gc_loop`).
## Problem
- We need to support multiple SSL CA certificates for graceful root CA
certificate rotation.
- Closes: https://github.com/neondatabase/cloud/issues/25971
## Summary of changes
- Parses `ssl_ca_file` as a pem bundle, which may contain multiple
certificates. Single pem cert is a valid pem bundle, so the change is
backward compatible.
## Problem
- Part of https://github.com/neondatabase/neon/issues/11113
- Building a new `reqwest::Client` for every request is expensive
because it parses CA certs under the hood. It's noticeable in storcon's
flamegraph.
## Summary of changes
- Reuse one `reqwest::Client` for all API calls to avoid parsing CA
certificates every time.
## Problem
part of https://github.com/neondatabase/neon/issues/11279
## Summary of changes
The invisible flag is used to exclude a timeline from synthetic size
calculation. For the first step, let's persist this flag. Most of the
code are following the `is_archived` modification flow.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
SSL certs are loaded only during start up. It doesn't allow the rotation
of short-lived certificates without server restart.
- Closes: https://github.com/neondatabase/cloud/issues/25525
## Summary of changes
- Implement `ReloadingCertificateResolver` which reloads certificates
from disk periodically.
## Problem
Pageservers use unencrypted HTTP requests for storage controller API.
- Closes: https://github.com/neondatabase/cloud/issues/25524
## Summary of changes
- Replace hyper0::server::Server with http_utils::server::Server in
storage controller.
- Add HTTPS handler for storage controller API.
- Support `ssl_ca_file` in pageserver.
## Problem
The pageserver upcall api was designed to work with control plane or the
storage controller.
We have completed the transition period and now the upcall api only
targets the storage controller.
## Summary of changes
Rename types accordingly and tweak some comments.
## Problem
We're seeing timeline creation failures that look suspiciously like some
race with the cleanup-deletion of initdb temporary directories. I
couldn't spot the bug, but we can make it a bit easier to debug.
Related: https://github.com/neondatabase/neon/issues/11296
## Summary of changes
- Avoid surfacing distracting ENOENT failure to delete as a log error --
this is fine, and can happen if timeline is cancelled while doing
initdb, or if initdb itself has an error where it doesn't write the dir
(this error is surfaced separately)
- Log after purging initdb temp directories
The only difference between
- `pageserver_api::models::TenantConfig` and
- `pageserver::tenant::config::TenantConfOpt`
at this point is that `TenantConfOpt` serializes with
`skip_serializing_if = Option::is_none`.
That is an efficiency improvement for all the places that currently
serde `models::TenantConfig` because new serializations will no longer
write `$fieldname: null` for each field that is `None` at runtime.
This should be particularly beneficial for Storcon, which stores
JSON-serialized `models::TenantConfig` in its DB.
# Behavior Changes
This PR changes the serialization behavior: we omit `None` fields
instead of serializing `$fieldname: null`).
So it's a data format change (see section on compatibility below).
And it changes API responses from Storcon and Pageserver.
## API Response Compatibility
Storcon returns the location description.
Afaik it is passed through into
- storcon_cli output
- storcon UI in console admin UI
These outputs will no longer contain `$fieldname: null` values,
which de-bloats the output (good).
But in storcon UI, it also serves as an editor "default", which
will be eliminated after a storcon with this PR is released.
## Data Format Compatibility
Backwards compat: new software reading old serialized data will
deserialize to the same runtime value because all the field types
are exactly the same and `skip_serializing_if` does not affect
deserialization.
Forward compat: old software reading data serialized by new software
will map absence fields in the serialized form to runtime value
`Option::None`. This is serde default behavior, see this playground
to convince yourself:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=f7f4e1a169959a3085b6158c022a05eb
The `serde(with="humantime_serde")` however behaves strangely:
if used on an `Option<Duration>`, it still requires the field to be
present,
unlike the serde default behavior shown in the previous paragraph.
The workaround is to set `serde(default)`.
Previously it was set on each individual field, but, we do have the
container attribute, so, set it there.
This requires deriving a `Default` impl, which, because all fields are
`Option`,
is non-magic.
See my notes here:
https://gist.github.com/problame/eddbc225a5d12617e9f2c6413e0cf799
# Future Work
We should have separate types (& crates) for
- runtime types configuration (e.g. PageServerConf::tenant_config,
AttachedLocationConf)
- `config-v1` file pageserver local disk file format
- `mgmt API`
- `pageserver.toml`
Right now they all use the same, which is convenient but makes it hard
to reason about compatibility breakage.
# Refs
- corresponding docs.neon.build PR
https://github.com/neondatabase/docs/pull/470
In
-
https://github.com/neondatabase/neon/pull/10993#issuecomment-2690428336
I added infinite retries for buffered writer flush IOs, primarily to
gracefully handle ENOSPC but more generally so that the buffered writer
is not left in a state where reads from the surrounding InMemoryLayer
cause panics.
However, I didn't add cancellation sensitivity, which is concerning
because then there is no way to detach a timeline/tenant that is
encountering the write IO errors.
That’s a legitimate scenario in the case of some edge case bug.
See the #10993 description for details.
This PR
- first makes flush loop infallible, enabled by infinite retries
- then adds sensitivity to `Timeline::cancel` to the flush loop, thereby
making it fallible in one specific way again
- finally fixes the InMemoryLayer/EphemeralFile/BufferedWriter
amalgamate to remain read-available after flush loop is cancelled.
The support for read-availability after cancellation is necessary so
that reads from the InMemoryLayer that are already queued up behind the
RwLock that wraps the BufferedWriter won't panic because of the
`mutable=None` that we leave behind in case the flush loop gets
cancelled.
# Alternatives
One might think that we can only ship the change for read-availability
if flush encounters an error, without the infinite retrying and/or
cancellation sensitivity complexity.
The problem with that is that read-availability sounds good but is
really quite useless, because we cannot ingest new WAL without a
writable InMemoryLayer. Thus, very soon after we transition to read-only
mode, reads from compute are going to wait anyway, but on `wait_lsn`
instead of the RwLock, because ingest isn't progressing.
Thus, having the infinite flush retries still makes more sense because
they're just "slowness" to the user, whereas wait_lsn is hard errors.
## Problem
`test_metadata_image_creation ` became flaky with #11212, since image
compaction may yield to L0 compaction.
## Summary of changes
Set `NoYield` when compacting in tenant tests.
## Problem
`l0_flush_delay_threshold` has already been set to 30 in production for
a couple of weeks. Let's harmonize the default.
## Summary of changes
Update `DEFAULT_L0_FLUSH_DELAY_FACTOR` to 3 such that the default
`l0_flush_delay_threshold` is `3 * compaction_threshold`.
This differs from the production setting, which is hardcoded to 30 (with
`compaction_threshold` at 10), and is more appropriate for any tenants
that have custom `compaction_threshold` overrides.
# Problem
We leave too few observability breadcrumbs in the case where wait_lsn is
exceptionally slow.
# Changes
- refactor: extract the monitoring logic out of `log_slow` into
`monitor_slow_future`
- add global + per-timeline counter for time spent waiting for wait_lsn
- It is updated while we're still waiting, similar to what we do for
page_service response flush.
- add per-timeline counterpair for started & finished wait_lsn count
- add slow-logging to leave breadcrumbs in logs, not just metrics
For the slow-logging, we need to consider not flooding the logs during a
broker or network outage/blip.
The solution is a "log-streak-level" concurrency limit per timeline.
At any given time, there is at most one slow wait_lsn that is logging
the "still running" and "completed" sequence of logs.
Other concurrent slow wait_lsn's don't log at all.
This leaves at least one breadcrumb in each timeline's logs if some
wait_lsn was exceptionally slow during a given period.
The full degree of slowness can then be determined by looking at the
per-timeline metric.
# Performance
Reran the `bench_log_slow` benchmark, no difference, so, existing call
sites are fine.
We do use a Semaphore, but only try_acquire it _after_ things have
already been determined to be slow. So, no baseline overhead
anticipated.
# Refs
-
https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222
## Problem
There was a panic on staging that compaction didn't find any keys. This
is possible if all layers selected for compaction does not contain any
keys within the current shard.
## Summary of changes
Make panic an error. In the future, we can try creating an empty image
layer so that GC can clean up those layers. Otherwise, for now, we can
only rely on shard ancestor compaction to remove these data.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
`compaction_l0_first` has already been enabled in production for a
couple of weeks.
## Summary of changes
Enable `compaction_l0_first` by default.
Also set `CompactFlags::NoYield` in `timeline_checkpoint_handler`, to
ensure explicitly requested compaction runs to completion. This endpoint
is mainly used in tests, and caused some flakiness where tests expected
compaction to complete.
## Problem
close https://github.com/neondatabase/neon/issues/10310
## Summary of changes
This patch adds a new behavior for the detach_ancestor API: detach with
multi-level ancestor and no reparenting. Though we can potentially
support multi-level + do reparenting / single-level + no-reparenting in
the future, as it's not required for the recovery/snapshot epic, I'd
prefer keeping things simple now that we only handle the old one and the
new one instead of supporting the full feature matrix.
I only added a test case of successful detaching instead of testing
failures. I'd like to make this into staging and add more tests in the
future.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
When the storage controller and Pageserver loads tenants from persisted
storage, it uses `ShardIdentity::unsharded()` for unsharded tenants.
However, this replaces the persisted stripe size of unsharded tenants
with the default stripe size.
This doesn't really matter for practical purposes, since the stripe size
is meaningless for unsharded tenants anyway, but can cause consistency
check failures if the persisted stripe size differs from the default.
This was seen in #11168, where we change the default stripe size.
Touches #11168.
## Summary of changes
Carry over the persisted stripe size from `TenantShardPersistence` for
unsharded tenants, and from `LocationConf` on Pageservers.
Also add bounds checks for type casts when loading persisted shard
metadata.
## Problem
`info_span!` is only used in a `linux` branch, causing the unused lint
to fire on macOS.
## Summary of changes
Fully qualify the `info_span!` use.
We want to export performance traces from the pageserver in OTEL format.
End goal is to see them in Grafana.
To this end, there are two changes here:
1. Update the `tracing-utils` crate to allow for explicitly specifying
the export configuration. Pageserver configuration is loaded from a file
on start-up. This allows us to use the same flow for export configs
there.
2. Update the `utils::logging::init` common entry point to set up OTEL
tracing infrastructure if requested. Note that an entirely different
tracing subscriber is used. This is to avoid interference with the
existing tracing set-up. For now, no service uses this functionality.
PR to plug this into the pageserver is
[here](https://github.com/neondatabase/neon/pull/11140).
Related https://github.com/neondatabase/neon/issues/9873
## Problem
This field was retained for backward compat only in
https://github.com/neondatabase/neon/pull/10707.
Once https://github.com/neondatabase/cloud/pull/25233 is released,
nothing external will be reading this field.
Internally, this was a mandatory field so storage controller is still
trying to decode it, so we must do this removal in two steps: this PR
makes the field optional, and after one release we can fully remove it.
Related: https://github.com/neondatabase/cloud/issues/24250
## Summary of changes
- Rename field to `_unused`
- Remove field from swagger
- Make field optional
# Problem
If the Pageserver ingest path
(InMemoryLayer=>EphemeralFile=>BufferedWriter)
encounters ENOSPC or any other write IO error when flushing the mutable
buffer
of the BufferedWriter, the buffered writer is left in a state where
subsequent _reads_
from the InMemoryLayer it will cause a `must not use after we returned
an error` panic.
The reason is that
1. the flush background task bails on flush failure,
2. causing the `FlushHandle::flush` function to fail at channel.recv()
and
3. causing the `FlushHandle::flush` function to bail with the flush
error,
4. leaving its caller `BufferedWriter::flush` with
`BufferedWriter::mutable = None`,
5. once the InMemoryLayer's RwLock::write guard is dropped, subsequent
reads can enter,
6. those reads find `mutable = None` and cause the panic.
# Context
It has always been the contract that writes against the BufferedWriter
API
must not be retried because the writer/stream-style/append-only
interface makes no atomicity guarantees ("On error, did nothing or a
piece of the buffer get appended?").
The idea was that the error would bubble up to upper layers that can
throw away the buffered writer and create a new one. (See our [internal
error handling policy document on how to handle e.g.
`ENOSPC`](c870a50bc0/src/storage/handling_io_and_logical_errors.md (L36-L43))).
That _might_ be true for delta/image layer writers, I haven't checked.
But it's certainly not true for the ingest path: there are no provisions
to throw away an InMemoryLayer that encountered a write error an
reingest the WAL already written to it.
Adding such higher-level retries would involve either resetting
last_record_lsn to a lower value and restarting walreceiver. The code
isn't flexible enough to do that, and such complexity likely isn't worth
it given that write errors are rare.
# Solution
The solution in this PR is to retry _any_ failing write operation
_indefinitely_ inside the buffered writer flush task, except of course
those that are fatal as per `maybe_fatal_err`.
Retrying indefinitely ensures that `BufferedWriter::mutable` is never
left `None` in the case of IO errors, thereby solving the problem
described above.
It's a clear improvement over the status quo.
However, while we're retrying, we build up backpressure because the
`flush` is only double-buffered, not infinitely buffered.
Backpressure here is generally good to avoid resource exhaustion, **but
blocks reads** and hence stalls GetPage requests because InMemoryLayer
reads and writes are mutually exclusive.
That's orthogonal to the problem that is solved here, though.
## Caveats
Note that there are some remaining conditions in the flush background
task where it can bail with an error. I have annotated one of them with
a TODO comment.
Hence the `FlushHandle::flush` is still fallible and hence the overall
scenario of leaving `mutable = None` on the bail path is still possible.
We can clean that up in a later commit.
Note also that retrying indefinitely is great for temporary errors like
ENOSPC but likely undesirable in case the `std::io::Error` we get is
really due to higher-level logic bugs.
For example, we could fail to flush because the timeline or tenant
directory got deleted and VirtualFile's reopen fails with ENOENT.
Note finally that cancellation is not respected while we're retrying.
This means we will block timeline/tenant/pageserver shutdown.
The reason is that the existing cancellation story for the buffered
writer background task was to recv from flush op channel until the
sending side (FlushHandle) is explicitly shut down or dropped.
Failing to handle cancellation carries the operational risk that even if
a single timeline gets stuck because of a logic bug such as the one laid
out above, we must still restart the whole pageserver process.
# Alternatives Considered
As pointed out in the `Context` section, throwing away a InMemoryLayer
that encountered an error and reingesting the WAL is a lot of complexity
that IMO isn't justified for such an edge case.
Also, it's wasteful.
I think it's a local optimum.
A more general and simpler solution for ENOSPC is to `abort()` the
process and run eviction on startup before bringing up the rest of
pageserver.
I argued for it in the past, the pro arguments are still valid and
complete:
https://neondb.slack.com/archives/C033RQ5SPDH/p1716896265296329
The trouble at the time was implementing eviction on startup.
However, maybe things are simpler now that we are fully storcon-managed
and all tenants have secondaries.
For example, if pageserver `abort()`s on ENOSPC and then simply don't
respond to storcon heartbeats while we're running eviction on startup,
storcon will fail tenants over to the secondary anyway, giving us all
the time we need to clean up.
The downside is that if there's a systemic space management bug, above
proposal will just propagate the problem to other nodes. But I imagine
that because of the delays involved with filling up disks, the system
might reach a half-stable state, providing operators more time to react.
# Demo
Intermediary commit `a03f335121480afc0171b0f34606bdf929e962c5` is demoed
in this (internal) screen recording:
https://drive.google.com/file/d/1nBC6lFV2himQ8vRXDXrY30yfWmI2JL5J/view?usp=drive_link
# Perf Testing
Ran `bench_ingest` on tmpfs, no measurable difference.
Spans are uniquely owned by the flush task, and the span stack isn't too
deep, so, enter and exit should be cheap.
Plus, each flush takes ~150us with direct IO enabled, so, not _that_
high frequency event anyways.
# Refs
- fixes https://github.com/neondatabase/neon/issues/10856
## Problem
This should also resolve the test flakiness of `test_gc_feedback`.
close https://github.com/neondatabase/neon/issues/11144
## Summary of changes
If `NoYield` is set, do not yield in gc-compaction.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Investigate https://github.com/neondatabase/neon/issues/11159
## Summary of changes
This doesn't fix the issue, but at least we can narrow down the cause
next time it happens by logging ancestor referenced layer cnt even if
it's 0.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
In #11122, we want to split shards once the logical size of the largest
timeline exceeds a split threshold. However, `get_top_tenants` currently
only returns `max_logical_size`, which tracks the max _total_ logical
size of a timeline across all shards.
This is problematic, because the storage controller needs to fetch a
list of N tenants that are eligible for splits, but the API doesn't
currently have a way to express this. For example, with a split
threshold of 1 GB, a tenant with `max_logical_size` of 4 GB is eligible
to split if it has 1 or 2 shards, but not if it already has 4 shards. We
need to express this in per-shard terms, otherwise the `get_top_tenants`
endpoint may end up only returning tenants that can't be split, blocking
splits entirely.
Touches https://github.com/neondatabase/neon/pull/11122.
Touches https://github.com/neondatabase/cloud/issues/22532.
## Summary of changes
Add `TenantShardItem::max_logical_size_per_shard` containing
`max_logical_size / shard_count`, and
`TenantSorting::MaxLogicalSizePerShard` to order and filter by it.
# Refs
- fixes https://github.com/neondatabase/neon/issues/6107
# Problem
`VirtualFile` currently parses the path it is opened with to identify
the `tenant,shard,timeline` labels to be used for the `STORAGE_IO_SIZE`
metric.
Further, for each read or write call to VirtualFile, it uses
`with_label_values` to retrieve the correct metrics object, which under
the hood is a global hashmap guarded by a parking_lot mutex.
We perform tens of thousands of reads and writes per second on every
pageserver instance; thus, doing the mutex lock + hashmap lookup is
wasteful.
# Changes
Apply the technique we use for all other timeline-scoped metrics to
avoid the repeat `with_label_values`: add it to `TimelineMetrics`.
Wrap `TimelineMetrics` into an `Arc`.
Propagate the `Arc<TimelineMetrics>` down do `VirtualFile`, and use
`Timeline::metrics::storage_io_size`.
To avoid contention on the `Arc<TimelineMetrics>`'s refcount atomics
between different connection handlers for the same timeline, we wrap it
into another Arc.
To avoid frequent allocations, we store that Arc<Arc<TimelineMetrics>>
inside the per-connection timeline cache.
Preliminary refactorings to enable this change:
- https://github.com/neondatabase/neon/pull/11001
- https://github.com/neondatabase/neon/pull/11030
# Performance
I ran the benchmarks in
`test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py`
on an `i3en.3xlarge` because that's what we currently run them on.
None of the benchmarks shows a meaningful difference in latency or
throughput or CPU utilization.
I would have expected some improvement in the
many-tenants-one-client-each workload because they all hit that hashmap
constantly, and clone the same `UintCounter` / `Arc` inside of it.
But apparently the overhead is miniscule compared to the remaining work
we do per getpage.
Yet, since the changes are already made, the added complexity is
manageable, and the perf overhead of `with_label_values` demonstrable in
micro-benchmarks, let's have this change anyway.
Also, propagating TimelineMetrics through RequestContext might come in
handy down the line.
The micro-benchmark that demonstrates perf impact of
`with_label_values`, along with other pitfalls and mitigation techniques
around the `metrics`/`prometheus` crate:
- https://github.com/neondatabase/neon/pull/11019
# Alternative Designs
An earlier iteration of this PR stored an `Arc<Arc<Timeline>>` inside
`RequestContext`.
The problem is that this risks reference cycles if the RequestContext
gets stored in an object that is owned directly or indirectly by
`Timeline`.
Ideally, we wouldn't be using this mess of Arc's at all and propagate
Rust references instead.
But tokio requires tasks to be `'static`, and so, we wouldn't be able to
propagate references across task boundaries, which is incompatible with
any sort of fan-out code we already have (e.g. concurrent IO) or future
code (parallel compaction).
So, opt for Arc for now.
We use the `metrics` / `prometheus` crate in the Pageserver code base.
This PR demonstrates
- typical performance pitfalls with that crate
- our current set of techniques to avoid most of these pitfalls.
refs
- https://github.com/neondatabase/neon/issues/10948
- https://github.com/neondatabase/neon/pull/7202
- I applied the `label_values__cache_label_values_lookup` technique
there.
- It didn't yield measurable results in high-level benchmarks though.
## Problem
As part of the disaster recovery tool. Partly for
https://github.com/neondatabase/neon/issues/9114.
## Summary of changes
* Add a new pageserver API to force patch the fields in index_part and
modify the timeline internal structures.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Part of https://github.com/neondatabase/neon/issues/9114
## Summary of changes
gc-compaction could take a long time in some cases, for example, if the
job split heuristics is wrong and we selected a too large region for
compaction that can't be finished within a reasonable amount of time. We
will give up such tasks and yield to L0 compaction. Each gc-compaction
sub-compaction job is atomic and cannot be split further so we have to
give up (instead of storing a state and continue later as in image
creation).
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
part of https://github.com/neondatabase/neon/issues/9114
## Summary of changes
* Add timers for each phase of the gc-compaction.
* Add a final ratio computation to directly show the garbage collection
ratio in the logs.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>