## Problem
This PR refactors some error handling to avoid log spam on
tenant/timeline shutdown.
- "ignoring failure to find gc cutoffs: timeline shutting down." logs
(https://github.com/neondatabase/neon/issues/8012)
- "synthetic_size_worker: failed to calculate synthetic size for tenant
...: Failed to refresh gc_info before gathering inputs: tenant shutting
down", for example here:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9502988669/index.html#suites/3fc871d9ee8127d8501d607e03205abb/1a074a66548bbcea
Closes: https://github.com/neondatabase/neon/issues/8012
## Summary of changes
- Refactor: Add a PageReconstructError variant to GcError: this is the
only kind of error that find_gc_cutoffs can emit.
- Functional change: only ignore shutdown PageReconstructError variant:
for other variants, treat it as a real error
- Refactor: add a structured CalculateSyntheticSizeError type and use it
instead of anyhow::Error in synthetic size calculations
- Functional change: while iterating through timelines gathering logical
sizes, only drop out if the whole tenant is cancelled: individual
timeline cancellations indicate deletion in progress and we can just
ignore those.
## Problem
This test could fail with a timeout waiting for tenant deletions.
Tenant deletions could get tripped up on nodes transitioning from
offline to online at the moment of the deletion. In a previous
reconciliation, the reconciler would skip detaching a particular
location because the node was offline, but then when we do the delete
the node is marked as online and can be picked as the node to use for
issuing a deletion request. This hits the "Unexpectedly still attached
path", which would still work if the caller kept calling DELETE, but if
a caller does a Delete,get,get,get poll, then it doesn't work because
the GET calls fail after we've marked the tenant as detached.
## Summary of changes
Fix the undesirable storage controller behavior highlighted by this test
failure:
- Change tenant deletion flow to _always_ wait for reconciliation to
succeed: it was unsound to proceed and return 202 if something was still
attached, because after the 202 callers can no longer GET the tenant.
Stabilize the test:
- Add a reconcile_until_idle to the test, so that it will not have
reconciliations running in the background while we mark a node online.
This test is not meant to be a chaos test: we should test that kind of
complexity elsewhere.
- This reconcile_until_idle also fixes another failure mode where the
test might see a None for a tenant location because a reconcile was
mutating it
(https://neon-github-public-dev.s3.amazonaws.com/reports/pr-7288/9500177581/index.html#suites/8fc5d1648d2225380766afde7c428d81/4acece42ae00c442/)
It remains the case that a motivated tester could produce a situation
where a DELETE gives a 500, when precisely the wrong node transitions
from offline to available at the precise moment of a deletion (but the
500 is better than returning 202 and then failing all subsequent GETs).
Note that nodes don't go through the offline state during normal
restarts, so this is super rare. We should eventually fix this by making
DELETE to the pageserver implicitly detach the tenant if it's attached,
but that should wait until nobody is using the legacy-style deletes (the
ones that use 202 + polling)
This failed once with `relation "test" does not exist` when trying to
run the query on the standby. It's possible that the standby is started
before the CREATE TABLE is processed in the pageserver, and the standby
opens up for queries before it has received the CREATE TABLE transaction
from the primary. To fix, wait for the standby to catch up to the
primary before starting to run the queries.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8025/9483658488/index.html
If a standby is started right after switching to a new WAL segment, the
request in the SLRU download request would point to the beginning of the
segment (e.g. 0/5000000), while the not-modified-since LSN would point
to just after the page header (e.g. 0/5000028). It's effectively the
same position, as there cannot be any WAL records in between, but the
pageserver rightly errors out on any request where the request LSN <
not-modified since LSN.
To fix, round down the not-modified since LSN to the beginning of the
page like the request LSN.
Fixes issue #8030
## Problem
Testcase page bench test_pageserver_max_throughput_getpage_at_latest_lsn
had been deactivated because it was flaky.
We now ignore copy fail error messages like in
270d3be507/test_runner/regress/test_pageserver_getpage_throttle.py (L17-L20)
and want to reactivate it to see it it is still flaky
## Summary of changes
- reactivate the test in CI
- ignore CopyFail error message during page bench test cases
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
- Split the first and second parts of the test to two separate tests
- In the first test, disable the aggressive GC, compaction, and
autovacuum. They are only needed by the second test. I'd like to get the
first test to a point that the VM page is never all-zeros. Disabling
autovacuum in the first test is hopefully enough to accomplish that.
- Compare the full page images, don't skip page header. After fixing the
previous point, there should be no discrepancy. LSN still won't match,
though, because of commit 387a36874c.
Fixes issue https://github.com/neondatabase/neon/issues/7984
The S3 scrubber contains "S3" in its name, but we want to make it
generic in terms of which storage is used (#7547). Therefore, rename it
to "storage scrubber", following the naming scheme of already existing
components "storage broker" and "storage controller".
Part of #7547
We've stored metadata as bytes within the `index_part.json` for
long fixed reasons. #7693 added support for reading out normal json
serialization of the `TimelineMetadata`.
Change the serialization to only write `TimelineMetadata` as json for
going forward, keeping the backward compatibility to reading the
metadata as bytes. Because of failure to include `alias = "metadata"` in
#7693, one more follow-up is required to make the switch from the old
name to `"metadata": <json>`, but that affects only the field name in
serialized format.
In documentation and naming, an effort is made to add enough warning
signs around TimelineMetadata so that it will receive no changes in the
future. We can add those fields to `IndexPart` directly instead.
Additionally, the path to cleaning up `metadata.rs` is documented in the
`metadata.rs` module comment. If we must extend `TimelineMetadata`
before that, the duplication suggested in [review comment] is the way to
go.
[review comment]:
https://github.com/neondatabase/neon/pull/7699#pullrequestreview-2107081558
Quite a few existing test cases create their own timelines instead of
using the default one. This pull request highlights that and hopefully
people can write simpler tests in the future.
Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Yuchen Liang <70461588+yliang412@users.noreply.github.com>
As seen with the pgvector 0.7.0 index builds, we can receive large
batches of images, leading to very large L0 layers in the range of 1GB.
These large layers are produced because we are only able to roll the
layer after we have witnessed two different Lsns in a single
`DataDirModification::commit`. As the single Lsn batches of images can
span over multiple `DataDirModification` lifespans, we will rarely get
to write two different Lsns in a single `put_batch` currently.
The solution is to remember the TimelineWriterState instead of eagerly
forgetting it until we really open the next layer or someone else
flushes (while holding the write_guard).
Additional changes are test fixes to avoid "initdb image layer
optimization" or ignoring initdb layers for assertion.
Cc: #7197 because small `checkpoint_distance` will now trigger the
"initdb image layer optimization"
A simple API to collect some statistics after compaction to easily
understand the result.
The tool reads the layer map, and analyze range by range instead of
doing single-key operations, which is more efficient than doing a
benchmark to collect the result. It currently computes two key metrics:
* Latest data access efficiency, which finds how many delta layers /
image layers the system needs to iterate before returning any key in a
key range.
* (Approximate) PiTR efficiency, as in
https://github.com/neondatabase/neon/issues/7770, which is simply the
number of delta files in the range. The reason behind that is, assume no
image layer is created, PiTR efficiency is simply the cost of collect
records from the delta layers, and the replay time. Number of delta
files (or in the future, estimated size of reads) is a simple yet
efficient way of estimating how much effort the page server needs to
reconstruct a page.
Signed-off-by: Alex Chi Z <chi@neon.tech>
In #7927 I needed to fix this test case, but the fixes should be
possible to land irrespective of the layer ingestion code change.
The most important fix is the behavior if an image layer is found: the
assertion message formatting raises a runtime error, which obscures the
fact that we found an image layer.
Closes#7406.
## Problem
When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is
exposed to handle that case, which verbosely logs the error. However, we
don't benefit from having the full backtrace provided by anyhow in this
case.
## Summary of changes
This PR introduces a new `ApiError` type to handle errors caused by
cancelled request more robustly.
- A new enum variant `ApiError::Cancelled`
- Currently the cancelled request is mapped to status code 500.
- Need to handle this error in proxy's `http_util` as well.
- Added a failpoint test to simulate cancelled `get_lsn_by_timestamp`
request.
Signed-off-by: Yuchen Liang <yuchen@neon.tech>
## Problem
As described in #7952, the controller's attempt to reconcile a tenant
before finally deleting it can get hung up waiting for the compute
notification hook to accept updates.
The fact that we try and reconcile a tenant at all during deletion is
part of a more general design issue (#5080), where deletion was
implemented as an operation on attached tenant, requiring the tenant to
be attached in order to delete it, which is not in principle necessary.
Closes: #7952
## Summary of changes
- In the pageserver deletion API, only do the traditional deletion path
if the tenant is attached. If it's secondary, then tear down the
secondary location, and then do a remote delete. If it's not attached at
all, just do the remote delete.
- In the storage controller, instead of ensuring a tenant is attached
before deletion, do a best-effort detach of the tenant, and then call
into some arbitrary pageserver to issue a deletion of remote content.
The pageserver retains its existing delete behavior when invoked on
attached locations. We can remove this later when all users of the API
are updated to either do a detach-before-delete. This will enable
removing the "weird" code paths during startup that sometimes load a
tenant and then immediately delete it, and removing the deletion markers
on tenants.
fixes https://github.com/neondatabase/neon/issues/7790 (duplicating most
of the issue description here for posterity)
# Background
From the time before always-authoritative `index_part.json`, we had to
handle duplicate layers. See the RFC for an illustration of how
duplicate layers could happen:
a8e6d259cb/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md (L41-L50)
As of #5198 , we should not be exposed to that problem anymore.
# Problem 1
We still have
1. [code in
Pageserver](82960b2175/pageserver/src/tenant/timeline.rs (L4502-L4521))
than handles duplicate layers
2. [tests in the test
suite](d9dcbffac3/test_runner/regress/test_duplicate_layers.py (L15))
that demonstrates the problem using a failpoint
However, the test in the test suite doesn't use the failpoint to induce
a crash that could legitimately happen in production.
What is does instead is to return early with an `Ok()`, so that the code
in Pageserver that handles duplicate layers (item 1) actually gets
exercised.
That "return early" would be a bug in the routine if it happened in
production.
So, the tests in the test suite are tests for their own sake, but don't
serve to actually regress-test any production behavior.
# Problem 2
Further, if production code _did_ (it nowawdays doesn't!) create a
duplicate layer, the code in Pageserver that handles the condition (item
1 above) is too little and too late:
* the code handles it by discarding the newer `struct Layer`; that's
good.
* however, on disk, we have already overwritten the old with the new
layer file
* the fact that we do it atomically doesn't matter because ...
* if the new layer file is not bit-identical, then we have a cache
coherency problem
* PS PageCache block cache: caches old bit battern
* blob_io offsets stored in variables, based on pre-overwrite bit
pattern / offsets
* => reading based on these offsets from the new file might yield
different data than before
# Solution
- Remove the test suite code pertaining to Problem 1
- Move & rename test suite code that actually tests RFC-27
crash-consistent layer map.
- Remove the Pageserver code that handles duplicate layers too late
(Problem 1)
- Use `RENAME_NOREPLACE` to prevent over-rename the file during
`.finish()`, bail with an error if it happens (Problem 2)
- This bailing prevents the caller from even trying to insert into the
layer map, as they don't even get a `struct Layer` at hand.
- Add `abort`s in the place where we have the layer map lock and check
for duplicates (Problem 2)
- Note again, we can't reach there because we bail from `.finish()` much
earlier in the code.
- Share the logic to clean up after failed `.finish()` between image
layers and delta layers (drive-by cleanup)
- This exposed that test `image_layer_rewrite` was overwriting layer
files in place. Fix the test.
# Future Work
This PR adds a new failure scenario that was previously "papered over"
by the overwriting of layers:
1. Start a compaction that will produce 3 layers: A, B, C
2. Layer A is `finish()`ed successfully.
3. Layer B fails mid-way at some `put_value()`.
4. Compaction bails out, sleeps 20s.
5. Some disk space gets freed in the meantime.
6. Compaction wakes from sleep, another iteration starts, it attempts to
write Layer A again. But the `.finish()` **fails because A already
exists on disk**.
The failure in step 5 is new with this PR, and it **causes the
compaction to get stuck**.
Before, it would silently overwrite the file and "successfully" complete
the second iteration.
The mitigation for this is to `/reset` the tenant.
## Problem
This was an oversight when adding heatmaps: because they are at the top
level of the tenant, they aren't included in the catch-all list & delete
that happens for timeline paths.
This doesn't break anything, but it leaves behind a few kilobytes of
garbage in the S3 bucket after a tenant is deleted, generating work for
the scrubber.
## Summary of changes
- During deletion, explicitly remove the heatmap file
- In test_tenant_delete_smoke, upload a heatmap so that the test would
fail its "remote storage empty after delete" check if we didn't delete
it.
I suspected a wakeup could be lost with
`remote_storage::support::DownloadStream` if the cancellation and inner
stream wakeups happen simultaneously. The next poll would only return
the cancellation error without setting the wakeup. There is no lost
wakeup because the single future for getting the cancellation error is
consumed when the value is ready, and a new future is created for the
*next* value. The new future is always polled. Similarly, if only the
`Stream::poll_next` is being used after a `Some(_)` value has been
yielded, it makes no sense to have an expectation of a wakeup for the
*(N+1)th* stream value already set because when a value is wanted,
`Stream::poll_next` will be called.
A test is added to show that the above is true.
Additionally, there was a question of these cancellations and timeouts
flowing to attached or secondary tenant downloads. A test is added to
show that this, in fact, happens.
Lastly, a warning message is logged when a download stream is polled
after a timeout or cancellation error (currently unexpected) so we can
rule it out while troubleshooting.
Store logical replication origin in KV storage
## Problem
See #6977
## Summary of changes
* Extract origin_lsn from commit WAl record
* Add ReplOrigin key to KV storage and store origin_lsn
* In basebackup replace snapshot origin_lsn with last committed
origin_lsn at basebackup LSN
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: Alex Chi Z <chi@neon.tech>
## Problem
Currently, we leave `index_part.json` objects from old generations
behind each time a pageserver restarts or a tenant is migrated. This
doesn't break anything, but it's annoying when a tenant has been around
for a long time and starts to accumulate 10s-100s of these.
Partially implements: #7043
## Summary of changes
- Add a new `pageserver-physical-gc` command to `s3_scrubber`
The name is a bit of a mouthful, but I think it makes sense:
- GC is the accurate term for what we are doing here: removing data that
takes up storage but can never be accessed.
- "physical" is a necessary distinction from the "normal" GC that we do
online in the pageserver, which operates at a higher level in terms of
LSNs+layers, whereas this type of GC is purely about S3 objects.
- "pageserver" makes clear that this command deals exclusively with
pageserver data, not safekeeper.
when running the regress tests locally without any environment variables
we use on CI, `test_pageserver_compaction_smoke` fails with division by
zero. fix it temporarily by allowing no vectored read happening. to be
cleaned when vectored get validation gets removed and the default value
can be changed.
Cc: https://github.com/neondatabase/neon/issues/7381
## Problem
- Because GC exposes all errors as an anyhow::Error, we have
intermittent issues with spurious log errors during shutdown, e.g. in
this failure of a performance test
https://neon-github-public-dev.s3.amazonaws.com/reports/main/9300804302/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/214a2154f6f0217a/
```
Gc failed 1 times, retrying in 2s: shutting down
```
GC really doesn't do a lot of complicated IO: it doesn't benefit from
the backtrace capabilities of anyhow::Error, and can be expressed more
robustly as an enum.
## Summary of changes
- Add GcError type and use it instead of anyhow::Error in GC functions
- In `gc_iteration_internal`, return GcError::Cancelled on shutdown
rather than Ok(()) (we only used Ok before because we didn't have a
clear cancellation error variant to use).
- In `gc_iteration_internal`, skip past timelines that are shutting
down, to avoid having to go through another GC iteration if we happen to
see a deleting timeline during a GC run.
- In `refresh_gc_info_internal`, avoid an error case where a timeline
might not be found after being looked up, by carrying an Arc<Timeline>
instead of a TimelineId between the first loop and second loop in the
function.
- In HTTP request handler, handle Cancelled variants as 503 instead of
turning all GC errors into 500s.
## Problem
In all cases, AncestorStopping is equivalent to Cancelled.
This became more obvious in
https://github.com/neondatabase/neon/pull/7912#discussion_r1620582309
when updating these error types.
## Summary of changes
- Remove AncestorStopping, always use Cancelled instead
This is a preparation for
https://github.com/neondatabase/neon/issues/6337.
The idea is to add FullAccessTimeline, which will act as a guard for
tasks requiring access to WAL files. Eviction will be blocked on these
tasks and WAL won't be deleted from disk until there is at least one
active FullAccessTimeline.
To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`.
After eviction is implemented, this function will be responsible for
downloading missing WAL file and waiting until the download finishes.
This commit also contains other small refactorings:
- Separate `get_tenant_dir` and `get_timeline_dir` functions for
building a local path. This is useful for looking at usages and finding
tasks requiring access to local filesystem.
- `timeline_manager` is now responsible for spawning all background
tasks
- WAL removal task is now spawned instantly after horizon is updated
## Problem
Looking at several noisy shutdown logs:
- In https://github.com/neondatabase/neon/issues/7861 we're hitting a
log error with `InternalServerError(timeline shutting down\n'` on the
checkpoint API handler.
- In the field, we see initial_logical_size_calculation errors on
shutdown, via DownloadError
- In the field, we see errors logged from layer download code
(independent of the error propagated) during shutdown
Closes: https://github.com/neondatabase/neon/issues/7861
## Summary of changes
The theme of these changes is to avoid propagating anyhow::Errors for
cases that aren't really unexpected error cases that we might want a
stacktrace for, and avoid "Other" error variants unless we really do
have unexpected error cases to propagate.
- On the flush_frozen_layers path, use the `FlushLayerError` type
throughout, rather than munging it into an anyhow::Error. Give
FlushLayerError an explicit from_anyhow helper that checks for timeline
cancellation, and uses it to give a Cancelled error instead of an Other
error when the timeline is shutting down.
- In logical size calculation, remove BackgroundCalculationError (this
type was just a Cancelled variant and an Other variant), and instead use
CalculateLogicalSizeError throughout. This can express a
PageReconstructError, and has a From impl that translates cancel-like
page reconstruct errors to Cancelled.
- Replace CalculateLogicalSizeError's Other(anyhow::Error) variant case
with a Decode(DeserializeError) variant, as this was the only kind of
error we actually used in the Other case.
- During layer download, drop out early if the timeline is shutting
down, so that we don't do an `error!()` log of the shutdown error in
this case.
## Problem
See https://github.com/neondatabase/cloud/issues/10845
## Summary of changes
Do not report error if GIN page is not restored
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
We want to regularly verify the performance of pgvector HNSW parallel
index builds and parallel similarity search using HNSW indexes.
The first release that considerably improved the index-build parallelism
was pgvector 0.7.0 and we want to make sure that we do not regress by
our neon compute VM settings (swap, memory over commit, pg conf etc.)
## Summary of changes
Prepare a Neon project with 1 million openAI vector embeddings (vector
size 1536).
Run HNSW indexing operations in the regression test for the various
distance metrics.
Run similarity queries using pgbench with 100 concurrent clients.
I have also added the relevant metrics to the grafana dashboards pgbench
and olape
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Do pull_timeline while WAL is being removed. To this end
- extract pausable_failpoint to utils, sprinkle pull_timeline with it
- add 'checkpoint' sk http endpoint to force WAL removal.
After fixing checking for pull file status code test fails so far which is
expected.
## Problem
- After a shard split of a large existing tenant, child tenants can end
up with oversized historic layers indefinitely, if those layers are
prevented from being GC'd by branchpoints.
This PR follows https://github.com/neondatabase/neon/pull/7531, and adds
rewriting of layers that contain a mixture of needed & un-needed
contents, in addition to dropping un-needed layers.
Closes: https://github.com/neondatabase/neon/issues/7504
## Summary of changes
- Add methods to ImageLayer for reading back existing layers
- Extend `compact_shard_ancestors` to rewrite layer files that contain a
mixture of keys that we want and keys we do not, if unwanted keys are
the majority of those in the file.
- Amend initialization code to handle multiple layers with the same
LayerName properly
- Get rid of of renaming bad layer files to `.old` since that's now
expected on restarts during rewrites.
* Make PS connection startup use async APIs
This allows for improved query cancellation when we start connections
* Make PS connections have per-shard connection retry state.
Previously they shared global backoff state, which is bad for quickly
getting all connections started and/or back online.
* Make sure we clean up most connection state on failed connections.
Previously, we could technically leak some resources that we'd otherwise
clean up. Now, the resources are correctly cleaned up.
* pagestore_smgr.c now PANICs on unexpected response message types.
Unexpected responses are likely a symptom of having a desynchronized
view of the connection state. As a desynchronized connection state can
cause corruption, we PANIC, as we don't know what data may have been
written to buffers: the only solution is to fail fast & hope we didn't
write wrong data.
* Catch errors in sync pagestream request handling.
Previously, if a query was cancelled after a message was sent to
the pageserver, but before the data was received, the backend
could forget that it sent the synchronous request, and let others
deal with the repercussions. This could then lead to incorrect
responses, or errors such as "unexpected response from page
server with tag 0x68"
## Problem
This test relied on some sleeps, and was failing ~5% of the time.
## Summary of changes
Use log-watching rather than straight waits, and make timeouts more
generous for the CI environment.
Don't set last-written LSN of a page when the record is replayed, only
when the page is evicted from cache. For comparison, we don't update
the last-written LSN on every page modification on the primary either,
only when the page is evicted. Do update the last-written LSN when the
page update is skipped in WAL redo, however.
In neon_get_request_lsns(), don't be surprised if the last-written LSN
is equal to the record being replayed. Use the LSN of the record being
replayed as the request LSN in that case. Add a long comment
explaining how that can happen.
In neon_wallog_page, update last-written LSN also when Shutdown has
been requested. We might still fetch and evict pages for a while,
after shutdown has been requested, so we better continue to do that
correctly.
Enable the check that we don't evict a page with zero LSN also in
standby, but make it a LOG message instead of PANIC
Fixes issue https://github.com/neondatabase/neon/issues/7791
The openapi description with the error descriptions:
- 200 is used for "detached or has been detached previously"
- 400 is used for "cannot be detached right now" -- it's an odd thing,
but good enough
- 404 is used for tenant or timeline not found
- 409 is used for "can never be detached" (root timeline)
- 500 is used for transient errors (basically ill-defined shutdown
errors)
- 503 is used for busy (other tenant ancestor detach underway,
pageserver shutdown)
Cc: #6994
"taking a fullbackup" is an ugly multi-liner copypasted in multiple
places, most recently with timeline ancestor detach tests. move it under
`PgBin` which is not a great place, but better than yet another utility
function.
Additionally:
- cleanup `psql_env` repetition (PgBin already configures that)
- move the backup tar comparison as a yet another free utility function
- use backup tar comparison in `test_import.py` where a size check was
done previously
- cleanup extra timeline creation from test
Cc: #7715
detaching a timeline from its ancestor can leave the resulting timeline
with more L0 layers than the compaction threshold. most of the time, the
detached timeline has made progress, and next time the L0 -> L1
compaction happens near the original branch point and not near the
last_record_lsn.
add a test to ensure that inheriting the historical L0s does not change
fullbackup. additionally:
- add `wait_until_completed` to test-only timeline checkpoint and
compact HTTP endpoints. with `?wait_until_completed=true` the endpoints
will wait until the remote client has completed uploads.
- for delta layers, describe L0-ness with the `/layer` endpoint
Cc: #6994
Previously we worked around file comparison issues by dropping unlogged
relations in the pg_regress tests, but this would lead to an unnecessary
diff when compared to upstream in our Postgres fork. Instead, we can
precompute the files that we know will be different, and ignore them.