Commit Graph

142 Commits

Author SHA1 Message Date
Christian Schwarz
6f5c262684 pageserver: add testing API to scan layers for disposable keys (#9393)
This PR adds a pageserver mgmt API to scan a layer file for disposable
keys.

It hooks it up to the sharding compaction test, demonstrating that we're
not filtering out all disposable keys.

This is extracted from PGDATA import
(https://github.com/neondatabase/neon/pull/9218)
where I do the filtering of layer files based on `is_key_disposable`.
2024-10-25 14:16:45 +02:00
Arpad Müller
4d9036bf1f Support offloaded timelines during shard split (#9489)
Before, we didn't copy over the `index-part.json` of offloaded timelines
to the new shard's location, resulting in the new shard not knowing the
timeline even exists.

In #9444, we copy over the manifest, but we also need to do this for
`index-part.json`.

As the operations to do are mostly the same between offloaded and
non-offloaded timelines, we can iterate over all of them in the same
loop, after the introduction of a `TimelineOrOffloadedArcRef` type to
generalize over the two cases. This is analogous to the deletion code
added in #8907.

The added test also ensures that the sharded archival config endpoint
works, something that has not yet been ensured by tests.

Part of #8088
2024-10-25 12:32:46 +02:00
Christian Schwarz
b782b11b33 refactor(timeline creation): represent bootstrap vs branch using enum (#9366)
# Problem

Timeline creation can either be bootstrap or branch.
The distinction is made based on whether the `ancestor_*` fields are
present or not.

In the PGDATA import code
(https://github.com/neondatabase/neon/pull/9218), I add a third variant
to timeline creation.

# Solution

The above pushed me to refactor the code in Pageserver to distinguish
the different creation requests through enum variants.

There is no externally observable effect from this change.

On the implementation level, a notable change is that the acquisition of
the `TimelineCreationGuard` happens later than before. This is necessary
so that we have everything in place to construct the
`CreateTimelineIdempotency`. Notably, this moves the acquisition of the
creation guard _after_ the acquisition of the `gc_cs` lock in the case
of branching. This might appear as if we're at risk of holding `gc_cs`
longer than before this PR, but, even before this PR, we were holding
`gc_cs` until after the `wait_completion()` that makes the timeline
creation durable in S3 returns. I don't see any deadlock risk with
reversing the lock acquisition order.

As a drive-by change, I found that the `create_timeline()` function in
`neon_local` is unused, so I removed it.

# Refs

* platform context: https://github.com/neondatabase/neon/pull/9218
* product context: https://github.com/neondatabase/cloud/issues/17507
* next PR stacked atop this one:
https://github.com/neondatabase/neon/pull/9501
2024-10-25 10:04:27 +00:00
Erik Grinaker
4c9835f4a3 storage_controller: delete stale shards when deleting tenant (#9333)
## Problem

Tenant deletion only removes the current shards from remote storage. Any
stale parent shards (before splits) will be left behind. These shards
are kept since child shards may reference data from the parent until new
image layers are generated.

## Summary of changes

* Document a special case for pageserver tenant deletion that deletes
all shards in remote storage when given an unsharded tenant ID, as well
as any unsharded tenant data.
* Pass an unsharded tenant ID to delete all remote storage under the
tenant ID prefix.
* Split out `RemoteStorage::delete_prefix()` to delete a bucket prefix,
with additional test coverage.
* Add a `delimiter` argument to `asset_prefix_empty()` to support
partial prefix matches (i.e. all shards starting with a given tenant
ID).
2024-10-17 14:34:51 +00:00
Arpad Müller
ec4cc30de9 Shut down timelines during offload and add offload tests (#9289)
Add a test for timeline offloading, and subsequent unoffloading.

Also adds a manual endpoint, and issues a proper timeline shutdown
during offloading which prevents a pageserver hang at shutdown.

Part of #8088.
2024-10-15 09:46:51 +00:00
Tristan Partin
53147b51f9 Use valid type hints for Python 3.9
I have no idea how this made it past the linters.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 13:00:25 -05:00
Tristan Partin
5bd8e2363a Enable all pyupgrade checks in ruff
This will help to keep us from using deprecated Python features going
forward.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-08 14:32:26 -05:00
Heikki Linnakangas
56bb1ac458 tests: Move NeonCli and friends to separate file (#9195)
In the passing, rename it to NeonLocalCli, to reflect that the binary
is called 'neon_local'.

Add wrapper for the 'timeline_import' command, eliminating the last
raw call to the raw_cli() function from tests, except for a few in
test_neon_cli.py which are about testing the 'neon_local' iteself. All
the other calls are now made through the strongly-typed wrapper
functions
2024-10-03 22:03:25 +03:00
Alex Chi Z.
700885471f fix(test): only test num of L1 layers in compaction smoke test (#9186)
close https://github.com/neondatabase/neon/issues/9160

For whatever reason, pg17's WAL pattern seems different from others,
which triggers some flaky behavior within the compaction smoke test.

## Summary of changes

* Run L0 compaction before proceeding with the read benchmark.
* So that we can ensure the num of L0 layers is 0 and test the
compaction behavior only with L1 layers.

We have a threshold for triggering L0 compaction. In some cases, the
test case did not produce enough L0 layers to do a L0 compaction,
therefore leaving the layer map with 3+ L0 layers above the L1 layers.
This increases the average read depth for the timeline.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-02 17:42:35 +01:00
Nikita Kalyanov
f446e08fb8 change HTTP method to comply with spec (#9100)
There is discrepancy with the spec, it has PUT
2024-09-23 15:53:06 +02:00
Arpad Müller
a1b71b73fe Rename some S3 usages to "remote storage" in exposed messages (#8999)
In exposed messages like log messages we mentioned "S3", which is not
entirely accurate as we support Azure blob storage now as well.
2024-09-17 19:15:01 +02:00
Vlad Lazar
c43e664ff5 storcon: provide an az id in metadata.json from neon local (#8897)
## Problem
Neon local set-up does not inject an az id in `metadata.json`. See real
change in https://github.com/neondatabase/neon/pull/8852.

## Summary of changes
We piggyback on the existing `availability_zone` pageserver
configuration in order to avoid making neon local even more complex.
2024-09-03 15:11:30 +01:00
Vlad Lazar
793b5061ec storcon: track pageserver availability zone (#8852)
## Problem
In order to build AZ aware scheduling, the storage controller needs to
know what AZ pageservers are in.

Related https://github.com/neondatabase/neon/issues/8848

## Summary of changes
This patch set adds a new nullable column to the `nodes` table:
`availability_zone_id`. The node registration
request is extended to include the AZ id (pageservers already have this
in their `metadata.json` file).

If the node is already registered, then we update the persistent and
in-memory state with the provided AZ.
Otherwise, we add the node with the AZ to begin with.

A couple assumptions are made here:
1. Pageserver AZ ids are stable
2. AZ ids do not change over time

Once all pageservers have a configured AZ, we can remove the optionals
in the code and make the database column not nullable.
2024-08-28 18:23:55 +01:00
Heikki Linnakangas
c5ef779801 tests: Remove unnecessary entries from list of allowed errors (#8199)
The "manual_gc" context was removed in commit be0c73f8e7. The code that
generated the other error was removed in commit 9a6c0be823.
2024-08-27 17:47:05 +01:00
Arpad Müller
2dd53e7ae0 Timeline archival test (#8824)
This PR:

* Implements the rule that archived timelines require all of their
children to be archived as well, as specified in the RFC. There is no
fancy locking mechanism though, so the precondition can still be broken.
As a TODO for later, we still allow unarchiving timelines with archived
parents.
* Adds an `is_archived` flag to `TimelineInfo`
* Adds timeline_archival_config to `PageserverHttpClient`
* Adds a new `test_timeline_archive` test, loosely based on
`test_timeline_delete`

Part of #8088
2024-08-26 17:30:19 +02:00
Joonas Koivunen
73286e6b9f test: copy dict to avoid error on retry (#8811)
there is no "const" in python, so when we modify the global dict, it
will remain that way on the retry. fix to not have it influence other
tests which might be run on the same runner.

evidence:
<https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8625/10513146742/index.html#/testresult/453c4ce05ada7496>
2024-08-23 14:43:08 +01:00
Vlad Lazar
f5cef7bf7f storcon: skip draining shard if it's secondary is lagging too much (#8644)
## Problem
Migrations of tenant shards with cold secondaries are holding up drains
in during production deployments.

## Summary of changes
If a secondary locations is lagging by more than 256MiB (configurable,
but that's the default), then skip cutting it over to the secondary as part of the node drain.
2024-08-09 15:45:07 +01:00
Joonas Koivunen
a81fab4826 refactor(timeline_detach_ancestor): replace ordered reparented with a hashset (#8629)
Earlier I was thinking we'd need a (ancestor_lsn, timeline_id) ordered
list of reparented. Turns out we did not need it at all. Replace it with
an unordered hashset. Additionally refactor the reparented direct
children query out, it will later be used from more places.

Split off from #8430.

Cc: #6994
2024-08-07 18:19:00 +02:00
John Spray
3727c6fbbe pageserver: use layer visibility when composing heatmap (#8616)
## Problem

Sometimes, a layer is Covered by hasn't yet been evicted from local disk
(e.g. shortly after image layer generation). It is not good use of
resources to download these to a secondary location, as there's a good
chance they will never be read.

This follows the previous change that added layer visibility:
- #8511 

Part of epic:
- https://github.com/neondatabase/neon/issues/8398

## Summary of changes

- When generating heatmaps, only include Visible layers
- Update test_secondary_downloads to filter to visible layers when
listing layers from an attached location
2024-08-06 17:15:40 +01:00
Joonas Koivunen
138f008bab feat: persistent gc blocking (#8600)
Currently, we do not have facilities to persistently block GC on a
tenant for whatever reason. We could do a tenant configuration update,
but that is risky for generation numbers and would also be transient.
Introduce a `gc_block` facility in the tenant, which manages per
timeline blocking reasons.

Additionally, add HTTP endpoints for enabling/disabling manual gc
blocking for a specific timeline. For debugging, individual tenant
status now includes a similar string representation logged when GC is
skipped.

Cc: #6994
2024-08-06 10:09:56 +01:00
Arpad Müller
8c828c586e Wait for completion of the upload queue in flush_frozen_layer (#8550)
Makes `flush_frozen_layer` add a barrier to the upload queue and makes
it wait for that barrier to be reached until it lets the flushing be
completed.

This gives us backpressure and ensures that writes can't build up in an
unbounded fashion.

Fixes #7317
2024-08-02 13:07:12 +02:00
John Spray
842c3d8c10 tests: simplify code around unstable test_basebackup_with_high_slru_count (#8477)
## Problem

In `test_basebackup_with_high_slru_count`, the pageserver is sometimes
mysteriously hanging on startup, having been started+stopped earlier in
the test setup while populating template tenant data.

- #7586 

We can't see why this is hanging in this particular test. The test does
some weird stuff though, like attaching a load of broken tenants and
then doing a SIGQUIT kill of a pageserver.

## Summary of changes

- Attach tenants normally instead of doing a failpoint dance to attach
them as broken
- Shut the pageserver down gracefully during init instead of using
immediate mode
- Remove the "sequential" variant of the unstable test, as this is going
away soon anyway
- Log before trying to acquire lock file, so that if it hangs we have a
clearer sense of if that's really where it's hanging. It seems like it
is, but that code does a non-blocking flock so it's surprising.
2024-07-24 11:26:24 +01:00
John Spray
9e23410074 tests: allow-list a controller heartbeat error (#8471)
## Problem

`test_change_pageserver` stops pageservers in a way that can overlap
with the controller's heartbeats: the controller can get a heartbeat
success and then immediately find the node unavailable. This particular
situation triggers a log that isn't in our current allow-list of
messages for nodes offline

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8339/10048487700/index.html#testresult/19678f27810231df/retries

## Summary of changes

- Add the message to the allow list
2024-07-23 16:09:05 -04:00
John Spray
80c8ceacbc tests: make test_scrubber_physical_gc_ancestors more stable (#8453)
## Problem

This test sometimes found that ancestors were getting cleaned up before
it had done any compaction.

Compaction was happening implicitly via Workload.

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8298/10032173390/index.html#testresult/fb04786402f80822/retries

## Summary of changes

- Set upload=False when writing data after shard split, to avoid doing a
checkpoint
- Add a checkpoint_period & explicit wait for uploads so that we ensure
data lands in S3 without doing a checkpoint
2024-07-23 12:57:57 +01:00
John Spray
975f8ac658 tests: add test_compaction_l0_memory (#8403)
This test reproduces the case of a writer creating a deep stack of L0
layers. It uses realistic layer sizes and writes several gigabytes of
data, therefore runs as a performance test although it is validating
memory footprint rather than performance per se.

It acts a regression test for two recent fixes:
- https://github.com/neondatabase/neon/pull/8401
- https://github.com/neondatabase/neon/pull/8391

In future it will demonstrate the larger improvement of using a k-merge
iterator for L0 compaction (#8184)

This test can be extended to enforce limits on the memory consumption of
other housekeeping steps, by restarting the pageserver and then running
other things to do the same "how much did RSS increase" measurement.
2024-07-17 17:35:27 +00:00
Arpad Müller
66337097de Avoid the storage controller in test_tenant_creation_fails (#8392)
As described in #8385, the likely source for flakiness in
test_tenant_creation_fails is the following sequence of events:

1. test instructs the storage controller to create the tenant
2. storage controller adds the tenant and persists it to the database.
issues a creation request
3. the pageserver restarts with the failpoint disabled
4. storage controller's background reconciliation still wants to create
the tenant
5. pageserver gets new request to create the tenant from background
reconciliation

This commit just avoids the storage controller entirely. It has its own
set of issues, as the re-attach request will obviously not include the
tenant, but it's still useful to test for non-existence of the tenant.

The generation is also not optional any more during tenant attachment.
If you omit it, the pageserver yields an error. We change the signature
of `tenant_attach` to reflect that.

Alternative to #8385
Fixes #8266
2024-07-16 12:19:28 +02:00
Joonas Koivunen
730db859c7 feat(timeline_detach_ancestor): success idempotency (#8354)
Right now timeline detach ancestor reports an error (409, "no ancestor")
on a new attempt after successful completion. This makes it troublesome
for storage controller retries. Fix it to respond with `200 OK` as if
the operation had just completed quickly.

Additionally, the returned timeline identifiers in the 200 OK response
are now ordered so that responses between different nodes for error
comparison are done by the storage controller added in #8353.

Design-wise, this PR introduces a new strategy for accessing the latest
uploaded IndexPart:
`RemoteTimelineClient::initialized_upload_queue(&self) ->
Result<UploadQueueAccessor<'_>, NotInitialized>`. It should be a more
scalable way to query the latest uploaded `IndexPart` than to add a
query method for each question directly on `RemoteTimelineClient`.

GC blocking will need to be introduced to make the operation fully
idempotent. However, it is idempotent for the cases demonstrated by
tests.

Cc: #6994
2024-07-15 17:47:53 +00:00
Yuchen Liang
19accfee4e feat(pageserver): integrate lsn lease into synthetic size (#8220)
Part of #7497, closes #8071. (accidentally closed #8208, reopened here)

## Problem

After the changes in #8084, we need synthetic size to also account for
leased LSNs so that users do not get free retention by running a small
ephemeral endpoint for a long time.

## Summary of changes

This PR integrates LSN leases into the synthetic size calculation. We
model leases as read-only branches started at the leased LSN (except it
does not have a timeline id).

Other changes:
- Add new unit tests testing whether a lease behaves like a read-only
branch.
- Change `/size_debug` response to include lease point in the SVG
visualization.
- Fix `/lsn_lease` HTTP API to do proper parsing for POST.



Signed-off-by: Yuchen Liang <yuchen@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-07-04 15:09:05 +00:00
John Spray
c9e6dd45d3 pageserver: downgrade stale generation messages to INFO (#8256)
## Problem

When generations were new, these messages were an important way of
noticing if something unexpected was going on. We found some real issues
when investigating tests that unexpectedly tripped them.

At time has gone on, this code is now pretty battle-tested, and as we do
more live migrations etc, it's fairly normal to see the occasional
message from a node with a stale generation.

At this point the cognitive load on developers to selectively allow-list
these logs outweighs the benefit of having them at warn severity.

Closes: https://github.com/neondatabase/neon/issues/8080

## Summary of changes

- Downgrade "Dropped remote consistent LSN updates" and "Dropping stale
deletions" messages to INFO
- Remove all the allow-list entries for these logs.
2024-07-04 15:05:41 +01:00
Heikki Linnakangas
9ce193082a Restore running xacts from CLOG on replica startup (#7288)
We have one pretty serious MVCC visibility bug with hot standby
replicas. We incorrectly treat any transactions that are in progress
in the primary, when the standby is started, as aborted. That can
break MVCC for queries running concurrently in the standby. It can
also lead to hint bits being set incorrectly, and that damage can last
until the replica is restarted.

The fundamental bug was that we treated any replica start as starting
from a shut down server. The fix for that is straightforward: we need
to set 'wasShutdown = false' in InitWalRecovery() (see changes in the
postgres repo).

However, that introduces a new problem: with wasShutdown = false, the
standby will not open up for queries until it receives a running-xacts
WAL record from the primary. That's correct, and that's how Postgres
hot standby always works. But it's a problem for Neon, because:

* It changes the historical behavior for existing users. Currently,
  the standby immediately opens up for queries, so if they now need to
  wait, we can breka existing use cases that were working fine
  (assuming you don't hit the MVCC issues).

* The problem is much worse for Neon than it is for standalone
  PostgreSQL, because in Neon, we can start a replica from an
  arbitrary LSN. In standalone PostgreSQL, the replica always starts
  WAL replay from a checkpoint record, and the primary arranges things
  so that there is always a running-xacts record soon after each
  checkpoint record. You can still hit this issue with PostgreSQL if
  you have a transaction with lots of subtransactions running in the
  primary, but it's pretty rare in practice.

To mitigate that, we introduce another way to collect the
running-xacts information at startup, without waiting for the
running-xacts WAL record: We can the CLOG for XIDs that haven't been
marked as committed or aborted. It has limitations with
subtransactions too, but should mitigate the problem for most users.

See https://github.com/neondatabase/neon/issues/7236.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-07-01 12:58:12 +03:00
John Spray
063553a51b pageserver: remove tenant create API (#8135)
## Problem

For some time, we have created tenants with calls to location_conf. The
legacy "POST /v1/tenant" path was only used in some tests.

## Summary of changes

- Remove the API
- Relocate TenantCreateRequest to the controller API file (this used to
be used in both pageserver and controller APIs)
- Rewrite tenant_create test helper to use location_config API, as
control plane and storage controller do
- Update docker-compose test script to create tenants with
location_config API (this small commit is also present in
https://github.com/neondatabase/neon/pull/7947)
2024-06-28 09:14:19 +01:00
Alex Chi Z
9b98823d61 bottom-most-compaction: use in test_gc_feedback + fix bugs (#8103)
Adds manual compaction trigger; add gc compaction to test_gc_feedback

Part of https://github.com/neondatabase/neon/issues/8002

```
test_gc_feedback[debug-pg15].logical_size: 50 Mb
test_gc_feedback[debug-pg15].physical_size: 2269 Mb
test_gc_feedback[debug-pg15].physical/logical ratio: 44.5302 
test_gc_feedback[debug-pg15].max_total_num_of_deltas: 7 
test_gc_feedback[debug-pg15].max_num_of_deltas_above_image: 2 
test_gc_feedback[debug-pg15].logical_size_after_bottom_most_compaction: 50 Mb
test_gc_feedback[debug-pg15].physical_size_after_bottom_most_compaction: 287 Mb
test_gc_feedback[debug-pg15].physical/logical ratio after bottom_most_compaction: 5.6312 
test_gc_feedback[debug-pg15].max_total_num_of_deltas_after_bottom_most_compaction: 4 
test_gc_feedback[debug-pg15].max_num_of_deltas_above_image_after_bottom_most_compaction: 1
```

## Summary of changes

* Add the manual compaction trigger
* Use in test_gc_feedback
* Add a guard to avoid running it with retain_lsns
* Fix: Do `schedule_compaction_update` after compaction
* Fix: Supply deltas in the correct order to reconstruct value

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-06-25 23:00:14 +00:00
John Spray
07f21dd6b6 pageserver: remove attach/detach apis (#8134)
## Problem

These APIs have been deprecated for some time, but were still used from
test code.

Closes: https://github.com/neondatabase/neon/issues/4282

## Summary of changes

- It is still convenient to do a "tenant_attach" from a test without
having to write out a location_conf body, so those test methods have
been retained with implementations that call through to their
location_conf equivalent.
2024-06-25 17:38:06 +01:00
Yuchen Liang
219e78f885 feat(pageserver): add an optional lease to the get_lsn_by_timestamp API (#8104)
Part of #7497, closes #8072.

## Problem

Currently the `get_lsn_by_timestamp` and branch creation pageserver APIs do not provide a pleasant client experience where the looked-up LSN might be GC-ed between the two API calls.

This PR attempts to prevent common races between GC and branch creation by making use of LSN leases provided in #8084. A lease can be optionally granted to a looked-up LSN. With the lease, GC will not touch layers needed to reconstruct all pages at this LSN for the duration of the lease.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-06-24 20:12:24 +00:00
John Spray
b74232eb4d tests: allow-list neon_local endpoint errors from storage controller (#8123)
## Problem

For testing, the storage controller has a built-in hack that loads
neon_local endpoint config from disk, and uses it to reconfigure
endpoints when the attached pageserver changes.

Some tests that stop an endpoint while the storage controller is running
could occasionally fail on log errors from the controller trying to use
its special test-mode calls into neon local Endpoint.

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8117/9592392425/index.html#/testresult/9d2bb8623d0d53f8

## Summary of changes

- Give NotifyError an explicit NeonLocal variant, to avoid munging these
into generic 500s (I don't want to ignore 500s in general)
- Allow-list errors related to the local notification hook.

The expectation is that tests using endpoints/workloads should be
independently checking that those endpoints work: if neon_local
generates an error inside the storage controller, that's ignorable.
2024-06-21 17:23:31 +00:00
John Spray
15728be0e1 pageserver: always detach before deleting (#8082)
In #7957 we enabled deletion without attachment, but retained the
old-style deletion (return 202, delete in background) for attached
tenants. In this PR, we remove the old-style deletion path, such that if
the tenant delete API is invoked while a tenant is detached, it is
simply detached before completing the deletion.

This intentionally doesn't rip out all the old deletion code: in case a
deletion was in progress at time of upgrade, we keep around the code for
finishing it for one release cycle. The rest of the code removal happens
in https://github.com/neondatabase/neon/pull/8091

Now that deletion will always be via the new path, the new path is also
updated to use some retries around remote storage operations, to
tripping up the control plane with 500s if S3 has an intermittent issue.
2024-06-21 15:39:19 +01:00
John Spray
59f949b4a8 pageserver: remove unused load/ignore APIs (#8122)
## Problem

These APIs have be unused for some time. They were superseded by
/location_conf: the equivalent of ignoring a tenant is now to put it in
secondary mode.

## Summary of changes

- Remove APIs
- Remove tests & helpers that used them
- Remove error variants that are no longer needed.
2024-06-21 10:02:15 +00:00
Vlad Lazar
e7d62a257d test: fix tenant duplication utility generation numbers (#8096)
## Problem
We have this set of test utilities which duplicate a tenant by copying
everything that's in remote storage and then attaching a tenant to the
pageserver and storage controller. When the "copied tenants" are created
on the storage controller, they start off from generation number 0. This
means that they can't see anything past that generation.

This issues has existed ever since generation numbers have been
introduced, but we've largely been lucky
for the generation to stay stable during the template tenant creation.

## Summary of Changes
Extend the storage controller debug attach hook to accept a generation
override. Use that in the tenant duplication logic to set the generation
number to something greater than the naturally reached generation. This
allows the tenants to see all layer files.
2024-06-19 11:55:59 +01:00
John Spray
eb0ca9b648 pageserver: improved synthetic size & find_gc_cutoff error handling (#8051)
## Problem

This PR refactors some error handling to avoid log spam on
tenant/timeline shutdown.

- "ignoring failure to find gc cutoffs: timeline shutting down." logs
(https://github.com/neondatabase/neon/issues/8012)
- "synthetic_size_worker: failed to calculate synthetic size for tenant
...: Failed to refresh gc_info before gathering inputs: tenant shutting
down", for example here:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8049/9502988669/index.html#suites/3fc871d9ee8127d8501d607e03205abb/1a074a66548bbcea

Closes: https://github.com/neondatabase/neon/issues/8012

## Summary of changes

- Refactor: Add a PageReconstructError variant to GcError: this is the
only kind of error that find_gc_cutoffs can emit.
- Functional change: only ignore shutdown PageReconstructError variant:
for other variants, treat it as a real error
- Refactor: add a structured CalculateSyntheticSizeError type and use it
instead of anyhow::Error in synthetic size calculations
- Functional change: while iterating through timelines gathering logical
sizes, only drop out if the whole tenant is cancelled: individual
timeline cancellations indicate deletion in progress and we can just
ignore those.
2024-06-14 11:08:11 +01:00
Joonas Koivunen
b52e31c1a4 fix: allow layer flushes more often (#7927)
As seen with the pgvector 0.7.0 index builds, we can receive large
batches of images, leading to very large L0 layers in the range of 1GB.
These large layers are produced because we are only able to roll the
layer after we have witnessed two different Lsns in a single
`DataDirModification::commit`. As the single Lsn batches of images can
span over multiple `DataDirModification` lifespans, we will rarely get
to write two different Lsns in a single `put_batch` currently.

The solution is to remember the TimelineWriterState instead of eagerly
forgetting it until we really open the next layer or someone else
flushes (while holding the write_guard).

Additional changes are test fixes to avoid "initdb image layer
optimization" or ignoring initdb layers for assertion.

Cc: #7197 because small `checkpoint_distance` will now trigger the
"initdb image layer optimization"
2024-06-10 13:50:17 +00:00
Alex Chi Z
3e63d0f9e0 test(pageserver): quantify compaction outcome (#7867)
A simple API to collect some statistics after compaction to easily
understand the result.

The tool reads the layer map, and analyze range by range instead of
doing single-key operations, which is more efficient than doing a
benchmark to collect the result. It currently computes two key metrics:

* Latest data access efficiency, which finds how many delta layers /
image layers the system needs to iterate before returning any key in a
key range.
* (Approximate) PiTR efficiency, as in
https://github.com/neondatabase/neon/issues/7770, which is simply the
number of delta files in the range. The reason behind that is, assume no
image layer is created, PiTR efficiency is simply the cost of collect
records from the delta layers, and the replay time. Number of delta
files (or in the future, estimated size of reads) is a simple yet
efficient way of estimating how much effort the page server needs to
reconstruct a page.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-06-10 10:42:13 +02:00
Yuchen Liang
630cfbe420 refactor(pageserver): designated api error type for cancelled request (#7949)
Closes #7406.

## Problem

When a `get_lsn_by_timestamp` request is cancelled, an anyhow error is
exposed to handle that case, which verbosely logs the error. However, we
don't benefit from having the full backtrace provided by anyhow in this
case.

## Summary of changes

This PR introduces a new `ApiError` type to handle errors caused by
cancelled request more robustly.
-  A new enum variant `ApiError::Cancelled`
- Currently the cancelled request is mapped to status code 500.
- Need to handle this error in proxy's `http_util` as well.
- Added a failpoint test to simulate cancelled `get_lsn_by_timestamp`
request.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-06-06 14:00:14 +00:00
John Spray
9fda85b486 pageserver: remove AncestorStopping error variants (#7916)
## Problem

In all cases, AncestorStopping is equivalent to Cancelled.

This became more obvious in
https://github.com/neondatabase/neon/pull/7912#discussion_r1620582309
when updating these error types.

## Summary of changes

- Remove AncestorStopping, always use Cancelled instead
2024-05-31 17:02:10 +01:00
John Spray
545f7e8cd7 tests: fix an allow list entry (#7856)
https://github.com/neondatabase/neon/pull/7844 typo'd one of the
expressions:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/9196993886/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/e420fbfdb193bf80/
2024-05-23 10:50:21 +01:00
John Spray
f98fdd20e3 tests: add a couple of allow lists for shutdown cases (#7844)
## Problem

Failures on some of our uglier shutdown log messages:

https://neon-github-public-dev.s3.amazonaws.com/reports/main/9192662995/index.html#suites/07874de07c4a1c9effe0d92da7755ebf/51b365408678c66f/

## Summary of changes

- Allow-list these errors.
2024-05-22 18:38:22 +00:00
Joonas Koivunen
a8a88ba7bc test(detach_ancestor): ensure L0 compaction in history is ok (#7813)
detaching a timeline from its ancestor can leave the resulting timeline
with more L0 layers than the compaction threshold. most of the time, the
detached timeline has made progress, and next time the L0 -> L1
compaction happens near the original branch point and not near the
last_record_lsn.

add a test to ensure that inheriting the historical L0s does not change
fullbackup. additionally:
- add `wait_until_completed` to test-only timeline checkpoint and
compact HTTP endpoints. with `?wait_until_completed=true` the endpoints
will wait until the remote client has completed uploads.
- for delta layers, describe L0-ness with the `/layer` endpoint

Cc: #6994
2024-05-21 20:08:43 +03:00
John Spray
c84656a53e pageserver: implement auto-splitting (#7681)
## Problem

Currently tenants are only split into multiple shards if a human being
calls the API to do it.

Issue: #7388 

## Summary of changes

- Add a pageserver API for returning the top tenants by size
- Add a step to the controller's background loop where if there is no
reconciliation or optimization to be done, it looks for things to split.
- Add a test that runs pgbench on many tenants concurrently, and checks
that splitting happens as expected as tenants grow, without interrupting
the client I/O.

This PR is quite basic: there is a tasklist in
https://github.com/neondatabase/neon/issues/7388 for further work. This
PR is meant to be safe (off by default), and sufficient to enable our
staging environment to run lots of sharded tenants without a human
having to set them up.
2024-05-17 16:01:24 +00:00
Joonas Koivunen
d9dcbffac3 python: allow using allowed_errors.py (#7719)
See #7718. Fix it by renaming all `types.py` to `common_types.py`.

Additionally, add an advert for using `allowed_errors.py` to test any
added regex.
2024-05-13 15:16:23 +03:00
John Spray
39c712f2ca tests: adjust log allow list since reqwest upgrade (#7666)
## Problem

Various performance test cases were destabilized by the recent upgrade
of `reqwest`, because it changes an error string.

Examples:
-
https://neon-github-public-dev.s3.amazonaws.com/reports/main/9005532594/index.html#testresult/3f984e471a9029a5/
-
https://neon-github-public-dev.s3.amazonaws.com/reports/main/9005532594/index.html#testresult/8bd0f095fe0402b7/

The performance tests suffer from this more than most tests, because
they churn enough data that the pageserver is still trying to contact
the storage controller while it is shut down at the end of tests.

## Summary of changes

s/Connection refused/error sending request/
2024-05-09 10:07:59 +01:00
John Spray
ca154d9cd8 pageserver: local layer path followups (#7640)
- Rename "filename" types which no longer map directly to a filename
(LayerFileName -> LayerName)
- Add a -v1- part to local layer paths to smooth the path to future
updates (we anticipate a -v2- that uses checksums later)
- Rename methods that refer to the string-ized version of a LayerName to
no longer be called "filename"
- Refactor reconcile() function to use a LocalLayerFileMetadata type
that includes the local path, rather than carrying local path separately
in a tuple and unwrap()'ing it later.
2024-05-08 16:50:21 +00:00