Minor changes from while I have been working on HA tests:
- Manual pytest executions came with some warnings from `log.warn()`
usage
- When something fails in a generations-enabled test, it it useful to
have a log from the attachment service of what attached when, and with
which generation.
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Part of #5172. First commits show that we used to allow starting up a
compute or creating a branch off a not yet uploaded timeline. This PR
moves activation of a timeline to happen **after** initial layer file(s)
(if any) and `index_part.json` have been uploaded. Simply moving
activation to be *after* downloads have finished works because we now
spawn a task per http request handler.
Current behaviour of uploading on the timelines on next startup is kept,
to be removed later as part of #5172.
Adds:
- `NeonCli.map_branch` and corresponding `neon_local` implementation:
allow creating computes for timelines managed via pageserver http
client/api
- possibly duplicate tests (I did not want to search for, will cleanup
in a follow-up if these duplicated)
Changes:
- make `wait_until_tenant_state` return immediatedly on `Broken` and not
wait more
## Problem
Pageservers must not delete objects or advertise updates to
remote_consistent_lsn without checking that they hold the latest
generation for the tenant in question (see [the RFC](
https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md))
In this PR:
- A new "deletion queue" subsystem is introduced, through which
deletions flow
- `RemoteTimelineClient` is modified to send deletions through the
deletion queue:
- For GC & compaction, deletions flow through the full generation
verifying process
- For timeline deletions, deletions take a fast path that bypasses
generation verification
- The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced
with a mechanism that maintains a "projected" lsn (equivalent to the
previous property), and a "visible" LSN (which is the one that we may
share with safekeepers).
- Until `control_plane_api` is set, all deletions skip generation
validation
- Tests are introduced for the new functionality in
`test_pageserver_generations.py`
Once this lands, if a pageserver is configured with the
`control_plane_api` configuration added in
https://github.com/neondatabase/neon/pull/5163, it becomes safe to
attach a tenant to multiple pageservers concurrently.
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
this should allow test
test_delete_tenant_exercise_crash_safety_failpoints with
debug-pg16-Check.RETRY_WITH_RESTART-mock_s3-tenant-delete-before-remove-timelines-dir-True
to pass more reliably.
Remote storage cleanup split from #5198:
- pageserver, extensions, and safekeepers now have their separate remote
storage
- RemoteStorageKind has the configuration code
- S3Storage has the cleanup code
- with MOCK_S3, pageserver, extensions, safekeepers use different
buckets
- with LOCAL_FS, `repo_dir / "local_fs_remote_storage" / $user` is used
as path, where $user is `pageserver`, `safekeeper`
- no more `NeonEnvBuilder.enable_xxx_remote_storage` but one
`enable_{pageserver,extensions,safekeeper}_remote_storage`
Should not have any real changes. These will allow us to default to
`LOCAL_FS` for pageserver on the next PR, remove
`RemoteStorageKind.NOOP`, work towards #5172.
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
I'm still a bit nervous about attach -> crash case. But it should work.
(unlike case with timeline). Ideally would be cool to cover this with
test.
This continues tradition of adding bool flags for Tenant::set_stopping.
Probably lifecycle project will help with fixing it.
Rather temporary solution before proper:
https://github.com/neondatabase/neon/issues/5006
It requires more plumbing so lets not attach deleted tenants first and
then implement resume.
Additionally fix `assert_prefix_empty`. It had a buggy prefix calculation,
and since we always asserted for absence of stuff it worked. Here I
started to assert for presence of stuff too and it failed. Added more
"presence" asserts to other places to be confident that it works.
Resolves [#5016](https://github.com/neondatabase/neon/issues/5016)
## Problem
This was set to 5 seconds, which was very close to how long a compaction
took on my workstation, and when deletion is blocked on compaction the
test would fail.
We will fix this to make compactions drop out on deletion, but for the
moment let's stabilize the test.
## Summary of changes
Change timeout on timeline deletion in
`test_timeline_deletion_with_files_stuck_in_upload_queue` from 5 seconds
to 30 seconds.
## Problem
Deletions can be possibly reordered. Use fsync to avoid the case when
mark file doesnt exist but other tenant/timeline files do.
See added comments.
resolves#4987
Originated from test failure where we got SlowDown error from s3.
The patch generalizes `download_retry` to not be download specific.
Resulting `retry` function is moved to utils crate. `download_retries`
is now a thin wrapper around this `retry` function.
To ensure that all needed retries are in place test code now uses
`test_remote_failures=1` setting.
Ref https://neondb.slack.com/archives/C059ZC138NR/p1691743624353009
## Problem
Currently we delete local files first, so if pageserver restarts after
local files deletion then remote deletion is not continued. This can be
solved with inversion of these steps.
But even if these steps are inverted when index_part.json is deleted
there is no way to distinguish between "this timeline is good, we just
didnt upload it to remote" and "this timeline is deleted we should
continue with removal of local state". So to solve it we use another
mark file. After index part is deleted presence of this mark file
indentifies that it was a deletion intention.
Alternative approach that was discussed was to delete all except
metadata first, and then delete metadata and index part. In this case we
still do not support local only configs making them rather unsafe
(deletion in them is already unsafe, but this direction solidifies this
direction instead of fixing it). Another downside is that if we crash
after local metadata gets removed we may leave dangling index part on
the remote which in theory shouldnt be a big deal because the file is
small.
It is not a big change to choose another approach at this point.
## Summary of changes
Timeline deletion sequence:
1. Set deleted_at in remote index part.
2. Create local mark file.
3. Delete local files except metadata (it is simpler this way, to be
able to reuse timeline initialization code that expects metadata)
4. Delete remote layers
5. Delete index part
6. Delete meta, timeline directory.
7. Delete mark file.
This works for local only configuration without remote storage.
Sequence is resumable from any point.
resolves#4453
resolves https://github.com/neondatabase/neon/pull/4552 (the issue was
created with async cancellation in mind, but we can still have issues
with retries if metadata is deleted among the first by remove_dir_all
(which doesnt have any ordering guarantees))
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
## Problem
1. During the rollout we got a panic: "timeline that we were deleting
was concurrently removed from 'timelines' map" that was caused by lock
guard not being propagated to the background part of the deletion.
Existing test didnt catch it because failpoint that was used for
verification was placed earlier prior to background task spawning.
2. When looking at surrounding code one more bug was detected. We
removed timeline from the map before deletion is finished, which breaks
client retry logic, because it will indicate 404 before actual deletion
is completed which can lead to client stopping its retry poll earlier.
## Summary of changes
1. Carry the lock guard over to background deletion. Ensure existing
test case fails without applied patch (second deletion becomes stuck
without it, which eventually leads to a test failure).
2. Move delete_all call earlier so timeline is removed from the map is
the last thing done during deletion.
Additionally I've added timeline_id to the `update_gc_info` span,
because `debug_assert_current_span_has_tenant_and_timeline_id` in
`download_remote_layer` was firing when `update_gc_info` lead to
on-demand downloads via `find_lsn_for_timestamp` (caught by @problame).
This is not directly related to the PR but fixes possible flakiness.
Another smaller set of changes involves deletion wrapper used in python
tests. Now there is a simpler wrapper that waits for deletions to
complete `timeline_delete_wait_completed`. Most of the
test_delete_timeline.py tests make negative tests, i.e., "does
ps_http.timeline_delete() fail in this and that scenario".
These can be left alone. Other places when we actually do the deletions,
we need to use the helper that polls for completion.
Discussion
https://neondb.slack.com/archives/C03F5SM1N02/p1686668007396639resolves#4496
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
Delete data from s3 when timeline deletion is requested
## Summary of changes
UploadQueue is altered to support scheduling of delete operations in
stopped state. This looks weird, and I'm thinking whether there are
better options/refactorings for upload client to make it look better.
Probably can be part of https://github.com/neondatabase/neon/issues/4378
Deletion is implemented directly in existing endpoint because changes are not
that significant. If we want more safety we can separate those or create
feature flag for new behavior.
resolves [#4193](https://github.com/neondatabase/neon/issues/4193)
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
Attach failures are not reported in public part of the api (in
`attachment_status` field of TenantInfo).
## Summary of changes
Expose TenantState::Broken as TenantAttachmentStatus::Failed
In the way its written Failed status will be reported even if no
attachment happened. (I e if tenant become broken on startup). This is
in line with other members. I e Active will be resolved to Attached even
if no actual attach took place.
This can be tweaked if needed. At the current stage it would be overengineering without clear motivation
resolves#4344
Before this patch, the following sequence would lead to the resurrection of a deleted timeline:
- create timeline
- wait for its index part to reach s3
- delete timeline
- wait an arbitrary amount of time, including 0 seconds
- detach tenant
- attach tenant
- the timeline is there and Active again
This happens because we only kept track of the deletion in the tenant dir (by deleting the timeline dir) but not in S3.
The solution is to turn the deleted timeline's IndexPart into a tombstone.
The deletion status of the timeline is expressed in the `deleted_at: Option<NativeDateTime>` field of IndexPart.
It's `None` while the timeline is alive and `Some(deletion time stamp)` if it is deleted.
We change the timeline deletion handler to upload this tombstoned IndexPart.
The handler does not return success if the upload fails.
Coincidentally, this fixes the long-stanging TODO about the `std::fs::remove_dir_all` being not atomic.
It need not be atomic anymore because we set the `deleted_at=Some()` before starting the `remove_dir_all`.
The tombstone is in the IndexPart only, not in the `metadata`.
So, we only have the tombstone and the `remove_dir_all` benefits mentioned above if remote storage is configured.
This was a conscious trade-off because there's no good format evolution story for the current metadata file format.
The introduction of this additional step into `delete_timeline` was painful because delete_timeline needs to be
1. cancel-safe
2. idempotent
3. safe to call concurrently
These are mostly self-inflicted limitations that can be avoided by using request-coalescing.
PR https://github.com/neondatabase/neon/pull/4159 will do that.
fixes https://github.com/neondatabase/neon/issues/3560
refs https://github.com/neondatabase/neon/issues/3889 (part of tenant relocation)
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
It had a couple of inherent races:
1) Even if compute is killed before the call, some more data might still arrive
to safekeepers after commit_lsn on them is polled, advancing it. Then checkpoint
on pageserver might not include this tail, and so upload of expected LSN won't
happen until one more checkpoint.
2) commit_lsn is updated asynchronously -- compute can commit transaction before
communicating commit_lsn to even single safekeeper (sync-safekeepers can be used
to forces the advancement). This makes semantics of
wait_for_sk_commit_lsn_to_reach_remote_storage quite complicated.
Replace it with last_flush_lsn_upload which
1) Learns last flush LSN on compute;
2) Waits for it to arrive to pageserver;
3) Checkpoints it;
4) Waits for the upload.
In some tests this keeps compute alive longer than before, but this doesn't seem
to be important.
There is a chance this fixes https://github.com/neondatabase/neon/issues/3209
Reason and backtrace are added to the Broken state. Backtrace is automatically collected when tenant entered the broken state. The format for API, CLI and metrics is changed and unified to return tenant state name in camel case. Previously snake case was used for metrics and camel case was used for everything else. Now tenant state field in TenantInfo swagger spec is changed to contain state name in "slug" field and other fields (currently only reason and backtrace for Broken variant in "data" field). To allow for this breaking change state was removed from TenantInfo swagger spec because it was not used anywhere.
Please note that the tenant's broken reason is not persisted on disk so the reason is lost when pageserver is restarted.
Requires changes to grafana dashboard that monitors tenant states.
Closes#3001
---------
Co-authored-by: theirix <theirix@gmail.com>