Commit Graph

3624 Commits

Author SHA1 Message Date
John Spray
18159b7695 deletion queue: expose errors from push/flush 2023-08-22 10:01:10 +01:00
John Spray
c1bc9c0f70 Various test fixes + tweaks to flushing 2023-08-18 12:44:35 +01:00
John Spray
2de5efa208 Fix broken wait_untils in test_remote_storage_upload_queue_retries 2023-08-18 12:44:35 +01:00
John Spray
d330eac4bc clippy 2023-08-18 12:44:35 +01:00
John Spray
3ebceeda71 pageserver: refactor timeline args into TimelineResources
This sidesteps clippy complaining about function arg counts,
and will enable introducing more shared structures in future
without the noise of adding extra args to all the functions
involved in timeline setup.
2023-08-18 12:44:35 +01:00
John Spray
31729d6f4d pageserver: refactor tenant args into a structure
This way, when we add some new shared structure that the
tenants need a reference to, we do not have to add it
individually as an extra argument to the various functions.
2023-08-18 12:44:35 +01:00
John Spray
7e0e3517c1 clippy 2023-08-18 12:44:35 +01:00
John Spray
c4fc6e433d tests: add e2e deletion queue recovery test 2023-08-18 12:44:35 +01:00
John Spray
c36cba28d6 pageserver: generalize flush API 2023-08-18 12:44:35 +01:00
John Spray
8eaa4015de deletion queue: versions in keys 2023-08-18 12:44:35 +01:00
John Spray
10e927ee3e Add encoding versions to deletion queue structs 2023-08-18 12:44:35 +01:00
John Spray
bb3a59f275 clippy 2023-08-18 12:44:35 +01:00
John Spray
a0ed43cc12 deletion queue: add DeletionHeader for sequence numbers 2023-08-18 12:44:35 +01:00
John Spray
99dc5a5c27 Deletion queue: implement recovery on startup 2023-08-18 12:44:35 +01:00
John Spray
54db1f5d8a remote_storage: add a helper for downloading full objects
This is only for use with small objects that we will
deserialize in a non-streaming way.

Also add a strip_prefix method to RemotePath.
2023-08-18 12:44:35 +01:00
John Spray
404b25e45f Remove vestigial remote_timeline_client deletion paths 2023-08-18 12:44:35 +01:00
John Spray
f4dba9f907 tests: update tenant deletion tests for deletion queue 2023-08-18 12:44:35 +01:00
John Spray
4ec45bc7dc tests: update tenant deletion tests for deletion queue 2023-08-18 12:44:35 +01:00
John Spray
a00d4a8d8c tests: update test_remote_timeline_client_calls_started_metric for deletion queue 2023-08-18 12:44:35 +01:00
John Spray
43c9a09d8f tests: update remote storage test for deletion queue 2023-08-18 12:44:35 +01:00
John Spray
3edd7ece40 deletion queue: improve frontend retry 2023-08-18 12:44:35 +01:00
John Spray
504fe9c2b0 pageserver: send timeline deletions through the deletion queue 2023-08-18 12:44:35 +01:00
John Spray
10df237a81 deletion queue: add push for generic objects (layers and garbage) 2023-08-18 12:44:35 +01:00
John Spray
d40f8475a5 Error metric and retries 2023-08-18 12:44:35 +01:00
John Spray
164f916a40 Spawn deletion workers with info spans 2023-08-18 12:44:35 +01:00
John Spray
4ebc29768c Add failpoint for deletion execution 2023-08-18 12:44:35 +01:00
John Spray
bae62916dc pageserver/http: add /v1/deletion_queue/flush_execute
This is principally for tesing, but might be useful in
the field if we want to e.g. flush a deletion queue
before running an external scrub tool
2023-08-18 12:44:35 +01:00
John Spray
5e2b8b376c utils: add ApiError::ShuttingDown
So that handlers that check their CancellationToken
explicitly can map it to a set http status.
2023-08-18 12:44:35 +01:00
John Spray
54ec7919b8 pageserver: add deletion queue submitted/executed metrics 2023-08-18 12:44:35 +01:00
John Spray
e0bed0732c Tweak deletion queue constants 2023-08-18 12:44:35 +01:00
John Spray
9e92121cc3 pageserver: flush deletion queue on clean shutdown 2023-08-18 12:44:35 +01:00
John Spray
50a9508f4f clippy 2023-08-18 12:44:35 +01:00
John Spray
f61402be24 pageserver: testing for deletion queue 2023-08-18 12:44:35 +01:00
John Spray
975e4f2235 Refactor deletion worker construction 2023-08-18 12:44:35 +01:00
John Spray
537eca489e Implement flush_execute() in deletion queue 2023-08-18 12:44:35 +01:00
John Spray
de4882886e pageserver: implement batching in deletion queue 2023-08-18 12:44:35 +01:00
John Spray
6982288426 pageserver: implement frontend of deletion queue 2023-08-18 12:44:35 +01:00
John Spray
ccfcfa1098 remote_storage: implement Serialize/Deserialize for RemotePath 2023-08-18 12:44:35 +01:00
John Spray
e2c793c897 Use deletion queue in schedule_layer_file_deletion 2023-08-18 12:44:33 +01:00
John Spray
0fdc492aa4 Add MockDeletionQueue for unit tests 2023-08-18 11:25:40 +01:00
John Spray
787b099541 wire deletion queue into timeline 2023-08-18 11:25:40 +01:00
John Spray
3af693749d pageserver: wire deletion queue through to Tenant 2023-08-18 11:25:40 +01:00
John Spray
6f9ae6bb5f pageserver: instantiate deletion queue at process scope 2023-08-18 11:25:40 +01:00
John Spray
16d77dcb73 Initial stub implementation of deletion queue 2023-08-18 11:25:40 +01:00
Joonas Koivunen
67af24191e test: cleanup remote_timeline_client tests (#5013)
I will have to change these as I change remote_timeline_client api in
#4938. So a bit of cleanup, handle my comments which were just resolved
during initial review.

Cleanup:
- use unwrap in tests instead of mixed `?` and `unwrap`
- use `Handle` instead of `&'static Reactor` to make the
RemoteTimelineClient more natural
- use arrays in tests
- use plain `#[tokio::test]`
2023-08-17 19:27:30 +03:00
Joonas Koivunen
6af5f9bfe0 fix: format context (#5022)
We return an error with unformatted `{timeline_id}`.
2023-08-17 14:30:25 +00:00
Dmitry Rodionov
64fc7eafcd Increase timeout once again. (#5021)
When failpoint is early in deletion process it takes longer to complete
after failpoint is removed.

Example was: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5889544346/index.html#suites/3556ed71f2d69272a7014df6dcb02317/49826c68ce8492b1
2023-08-17 15:37:28 +03:00
Conrad Ludgate
3e4710c59e proxy: add more sasl logs (#5012)
## Problem

A customer is having trouble connecting to neon from their production
environment. The logs show a mix of "Internal error" and "authentication
protocol violation" but not the full error

## Summary of changes

Make sure we don't miss any logs during SASL/SCRAM
2023-08-17 12:05:54 +01:00
Dmitry Rodionov
d8b0a298b7 Do not attach deleted tenants (#5008)
Rather temporary solution before proper:
https://github.com/neondatabase/neon/issues/5006

It requires more plumbing so lets not attach deleted tenants first and
then implement resume.

Additionally fix `assert_prefix_empty`. It had a buggy prefix calculation,
and since we always asserted for absence of stuff it worked. Here I
started to assert for presence of stuff too and it failed. Added more
"presence" asserts to other places to be confident that it works.

Resolves [#5016](https://github.com/neondatabase/neon/issues/5016)
2023-08-17 13:46:49 +03:00
Alexander Bayandin
c8094ee51e test_compatibility: run amcheck unconditionally (#4985)
## Problem

The previous version of neon (that we use in the forward compatibility test)
has installed `amcheck` extension now. We can run `pg_amcheck`
unconditionally.

## Summary of changes
- Run `pg_amcheck` in compatibility tests unconditionally
2023-08-17 11:46:00 +01:00