Commit Graph

1867 Commits

Author SHA1 Message Date
Christian Schwarz
f9b6e7bbbe Merge branch 'main' into problame/2024-02-walredo-work/prespawn/switch-to-heavier-once-cell-with-rwlock 2024-02-06 15:33:09 +00:00
Christian Schwarz
d7b29aace7 refactor(walredo): don't create WalRedoManager for broken tenants (#6597)
When we'll later introduce a global pool of pre-spawned walredo
processes (https://github.com/neondatabase/neon/issues/6581), this
refactoring avoids plumbing through the reference to the pool to all the
places where we create a broken tenant.

Builds atop the refactoring in #6583
2024-02-06 16:20:02 +01:00
Christian Schwarz
53a3ed0a7e debug_assert presence of shard_id tracing field (#6572)
also:
fixes https://github.com/neondatabase/neon/issues/6638
2024-02-06 14:43:33 +00:00
John Spray
6297843317 tests: flakiness fixes in pageserver tests (#6632)
Fix several test flakes:
- test_sharding_service_smoke had log failures on "Dropped LSN updates"
- test_emergency_mode had log failures on a deletion queue shutdown
check, where the check was incorrect because it was expecting channel
receiver to stay alive after cancellation token was fired.
- test_secondary_mode_eviction had racing heatmap uploads because the
test was using a live migration hook to set up locations, where that
migration was itself uploading heatmaps and generally making the
situation more complex than it needed to be.

These are the failure modes that I saw when spot checking the last few
failures of each test.

This will mostly/completely address #6511, but I'll leave that ticket
open for a couple days and then check if either of the tests named in
that ticket are flaky.

Related #6511
2024-02-06 12:49:41 +00:00
Christian Schwarz
0de46fd6f2 heavier_once_cell: switch to tokio::sync::RwLock (#6589)
Using the RwLock reduces contention on the hot path.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2024-02-06 14:04:15 +02:00
Joonas Koivunen
53743991de uploader: avoid cloning vecs just to get Bytes (#6645)
Fix cloning the serialized heatmap on every attempt by just turning it
into `bytes::Bytes` before clone so it will be a refcounted instead of
refcounting a vec clone later on.

Also fixes one cancellation token cloning I had missed in #6618.
Cc: #6096
2024-02-06 11:34:13 +00:00
Christian Schwarz
edcde05c1c refactor(walredo): split up the massive walredo.rs (#6583)
Part of https://github.com/neondatabase/neon/issues/6581
2024-02-06 09:44:49 +00:00
Christian Schwarz
e196d974cc pagebench: actually implement --num_clients (#6640)
Will need this to validate per-tenant throttling in
https://github.com/neondatabase/neon/issues/5899
2024-02-06 10:34:16 +01:00
Joonas Koivunen
947165788d refactor: needless cancellation token cloning (#6618)
The solution we ended up for `backoff::retry` requires always cloning of
cancellation tokens even though there is just `.await`. Fix that, and
also turn the return type into `Option<Result<T, E>>` avoiding the need
for the `E::cancelled()` fn passed in.

Cc: #6096
2024-02-06 09:39:06 +02:00
Joonas Koivunen
5e8deca268 metrics: remove broken tenants (#6586)
Before tenant migration it made sense to leak broken tenants in the
metrics until restart. Nowdays it makes less sense because on
cancellations we set the tenant broken. The set metric still allows
filterable alerting.

Fixes: #6507
2024-02-05 14:49:35 +02:00
Joonas Koivunen
db89b13aaa fix: use the shared constant download buffer size (#6620)
Noticed that we had forgotten to use
`remote_timeline_client.rs::BUFFER_SIZE` in one instance.
2024-02-05 13:10:08 +01:00
Arpad Müller
56cf360439 Don't preserve temp files on creation errors of delta layers (#6612)
There is currently no cleanup done after a delta layer creation error,
so delta layers can accumulate. The problem gets worse as the operation
gets retried and delta layers accumulate on the disk. Therefore, delete
them from disk (if something has been written to disk).
2024-02-05 09:53:37 +00:00
Joonas Koivunen
70f646ffe2 More logging fixes (#6584)
I was on-call this week, these would had made me understand more/faster
of the system:
- move stray attaching start logging inside the span it starts, add
generation
- log ancestor timeline_id or bootstrapping in the beginning of timeline
creation
2024-02-05 09:34:03 +02:00
Arpad Müller
aac8eb2c36 Minor logging improvements (#6593)
* log when `lsn_by_timestamp` finished together with its result
* add back logging of the layer name as suggested in
https://github.com/neondatabase/neon/pull/6549#discussion_r1475756808
2024-02-03 02:16:20 +01:00
Christian Schwarz
f8652fc738 Merge branch 'problame/2024-02-walredo-work/prespawn/heaver-once-cell-for-process-launch' into problame/2024-02-walredo-work/prespawn/switch-to-heavier-once-cell-with-rwlock 2024-02-02 18:36:17 +00:00
Christian Schwarz
bbf954d411 Merge branch 'problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo' into problame/2024-02-walredo-work/prespawn/heaver-once-cell-for-process-launch 2024-02-02 18:36:16 +00:00
Christian Schwarz
efb3e7bb15 Merge branch 'problame/2024-02-walredo-work/prespawn/split-code' into problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo 2024-02-02 18:36:15 +00:00
Christian Schwarz
01688a5ce1 Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code 2024-02-02 18:36:14 +00:00
John Spray
2e5eab69c6 tests: remove test_gc_cutoff (#6587)
This test became flaky when postgres retry handling was fixed to use
backoff delays -- each iteration in this test's loop was taking much
longer because pgbench doesn't fail until postgres has given up on
retrying to the pageserver.

We are just removing it, because the condition it tests is no longer
risky: we reload all metadata from remote storage on restart, so
crashing directly between making local changes and doing remote uploads
isn't interesting any more.

Closes:  https://github.com/neondatabase/neon/issues/2856
Closes: https://github.com/neondatabase/neon/issues/5329
2024-02-02 18:20:18 +00:00
Christian Schwarz
64b4b498a4 Revert "remove the walredo usage, that'll be in the next pr"
This reverts commit 20e82629df.
2024-02-02 17:25:25 +00:00
Christian Schwarz
20e82629df remove the walredo usage, that'll be in the next pr 2024-02-02 17:21:59 +00:00
Christian Schwarz
0a09cff816 heavier_once_cell: switch to tokio::sync::RwLock
Using the RwLock reduces contention on the hot path.
2024-02-02 17:09:56 +00:00
Christian Schwarz
c29532cded Revert "Revert "[DO NOT MERGE] refactor(walredo): use replace RwLock with heavier_once_cell""
This reverts commit 6d94d9fb19.
2024-02-02 16:43:14 +00:00
Christian Schwarz
1102d3f0bf Revert "switch to tokio::RwLock"
This reverts commit e8f1af5527.
2024-02-02 16:43:08 +00:00
Christian Schwarz
e8f1af5527 switch to tokio::RwLock 2024-02-02 16:42:54 +00:00
Christian Schwarz
6d94d9fb19 Revert "[DO NOT MERGE] refactor(walredo): use replace RwLock with heavier_once_cell"
This reverts commit 2ab2608d4c.
2024-02-02 16:15:37 +00:00
Christian Schwarz
84169c926a Merge branch 'problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo' into problame/2024-02-walredo-work/prespawn/heaver-once-cell-for-process-launch 2024-02-02 15:53:57 +00:00
Christian Schwarz
acdebf2cec Merge branch 'problame/2024-02-walredo-work/prespawn/split-code' into problame/2024-02-walredo-work/prespawn/broken-tenants-no-walredo 2024-02-02 15:53:56 +00:00
Christian Schwarz
44cb5e5be6 Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code 2024-02-02 15:53:55 +00:00
John Spray
46fb1a90ce pageserver: avoid calculating/sending logical sizes on shard !=0 (#6567)
## Problem

Sharded tenants only maintain accurate relation sizes on shard 0.
Therefore logical size can only be calculated on shard 0. Fortunately it
is also only _needed_ on shard 0, to provide Safekeeper feedback and to
send consumption metrics.

Closes: #6307

## Summary of changes

- Send 0 for logical size to safekeepers on shards !=0
- Skip logical size warmup task on shards !=0
- Skip imitate_layer_accesses on shards !=0
2024-02-02 15:52:03 +00:00
Christian Schwarz
2ab2608d4c [DO NOT MERGE] refactor(walredo): use replace RwLock with heavier_once_cell
The API is nice, exactly what we want, but we would want a more
optimistic underlying sync primitive.
2024-02-02 15:36:15 +00:00
Christian Schwarz
f73aa3eb32 refactor(walredo): avoid the need for a WalRedoManager in broken tenants
When we'll later introduce a global pool of pre-spawned walredo
processes (https://github.com/neondatabase/neon/issues/6581), this
refactoring avoids plumbing through the reference to the pool to all the
places where we create a broken tenant.

Builds atop the refactoring in #6583
2024-02-02 14:52:53 +00:00
Christian Schwarz
2374e1318e Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code 2024-02-02 14:42:30 +00:00
John Spray
56171cbe8c pageserver: more permissive activation timeout when testing (#6564)
## Problem

The 5 second activation timeout is appropriate for production
environments, where we want to give a prompt response to the cloud
control plane, and if we fail it will retry the call. In tests however,
we don't want every call to e.g. timeline create to have to come with a
retry wrapper.

This issue has always been there, but it is more apparent in sharding
tests that concurrently attach several tenant shards.

Closes: https://github.com/neondatabase/neon/issues/6563

## Summary of changes

When `testing` feature is enabled, make `ACTIVE_TENANT_TIMEOUT` 30
seconds instead of 5 seconds.
2024-02-02 15:14:42 +01:00
Arpad Müller
48b05b7c50 Add a time_travel_remote_storage http endpoint (#6533)
Adds an endpoint to the pageserver to S3-recover an entire tenant to a
specific given timestamp.

Required input parameters:
* `travel_to`: the target timestamp to recover the S3 state to
* `done_if_after`: a timestamp that marks the beginning of the recovery
process. retries of the query should keep this value constant. it *must*
be after `travel_to`, and also after any changes we want to revert, and
must represent a point in time before the endpoint is being called, all
of these time points in terms of the time source used by S3. these
criteria need to hold even in the face of clock differences, so I
recommend waiting a specific amount of time, then taking
`done_if_after`, then waiting some amount of time again, and only then
issuing the request.

Also important to note: the timestamps in S3 work at second accuracy, so
one needs to add generous waits before and after for the process to work
smoothly (at least 2-3 seconds).

We ignore the added test for the mocked S3 for now due to a limitation
in moto: https://github.com/getmoto/moto/issues/7300 .

Part of https://github.com/neondatabase/cloud/issues/8233
2024-02-02 14:52:12 +01:00
Christian Schwarz
aa0e9fdaef Merge branch 'main' into problame/2024-02-walredo-work/prespawn/split-code 2024-02-02 11:50:15 +00:00
John Spray
24e916d37f pageserver: fix a syntax error in swagger (#6566)
A description was written as a follow-on to a section line, rather than
in the proper `description:` part. This caused swagger parsers to
rightly reject it.
2024-02-02 10:35:09 +00:00
Christian Schwarz
9b8aa270b8 cleanups 2024-02-02 10:19:18 +00:00
Christian Schwarz
4571db1750 extract NeonWalRecord apply logic 2024-02-02 10:14:50 +00:00
Christian Schwarz
6fe534fea3 move protocol ad child module of process, where it belongs 2024-02-02 10:05:50 +00:00
Christian Schwarz
8b258e20a0 move more stuff around 2024-02-02 10:03:40 +00:00
Christian Schwarz
29eec6c563 split off walredo process & protocol from walredo.rs 2024-02-02 09:59:31 +00:00
Heikki Linnakangas
350865392c Print checkpoint key contents with "pagectl print-layer-file" (#6541)
This was very useful in debugging the bugs fixed in #6410 and #6502.

There's a lot more we could do. This only adds the printing to delta
layers, not image layers, for example, and it might be useful to print
details of more record types. But this is a good start.
2024-02-02 01:35:31 +02:00
Christian Schwarz
1be5e564ce feat(walredo): use posix_spawn by moving close_fds() work to walredo C code (#6574)
The rust stdlib uses the efficient `posix_spawn` by default.
However, before this PR, pageserver used `pre_exec()` in our
`close_fds()` ext trait.

This PR moves the work that `close_fds()` did to the walredo C code.
I verified manually using `gdb` that we're now forking out the walredo
process using `posix_spawn`.

refs https://github.com/neondatabase/neon/issues/6565
2024-02-01 22:38:34 +01:00
Christian Schwarz
7a70ef991f feat(walredo): various observability improvements (#6573)
- log when we start walredo process
- include tenant shard id in walredo argv
- dump some basic walredo state in tenant details api
- more suitable walredo process launch histogram buckets
- avoid duplicate tracing labels in walredo launch spans
2024-02-01 21:59:40 +01:00
Vlad Lazar
221531c9db pageserver: lift ancestor timeline logic from read path (#6543)
When the read path needs to follow a key into the ancestor timeline, it
needs to wait for said ancestor to become active and aware of it's
branching lsn. The logic is lifted into a separate function with it's
own new error type.

This is done because the vectored read path needs the same logic. It's
also the reason for the newly introduced error type.

When we'll switch the read path to proxy into `get_vectored`, we can
remove the duplicated variants from `PageReconstructError`.
2024-02-01 10:35:18 +00:00
Christian Schwarz
4c173456dc pagebench: fix percentiles reporting (#6547)
Before this patch, pagebench was always showing the same value.

refs https://github.com/neondatabase/neon/issues/6509
2024-01-31 23:29:48 +00:00
Christian Schwarz
e82625b77d refactor(pageserver main): signal handling (#6554)
This refactoring makes it easier to experimentally replace
BACKGROUND_RUNTIME with a single-threaded runtime. Found this useful
[during benchmarking](https://github.com/neondatabase/neon/pull/6555).
2024-01-31 23:25:57 +00:00
Joonas Koivunen
3d5fab127a rewrite Gate impl for better observability (#6542)
changes:
- two messages instead of message every second when gate was closing
- replace the gate name string by using a pointer
- slow GateGuards are likely to log who they were (see example)

example found in regress tests: <https://github.com/neondatabase/neon/pull/6542#issuecomment-1919009256>
2024-01-31 22:15:58 +00:00
Joonas Koivunen
66719d7eaf logging: fix span usage (#6549)
Fixes some duplication due to extra or misconfigured `#[instrument]`,
while filling in the `timeline_id` to delete timeline flow calls.
2024-01-31 20:52:00 +00:00