Commit Graph

5051 Commits

Author SHA1 Message Date
Christian Schwarz
6c5c2c80d2 Revert "adjust bench for both sync and async benchmarking"
This reverts commit afd2c1369e.
2024-04-10 15:43:08 +00:00
Christian Schwarz
c9e8ae40dd Revert "benchmark numbers"
This reverts commit 6e0a02de27.
2024-04-09 11:35:47 +00:00
Christian Schwarz
0047481cc8 Revert "Revert "Revert "HACK: restore old impl, make runtime configurable (how to: reconfigure via HTTP, then kill existing walredo procs)"""
This reverts commit bea2e121dd.
2024-04-09 11:35:07 +00:00
Christian Schwarz
feaca7bc1e Revert "fixup: re-apply bring-back of wal_redo_timeout changes after file movements"
This reverts commit f489a10509.
2024-04-09 11:32:56 +00:00
Christian Schwarz
5d12475664 Revert "Revert "Revert "make the default process kind runtime-configurable, and switch to sync"""
This reverts commit 825c0e30e8.
2024-04-09 11:32:47 +00:00
Christian Schwarz
6e0a02de27 benchmark numbers 2024-04-09 10:59:29 +00:00
Christian Schwarz
afd2c1369e adjust bench for both sync and async benchmarking 2024-04-09 11:21:16 +00:00
Christian Schwarz
b44bbd276e HACK: set walredo process kind metric on startup 2024-04-09 10:33:57 +00:00
Christian Schwarz
aad5a672f0 DO NOT MERGE: diable materialized page cache for benchmarking 2024-04-08 15:42:50 +00:00
Christian Schwarz
5565087dba Merge remote-tracking branch 'origin/problame/async-walredo/benchmarking-2024-04-08--1' into problame/async-walredo/benchmarking-2024-04-08--1 2024-04-08 15:40:34 +00:00
Christian Schwarz
825c0e30e8 Revert "Revert "make the default process kind runtime-configurable, and switch to sync""
This reverts commit b72891d28c.
2024-04-08 15:36:34 +00:00
Christian Schwarz
f489a10509 fixup: re-apply bring-back of wal_redo_timeout changes after file movements 2024-04-08 15:26:12 +00:00
Christian Schwarz
bea2e121dd Revert "Revert "HACK: restore old impl, make runtime configurable (how to: reconfigure via HTTP, then kill existing walredo procs)""
This reverts commit c38b3e6ad6.
2024-04-08 15:14:40 +00:00
Christian Schwarz
82e7e4d84a DO NOT MERGE: benchmarking setup 2024-04-08 14:58:41 +00:00
Christian Schwarz
4ef2fb29fa bring back wal_redo_timeout 2024-04-08 14:53:52 +00:00
Christian Schwarz
ffef90f3db Merge remote-tracking branch 'origin/main' into problame/integrate-tokio-epoll-uring/benchmarking/2024-01-31-prs/async-walredo 2024-04-08 14:31:48 +00:00
Christian Schwarz
1081a4d246 pageserver: option to run with just one tokio runtime (#7331)
This PR is an off-by-default revision v2 of the (since-reverted) PR
#6555 / commit `3220f830b7fbb785d6db8a93775f46314f10a99b`.

See that PR for details on why running with a single runtime is
desirable and why we should be ready.

We reverted #6555 because it showed regressions in prodlike cloudbench,
see the revert commit message `ad072de4209193fd21314cf7f03f14df4fa55eb1`
for more context.

This PR makes it an opt-in choice via an env var.

The default is to use the 4 separate runtimes that we have today, there
shouldn't be any performance change.

I tested manually that the env var & added metric works.

```
# undefined env var => no change to before this PR, uses 4 runtimes
./target/debug/neon_local start
# defining the env var enables one-runtime mode, value defines that one runtime's configuration
NEON_PAGESERVER_USE_ONE_RUNTIME=current_thread ./target/debug/neon_local start
NEON_PAGESERVER_USE_ONE_RUNTIME=multi_thread:1 ./target/debug/neon_local start
NEON_PAGESERVER_USE_ONE_RUNTIME=multi_thread:2 ./target/debug/neon_local start
NEON_PAGESERVER_USE_ONE_RUNTIME=multi_thread:default ./target/debug/neon_local start

```

I want to use this change to do more manualy testing and potentially
testing in staging.

Future Work
-----------

Testing / deployment ergonomics would be better if this were a variable
in `pageserver.toml`.
It can be done, but, I don't need it right now, so let's stick with the
env var.
2024-04-08 16:27:08 +02:00
Christian Schwarz
d8a926618e tokio-test not necessary 2024-04-08 14:26:22 +00:00
Christian Schwarz
c38b3e6ad6 Revert "HACK: restore old impl, make runtime configurable (how to: reconfigure via HTTP, then kill existing walredo procs)"
This reverts commit cca66e5e82.
2024-04-08 14:14:07 +00:00
Christian Schwarz
b72891d28c Revert "make the default process kind runtime-configurable, and switch to sync"
This reverts commit 67a7abc7cf.
2024-04-08 14:08:36 +00:00
Christian Schwarz
5efaddea02 Merge remote-tracking branch 'origin/problame/configurable-one-runtime' into problame/integrate-tokio-epoll-uring/benchmarking/2024-01-31-prs/async-walredo 2024-04-08 14:03:18 +00:00
Arpad Müller
47b705cffe Remove async_trait from CompactionDeltaLayer (#7342)
Removes usage of async_trait from the `CompactionDeltaLayer` trait.

Split off from #7301

Related earlier work: https://github.com/neondatabase/neon/pull/6305,
https://github.com/neondatabase/neon/pull/6464,
https://github.com/neondatabase/neon/pull/7303
2024-04-08 14:59:08 +02:00
Christian Schwarz
aa5439cb6e Merge remote-tracking branch 'origin/main' into problame/configurable-one-runtime 2024-04-08 12:24:43 +00:00
Christian Schwarz
2d3c9f0d43 refactor(pageserver): use tokio::signal instead of spawn_blocking (#7332)
It's just unnecessary to use spawn_blocking there, and with
https://github.com/neondatabase/neon/pull/7331 , it will result in
really just one executor thread when enabling one-runtime with
current_thread executor.
2024-04-08 09:35:32 +00:00
Joonas Koivunen
21b3e1d13b fix(utilization): return used as does df (#7337)
We can currently underflow `pageserver_resident_physical_size_global`,
so the used disk bytes would show `u63::MAX` by mistake. The assumption
of the API (and the documented behavior) was to give the layer files
disk usage.

Switch to reporting numbers that match `df` output.

Fixes: #7336
2024-04-08 09:01:38 +03:00
John Spray
0788760451 tests: further stabilize test_deletion_queue_recovery (#7335)
This is the other main failure mode called out in #6092 , that the test
can shut down the pageserver while it has "future layers" in the index,
and that this results in unexpected stats after restart.

We can avoid this nondeterminism by shutting down the endpoint, flushing
everything from SK to PS, checkpointing, and then waiting for that final
LSN to be uploaded. This is more heavyweight than most of our tests
require, but useful in the case of tests that expect a particular
behavior after restart wrt layer deletions.
2024-04-07 21:21:18 +00:00
John Spray
74b2314a5d control_plane: revise compute_hook locking (don't serialise all calls) (#7088)
## Problem

- Previously, an async mutex was held for the duration of
`ComputeHook::notify`. This served multiple purposes:
  - Ensure updates to a given tenant are sent in the proper order
- Prevent concurrent calls into neon_local endpoint updates in test
environments (neon_local is not safe to call concurrently)
- Protect the inner ComputeHook::state hashmap that is used to calculate
when to send notifications.

This worked, but had the major downside that while we're waiting for a
compute hook request to the control plane to succeed, we can't notify
about any other tenants. Notifications block progress of live
migrations, so this is a problem.

## Summary of changes

- Protect `ComputeHook::state` with a sync lock instead of an async lock
- Use a separate async lock ( `ComputeHook::neon_local_lock` ) for
preventing concurrent calls into neon_local, and only take this in the
neon_local code path.
- Add per-tenant async locks in ShardedComputeHookTenant, and use these
to ensure that only one remote notification can be sent at once per
tenant. If several shards update concurrently, their updates will be
coalesced.
- Add an explicit semaphore that limits concurrency of calls into the
cloud control plane.
2024-04-06 19:51:59 +00:00
Christian Schwarz
edcaae6290 fixup: PR #7319 defined workload.py def stop() twice (#7333)
Somehow it made it through CI.
2024-04-05 19:11:04 +00:00
Christian Schwarz
dc8e318a42 fix copy-pasta 2024-04-05 17:58:21 +00:00
Christian Schwarz
871a3caca9 change thread name 2024-04-05 17:58:03 +00:00
Christian Schwarz
edd7f69c2d make current_thread mode work
We need to have &'static Runtime, not &'static Handle, because
&'static Handle doesn't drive IO/timers on current_thread RT.
2024-04-05 17:51:04 +00:00
Christian Schwarz
70fb7e3580 metric, useful for rollout / analyzing grafana metrics 2024-04-05 17:34:11 +00:00
John Spray
4fc95d2d71 pageserver: apply shard filtering to blocks ingested during initdb (#7319)
## Problem

Ingest filtering wasn't being applied to timeline creations, so a
timeline created on a sharded tenant would use 20MB+ on each shard (each
shard got a full copy). This didn't break anything, but is inefficient
and leaves the system in a harder-to-validate state where shards
initially have some data that they will eventually drop during
compaction.

Closes: https://github.com/neondatabase/neon/issues/6649

## Summary of changes

- in `import_rel`, filter block-by-block with is_key_local
- During test_sharding_smoke, check that per-shard physical sizes are as
expected
- Also extend the test to check deletion works as expected (this was an
outstanding tech debt task)
2024-04-05 18:07:35 +01:00
John Spray
534c099b42 tests: improve stability of test_deletion_queue_recovery (#7325)
## Problem

As https://github.com/neondatabase/neon/issues/6092 points out, this
test was (ab)using a failpoint!() with 'pause', which was occasionally
causing index uploads to get hung on a stuck executor thread, resulting
in timeouts waiting for remote_consistent_lsn.

That is one of several failure modes, but by far the most frequent.

## Summary of changes

- Replace the failpoint! with a `sleep_millis_async`, which is not only
async but also supports clean shutdown.
- Improve debugging: log the consistent LSN when scheduling an index
upload
- Tidy: remove an unnecessary checkpoint in the test code, where
last_flush_lsn_upload had just been called (this does a checkpoint
internally)
2024-04-05 18:01:31 +01:00
Christian Schwarz
6b820bb423 fixup env var value parsing 2024-04-05 16:42:44 +00:00
John Spray
ec01292b55 storage controller: rename TenantState to TenantShard (#7329)
This is a widely used type that had a misleading name: it's not the
total state of a tenant, but rrepresents one shard.
2024-04-05 16:29:53 +00:00
Christian Schwarz
740efb0ab5 cleanup 2024-04-05 17:22:06 +02:00
Christian Schwarz
5cf45df692 remove env_config::Bool 2024-04-05 17:22:06 +02:00
Christian Schwarz
3779854f12 rename "single runtime" to "one runtime", allow configuring current_thread and multi_thread:$num_workers 2024-04-05 17:22:06 +02:00
Christian Schwarz
dc03f7a44f pageserver: ability to use a single runtime
This PR allows running the pageserver with a single tokio runtime.
2024-04-05 17:22:06 +02:00
Christian Schwarz
43cf9d10d2 env_config improvements 2024-04-05 17:22:06 +02:00
Christian Schwarz
31d4d1e233 env_config from PR #6125 2024-04-05 17:22:06 +02:00
John Spray
66fc465484 Clean up 'attachment service' names to storage controller (#7326)
The binary etc were renamed some time ago, but the path in the source
tree remained "attachment_service" to avoid disruption to ongoing PRs.
There aren't any big PRs out right now, so it's a good time to cut over.

- Rename `attachment_service` to `storage_controller`
- Move it to the top level for symmetry with `storage_broker` & to avoid
mixing the non-prod neon_local stuff (`control_plane/`) with the storage
controller which is a production component.
2024-04-05 16:18:00 +01:00
Conrad Ludgate
55da8eff4f proxy: report metrics based on cold start info (#7324)
## Problem

Would be nice to have a bit more info on cold start metrics.

## Summary of changes

* Change connect compute latency to include `cold_start_info`.
* Update `ColdStartInfo` to include HttpPoolHit and WarmCached.
* Several changes to make more use of interned strings
2024-04-05 16:14:50 +01:00
Arpad Müller
0fa517eb80 Update test-context dependency to 0.3 (#7303)
Updates the `test-context` dev-dependency of the `remote_storage` crate
to 0.3. This removes a lot of `async_trait` instances.

Related earlier work: #6305, #6464
2024-04-05 15:53:29 +02:00
Arthur Petukhovsky
8ceb4f0a69 Fix partial zero segment upload (#7318)
Found these logs on staging safekeepers:
```
INFO Partial backup{ttid=X/Y}: failed to upload 000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial: Failed to open file "/storage/safekeeper/data/X/Y/000000010000000000000000.partial" for wal backup: No such file or directory (os error 2)
INFO Partial backup{ttid=X/Y}:upload{name=000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial}: starting upload PartialRemoteSegment { status: InProgress, name: "000000010000000000000000_173_0000000000000000_0000000000000000_sk56.partial", commit_lsn: 0/0, flush_lsn: 0/0, term: 173 }
```

This is because partial backup tries to upload zero segment when there
is no data in timeline. This PR fixes this bug introduced in #6530.
2024-04-05 11:48:08 +01:00
John Spray
6019ccef06 tests: extend log allow list in test_storcon_cli (#7321)
This test was occasionally flaky: it already allowed the log for the
scheduler complaining about Stop state, but not the log for
maybe_reconcile complaining.
2024-04-05 11:44:15 +01:00
John Spray
0c6367a732 storage controller: fix repeated location_conf returning no shards (#7314)
## Problem

When a location_conf request was repeated with no changes, we failed to
build the list of shards in the result.

## Summary of changes

Remove conditional that only generated a list of updates if something
had really changed. This does some redundant database updates, but it is
preferable to having a whole separate code path for no-op changes.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-04-04 17:34:05 +00:00
John Spray
e17bc6afb4 pageserver: update mgmt_api to use TenantShardId (#7313)
## Problem

The API client was written around the same time as some of the server
APIs changed from TenantId to TenantShardId

Closes: https://github.com/neondatabase/neon/issues/6154

## Summary of changes

- Refactor mgmt_api timeline_info and keyspace methods to use
TenantShardId to match the server

This doesn't make pagebench sharding aware, but it paves the way to do
so later.
2024-04-04 18:23:45 +01:00
John Spray
ac7fc6110b pageserver: handle WAL gaps on sharded tenants (#6788)
## Problem

In the test for https://github.com/neondatabase/neon/pull/6776, a test
cases uses tiny layer sizes and tiny stripe sizes. This hits a scenario
where a shard's checkpoint interval spans a region where none of the
content in the WAL is ingested by this shard. Since there is no layer to
flush, we do not advance disk_consistent_lsn, and this causes the test
to fail while waiting for LSN to advance.

## Summary of changes

- Pass an LSN through `layer_flush_start_tx`. This is the LSN to which
we have frozen at the time we ask the flush to flush layers frozen up to
this point.
- In the layer flush task, if the layers we flush do not reach
`frozen_to_lsn`, then advance disk_consistent_lsn up to this point.
- In `maybe_freeze_ephemeral_layer`, handle the case where
last_record_lsn has advanced without writing a layer file: this ensures
that disk_consistent_lsn and remote_consistent_lsn advance anyway.

The net effect is that the disk_consistent_lsn is allowed to advance
past regions in the WAL where a shard ingests no data, and that we
uphold our guarantee that remote_consistent_lsn always eventually
reaches the tip of the WAL.

The case of no layer at all is hard to test at present due to >0 shards
being polluted with SLRU writes, but I have tested it locally with a
branch that disables SLRU writes on shards >0. We can tighten up the
testing on this in future as/when we refine shard filtering (currently
shards >0 need the SLRU because they use it to figure out cutoff in GC
using timestamp-to-lsn).
2024-04-04 16:54:38 +00:00