Commit Graph

4477 Commits

Author SHA1 Message Date
Christian Schwarz
1daeba6d87 another attempt to reduce allocations, don't know if helpers, certainly didn't eliminate all of them 2024-01-30 14:10:20 +00:00
Christian Schwarz
f3e1ae6740 try (and fail) to implement borrowed deserialize of Value
(neon-_e02wX9z-py3.9) admin@ip-172-31-13-23:[~/neon-main]: cargo lcheck  --features testing
    Checking pageserver v0.1.0 (/home/admin/neon-main/pageserver)
    Building [=======================> ] 716/721: pageserver
error: implementation of `Deserialize` is not general enough
   --> pageserver/src/tenant/storage_layer/inmemory_layer.rs:179:29
    |
179 |                 let value = ValueDe::des(&reconstruct_state.scratch)?;
    |                             ^^^^^^^^^^^^ implementation of `Deserialize` is not general enough
    |
    = note: `ValueDe<'_>` must implement `Deserialize<'0>`, for any lifetime `'0`...
    = note: ...but `ValueDe<'_>` actually implements `Deserialize<'1>`, for some specific lifetime `'1`

error: implementation of `Deserialize` is not general enough
   --> pageserver/src/tenant/storage_layer/delta_layer.rs:792:23
    |
792 |             let val = ValueDe::des(&reconstruct_state.scratch).with_context(|| {
    |                       ^^^^^^^^^^^^ implementation of `Deserialize` is not general enough
    |
    = note: `ValueDe<'_>` must implement `Deserialize<'0>`, for any lifetime `'0`...
    = note: ...but `ValueDe<'_>` actually implements `Deserialize<'1>`, for some specific lifetime `'1`
2024-01-30 10:43:24 +00:00
Christian Schwarz
de8076d97d use smallvec & pooling to avoid allocations on reconstruction path 2024-01-30 09:37:53 +00:00
Christian Schwarz
a28cdf1c28 wal-redo: consume reconstruct state as references (needed for next patch, useful indepdendently)
We didn't take advantage of having the owned types inside walredo.rs,
might as well just pass them in as reference so we can re-use their
allocation in the next commit.
2024-01-30 09:11:16 +00:00
Christian Schwarz
3cd4f8aa59 possibly found the place where we do all those allocations, will check tomorrow 2024-01-29 20:25:02 +00:00
Christian Schwarz
c98215674c avoid Vec::new() in walredo code path; still no dramatic improvement over before_scratch.svg 2024-01-29 20:10:03 +00:00
Christian Schwarz
0e3561f6d1 WIP: try to eliminate the raw_vec::finish_grow and bytes::promotable_even-drop
This one doesn't make a big difference.
2024-01-29 19:52:05 +00:00
Christian Schwarz
28a4247c97 rip out slot pinning, has about 5% speedup 2024-01-29 19:37:35 +00:00
Christian Schwarz
70bc01494c Revert "broken impl of a permit pool to shave off its allocations"
This reverts commit a1af2c7150.
2024-01-29 19:23:09 +00:00
Christian Schwarz
a1af2c7150 broken impl of a permit pool to shave off its allocations 2024-01-29 19:22:55 +00:00
Christian Schwarz
043ed5edea for posterity: RSS is about 18GB with previous bench at env.pageserver_config_override='page_cache_size=2097152;max_file_descriptors=500000;virtual_file_io_engine="tokio-epoll-uring"' 2024-01-29 18:35:07 +00:00
Christian Schwarz
6753ff089c results: req_lru_size=2 gives tokio-epoll-uring 16k GetPage/s@110kIOPs std-fs: 9.5 GetPage/s @ 65k IOPS
RUST_BACKTRACE=1 ./target/release/pagebench get-page-latest-lsn --mgmt-api-endpoint http://localhost:15011 --page-service-connstring=postgresql://localhost:15010  --keyspace-cache keyspace.cache   --limit-to-first-n-targets 1000 --set-io-engine tokio-epoll-uring --set-req-lru-size 2 --runtime 2m

Biggest gain with lru_size from 0 to 1, yay.
Adding one more gives another 1-2k

cgroup mem.high unlimited
made sure global page cache is large enough to not have any misses

MAKE SURE TO WARM UP, IT TAKES A WHILE, STILL DON'T KNOW WHY WARMUP IS
THAT BADLY NEEDED

std-fs: 50% cpu, lot of iowait
2024-01-29T18:25:52.923572Z  INFO all clients stopped
{
  "total": {
    "request_count": 1194213,
    "latency_mean": "68ms 343us",
    "latency_percentiles": {
      "p95": "152ms 63us",
      "p99": "201ms 215us",
      "p99.9": "260ms 991us",
      "p99.99": "314ms 623us"
    }
  }
}

tokio-epoll-uring: 100%cpu utilization
Disk isn't saturated.
We're CPU bound here.

{
  "total": {
    "request_count": 1927700,
    "latency_mean": "43ms 11us",
    "latency_percentiles": {
      "p95": "83ms 263us",
      "p99": "101ms 887us",
      "p99.9": "124ms 991us",
      "p99.99": "147ms 583us"
    }
  }
}
2024-01-29 18:33:04 +00:00
Christian Schwarz
49a5e411d6 implement request-scoped LRU cache 2024-01-29 18:22:00 +00:00
Christian Schwarz
21a11822e8 results: tokio-epoll-uring 3.3kGetPage/s@240k IOPS, std-fs: 1.2kGetPage/s@80k IOPS
We have immense read amplification, I think we read the same blk
multiple times during one getpage request.

Before the switch to O_DIRECT, we'd go to the kernel page cache
many times. std-fs has an edge there, it's more efficient than
tokio-epoll-uring for workloads that have a high kernel page cache hit
rate.

With O_DIRECT, we now go to the disk for each read, making the inefficiency apparent.
tokio-epoll-uring is mcuh better there, as we can see it can drive up to
240k IOPS, which is 2GiBs random 8k reads, which afaik is the max that
the EC2 NVMe allows.
CPU isn't near 100%.
SO, we're IO bound.

Idea to try out to reduce the read amplification: request-local page cache.
2024-01-29 16:21:22 +00:00
Christian Schwarz
aca2d7bdea use O_DIRECT for VirtualFile reads 2024-01-29 16:21:14 +00:00
Christian Schwarz
db44395ee2 rip out materialized page cache 2024-01-29 14:45:16 +00:00
Christian Schwarz
03874009ec add back page cache but not for DeltaLayerValue and ImageLayerValue 2024-01-29 14:44:55 +00:00
Christian Schwarz
0033b4c985 results: both tokio-epoll-uring and std-fs achieve about 4k GetPage/sec @ 60k IOPS 2024-01-29 13:53:13 +00:00
Christian Schwarz
a608667301 rip out page cache 2024-01-29 13:53:07 +00:00
Christian Schwarz
b9b7670a3a hack: use a single runtime in pageserver
doesn't seem to make a meaningful perf difference under
get-page-latest-lsn load
2024-01-29 12:23:45 +00:00
Christian Schwarz
62a3d87098 results under higher memory pressure show that tokio-epoll-uring pays off
setup:

sudo mkdir /sys/fs/cgroup/benchmark
admin@ip-172-31-13-23:[~/neon-main]: sudo mkdir /sys/fs/cgroup/benchmark
admin@ip-172-31-13-23:[~/neon-main]: sudo chown admin:admin /sys/fs/cgroup/benchmark
admin@ip-172-31-13-23:[~/neon-main]: sudo chown admin:admin /sys/fs/cgroup/benchmark/cgroup.procs
admin@ip-172-31-13-23:[~/neon-main]: echo THE_PID_OF_THE_SHELL_WHERE_WE_LAUNCH_PAGESERVER > /sys/fs/cgroup/benchmark/cgroup.procs

from another shell, that's not in the cgroup, run pagebench

admin@ip-172-31-13-23:[~/neon-main]: RUST_BACKTRACE=1 ./target/release/pagebench get-page-latest-lsn --mgmt-api-endpoint http://localhost:15011 --page-service-connstring=postgresql://localhost:15010  --keyspace-cache keyspace.cache  --per-target-rate-limit 2000 --limit-to-first-n-targets 500  --set-io-engine YOUR_IO_ENGINE --runtime 10s

tokio-epoll-uring:

{
  "total": {
    "request_count": 63780,
    "latency_mean": "77ms 993us",
    "latency_percentiles": {
      "p95": "120ms 703us",
      "p99": "143ms 743us",
      "p99.9": "171ms 775us",
      "p99.99": "195ms 583us"
    }
  }
}

Does ca 85-90k IOPS to the NVMe.

std-fs

{
  "total": {
    "request_count": 49303,
    "latency_mean": "100ms 669us",
    "latency_percentiles": {
      "p95": "214ms 399us",
      "p99": "268ms 799us",
      "p99.9": "335ms 359us",
      "p99.99": "399ms 615us"
    }
  }
}

Does ca 70k IOPS to the NVMe.

with higher memroy pre
2024-01-29 10:50:52 +00:00
Christian Schwarz
8d6ce71b29 hacky: ability to set io_engine via mgmt_api => pagebench 2024-01-29 10:50:20 +00:00
Christian Schwarz
d23ea718ee 2min 3 tenants, 2000 req/s each; that is 0 IOPS workload (all in PS/Kernel page cache)
Very comparable.
tokio-epoll-uring
{
  "total": {
    "request_count": 719999,
    "latency_mean": "375us",
    "latency_percentiles": {
      "p95": "576us",
      "p99": "649us",
      "p99.9": "823us",
      "p99.99": "1ms 636us"
    }
  }
}

std-fs
{
  "total": {
    "request_count": 719997,
    "latency_mean": "341us",
    "latency_percentiles": {
      "p95": "543us",
      "p99": "618us",
      "p99.9": "748us",
      "p99.99": "1ms 358us"
    }
  }
}
2024-01-27 12:59:15 +00:00
Christian Schwarz
73a7ca38b3 same config, but, rate limit of 2/sec per tenant => bursty due to ticker behavior
RUST_BACKTRACE=1 ./target/release/pagebench get-page-latest-lsn --mgmt-api-endpoint http://localhost:15011 --page-service-connstring=postgresql://localhost:15010  --keyspace-cache keyspace.cache  --per-target-rate-limit 2 --runtime 2m

std-fs

{
  "total": {
    "request_count": 240001,
    "latency_mean": "73ms 562us",
    "latency_percentiles": {
      "p95": "101ms 311us",
      "p99": "106ms 431us",
      "p99.9": "115ms 455us",
      "p99.99": "129ms 407us"
    }
  }
}

tokio-epoll-uring

{
  "total": {
    "request_count": 240000,
    "latency_mean": "84ms 517us",
    "latency_percentiles": {
      "p95": "116ms 671us",
      "p99": "125ms 759us",
      "p99.9": "138ms 239us",
      "p99.99": "148ms 223us"
    }
  }
}
2024-01-27 12:51:11 +00:00
Christian Schwarz
7eb1d4cfa6 manual 2min test run including warmup
RUST_BACKTRACE=1 ./target/release/pagebench get-page-latest-lsn --mgmt-api-endpoint http://localhost:15011 --page-service-connstring=postgresql://localhost:15010  --keyspace-cache keyspace.cache --runtime 2m

2min std-fs
{
    "total": {
      "request_count": 1213184,
      "latency_mean": "67ms 793us",
      "latency_percentiles": {
        "p95": "153ms 471us",
        "p99": "197ms 247us",
        "p99.9": "246ms 399us",
        "p99.99": "288ms 255us"
      }
    }
  }

2min tokio-eoll-uring

{
  "total": {
    "request_count": 825637,
    "latency_mean": "108ms 702us",
    "latency_percentiles": {
      "p95": "136ms 959us",
      "p99": "191ms 615us",
      "p99.9": "9s 977ms 855us",
      "p99.99": "16s 334ms 847us"
    }
  }
}
2024-01-27 12:48:32 +00:00
Christian Schwarz
6ebd683327 TODO/workaround: walredo quiescing broken with compaction_period=0 2024-01-27 12:48:27 +00:00
Christian Schwarz
b1ecdfe099 WIP: async walredo 2024-01-27 12:47:29 +00:00
Christian Schwarz
82a74d0e77 pagebench: fix percentiles reporting 2024-01-27 12:46:36 +00:00
Christian Schwarz
49b43c75e2 run test_pageserver_max_throughput_getpage_at_latest_lsn with 1k tenants, compare std-fs with tokio-epoll-uring 2024-01-26 16:49:12 +00:00
Vlad Lazar
5b34d5f561 pageserver: add vectored get latency histogram (#6461)
This patch introduces a new set of grafana metrics for a histogram:
pageserver_get_vectored_seconds_bucket{task_kind="Compaction|PageRequestHandler"}.

While it has a `task_kind` label, only compaction and SLRU fetches are
tracked. This reduces the increase in cardinality to 24.

The metric should allow us to isolate performance regressions while the
vectorized get is being implemented. Once the implementation is
complete, it'll also allow us to quantify the improvements.
2024-01-26 13:40:03 +00:00
Alexander Bayandin
26c55b0255 Compute: fix rdkit extension build (#6488)
## Problem

`rdkit` extension build started to fail because of the changed checksum
of the Comic Neue font:

```
Downloading https://fonts.google.com/download?family=Comic%20Neue...
CMake Error at Code/cmake/Modules/RDKitUtils.cmake:257 (MESSAGE):
  The md5 checksum for /rdkit-src/Code/GraphMol/MolDraw2D/Comic_Neue.zip is
  incorrect; expected: 850b0df852f1cda4970887b540f8f333, found:
  b7fd0df73ad4637504432d72a0accb8f
```

https://github.com/neondatabase/neon/actions/runs/7666530536/job/20895534826

Ref https://neondb.slack.com/archives/C059ZC138NR/p1706265392422469

## Summary of changes
- Disable comic fonts for `rdkit` extension
2024-01-26 12:39:20 +00:00
Vadim Kharitonov
12e9b2a909 Update plv8 (#6465) 2024-01-26 09:56:11 +00:00
Christian Schwarz
918b03b3b0 integrate tokio-epoll-uring as alternative VirtualFile IO engine (#5824) 2024-01-26 09:25:07 +01:00
Alexander Bayandin
d36623ad74 CI: cancel old e2e-tests on new commits (#6463)
## Problem

Triggered `e2e-tests` job is not cancelled along with other jobs in a PR
if the PR get new commits. We can improve the situation by setting
`concurrency_group` for the remote workflow
(https://github.com/neondatabase/cloud/pull/9622 adds
`concurrency_group` group input to the remote workflow).

Ref https://neondb.slack.com/archives/C059ZC138NR/p1706087124297569

Cloud's part added in https://github.com/neondatabase/cloud/pull/9622

## Summary of changes
- Set `concurrency_group` parameter when triggering `e2e-tests`
- At the beginning of a CI pipeline, trigger Cloud's
`cancel-previous-in-concurrency-group.yml` workflow which cancels
previously triggered e2e-tests
2024-01-25 19:25:29 +00:00
Christian Schwarz
689ad72e92 fix(neon_local): leaks child process if it fails to start & pass checks (#6474)
refs https://github.com/neondatabase/neon/issues/6473

Before this PR, if process_started() didn't return Ok(true) until we
ran out of retries, we'd return an error but leave the process running.

Try it by adding a 20s sleep to the pageserver `main()`, e.g., right
before we claim the pidfile.

Without this PR, output looks like so:

```
(.venv) cs@devvm-mbp:[~/src/neon-work-2]: ./target/debug/neon_local start
Starting neon broker at 127.0.0.1:50051.
storage_broker started, pid: 2710939
.
attachment_service started, pid: 2710949
Starting pageserver node 1 at '127.0.0.1:64000' in ".neon/pageserver_1".....
pageserver has not started yet, continuing to wait.....
pageserver 1 start failed: pageserver did not start in 10 seconds
No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/pageserver_1/pageserver.pid"
No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/safekeepers/sk1/safekeeper.pid"
Stopping storage_broker with pid 2710939 immediately.......
storage_broker has not stopped yet, continuing to wait.....
neon broker stop failed: storage_broker with pid 2710939 did not stop in 10 seconds
Stopping attachment_service with pid 2710949 immediately.......
attachment_service has not stopped yet, continuing to wait.....
attachment service stop failed: attachment_service with pid 2710949 did not stop in 10 seconds
```

and we leak the pageserver process

```
(.venv) cs@devvm-mbp:[~/src/neon-work-2]: ps aux | grep pageserver
cs       2710959  0.0  0.2 2377960 47616 pts/4   Sl   14:36   0:00 /home/cs/src/neon-work-2/target/debug/pageserver -D .neon/pageserver_1 -c id=1 -c pg_distrib_dir='/home/cs/src/neon-work-2/pg_install' -c http_auth_type='Trust' -c pg_auth_type='Trust' -c listen_http_addr='127.0.0.1:9898' -c listen_pg_addr='127.0.0.1:64000' -c broker_endpoint='http://127.0.0.1:50051/' -c control_plane_api='http://127.0.0.1:1234/' -c remote_storage={local_path='../local_fs_remote_storage/pageserver'}
```

After this PR, there is no leaked process.
2024-01-25 19:20:02 +01:00
Christian Schwarz
fd4cce9417 test_pageserver_max_throughput_getpage_at_latest_lsn: remove n_tenants=100 combination (#6477)
Need to fix the neon_local timeouts first
(https://github.com/neondatabase/neon/issues/6473)
and also not run them on every merge, but only nightly:
https://github.com/neondatabase/neon/issues/6476
2024-01-25 18:17:53 +00:00
Arpad Müller
d52b81340f S3 based recovery (#6155)
Adds a new `time_travel_recover` function to the `RemoteStorage` trait
that allows time travel like functionality for S3 buckets, regardless of
their content (it is not even pageserver related). It takes a different
approach from [this
post](https://aws.amazon.com/blogs/storage/point-in-time-restore-for-amazon-s3-buckets/)
that is more complicated.

It takes as input a prefix a target timestamp, and a limit timestamp:

* executes [`ListObjectVersions`](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html)
* obtains the latest version that comes before the target timestamp
* copies that latest version to the same prefix
* if there is versions newer than the limit timestamp, it doesn't do
anything for the file

The limit timestamp is meant to be some timestamp before the start of
the recovery operation and after any changes that one wants to revert.
For example, it might be the time point after a tenant was detached from
all involved pageservers. The limiting mechanism ensures that the
operation is idempotent and can be retried without causing additional
writes/copies.

The approach fulfills all the requirements laid out in 8233, and is a
recoverable operation. Nothing is deleted permanently, only new entries
added to the version log.

I also enable [nextest retries](https://nexte.st/book/retries.html) to
help with some general S3 flakiness (on top of low level retries).

Part of https://github.com/neondatabase/cloud/issues/8233
2024-01-25 18:23:18 +01:00
Joonas Koivunen
8dee9908f8 fix(compaction_task): wrong log levels (#6442)
Filter what we log on compaction task. Per discussion in last triage
call, fixing these by introducing and inspecting the root cause within
anyhow::Error instead of rolling out proper conversions.

Fixes: #6365
Fixes: #6367
2024-01-25 18:45:17 +02:00
Konstantin Knizhnik
19ed230708 Add support for PS sharding in compute (#6205)
refer #5508

replaces #5837

## Problem

This PR implements sharding support at compute side. Relations are
splinted in stripes and `get_page` requests are redirected to the
particular shard where stripe is located. All other requests (i.e. get
relation or database size) are always send to shard 0.

## Summary of changes

Support of sharding at compute side include three things:
1. Make it possible to specify and change in runtime connection to more
retain one page server
2. Send `get_page` request to the particular shard (determined by hash
of page key)
3. Support multiple servers in prefetch ring requests

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2024-01-25 15:53:31 +02:00
Joonas Koivunen
463b6a26b5 test: show relative order eviction with "fast growing tenant" (#6377)
Refactor out test_disk_usage_eviction tenant creation and add a custom
case with 4 tenants, 3 made with pgbench scale=1 and 1 made with pgbench
scale=4.

Because the tenants are created in order of scales [1, 1, 1, 4] this is
simple enough to demonstrate the problem with using absolute access
times, because on a disk usage based eviction run we will
disproportionally target the *first* scale=1 tenant(s), and the later
larger tenant does not lose anything.

This test is not enough to show the difference between `relative_equal`
and `relative_spare` (the fudge factor); much larger scale will be
needed for "the large tenant", but that will make debug mode tests
slower.

Cc: #5304
2024-01-25 15:38:28 +02:00
John Spray
c9b1657e4c pageserver: fixes for creation operations overlapping with shutdown/startup (#6436)
## Problem

For #6423, creating a reproducer turned out to be very easy, as an
extension to test_ondemand_activation.

However, before I had diagnosed the issue, I was starting with a more
brute force approach of running creation API calls in the background
while restarting a pageserver, and that shows up a bunch of other
interesting issues.

In this PR:
- Add the reproducer for #6423 by extending `test_ondemand_activation`
(confirmed that this test fails if I revert the fix from
https://github.com/neondatabase/neon/pull/6430)
- In timeline creation, return 503 responses when we get an error and
the tenant's cancellation token is set: this covers the cases where we
get an anyhow::Error from something during timeline creation as a result
of shutdown.
- While waiting for tenants to become active during creation, don't
.map_err() the result to a 500: instead let the `From` impl map the
result to something appropriate (this includes mapping shutdown to 503)
- During tenant creation, we were calling `Tenant::load_local` because
no Preload object is provided. This is usually harmless because the
tenant dir is empty, but if there are some half-created timelines in
there, bad things can happen. Propagate the SpawnMode into
Tenant::attach, so that it can properly skip _any_ attempt to load
timelines if creating.
- When we call upsert_location, there's a SpawnMode that tells us
whether to load from remote storage or not. But if the operation is a
retry and we already have the tenant, it is not correct to skip loading
from remote storage: there might be a timeline there. This isn't
strictly a correctness issue as long as the caller behaves correctly
(does not assume that any timelines are persistent until the creation is
acked), but it's a more defensive position.
- If we shut down while the task in Tenant::attach is running, it can
end up spawning rogue tasks. Fix this by holding a GateGuard through
here, and in upsert_location shutting down a tenant after calling
tenant_spawn if we can't insert it into tenants_map. This fixes the
expected behavior that after shutdown_all_tenants returns, no tenant
tasks are running.
- Add `test_create_churn_during_restart`, which runs tenant & timeline
creations across pageserver restarts.
- Update a couple of tests that covered cancellation, to reflect the
cleaner errors we now return.
2024-01-25 12:35:52 +00:00
Arpad Müller
b92be77e19 Make RemoteStorage not use async_trait (#6464)
Makes the `RemoteStorage` trait not be based on `async_trait` any more.

To avoid recursion in async (not supported by Rust), we made
`GenericRemoteStorage` generic on the "Unreliable" variant. That allows
us to have the unreliable wrapper never contain/call itself.

related earlier work: #6305
2024-01-24 21:27:54 +01:00
Arthur Petukhovsky
8cb8c8d7b5 Allow remove_wal.rs to run on inactive timelines (#6462)
Temporary enable it on staging to help with
https://github.com/neondatabase/neon/issues/6403
Can be also deployed to prod if will work well on staging.
2024-01-24 16:48:56 +00:00
Conrad Ludgate
210700d0d9 proxy: add newtype wrappers for string based IDs (#6445)
## Problem

too many string based IDs. easy to mix up ID types.

## Summary of changes

Add a bunch of `SmolStr` wrappers that provide convenience methods but
are type safe
2024-01-24 16:38:10 +00:00
Joonas Koivunen
a0a3ba85e7 fix(page_service): walredo logging problem (#6460)
Fixes: #6459 by formatting full causes of an error to log, while keeping
the top level string for end-user.

Changes user visible error detail from:

```
-DETAIL:  page server returned error: Read error: Failed to reconstruct a page image:
+DETAIL:  page server returned error: Read error
```

However on pageserver logs:

```
-ERROR page_service_conn_main{...}: error reading relation or page version: Read error: Failed to reconstruct a page image:
+ERROR page_service_conn_main{...}: error reading relation or page version: Read error: reconstruct a page image: launch walredo process: spawn process: Permission denied (os error 13)
```
2024-01-24 15:47:17 +00:00
Arpad Müller
d820aa1d08 Disable initdb cancellation (#6451)
## Problem

The initdb cancellation added in #5921 is not sufficient to reliably
abort the entire initdb process. Initdb also spawns children. The tests
added by #6310 (#6385) and #6436 now do initdb cancellations on a more
regular basis.

In #6385, I attempted to issue `killpg` (after giving it a new process
group ID) to kill not just the initdb but all its spawned subprocesses,
but this didn't work. Initdb doesn't take *that* long in the end either,
so we just wait until it concludes.

## Summary of changes

* revert initdb cancellation support added in #5921
* still return `Err(Cancelled)` upon cancellation, but this is just to
not have to remove the cancellation infrastructure
* fixes to the `test_tenant_delete_races_timeline_creation` test to make
it reliably pass

Fixes #6385
2024-01-24 13:06:05 +01:00
Christian Schwarz
996abc9563 pagebench-based GetPage@LSN performance test (#6214) 2024-01-24 12:51:53 +01:00
John Spray
a72af29d12 control_plane/attachment_service: implement PlacementPolicy::Detached (#6458)
## Problem

The API for detaching things wasn't implement yet, but one could hit
this case indirectly from tests when using attach-hook, and find tenants
unexpectedly attached again because their policy remained Single.

## Summary of changes

Add PlacementPolicy::Detached, and:
- add the behavior for it in schedule()
- in tenant_migrate, refuse if the policy is detached
- automatically set this policy in attach-hook if the caller has
specified pageserver=null.
2024-01-24 12:49:30 +01:00
Sasha Krassovsky
4f51824820 Fix creating publications for all tables 2024-01-23 22:41:00 -08:00
Christian Schwarz
743f6dfb9b fix(attachment_service): corrupted attachments.json when parallel requests (#6450)
The pagebench integration PR (#6214) issues attachment requests in
parallel.
We observed corrupted attachments.json from time to time, especially in
the test cases with high tenant counts.

The atomic overwrite added in #6444 exposed the root cause cleanly:
the `.commit()` calls of two request handlers could interleave or
be reordered.
See also:
https://github.com/neondatabase/neon/pull/6444#issuecomment-1906392259

This PR makes changes to the `persistence` module to fix above race:
- mpsc queue for PendingWrites
- one writer task performs the writes in mpsc queue order
- request handlers that need to do writes do it using the
  new `mutating_transaction` function.

`mutating_transaction`, while holding the lock, does the modifications,
serializes the post-modification state, and pushes that as a
`PendingWrite` into the mpsc queue.
It then release the lock and `await`s the completion of the write.
The writer tasks executes the `PendingWrites` in queue order.
Once the write has been executed, it wakes the writing tokio task.
2024-01-23 19:14:32 +00:00