Commit Graph

7054 Commits

Author SHA1 Message Date
Christian Schwarz
82d20b52b3 make noise when dropping an IoConcurrency with unfinished requests 2025-01-20 19:12:00 +01:00
Christian Schwarz
3b1328423e basebackup: fetch all SLRUs of one basebackup using the same IoConcurrency 2025-01-20 16:58:14 +01:00
Christian Schwarz
2eb235e923 doc string explaining why we're deadlock free right now and why it's so brittle 2025-01-17 18:33:34 +01:00
Christian Schwarz
40ab9c2c5e we can avoid adding the Arc<Mutex<>> around EphemeralLayer if we instead extend the lifetime of the InMemoryLayer for the spawned IO future; plus it's semantically more similar to what we now do for Delta and Image layers 2025-01-17 18:16:17 +01:00
Christian Schwarz
c43400389f delta & image layer spawned IOs: keep layer resident until IO is done 2025-01-17 18:00:13 +01:00
Christian Schwarz
65932512c1 run tests with futures-unordered 2025-01-16 20:03:01 +01:00
Christian Schwarz
1866f261e0 make mypy pass 2025-01-16 20:01:42 +01:00
Christian Schwarz
7c662b771a Merge branch 'problame/hung-shutdown/fix' into vlad/read-path-concurrent-io 2025-01-16 19:22:38 +01:00
Christian Schwarz
8f40bd4eb3 there is no Error Fe message -,- 2025-01-16 19:21:44 +01:00
Christian Schwarz
d2f8342080 Merge branch 'problame/hung-shutdown/fix' into vlad/read-path-concurrent-io 2025-01-16 18:16:36 +01:00
Christian Schwarz
92e4dd7ffa script: template NEON_REPO_DIR 2025-01-16 18:14:34 +01:00
Christian Schwarz
0c3ab9c494 move test message tag to 99 and represent Fe message tag as enum, like we do for Be message 2025-01-16 18:07:56 +01:00
Christian Schwarz
c19a16792a address nit ; https://github.com/neondatabase/neon/pull/10386#discussion_r1918782034 2025-01-16 17:54:14 +01:00
Christian Schwarz
cf75eb7d86 Revert "hacky experiment: what if we had more walredo procs => doesn't move the needle on throughput"
This reverts commit 9fffe6e60d.
2025-01-16 16:46:49 +01:00
Christian Schwarz
6ededa17e2 Revert "experiment: buffered socket with 128k buffer size; not super needle-moving"
This reverts commit 7e13e5fc4a.
2025-01-16 16:42:10 +01:00
Christian Schwarz
7e13e5fc4a experiment: buffered socket with 128k buffer size; not super needle-moving 2025-01-16 16:42:01 +01:00
Christian Schwarz
45358bcb65 in the deepl_layers_with_delta script, make the stack height an argument 2025-01-16 16:41:15 +01:00
Christian Schwarz
9fffe6e60d hacky experiment: what if we had more walredo procs => doesn't move the needle on throughput 2025-01-16 13:58:23 +01:00
Christian Schwarz
2ff0a4ae82 extract the l0stack generator into a reusable python module 2025-01-16 13:24:34 +01:00
Christian Schwarz
66c0df8109 doc comment on BatchedFeMessage explaining WeakHandle; https://github.com/neondatabase/neon/pull/10386#discussion_r1916968951 2025-01-15 21:50:00 +01:00
Christian Schwarz
9fe77c527f inline get_impl; https://github.com/neondatabase/neon/pull/10386#discussion_r1916939623 2025-01-15 21:47:39 +01:00
Christian Schwarz
7fb4595c7e fix: WeakHandle was holding on to the Timeline allocation
This made test_timeline_deletion_with_files_stuck_in_upload_queue fail
because the RemoteTimelineClient was being kept alive.

The fix is to stop keeping the timeline alive from WeakHandle.
2025-01-15 21:46:37 +01:00
Christian Schwarz
350dc251df test case demonstrates the issue: we hod Timeline object alive
--- STDERR:              pageserver tenant::timeline::handle::tests::test_weak_handles ---
thread 'tenant::timeline::handle::tests::test_weak_handles' panicked at pageserver/src/tenant/timeline/handle.rs:1131:9:
assertion `left == right` failed
  left: 3
 right: 2
2025-01-15 21:46:30 +01:00
Christian Schwarz
5b77a6d3ce address clippy 2025-01-15 19:38:21 +01:00
Christian Schwarz
8c5005ff59 rename IoConcurrency::{todo=>serial} and remove deprecation warning 2025-01-15 19:38:05 +01:00
Christian Schwarz
f8218ac5fc Revert "investigation: add log_if_slow => shows that the io_futures are slow"
This reverts commit e81fa7137e.
2025-01-15 19:34:37 +01:00
Christian Schwarz
40470c66cd remove opportunistic poll, it seems slightly beneficial for perf
esp before I remembered to configure pipelining, the unpipelined
configuration achieved ~10% higher tput.

In any way, makes sense to not do the opportunisitc polling because
it registers the wrong waker.
2025-01-15 19:34:05 +01:00
Christian Schwarz
9b9479881a extend script with instructions to configure batching 2025-01-15 19:30:15 +01:00
Christian Schwarz
af11b201bd now the issue is no longer reproducible, maybe it was the barriers? 2025-01-15 19:10:45 +01:00
Christian Schwarz
8fafff37c5 remove the whole barriers business 2025-01-15 19:00:00 +01:00
Christian Schwarz
e81fa7137e investigation: add log_if_slow => shows that the io_futures are slow 2025-01-15 18:56:07 +01:00
Christian Schwarz
e60738f029 it's reproducible before the merge, so, continuing to investigate and fix here 2025-01-15 18:43:01 +01:00
Christian Schwarz
f75b07a160 I find that if I ever go beyond queue-depth=4, something in the pageserver locks up. 2025-01-15 18:31:40 +01:00
Christian Schwarz
a5524fcf4d add comment to use queue-depthed pagebench to the script 2025-01-15 18:31:29 +01:00
Christian Schwarz
351da2349e Merge branch 'problame/hung-shutdown/fix' into vlad/read-path-concurrent-io 2025-01-15 17:09:02 +01:00
Christian Schwarz
c545d227b9 review doc comment 2025-01-15 16:24:39 +01:00
Christian Schwarz
a4fc6a92c9 fix cargo doc 2025-01-15 16:10:04 +01:00
Christian Schwarz
2205736262 doc comment & one fixup 2025-01-15 14:27:08 +01:00
Christian Schwarz
5f9ddbae2f Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix 2025-01-15 00:25:11 +01:00
Christian Schwarz
173f18832c fixup 2025-01-15 00:24:59 +01:00
Christian Schwarz
23bd5833e1 Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix 2025-01-15 00:21:54 +01:00
Christian Schwarz
dedd524d7e refinements 2025-01-15 00:21:28 +01:00
Christian Schwarz
0340f00228 post-merge fix the handling of the new pagestream Test message, so that the regression test now passes
non-package-mode-py3.10christian@neon-hetzner-dev-christian:[~/src/neon]: BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest ./test_runner/regress/test_page_service_batching_regressions.py --timeout=0 --pdb
2025-01-14 23:56:35 +01:00
Christian Schwarz
366ff9ffcc Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix 2025-01-14 23:51:53 +01:00
Christian Schwarz
a8f9b564be fix cd pageserver && cargo clippy --features testing build 2025-01-14 23:50:22 +01:00
Christian Schwarz
5450e54dab bump ci 2025-01-14 22:47:16 +01:00
Christian Schwarz
53b05c4ba0 cleanups to make CI pass (well, fail because the bug isn't fixed yet) 2025-01-14 22:45:09 +01:00
Christian Schwarz
1f7d173235 Merge remote-tracking branch 'origin/main' into problame/hung-shutdown/demo-hypothesis 2025-01-14 22:33:20 +01:00
Christian Schwarz
8454e19a0f address warnings and such 2025-01-14 22:28:08 +01:00
Christian Schwarz
45e08d0aa5 it repros 2025-01-14 22:16:27 +01:00