Christian Schwarz
c4f03a225c
latency recorder sketches
2025-01-23 08:51:24 +01:00
Christian Schwarz
214ce815bc
sketch cancellation
2025-01-22 19:19:42 +01:00
Christian Schwarz
f5b2eee23d
prototype IoConcurrency propagation
2025-01-22 19:13:34 +01:00
Christian Schwarz
4b2a91cb5a
sketch propagation through request context
2025-01-22 19:13:34 +01:00
Christian Schwarz
a6660a2883
without request context
2025-01-22 18:32:02 +01:00
Christian Schwarz
b0b9206908
Merge remote-tracking branch 'origin/main' into vlad/read-path-concurrent-io
...
Conflicts:
pageserver/src/tenant/timeline.rs
test_runner/fixtures/neon_fixtures.py
2025-01-22 14:31:12 +01:00
Christian Schwarz
4298e77f7a
run unit tests in both modes
2025-01-22 14:30:01 +01:00
Christian Schwarz
dbb88cc59e
test_get_vectored: don't parametrize inside the test, instead, use spawn_for_test like we do in all the other tests
...
This is a remnant from the early times of this PR.
2025-01-22 12:55:16 +01:00
Christian Schwarz
c5af3c576e
the previous patch didn't cover test_version_mismatch; this one is far more universal
2025-01-22 01:03:34 +01:00
Christian Schwarz
3526d9aad3
pass forward compatibility
2025-01-22 00:40:19 +01:00
Christian Schwarz
a501095c5a
fixup(commit b2dbc47b31 initial logical size calculation wasn't polled to completion; fix that, to make tests pass)
...
(requires previous commit)
2025-01-21 23:22:31 +01:00
Christian Schwarz
728052bd2e
pausable_failpoint: add ability to provide a cancel flag, similar to what we have for sleep
2025-01-21 23:16:53 +01:00
Christian Schwarz
361210f8dc
actually enable concurrent IO (and batching!) by default in test suite (still need to figure out how to not make compat test break)
...
https://neondb.slack.com/archives/C059ZC138NR/p1737490501941309
2025-01-21 21:27:21 +01:00
Christian Schwarz
925dd17fb8
Revert "debug why CI tests don't run with sidecar-task"
...
This reverts commit a528b325ee .
2025-01-21 20:51:48 +01:00
Christian Schwarz
fe615520dd
remove the timing histograms for traversal and walredo, since their meaning and utility is dubious with concurrent IO; https://github.com/neondatabase/neon/pull/9353#discussion_r1924181713
...
The issue is that get_vectored_reconstruct_data latency means something
very different now with concurrent IO than what it did before, because
all the time we spend on the data blocks is no longer part of the
get_vectored_reconstruct_data().await wall clock time
GET_RECONSTRUCT_DATA_TIME : all the 3 dashboards that use it are in my /personal/christian folder. I guess I'm free to break them 😄
https://github.com/search?q=repo%3Aneondatabase%2Fgrafana-dashboard-export%20pageserver_getpage_get_reconstruct_data_seconds&type=code
RECONSTRUCT_TIME
Used in a couple of dashboards I think nobody uses
- Timeline Inspector
- Sharding WAL streaming
- Pageserver
- walredo time throaway
Vlad agrees with removing them for now.
Maybe in the future we'll add some back
pageserver_getpage_get_reconstruct_data_seconds -> pageserver_getpage_io_plan_seconds
pageserver_getpage_reconstruct_data_seconds -> pageserver_getpage_io_execute_seconds
2025-01-21 20:32:15 +01:00
Christian Schwarz
d6cdf1b13f
debug assertion on correct record order: https://github.com/neondatabase/neon/pull/9353#discussion_r1923801121
2025-01-21 20:21:35 +01:00
Christian Schwarz
b2dbc47b31
initial logical size calculation wasn't polled to completion; fix that, to make tests pass
...
(see prev commit for stack trace)
CI test failures
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9353/12892883355/index.html#suites/a1c2be32556270764423c495fad75d47/92cacda354b63fd7/
2025-01-21 20:09:39 +01:00
Christian Schwarz
a69dcadba4
test failure
...
christian@neon-hetzner-dev-christian:[~/src/neon-work-1]: NEON_PAGESERVER_USE_ONE_RUNTIME=current_thread DEFAULT_PG_VERSION=14 BUILD_TYPE=release poetry run pytest -k 'test_ancestor_detach_branched_from[release-pg14-False-True-after]'
2025-01-21T18:42:38.794431Z WARN initial_size_calculation{tenant_id=cb106e50ddedc20995b0b1bb065ebcd9 shard_id=0000 timeline_id=e362ff10e7c4e116baee457de5c766d9}:logical_size_calculation_task: dropping ValuesReconstructState while some IOs have not been completed num_active_ios=1 sidecar_task_id=None backtrace= 0: <pageserver::tenant::storage_layer::ValuesReconstructState as core::ops::drop::Drop>::drop
at /home/christian/src/neon-work-1/pageserver/src/tenant/storage_layer.rs:553:24
1: core::ptr::drop_in_place<pageserver::tenant::storage_layer::ValuesReconstructState>
at /home/christian/.rustup/toolchains/1.84.0-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:521:1
2: core::ptr::drop_in_place<pageserver::tenant::timeline::Timeline::get::{{closure}}>
at /home/christian/src/neon-work-1/pageserver/src/tenant/timeline.rs:1042:5
3: core::ptr::drop_in_place<pageserver::pgdatadir_mapping::<impl pageserver::tenant::timeline::Timeline>::get_current_logical_size_non_incremental::{{closure}}>
at /home/christian/src/neon-work-1/pageserver/src/pgdatadir_mapping.rs:1001:67
4: core::ptr::drop_in_place<pageserver::tenant::timeline::Timeline::calculate_logical_size::{{closure}}>
at /home/christian/src/neon-work-1/pageserver/src/tenant/timeline.rs:3100:18
5: core::ptr::drop_in_place<pageserver::tenant::timeline::Timeline::logical_size_calculation_task::{{closure}}::{{closure}}::{{closure}}>
at /home/christian/src/neon-work-1/pageserver/src/tenant/timeline.rs:3050:22
6: core::ptr::drop_in_place<pageserver::tenant::timeline::Timeline::logical_size_calculation_task::{{closure}}::{{closure}}>
at /home/christian/src/neon-work-1/pageserver/src/tenant/timeline.rs:3060:5
2025-01-21 19:43:25 +01:00
Christian Schwarz
c4e0ba38b8
drain spawned IOs if traversal fails; https://github.com/neondatabase/neon/pull/9353/files#r1923589380
2025-01-21 19:33:13 +01:00
Christian Schwarz
14e4fcdb2a
delta_layer: remove ignore_key_with_err optimization; https://github.com/neondatabase/neon/pull/9353#discussion_r1920573294
...
it would cause an assertion failure because we wouldn't be consuming all IOs
2025-01-21 19:11:25 +01:00
Christian Schwarz
1862fdf9e2
clean up doc comment
2025-01-21 19:01:22 +01:00
Christian Schwarz
24e0a3f941
undo the WIP benchmarks, will clean those up and commit in a future PR
2025-01-21 19:00:20 +01:00
Christian Schwarz
a528b325ee
debug why CI tests don't run with sidecar-task
2025-01-21 18:53:44 +01:00
Christian Schwarz
a3c756334b
lift noisereporting to ValuesReconstructData::drop, it's actually better there
...
For all high-rooted long-lived IoConcurrency's, IoConcurrency::drop will
never run.
What we actually care about is that we leave no dangling IOs after
get_vectored_impl, which lives much shorter than a high-rooted
IoConcurrency.
However, lifetime of `ValuesReconstructData` is generally == lifetime of
get_vectored_impl.
2025-01-21 18:52:15 +01:00
Christian Schwarz
93625344eb
refactor: wrap the oneshot's into a properly named abstraction (OnDiskValueIo, etc)
2025-01-21 17:45:36 +01:00
Christian Schwarz
7ca9112ec1
fix noise
2025-01-21 17:08:03 +01:00
Christian Schwarz
4e72b22b41
make noise from IoConcurrency::drop instead of the task, for more context
2025-01-21 14:51:54 +01:00
Christian Schwarz
4014c390e2
initial logical size calculation can also reasonable use the sidecar because it's concurrency-limited
2025-01-21 12:50:33 +01:00
Christian Schwarz
bca4263eb8
inspect_image_layer can also have an IoConcurrency root, it's tests only
2025-01-21 12:40:22 +01:00
Christian Schwarz
a958febd7a
reference issue that will remote hard-coded sequential()
2025-01-21 12:36:23 +01:00
Christian Schwarz
fc27da43ff
one more test can do without it
2025-01-21 12:25:30 +01:00
Christian Schwarz
cf2f0c27aa
IoConcurrency roots for scan() an tests
2025-01-21 12:21:46 +01:00
Christian Schwarz
f54c5d5596
turns out create_image_layers is easy
2025-01-21 10:47:49 +01:00
Christian Schwarz
ce5452d2e5
followup 0a37164c29: also rename IoConcurrency::serial()
2025-01-21 00:47:37 +01:00
Christian Schwarz
af6c9ffac7
Ok, I now understand why it deadlocked in mode=sidecar-task
...
The reason is that even in mode=`sidecar-task`, there
are a bunch of places that are serial. Those places obviously deadlock.
2025-01-21 00:41:45 +01:00
Christian Schwarz
081ff26519
fixup 40ab9c2c5e: deadlock
...
Reproduced by
test_runner/regress/test_branching.py::test_branching_with_pgbench[debug-pg16-flat-1-10]'
It kinda makes sense that this deadlocks in `sequential` mode.
However, it also deadlocks in `sidecar-task` mode.
I don't understand why.
2025-01-20 23:46:56 +01:00
Christian Schwarz
0a37164c29
replace env var with config variable; add test suite fixture env var to override default
2025-01-20 23:46:56 +01:00
Christian Schwarz
0eff09e35f
Merge remote-tracking branch 'origin/main' into vlad/read-path-concurrent-io
2025-01-20 19:47:03 +01:00
Christian Schwarz
cdad6b2de5
we don't need the CancellationToken, the ios_rx.recv() will fail at the same time
2025-01-20 19:13:40 +01:00
Christian Schwarz
82d20b52b3
make noise when dropping an IoConcurrency with unfinished requests
2025-01-20 19:12:00 +01:00
Christian Schwarz
3b1328423e
basebackup: fetch all SLRUs of one basebackup using the same IoConcurrency
2025-01-20 16:58:14 +01:00
Christian Schwarz
2eb235e923
doc string explaining why we're deadlock free right now and why it's so brittle
2025-01-17 18:33:34 +01:00
Christian Schwarz
40ab9c2c5e
we can avoid adding the Arc<Mutex<>> around EphemeralLayer if we instead extend the lifetime of the InMemoryLayer for the spawned IO future; plus it's semantically more similar to what we now do for Delta and Image layers
2025-01-17 18:16:17 +01:00
Christian Schwarz
c43400389f
delta & image layer spawned IOs: keep layer resident until IO is done
2025-01-17 18:00:13 +01:00
Christian Schwarz
65932512c1
run tests with futures-unordered
2025-01-16 20:03:01 +01:00
Christian Schwarz
1866f261e0
make mypy pass
2025-01-16 20:01:42 +01:00
Christian Schwarz
7c662b771a
Merge branch 'problame/hung-shutdown/fix' into vlad/read-path-concurrent-io
2025-01-16 19:22:38 +01:00
Christian Schwarz
8f40bd4eb3
there is no Error Fe message -,-
2025-01-16 19:21:44 +01:00
Christian Schwarz
d2f8342080
Merge branch 'problame/hung-shutdown/fix' into vlad/read-path-concurrent-io
2025-01-16 18:16:36 +01:00
Christian Schwarz
92e4dd7ffa
script: template NEON_REPO_DIR
2025-01-16 18:14:34 +01:00
Christian Schwarz
0c3ab9c494
move test message tag to 99 and represent Fe message tag as enum, like we do for Be message
2025-01-16 18:07:56 +01:00
Christian Schwarz
c19a16792a
address nit ; https://github.com/neondatabase/neon/pull/10386#discussion_r1918782034
2025-01-16 17:54:14 +01:00
Christian Schwarz
cf75eb7d86
Revert "hacky experiment: what if we had more walredo procs => doesn't move the needle on throughput"
...
This reverts commit 9fffe6e60d .
2025-01-16 16:46:49 +01:00
Christian Schwarz
6ededa17e2
Revert "experiment: buffered socket with 128k buffer size; not super needle-moving"
...
This reverts commit 7e13e5fc4a .
2025-01-16 16:42:10 +01:00
Christian Schwarz
7e13e5fc4a
experiment: buffered socket with 128k buffer size; not super needle-moving
2025-01-16 16:42:01 +01:00
Christian Schwarz
45358bcb65
in the deepl_layers_with_delta script, make the stack height an argument
2025-01-16 16:41:15 +01:00
Christian Schwarz
9fffe6e60d
hacky experiment: what if we had more walredo procs => doesn't move the needle on throughput
2025-01-16 13:58:23 +01:00
Christian Schwarz
2ff0a4ae82
extract the l0stack generator into a reusable python module
2025-01-16 13:24:34 +01:00
Christian Schwarz
66c0df8109
doc comment on BatchedFeMessage explaining WeakHandle; https://github.com/neondatabase/neon/pull/10386#discussion_r1916968951
2025-01-15 21:50:00 +01:00
Christian Schwarz
9fe77c527f
inline get_impl; https://github.com/neondatabase/neon/pull/10386#discussion_r1916939623
2025-01-15 21:47:39 +01:00
Christian Schwarz
7fb4595c7e
fix: WeakHandle was holding on to the Timeline allocation
...
This made test_timeline_deletion_with_files_stuck_in_upload_queue fail
because the RemoteTimelineClient was being kept alive.
The fix is to stop keeping the timeline alive from WeakHandle.
2025-01-15 21:46:37 +01:00
Christian Schwarz
350dc251df
test case demonstrates the issue: we hod Timeline object alive
...
--- STDERR: pageserver tenant::timeline::handle::tests::test_weak_handles ---
thread 'tenant::timeline::handle::tests::test_weak_handles' panicked at pageserver/src/tenant/timeline/handle.rs:1131:9:
assertion `left == right` failed
left: 3
right: 2
2025-01-15 21:46:30 +01:00
Christian Schwarz
5b77a6d3ce
address clippy
2025-01-15 19:38:21 +01:00
Christian Schwarz
8c5005ff59
rename IoConcurrency::{todo=>serial} and remove deprecation warning
2025-01-15 19:38:05 +01:00
Christian Schwarz
f8218ac5fc
Revert "investigation: add log_if_slow => shows that the io_futures are slow"
...
This reverts commit e81fa7137e .
2025-01-15 19:34:37 +01:00
Christian Schwarz
40470c66cd
remove opportunistic poll, it seems slightly beneficial for perf
...
esp before I remembered to configure pipelining, the unpipelined
configuration achieved ~10% higher tput.
In any way, makes sense to not do the opportunisitc polling because
it registers the wrong waker.
2025-01-15 19:34:05 +01:00
Christian Schwarz
9b9479881a
extend script with instructions to configure batching
2025-01-15 19:30:15 +01:00
Christian Schwarz
af11b201bd
now the issue is no longer reproducible, maybe it was the barriers?
2025-01-15 19:10:45 +01:00
Christian Schwarz
8fafff37c5
remove the whole barriers business
2025-01-15 19:00:00 +01:00
Christian Schwarz
e81fa7137e
investigation: add log_if_slow => shows that the io_futures are slow
2025-01-15 18:56:07 +01:00
Christian Schwarz
e60738f029
it's reproducible before the merge, so, continuing to investigate and fix here
2025-01-15 18:43:01 +01:00
Christian Schwarz
f75b07a160
I find that if I ever go beyond queue-depth=4, something in the pageserver locks up.
2025-01-15 18:31:40 +01:00
Christian Schwarz
a5524fcf4d
add comment to use queue-depthed pagebench to the script
2025-01-15 18:31:29 +01:00
Christian Schwarz
351da2349e
Merge branch 'problame/hung-shutdown/fix' into vlad/read-path-concurrent-io
2025-01-15 17:09:02 +01:00
Christian Schwarz
c545d227b9
review doc comment
2025-01-15 16:24:39 +01:00
Christian Schwarz
a4fc6a92c9
fix cargo doc
2025-01-15 16:10:04 +01:00
Christian Schwarz
2205736262
doc comment & one fixup
2025-01-15 14:27:08 +01:00
Christian Schwarz
5f9ddbae2f
Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix
2025-01-15 00:25:11 +01:00
Christian Schwarz
173f18832c
fixup
2025-01-15 00:24:59 +01:00
Christian Schwarz
23bd5833e1
Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix
2025-01-15 00:21:54 +01:00
Christian Schwarz
dedd524d7e
refinements
2025-01-15 00:21:28 +01:00
Christian Schwarz
0340f00228
post-merge fix the handling of the new pagestream Test message, so that the regression test now passes
...
non-package-mode-py3.10christian@neon-hetzner-dev-christian:[~/src/neon]: BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest ./test_runner/regress/test_page_service_batching_regressions.py --timeout=0 --pdb
2025-01-14 23:56:35 +01:00
Christian Schwarz
366ff9ffcc
Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix
2025-01-14 23:51:53 +01:00
Christian Schwarz
a8f9b564be
fix cd pageserver && cargo clippy --features testing build
2025-01-14 23:50:22 +01:00
Christian Schwarz
5450e54dab
bump ci
2025-01-14 22:47:16 +01:00
Christian Schwarz
53b05c4ba0
cleanups to make CI pass (well, fail because the bug isn't fixed yet)
2025-01-14 22:45:09 +01:00
Christian Schwarz
1f7d173235
Merge remote-tracking branch 'origin/main' into problame/hung-shutdown/demo-hypothesis
2025-01-14 22:33:20 +01:00
Christian Schwarz
8454e19a0f
address warnings and such
2025-01-14 22:28:08 +01:00
Christian Schwarz
45e08d0aa5
it repros
2025-01-14 22:16:27 +01:00
Christian Schwarz
9a02bc0cfd
try to repro root cause hypothesis for https://github.com/neondatabase/neon/issues/10309
...
This approach here doesn't work because it slows down all the responses.
the workload() thread gets stuck in auth, prob with zero pipeline depth
0x00007fa28fe48e63 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007fa28fe48e63 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000561a285bf44e in WaitEventSetWaitBlock (set=0x561a292723e8, cur_timeout=9999, occurred_events=0x7ffd1f11c970, nevents=1)
at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/ipc/latch.c:1535
#2 0x0000561a285bf338 in WaitEventSetWait (set=0x561a292723e8, timeout=9999, occurred_events=0x7ffd1f11c970, nevents=1, wait_event_info=117440512)
at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/ipc/latch.c:1481
#3 0x00007fa2904a7345 in call_PQgetCopyData (shard_no=0, buffer=0x7ffd1f11cad0) at /home/christian/src/neon//pgxn/neon/libpagestore.c:703
#4 0x00007fa2904a7aec in pageserver_receive (shard_no=0) at /home/christian/src/neon//pgxn/neon/libpagestore.c:899
#5 0x00007fa2904af471 in prefetch_read (slot=0x561a292863b0) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:644
#6 0x00007fa2904af26b in prefetch_wait_for (ring_index=0) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:596
#7 0x00007fa2904b489d in neon_read_at_lsnv (rinfo=..., forkNum=MAIN_FORKNUM, base_blockno=0, request_lsns=0x7ffd1f11cd60, buffers=0x7ffd1f11cd30, nblocks=1, mask=0x0)
at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3024
#8 0x00007fa2904b4f34 in neon_read_at_lsn (rinfo=..., forkNum=MAIN_FORKNUM, blkno=0, request_lsns=..., buffer=0x7fa28b969000) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3104
#9 0x00007fa2904b515d in neon_read (reln=0x561a292ef448, forkNum=MAIN_FORKNUM, blkno=0, buffer=0x7fa28b969000) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3146
#10 0x0000561a285f1ed5 in smgrread (reln=0x561a292ef448, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fa28b969000) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/smgr/smgr.c:567
#11 0x0000561a285a528a in ReadBuffer_common (smgr=0x561a292ef448, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffd1f11cf1b)
at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:1134
#12 0x0000561a285a47e3 in ReadBufferExtended (reln=0x7fa28ce1c888, forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0)
at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:781
#13 0x0000561a285a46ad in ReadBuffer (reln=0x7fa28ce1c888, blockNum=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:715
#14 0x0000561a2811d511 in _bt_getbuf (rel=0x7fa28ce1c888, blkno=0, access=1) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtpage.c:852
#15 0x0000561a2811d1b2 in _bt_metaversion (rel=0x7fa28ce1c888, heapkeyspace=0x7ffd1f11d9f0, allequalimage=0x7ffd1f11d9f1) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtpage.c:747
#16 0x0000561a28126220 in _bt_first (scan=0x561a292d0348, dir=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtsearch.c:1465
#17 0x0000561a28121a07 in btgettuple (scan=0x561a292d0348, dir=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtree.c:246
#18 0x0000561a28111afa in index_getnext_tid (scan=0x561a292d0348, direction=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/indexam.c:583
#19 0x0000561a28111d14 in index_getnext_slot (scan=0x561a292d0348, direction=ForwardScanDirection, slot=0x561a292d01a8) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/indexam.c:675
#20 0x0000561a2810fbcc in systable_getnext (sysscan=0x561a292d0158) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/genam.c:512
#21 0x0000561a287a1ee1 in SearchCatCacheMiss (cache=0x561a292a0f80, nkeys=1, hashValue=3028463561, hashIndex=1, v1=94670359561576, v2=0, v3=0, v4=0)
at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1440
#22 0x0000561a287a1d8a in SearchCatCacheInternal (cache=0x561a292a0f80, nkeys=1, v1=94670359561576, v2=0, v3=0, v4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1360
#23 0x0000561a287a1a4f in SearchCatCache (cache=0x561a292a0f80, v1=94670359561576, v2=0, v3=0, v4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1214
#24 0x0000561a287be060 in SearchSysCache (cacheId=10, key1=94670359561576, key2=0, key3=0, key4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/syscache.c:817
#25 0x0000561a287be66f in GetSysCacheOid (cacheId=10, oidcol=1, key1=94670359561576, key2=0, key3=0, key4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/syscache.c:1055
#26 0x0000561a286319a5 in get_role_oid (rolname=0x561a29270568 "cloud_admin", missing_ok=true) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/adt/acl.c:5251
#27 0x0000561a283d42ca in check_hba (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/hba.c:2493
#28 0x0000561a283d5537 in hba_getauthmethod (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/hba.c:3067
#29 0x0000561a283c6fd7 in ClientAuthentication (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/auth.c:395
#30 0x0000561a287dc943 in PerformAuthentication (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/init/postinit.c:247
#31 0x0000561a287dd9cd in InitPostgres (in_dbname=0x561a29270588 "postgres", dboid=0, username=0x561a29270568 "cloud_admin", useroid=0, load_session_libraries=true, override_allow_connections=false,
out_dbname=0x0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/init/postinit.c:929
#32 0x0000561a285fa10b in PostgresMain (dbname=0x561a29270588 "postgres", username=0x561a29270568 "cloud_admin") at /home/christian/src/neon//vendor/postgres-v16/src/backend/tcop/postgres.c:4293
#33 0x0000561a28524ce4 in BackendRun (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:4465
#34 0x0000561a285245da in BackendStartup (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:4193
#35 0x0000561a285209c4 in ServerLoop () at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:1782
#36 0x0000561a2852030f in PostmasterMain (argc=3, argv=0x561a291c5fc0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:1466
#37 0x0000561a283dd987 in main (argc=3, argv=0x561a291c5fc0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/main/main.c:238
2025-01-14 20:42:01 +01:00
Christian Schwarz
9e1cd986d7
address "why mut" nits; https://github.com/neondatabase/neon/pull/10353#discussion_r1913685991 https://github.com/neondatabase/neon/pull/10353#discussion_r1913683065 https://github.com/neondatabase/neon/pull/10353#discussion_r1913683392
2025-01-14 15:30:33 +01:00
Christian Schwarz
47544dcc0b
simplify SmgrOpTimerState variant names, and add some doc comments; https://github.com/neondatabase/neon/pull/10353#discussion_r1913676569 and https://github.com/neondatabase/neon/pull/10353#discussion_r1913676824
2025-01-14 15:28:09 +01:00
Christian Schwarz
4e094d9638
rearrange code & inline HandleInner::shutdown() to minimize the diff
2025-01-14 15:12:46 +01:00
Christian Schwarz
c8bee86586
in some early WIP commit we had removed the loop{} inside get(); re-establish it one level down
2025-01-14 15:12:03 +01:00
Christian Schwarz
768a867dcf
doc comment fix
2025-01-14 14:54:15 +01:00
Christian Schwarz
3b65465e10
turns out with the switch to sync Mutex there's no reason for upgrade() to be async either
2025-01-14 14:53:41 +01:00
Christian Schwarz
e4ea706424
turns out PerTimelineState::shutdown() doesn't need to be async
2025-01-14 14:48:03 +01:00
Christian Schwarz
7034f54a9e
remove the earlier-commented-out assertions on arc reference counts, they were too whiteboxy to begin with
2025-01-14 14:45:25 +01:00
Christian Schwarz
d68c5ddf7e
avoid the tokio::sync::Mutex by wrapping the GateGuard into an Arc
2025-01-14 14:23:28 +01:00
Christian Schwarz
b95365b45d
Revert "experiment: what if we make Handle !Send so it can't be held across await points"
...
This reverts commit b44070d0c7 .
2025-01-14 13:32:12 +01:00
Christian Schwarz
b44070d0c7
experiment: what if we make Handle !Send so it can't be held across await points
...
Result: the whole point of having a Handle at hand is to be holding a GateGuard
while performing a Timeline operation. Dont' do it.
2025-01-14 13:30:48 +01:00
Christian Schwarz
6b22acba9b
avoid cloning the Arc<Timeline> on every handle upgrade/downgade, by wrapping it in yet another Arc
2025-01-14 12:28:37 +01:00
Christian Schwarz
22058d17d1
it turns out PerTimelineState need not store a Types::Timeline at all
2025-01-14 12:22:00 +01:00
Christian Schwarz
a8d096b72c
Revert "WIP experiment: avoid upgrading"
...
This reverts commit f6eb6fff9f .
2025-01-14 12:18:02 +01:00
Christian Schwarz
f6eb6fff9f
WIP experiment: avoid upgrading
2025-01-14 12:14:31 +01:00
Christian Schwarz
c868ceded0
WeakHandle should store weak ref to the GateGuard
2025-01-14 12:08:53 +01:00
Christian Schwarz
e82aa9419e
convert handles to named structs
2025-01-14 12:02:25 +01:00
Christian Schwarz
bfe9efefb8
test tenant::timeline::handle::tests::test_timeline_shutdown hangs
2025-01-14 11:56:08 +01:00
Christian Schwarz
62f63275b2
fix test_connection_handler_exit
2025-01-14 11:55:45 +01:00
Christian Schwarz
5762fdbf68
test failure at: 5b45f03aa2/pageserver/src/tenant/timeline/handle.rs (L958)
2025-01-14 11:39:46 +01:00
Christian Schwarz
5b45f03aa2
tests: comment out the strong_count / weak_count assertions,
2025-01-14 11:39:19 +01:00
Christian Schwarz
3fefa5b415
fix warnings
2025-01-14 11:31:58 +01:00
Christian Schwarz
6007a94f91
it compiles
2025-01-14 11:30:43 +01:00
Christian Schwarz
9e03dda0c3
handle downgrade during batching
2025-01-13 15:49:45 +01:00
Christian Schwarz
8a0a0d06a8
renames
2025-01-13 14:38:19 +01:00
Christian Schwarz
dda31b9cb6
adjust shutdown
2025-01-13 14:38:19 +01:00
Christian Schwarz
9591d8789c
WIP
2025-01-13 14:27:32 +01:00
Christian Schwarz
ee851d1127
Merge remote-tracking branch 'origin/main' into problame/throttle-before-batching
2025-01-10 20:38:08 +01:00
Christian Schwarz
d6a2b62cfb
grand refactor of SmgrOpTimer states
2025-01-10 20:29:44 +01:00
Christian Schwarz
ad5120197c
self-review
2025-01-10 20:28:20 +01:00
Christian Schwarz
4d496a29c2
following up to the last commit, the observation points that we use to calculate the various latency metrics are different, adjust for that
2025-01-10 20:26:09 +01:00
Christian Schwarz
8793e28ccb
throttling cancel-sensitivity
2025-01-10 20:25:24 +01:00
Christian Schwarz
aa8da1e621
move throttle into pagestream_read_message
2025-01-10 15:35:07 +01:00
Christian Schwarz
c36cf79ab5
page_service client: address todo / unused warning
2025-01-08 18:00:38 +01:00
Christian Schwarz
f6f947e4ec
Revert "debug cruft, likely will revert but this proved useful, esp log_if_slow"
...
This reverts commit 0fd67b27e5 .
Bunch of conflicts.
2025-01-07 12:26:04 +01:00
Christian Schwarz
987829e5c2
fix build failures on Linux
2025-01-07 12:13:37 +01:00
Christian Schwarz
c73e8c34c8
Merge remote-tracking branch 'origin/main' into vlad/read-path-concurrent-io
2025-01-07 12:06:52 +01:00
Christian Schwarz
4637389882
remov the "parallel" mode, as we won't ever enable this in practice and benchmarks have shown very limited upside over futures-unordered
2024-12-21 20:44:13 +01:00
Christian Schwarz
d844ed098a
fix some minor warnings
2024-12-21 20:37:21 +01:00
Christian Schwarz
d776ee66d7
avoid Arc<Gate> by having clonable GateGuard
2024-12-21 20:33:04 +01:00
Christian Schwarz
dc58846f0c
with the new approach we don't even need the no-slots patch of tokio-epoll-uring
2024-12-21 20:33:04 +01:00
Christian Schwarz
3396574d21
cargo fmt
2024-12-21 20:32:53 +01:00
Christian Schwarz
97a01bdcb4
pagebench: option for queue depth
2024-12-17 21:01:22 +01:00
Christian Schwarz
29fecda704
avoid killing walredo
2024-12-17 21:01:22 +01:00
Christian Schwarz
dc1d53f7be
name delta stack height variable
2024-12-17 21:01:22 +01:00
Christian Schwarz
8e6c01ddea
WIP fix: one task per connections to drive all the IO futures
2024-12-17 21:01:22 +01:00
Christian Schwarz
e209fd8d77
diagnosis: with the debug cruft from previous commit, we see "thread_local_system" hang forever
...
The reason is likely that one spawned IO futures kicks off
thread_local_system launch, then returns Pending.
Another IO future observes the once cell already locked and waits
for the first future to finish.
But that never happens.
It's a sort of priority inversion.
2024-12-17 18:48:32 +01:00
Christian Schwarz
0fd67b27e5
debug cruft, likely will revert but this proved useful, esp log_if_slow
2024-12-17 17:32:06 +01:00
Christian Schwarz
d962b44c20
fix the script (it stopped making changes beyond 6); this now creates 18-tall delta stacks for each page
2024-12-13 12:43:40 +01:00
Christian Schwarz
31fec1fb4b
add the script that I used to generate the delta stack
...
non-package-mode-py3.10christian@neon-hetzner-dev-christian:[~/src/neon/test_runner]: poetry run python3 deep_layers_with_delta.py
2024-12-12 20:25:16 +01:00
Christian Schwarz
87755bf80e
concurrent-futures: poll before pushing into FuturesUnordered (this will do io_uring submission in most cases)
2024-12-12 19:37:23 +01:00
Christian Schwarz
1f7f947119
results
...
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 47653
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.417 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.588 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.670 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 0.991 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 1.896 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.time: 20.1797
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_cpu_seconds_total: 15.1300
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1: 2209
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.4: 9683
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.8: 16795
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.16: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.32: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.64: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.128: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.256: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.512: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1024: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.9223372036854775807: 50022
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 46773
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.425 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.632 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.837 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.234 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.347 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.time: 20.2244
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_cpu_seconds_total: 15.9300
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1: 2072
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.4: 9362
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.8: 16558
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.16: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.32: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.64: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.128: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.256: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.512: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1024: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.9223372036854775807: 49119
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 47861
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.416 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.609 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.713 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.135 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 1.940 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.time: 20.1969
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_cpu_seconds_total: 15.7000
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1: 2282
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.4: 9597
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.8: 16926
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.16: 50189
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.32: 50189
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.64: 50189
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.128: 50189
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.256: 50189
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.512: 50189
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.1024: 50189
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.counters.pageserver_layers_visited_per_vectored_read_global_buckets.9223372036854775807: 50189
============================================================================================================================================================================================= 3 passed, 236 deselected in 101.20s (0:01:41) ==============================================================================================================================================================================================
2024-12-12 14:51:03 +01:00
Christian Schwarz
8b477ce4ee
super hacky way to get layer visit buckets
2024-12-12 14:50:47 +01:00
Christian Schwarz
02d0d89069
run bench on hetzner box
...
christian@neon-hetzner-dev-christian:[~/src/neon]: NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS=true DEFAULT_PG_VERSION=16 BUILD_TYPE=release poetry run pytest --alluredir ~/tmp/alluredir --clean-alluredir test_runner/performance -k 'test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant' --maxfail=1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 46336
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.429 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.607 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.705 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.138 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-serial-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.059 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 48772
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.408 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.592 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.701 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.139 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-parallel-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.231 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_tenants: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pgbench_scale: 136
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.duration: 20 s
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.n_clients: 1
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.config: 0
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.page_cache_size: 134217728 byte
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.pageserver_config_override.max_file_descriptors: 500000
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.request_count: 47102
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_mean: 0.422 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p95: 0.605 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99: 0.705 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.9: 1.124 ms
test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant[release-pg16-direct-futures-unordered-1-1-136-20].pageserver_max_throughput_getpage_at_latest_lsn.latency_percentiles.p99.99: 2.029 ms
============================================================================================================================================================================================= 3 passed, 236 deselected in 102.19s (0:01:42) ==============================================================================================================================================================================================
2024-12-12 14:18:51 +01:00
Christian Schwarz
309edebb90
repurpose test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant to measure latency improvement in unbatchable-pagestream but parallelizable workload (multiple layers visited)
2024-12-12 14:18:42 +01:00
Christian Schwarz
80aebce3d6
benc results on my box
...
christian@neon-hetzner-dev-christian:[~/src/neon]: DEFAULT_PG_VERSION=16 BUILD_TYPE=release poetry run pytest --alluredir ~/tmp/alluredir --clean-alluredir test_runner/performance/pageserver/test_page_service_batching.py -k 'test_throughput' --maxfail=1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.0999
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.0450
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.2530
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.3367
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.0713
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.0550
test_throughput[release-pg16-50-pipelining_config2-5-futures-unordered-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2825
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.5882
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.1176
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.5882
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3012
test_throughput[release-pg16-50-pipelining_config3-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4734
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.3162
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.8000
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.5333
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.8000
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3227
test_throughput[release-pg16-50-pipelining_config4-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4442
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2842
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.7647
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.1176
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.7647
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3135
test_throughput[release-pg16-50-pipelining_config5-5-futures-unordered-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4740
==================================================================================================================================================================================================== 6 passed, 9 deselected in 55.08s ====================================================================================================================================================================================================
2024-12-12 13:57:24 +01:00
Christian Schwarz
71b6aa2ab7
implement futuresunordered mode
2024-12-12 13:55:35 +01:00
Christian Schwarz
e051e916b6
results on my hetzner box
...
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.0955
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.0250
test_throughput[release-pg16-50-pipelining_config0-5-serial-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.time: 1.1962
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 1.2700
test_throughput[release-pg16-50-pipelining_config1-5-parallel-direct-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2611
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.5000
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.0556
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.5000
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.2850
test_throughput[release-pg16-50-pipelining_config2-5-serial-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4775
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.3033
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.6875
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.0625
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.6875
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3075
test_throughput[release-pg16-50-pipelining_config3-5-parallel-direct-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4777
==================================================================================================================================================================================================== 4 passed, 6 deselected in 38.06s ====================================================================================================================================================================================================
2024-12-12 13:37:13 +01:00
Christian Schwarz
7f55a32edb
parametrization over direct io mode (only direct io for now)
2024-12-12 13:36:05 +01:00
Christian Schwarz
3779370f08
results on my hetzner box
...
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Benchmark results ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.time: 0.7328
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 0.8850
test_throughput[release-pg16-50-pipelining_config0-5-serial-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.time: 0.7545
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_sum: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.pageserver_batch_size_histo_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.compute_getpage_count: 6,403.0000
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].counters.pageserver_cpu_seconds_total: 0.9667
test_throughput[release-pg16-50-pipelining_config1-5-parallel-100-128-batchable {'mode': 'serial'}].perfmetric.batching_factor: 1.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.1824
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.1111
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 297.8889
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.1111
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.2196
test_throughput[release-pg16-50-pipelining_config2-5-serial-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4883
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].tablesize_mib: 50.0000 MiB
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].effective_io_concurrency: 100.0000
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].readhead_buffer_size: 128.0000
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].config: 0.0000
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.time: 0.2743
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_sum: 6,401.6667
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_batch_size_histo_count: 298.4444
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.compute_getpage_count: 6,401.6667
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].counters.pageserver_cpu_seconds_total: 0.3350
test_throughput[release-pg16-50-pipelining_config3-5-parallel-100-128-batchable {'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].perfmetric.batching_factor: 21.4501
test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_mean: 0.145 ms
test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p95: 0.178 ms
test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p99: 0.199 ms
test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p99.9: 0.265 ms
test_latency[release-pg16-pipelining_config0-serial-{'mode': 'serial'}].latency_percentiles.p99.99: 0.366 ms
test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_mean: 0.168 ms
test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p95: 0.201 ms
test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p99: 0.224 ms
test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p99.9: 0.317 ms
test_latency[release-pg16-pipelining_config1-parallel-{'mode': 'serial'}].latency_percentiles.p99.99: 0.416 ms
test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.149 ms
test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.184 ms
test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.205 ms
test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.289 ms
test_latency[release-pg16-pipelining_config2-serial-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.359 ms
test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.180 ms
test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.219 ms
test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.244 ms
test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.341 ms
test_latency[release-pg16-pipelining_config3-parallel-{'max_batch_size': 1, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.522 ms
test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.154 ms
test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.189 ms
test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.211 ms
test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.268 ms
test_latency[release-pg16-pipelining_config4-serial-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.307 ms
test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_mean: 0.167 ms
test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p95: 0.211 ms
test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99: 0.238 ms
test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.9: 0.338 ms
test_latency[release-pg16-pipelining_config5-parallel-{'max_batch_size': 32, 'execution': 'concurrent-futures', 'mode': 'pipelined'}].latency_percentiles.p99.99: 0.407 ms
===================================================================================================================================================================================================== 10 passed in 120.65s (0:02:00) =====================================================================================================================================================================================================
2024-12-12 13:32:12 +01:00
Christian Schwarz
659da352a2
DO NOT MERGE trim down benchmark to what's relevant
2024-12-12 13:31:55 +01:00
Christian Schwarz
0e8ac43450
hacky parametrization of relevant benchmarks
2024-12-12 13:28:00 +01:00
Christian Schwarz
410283c9b1
serial mode without tasks
2024-12-12 12:26:15 +01:00
Christian Schwarz
4dc7434d0e
Merge remote-tracking branch 'origin/main' into vlad/read-path-concurrent-io
2024-12-10 13:22:03 +01:00
Vlad Lazar
c72e87f98d
Merge branch 'main' into vlad/read-path-concurrent-io
...
Fix merge conflict with https://github.com/neondatabase/neon/pull/9631 .
2024-11-11 11:29:38 +01:00
Vlad Lazar
dba696885d
review: do some testing of parallel IO
2024-11-04 14:42:49 +01:00
Vlad Lazar
657526a5be
review: make import scope tighter
2024-11-04 14:24:30 +01:00
Vlad Lazar
2508c64646
review: comment on end state of io concurrency
2024-11-04 14:14:21 +01:00
Vlad Lazar
34a43b0789
review: improve doc comment for planner
2024-11-04 14:12:44 +01:00
Vlad Lazar
9f95864c8a
review: remove matches! usage
2024-11-04 14:08:26 +01:00
Vlad Lazar
c446e3d3ab
chore: fixup comment
2024-11-04 14:07:20 +01:00
Vlad Lazar
6599034410
chore: clippy warn
2024-11-04 14:07:20 +01:00
Vlad Lazar
eafae92795
pageserver: make io serial by default
...
One can configure this via the NEON_PAGESERVER_VALUE_RECONSTRUCT_IO_CONCURRENCY
env var. A config is possible as well, but it's more work and this is
enough for experimentation.
2024-11-04 14:07:20 +01:00
Vlad Lazar
1cbbc35957
pageserver: fixup some tests using low level read apis
2024-11-04 14:07:20 +01:00
Vlad Lazar
cc57b34771
pageserver: reinstate inspect_image_layers
2024-11-04 14:07:20 +01:00
Vlad Lazar
6da0b5fac1
pagserver: further simplify read path error handling
2024-11-04 14:07:20 +01:00
Vlad Lazar
11990ef2fe
pageserver: use correct will_init for streaming planner
...
`BlobMeta::will_init` is not actually used on these code paths,
but let's be kind to future ourselves and make sure it's correct.
2024-11-04 14:07:20 +01:00
Vlad Lazar
0a840fbdbd
pageserver: purge read path caching support
...
We now only store indices in the page cache.
This commit removes any caching support from the read path.
2024-11-04 14:07:20 +01:00
Vlad Lazar
0ad736e2a6
pageserver: clean up comms between pending IOs and awaiter
...
Previously, each pending IO sent a stupid buffer which was just what it
read from the layer file for the key. This made the awaiter code
confusing because on disk images in layer files don't keep the enum wrapper,
but the ones in delta layers do.
This commit introduces a type to make this a bit easier and cleans up
the IO awaiting code a bit. We also avoid some rather silly serialize,
deserialize dance.
2024-11-04 14:07:20 +01:00
Vlad Lazar
bde1694cb9
pageserver: remove unused on_key_error method
...
It's obvious the method is unused, but let's break down error handling
of the read path. Before this patch set, all IO was done sequentially
for a given read. If one IO failed, then the error would stop the
processing of the read path.
Now that we are doing IO concurrently when serving a read request
it's not trivial to implement the same error handling approach.
As of this commit, one IO failure does not stop any other IO requests.
When awaiting for the IOs to complete, we stop waiting on the first
failure, but we do not signal any other pending IOs to complete and
they will just fail silently.
Long term, we need a better approach for this. Two broad ideas:
1. Introduce some synchronization between pending IO tasks such
that new IOs are not issued after the first failure
2. Cancel any pending IOs when the first error is discovered
2024-11-04 14:07:20 +01:00
Vlad Lazar
e3aa85d722
pagserver: do concurrent IO on the read path
...
Previously, the read path would wait for all IO in one layer visit to
complete before visiting the next layer (if a subsequent visit is
required). IO within one layer visit was also sequential.
With this patch we gain the ability to issue IO concurrently within one
layer visit **and** to move on to the next layer without waiting for IOs
from the previous visit to complete.
This is a slightly cleaned up version of the work done at the Lisbon
hackathon.
2024-11-04 14:07:17 +01:00