rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-16 09:52:54 +00:00

Author	SHA1	Message	Date
Christian Schwarz	cf75eb7d86	Revert "hacky experiment: what if we had more walredo procs => doesn't move the needle on throughput" This reverts commit `9fffe6e60d`.	2025-01-16 16:46:49 +01:00
Christian Schwarz	6ededa17e2	Revert "experiment: buffered socket with 128k buffer size; not super needle-moving" This reverts commit `7e13e5fc4a`.	2025-01-16 16:42:10 +01:00
Christian Schwarz	7e13e5fc4a	experiment: buffered socket with 128k buffer size; not super needle-moving	2025-01-16 16:42:01 +01:00
Christian Schwarz	45358bcb65	in the deepl_layers_with_delta script, make the stack height an argument	2025-01-16 16:41:15 +01:00
Christian Schwarz	9fffe6e60d	hacky experiment: what if we had more walredo procs => doesn't move the needle on throughput	2025-01-16 13:58:23 +01:00
Christian Schwarz	2ff0a4ae82	extract the l0stack generator into a reusable python module	2025-01-16 13:24:34 +01:00
Christian Schwarz	5b77a6d3ce	address clippy	2025-01-15 19:38:21 +01:00
Christian Schwarz	8c5005ff59	rename IoConcurrency::{todo=>serial} and remove deprecation warning	2025-01-15 19:38:05 +01:00
Christian Schwarz	f8218ac5fc	Revert "investigation: add log_if_slow => shows that the io_futures are slow" This reverts commit `e81fa7137e`.	2025-01-15 19:34:37 +01:00
Christian Schwarz	40470c66cd	remove opportunistic poll, it seems slightly beneficial for perf esp before I remembered to configure pipelining, the unpipelined configuration achieved ~10% higher tput. In any way, makes sense to not do the opportunisitc polling because it registers the wrong waker.	2025-01-15 19:34:05 +01:00
Christian Schwarz	9b9479881a	extend script with instructions to configure batching	2025-01-15 19:30:15 +01:00
Christian Schwarz	af11b201bd	now the issue is no longer reproducible, maybe it was the barriers?	2025-01-15 19:10:45 +01:00
Christian Schwarz	8fafff37c5	remove the whole barriers business	2025-01-15 19:00:00 +01:00
Christian Schwarz	e81fa7137e	investigation: add log_if_slow => shows that the io_futures are slow	2025-01-15 18:56:07 +01:00
Christian Schwarz	e60738f029	it's reproducible before the merge, so, continuing to investigate and fix here	2025-01-15 18:43:01 +01:00
Christian Schwarz	f75b07a160	I find that if I ever go beyond queue-depth=4, something in the pageserver locks up.	2025-01-15 18:31:40 +01:00
Christian Schwarz	a5524fcf4d	add comment to use queue-depthed pagebench to the script	2025-01-15 18:31:29 +01:00
Christian Schwarz	351da2349e	Merge branch 'problame/hung-shutdown/fix' into vlad/read-path-concurrent-io	2025-01-15 17:09:02 +01:00
Christian Schwarz	c545d227b9	review doc comment	2025-01-15 16:24:39 +01:00
Christian Schwarz	a4fc6a92c9	fix cargo doc	2025-01-15 16:10:04 +01:00
Christian Schwarz	2205736262	doc comment & one fixup	2025-01-15 14:27:08 +01:00
Christian Schwarz	5f9ddbae2f	Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix	2025-01-15 00:25:11 +01:00
Christian Schwarz	173f18832c	fixup	2025-01-15 00:24:59 +01:00
Christian Schwarz	23bd5833e1	Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix	2025-01-15 00:21:54 +01:00
Christian Schwarz	dedd524d7e	refinements	2025-01-15 00:21:28 +01:00
Christian Schwarz	0340f00228	post-merge fix the handling of the new pagestream Test message, so that the regression test now passes non-package-mode-py3.10christian@neon-hetzner-dev-christian:[~/src/neon]: BUILD_TYPE=debug DEFAULT_PG_VERSION=16 poetry run pytest ./test_runner/regress/test_page_service_batching_regressions.py --timeout=0 --pdb	2025-01-14 23:56:35 +01:00
Christian Schwarz	366ff9ffcc	Merge branch 'problame/hung-shutdown/demo-hypothesis' into problame/hung-shutdown/fix	2025-01-14 23:51:53 +01:00
Christian Schwarz	a8f9b564be	fix `cd pageserver && cargo clippy --features testing` build	2025-01-14 23:50:22 +01:00
Christian Schwarz	5450e54dab	bump ci	2025-01-14 22:47:16 +01:00
Christian Schwarz	53b05c4ba0	cleanups to make CI pass (well, fail because the bug isn't fixed yet)	2025-01-14 22:45:09 +01:00
Christian Schwarz	1f7d173235	Merge remote-tracking branch 'origin/main' into problame/hung-shutdown/demo-hypothesis	2025-01-14 22:33:20 +01:00
Christian Schwarz	8454e19a0f	address warnings and such	2025-01-14 22:28:08 +01:00
Christian Schwarz	45e08d0aa5	it repros	2025-01-14 22:16:27 +01:00
Erik Grinaker	6debb49b87	pageserver: coalesce index uploads when possible (#10248 ) ## Problem With upload queue reordering in #10218, we can easily get into a situation where multiple index uploads are queued back to back, which can't be parallelized. This will happen e.g. when multiple layer flushes enqueue layer/index/layer/index/... and the layers skip the queue and are uploaded in parallel. These index uploads will incur serial S3 roundtrip latencies, and may block later operations. Touches #10096. ## Summary of changes When multiple back-to-back index uploads are ready to upload, only upload the most recent index and drop the rest.	2025-01-14 21:10:17 +00:00
Christian Schwarz	9a02bc0cfd	try to repro root cause hypothesis for https://github.com/neondatabase/neon/issues/10309 This approach here doesn't work because it slows down all the responses. the workload() thread gets stuck in auth, prob with zero pipeline depth 0x00007fa28fe48e63 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007fa28fe48e63 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x0000561a285bf44e in WaitEventSetWaitBlock (set=0x561a292723e8, cur_timeout=9999, occurred_events=0x7ffd1f11c970, nevents=1) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/ipc/latch.c:1535 #2 0x0000561a285bf338 in WaitEventSetWait (set=0x561a292723e8, timeout=9999, occurred_events=0x7ffd1f11c970, nevents=1, wait_event_info=117440512) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/ipc/latch.c:1481 #3 0x00007fa2904a7345 in call_PQgetCopyData (shard_no=0, buffer=0x7ffd1f11cad0) at /home/christian/src/neon//pgxn/neon/libpagestore.c:703 #4 0x00007fa2904a7aec in pageserver_receive (shard_no=0) at /home/christian/src/neon//pgxn/neon/libpagestore.c:899 #5 0x00007fa2904af471 in prefetch_read (slot=0x561a292863b0) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:644 #6 0x00007fa2904af26b in prefetch_wait_for (ring_index=0) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:596 #7 0x00007fa2904b489d in neon_read_at_lsnv (rinfo=..., forkNum=MAIN_FORKNUM, base_blockno=0, request_lsns=0x7ffd1f11cd60, buffers=0x7ffd1f11cd30, nblocks=1, mask=0x0) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3024 #8 0x00007fa2904b4f34 in neon_read_at_lsn (rinfo=..., forkNum=MAIN_FORKNUM, blkno=0, request_lsns=..., buffer=0x7fa28b969000) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3104 #9 0x00007fa2904b515d in neon_read (reln=0x561a292ef448, forkNum=MAIN_FORKNUM, blkno=0, buffer=0x7fa28b969000) at /home/christian/src/neon//pgxn/neon/pagestore_smgr.c:3146 #10 0x0000561a285f1ed5 in smgrread (reln=0x561a292ef448, forknum=MAIN_FORKNUM, blocknum=0, buffer=0x7fa28b969000) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/smgr/smgr.c:567 #11 0x0000561a285a528a in ReadBuffer_common (smgr=0x561a292ef448, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffd1f11cf1b) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:1134 #12 0x0000561a285a47e3 in ReadBufferExtended (reln=0x7fa28ce1c888, forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:781 #13 0x0000561a285a46ad in ReadBuffer (reln=0x7fa28ce1c888, blockNum=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/storage/buffer/bufmgr.c:715 #14 0x0000561a2811d511 in _bt_getbuf (rel=0x7fa28ce1c888, blkno=0, access=1) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtpage.c:852 #15 0x0000561a2811d1b2 in _bt_metaversion (rel=0x7fa28ce1c888, heapkeyspace=0x7ffd1f11d9f0, allequalimage=0x7ffd1f11d9f1) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtpage.c:747 #16 0x0000561a28126220 in _bt_first (scan=0x561a292d0348, dir=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtsearch.c:1465 #17 0x0000561a28121a07 in btgettuple (scan=0x561a292d0348, dir=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/nbtree/nbtree.c:246 #18 0x0000561a28111afa in index_getnext_tid (scan=0x561a292d0348, direction=ForwardScanDirection) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/indexam.c:583 #19 0x0000561a28111d14 in index_getnext_slot (scan=0x561a292d0348, direction=ForwardScanDirection, slot=0x561a292d01a8) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/indexam.c:675 #20 0x0000561a2810fbcc in systable_getnext (sysscan=0x561a292d0158) at /home/christian/src/neon//vendor/postgres-v16/src/backend/access/index/genam.c:512 #21 0x0000561a287a1ee1 in SearchCatCacheMiss (cache=0x561a292a0f80, nkeys=1, hashValue=3028463561, hashIndex=1, v1=94670359561576, v2=0, v3=0, v4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1440 #22 0x0000561a287a1d8a in SearchCatCacheInternal (cache=0x561a292a0f80, nkeys=1, v1=94670359561576, v2=0, v3=0, v4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1360 #23 0x0000561a287a1a4f in SearchCatCache (cache=0x561a292a0f80, v1=94670359561576, v2=0, v3=0, v4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/catcache.c:1214 #24 0x0000561a287be060 in SearchSysCache (cacheId=10, key1=94670359561576, key2=0, key3=0, key4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/syscache.c:817 #25 0x0000561a287be66f in GetSysCacheOid (cacheId=10, oidcol=1, key1=94670359561576, key2=0, key3=0, key4=0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/cache/syscache.c:1055 #26 0x0000561a286319a5 in get_role_oid (rolname=0x561a29270568 "cloud_admin", missing_ok=true) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/adt/acl.c:5251 #27 0x0000561a283d42ca in check_hba (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/hba.c:2493 #28 0x0000561a283d5537 in hba_getauthmethod (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/hba.c:3067 #29 0x0000561a283c6fd7 in ClientAuthentication (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/libpq/auth.c:395 #30 0x0000561a287dc943 in PerformAuthentication (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/init/postinit.c:247 #31 0x0000561a287dd9cd in InitPostgres (in_dbname=0x561a29270588 "postgres", dboid=0, username=0x561a29270568 "cloud_admin", useroid=0, load_session_libraries=true, override_allow_connections=false, out_dbname=0x0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/utils/init/postinit.c:929 #32 0x0000561a285fa10b in PostgresMain (dbname=0x561a29270588 "postgres", username=0x561a29270568 "cloud_admin") at /home/christian/src/neon//vendor/postgres-v16/src/backend/tcop/postgres.c:4293 #33 0x0000561a28524ce4 in BackendRun (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:4465 #34 0x0000561a285245da in BackendStartup (port=0x561a29268de0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:4193 #35 0x0000561a285209c4 in ServerLoop () at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:1782 #36 0x0000561a2852030f in PostmasterMain (argc=3, argv=0x561a291c5fc0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/postmaster/postmaster.c:1466 #37 0x0000561a283dd987 in main (argc=3, argv=0x561a291c5fc0) at /home/christian/src/neon//vendor/postgres-v16/src/backend/main/main.c:238	2025-01-14 20:42:01 +01:00
Erik Grinaker	e58e29e639	pageserver: limit number of upload queue tasks (#10384 ) ## Problem The upload queue can currently schedule an arbitrary number of tasks. This can both spawn an unbounded number of Tokio tasks, and also significantly slow down upload queue scheduling as it's quadratic in number of operations. Touches #10096. ## Summary of changes Limit the number of inprogress tasks to the remote storage upload concurrency. While this concurrency limit is shared across all tenants, there's certainly no point in scheduling more than this -- we could even consider setting the limit lower, but don't for now to avoid artificially constraining tenants.	2025-01-14 18:01:14 +00:00
Heikki Linnakangas	d36112d20f	Simplify compute dockerfile by setting PATH just once (#10357 ) By setting PATH in the 'pg-build' layer, all the extension build layers will inherit. No need to pass PG_CONFIG to all the various make invocations either: once pg_config is in PATH, the Makefiles will pick it up from there.	2025-01-14 17:02:35 +00:00
Erik Grinaker	ffaa52ff5d	pageserver: reorder upload queue when possible (#10218 ) ## Problem The upload queue currently sees significant head-of-line blocking. For example, index uploads act as upload barriers, and for every layer flush we schedule a layer and index upload, which effectively serializes layer uploads. Resolves #10096. ## Summary of changes Allow upload queue operations to bypass the queue if they don't conflict with preceding operations, increasing parallelism. NB: the upload queue currently schedules an explicit barrier after every layer flush as well (see #8550). This must be removed to enable parallelism. This will require a better mechanism for compaction backpressure, see e.g. #8390 or #5415.	2025-01-14 16:31:59 +00:00
John Spray	aa7323a384	storage controller: quality of life improvements for AZ handling (#10379 ) ## Problem Since https://github.com/neondatabase/neon/pull/9916, the preferred AZ of a tenant is much more impactful, and we would like to make it more visible in tooling. ## Summary of changes - Include AZ in node describe API - Include AZ info in node & tenant outputs in CLI - Add metrics for per-node shard counts, labelled by AZ - Add a CLI for setting preferred AZ on a tenant - Extend AZ-setting API+CLI to handle None for clearing preferred AZ	2025-01-14 15:30:43 +00:00
Christian Schwarz	2466a2f977	page_service: throttle individual requests instead of the batched request (#10353 ) ## Problem Before this PR, the pagestream throttle was applied weighted on a per-batch basis. This had several problems: 1. The throttle occurence counters were only bumped by `1` instead of `batch_size`. 2. The throttle wait time aggregator metric only counted one wait time, irrespective of `batch_size`. That makes sense in some ways of looking at it but not in others. 3. If the last request in the batch runs into the throttle, the other requests in the batch are also throttled, i.e., over-throttling happens (theoretical, didn't measure it in practice). ## Solution It occured to me that we can simply push the throttling upwards into `pagestream_read_message`. This has the added benefit that in pipeline mode, the `executor` stage will, if it is idle, steal whatever requests already made it into the `spsc_fold` and execute them; before this change, that was not the case - the throttling happened in the `executor` stage instead of the `batcher` stage. ## Code Changes There are two changes in this PR: 1. Lifting up the throttling into the `pagestream_read_message` method. 2. Move the throttling metrics out of the `Throttle` type into `SmgrOpMetrics`. Unlike the other smgr metrics, throttling is per-tenant, hence the Arc. 3. Refactor the `SmgrOpTimer` implementation to account for the new observation states, and simplify its design. 4. Drive-by-fix flush time metrics. It was using the same `now` in the `observe_guard` every time. The `SmgrOpTimer` is now a state machine. Each observation point moves the state machine forward. If a timer object is dropped early some "pair"-like metrics still require an increment or observation. That's done in the Drop implementation, by driving the state machine to completion.	2025-01-14 15:28:01 +00:00
Alex Chi Z.	9bdb14c1c0	fix(pageserver): ensure initial image layers have correct key ranges (#10374 ) ## Problem Discovered during the relation dir refactor work. If we do not create images as in this patch, we would get two set of image layers: ``` 0000...METADATA_KEYS 0000...REL_KEYS ``` They overlap at the same LSN and would cause data loss for relation keys. This doesn't happen in prod because initial image layer generation is never called, but better to be fixed to avoid future issues with the reldir refactors. ## Summary of changes * Consolidate create_image_layers call into a single one. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-14 15:27:48 +00:00
Christian Schwarz	9e1cd986d7	address "why mut" nits; https://github.com/neondatabase/neon/pull/10353#discussion_r1913685991 https://github.com/neondatabase/neon/pull/10353#discussion_r1913683065 https://github.com/neondatabase/neon/pull/10353#discussion_r1913683392	2025-01-14 15:30:33 +01:00
Christian Schwarz	47544dcc0b	simplify SmgrOpTimerState variant names, and add some doc comments; https://github.com/neondatabase/neon/pull/10353#discussion_r1913676569 and https://github.com/neondatabase/neon/pull/10353#discussion_r1913676824	2025-01-14 15:28:09 +01:00
Christian Schwarz	4e094d9638	rearrange code & inline HandleInner::shutdown() to minimize the diff	2025-01-14 15:12:46 +01:00
Christian Schwarz	c8bee86586	in some early WIP commit we had removed the loop{} inside get(); re-establish it one level down	2025-01-14 15:12:03 +01:00
Christian Schwarz	768a867dcf	doc comment fix	2025-01-14 14:54:15 +01:00
Christian Schwarz	3b65465e10	turns out with the switch to sync Mutex there's no reason for upgrade() to be async either	2025-01-14 14:53:41 +01:00
Christian Schwarz	e4ea706424	turns out PerTimelineState::shutdown() doesn't need to be async	2025-01-14 14:48:03 +01:00
Christian Schwarz	7034f54a9e	remove the earlier-commented-out assertions on arc reference counts, they were too whiteboxy to begin with	2025-01-14 14:45:25 +01:00
Christian Schwarz	d68c5ddf7e	avoid the tokio::sync::Mutex by wrapping the GateGuard into an Arc	2025-01-14 14:23:28 +01:00

1 2 3 4 5 ...

7037 Commits