mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-08 05:52:55 +00:00
Story on why: The apply_wal_records() function spawned the special postgres process to perform WAL redo. That was done in a blocking fashion: it launches the process, then it writes the command to its stdin, then it reads the result from its stdout. I wanted to also read the child process's stderr, and forward it to the page server's log (which is just the page server's stderr ATM). That has classic potential for deadlock: the child process might block trying to write to stderr/stdout, if the parent isn't reading it. So the parent needs to perform the read/write with the child's stdin/stdout/stderr in an async fashion. So I refactored the code in walredo.c into async style. But it started to hang. It took me a while to figure it out; async makes for really ugly stacktraces, it's hard to figure out what's going on. The call path goes like this: Page service -> get_page_at_lsn() in page cache -> apply_wal_records() the page service is written in async style. And I refactored apply_wal_recorsds() to also be async. BUT, get_page_at_lsn() acquires a lock, in a blocking fashion. The lock-up happened like this: - a GetPage@LSN request arrives. The asynch handler thread calls get_page_at_lsn(), which acquires a lock. While holding the lock, it calls apply_wal_records(). - apply_wal_records() launches the child process, and waits on it using async functions - more GetPage@LSN requests arrive. They also call get_page_at_lsn(). But because the lock is already held, they all block The subsequent GetPage@LSN calls that block waiting on the lock use up all the async handler threads. All the threads are locked up, so there is no one left to make progress on the apply_wal_records() call, so it never releases the lock. Deadlock So my lesson here is that mixing async and blocking styles is painful. Googling around, this is a well known problem, there are long philosophical discussions on "what color is your function". My plan to fix that is to move the WAL redo into a separate thread or thread pool, and have the GetPage@LSN handlers communicate with it using channels. Having a separate thread pool for it makes sense anyway in the long run. We'll want to keep the postgres process around, rather than launch it separately every time we need to reconstruct a page. Also, when we're not busy reconstructing pages that are needed right now by GetPage@LSN calls, we want to proactively apply incoming WAL records from a "backlog". Solution: Launch a dedicated thread for WAL redo at startup. It has an event loop, where it listens on a channel for requests to apply WAL. When a page server thread needs some WAL to be applied, it sends the request on the channel, and waits for response. After it's done the WAL redo process puts the new page image in the page cache, and wakes up the requesting thread using a condition variable. This also needed locking changes in the page cache. Each cache entry now has a reference counter and a dedicated Mutex to protect just the entry.
25 lines
872 B
TOML
25 lines
872 B
TOML
[package]
|
|
name = "pageserver"
|
|
version = "0.1.0"
|
|
authors = ["Stas Kelvich <stas@zenith.tech>"]
|
|
edition = "2018"
|
|
|
|
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
|
|
|
|
[dependencies]
|
|
crossbeam-channel = "0.5.0"
|
|
rand = "0.8.3"
|
|
regex = "1.4.5"
|
|
bytes = "1.0.1"
|
|
byteorder = "1.4.3"
|
|
futures = "0.3.13"
|
|
lazy_static = "1.4.0"
|
|
log = "0.4.14"
|
|
stderrlog = "0.5.1"
|
|
rust-s3 = { git = "https://github.com/hlinnaka/rust-s3", features = ["no-verify-ssl"] }
|
|
tokio = { version = "1.3.0", features = ["full"] }
|
|
tokio-stream = { version = "0.1.4" }
|
|
tokio-postgres = { git = "https://github.com/kelvich/rust-postgres", branch = "replication_rebase" }
|
|
postgres-protocol = { git = "https://github.com/kelvich/rust-postgres", branch = "replication_rebase" }
|
|
postgres = { git = "https://github.com/kelvich/rust-postgres", branch = "replication_rebase" }
|