rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-03 19:42:55 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	1367332447	Separate walkeeper and pageserver sources into different directories. The integration tests, which depend on both walkeeper and pageserver, are moved into yet another directory, 'integration_tests'.	2021-04-06 13:15:26 +03:00
Konstantin Knizhnik	13f507f0b4	Calculate records CRC in wal decoder	2021-04-02 10:30:56 +03:00
Konstantin Knizhnik	02ca245081	Port wal_acceptor to rust	2021-04-02 10:30:56 +03:00
anastasia	853c130ff0	[issue #7 ] CLI parse subcommands	2021-03-29 15:59:28 +03:00
Stas Kelvich	8a80d055b9	daemon mode for pageserver	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	4c0be32bf5	Implement Text User Interface to show log streams in multiple "windows" Switch to 'slog' crate for logging, it gives us the flexibility that we need for the widget to scroll logs on TUI	2021-03-29 15:59:28 +03:00
Stas Kelvich	9e89c1e2cd	add CLI options to pageserver	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	303a546aba	Refactor locking in page cache, and use async I/O for WAL redo Story on why: The apply_wal_records() function spawned the special postgres process to perform WAL redo. That was done in a blocking fashion: it launches the process, then it writes the command to its stdin, then it reads the result from its stdout. I wanted to also read the child process's stderr, and forward it to the page server's log (which is just the page server's stderr ATM). That has classic potential for deadlock: the child process might block trying to write to stderr/stdout, if the parent isn't reading it. So the parent needs to perform the read/write with the child's stdin/stdout/stderr in an async fashion. So I refactored the code in walredo.c into async style. But it started to hang. It took me a while to figure it out; async makes for really ugly stacktraces, it's hard to figure out what's going on. The call path goes like this: Page service -> get_page_at_lsn() in page cache -> apply_wal_records() the page service is written in async style. And I refactored apply_wal_recorsds() to also be async. BUT, get_page_at_lsn() acquires a lock, in a blocking fashion. The lock-up happened like this: - a GetPage@LSN request arrives. The asynch handler thread calls get_page_at_lsn(), which acquires a lock. While holding the lock, it calls apply_wal_records(). - apply_wal_records() launches the child process, and waits on it using async functions - more GetPage@LSN requests arrive. They also call get_page_at_lsn(). But because the lock is already held, they all block The subsequent GetPage@LSN calls that block waiting on the lock use up all the async handler threads. All the threads are locked up, so there is no one left to make progress on the apply_wal_records() call, so it never releases the lock. Deadlock So my lesson here is that mixing async and blocking styles is painful. Googling around, this is a well known problem, there are long philosophical discussions on "what color is your function". My plan to fix that is to move the WAL redo into a separate thread or thread pool, and have the GetPage@LSN handlers communicate with it using channels. Having a separate thread pool for it makes sense anyway in the long run. We'll want to keep the postgres process around, rather than launch it separately every time we need to reconstruct a page. Also, when we're not busy reconstructing pages that are needed right now by GetPage@LSN calls, we want to proactively apply incoming WAL records from a "backlog". Solution: Launch a dedicated thread for WAL redo at startup. It has an event loop, where it listens on a channel for requests to apply WAL. When a page server thread needs some WAL to be applied, it sends the request on the channel, and waits for response. After it's done the WAL redo process puts the new page image in the page cache, and wakes up the requesting thread using a condition variable. This also needed locking changes in the page cache. Each cache entry now has a reference counter and a dedicated Mutex to protect just the entry.	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	e7694a1d5a	page server: use logger module	2021-03-29 15:59:28 +03:00
Stas Kelvich	851e910324	WIP: local control plane	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	8bb282dcad	page server: Restore base backup from S3 at page server startup This includes a "launch.sh" script that I've been using to initialize and launch the Postgres + Page Server combination.	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	af7ebb6395	Implement WAL redo. When a page is requested from the page cache (GetPage@LSN), launch postgres in special "WAL redo" mode to reconstruct that page version. Plus a bunch of misc fixes to the WAL decoding code.	2021-03-29 15:59:27 +03:00
Stas Kelvich	626b4e9987	basic support of postgres backend protocol	2021-03-29 15:59:27 +03:00
Heikki Linnakangas	3058021ca7	WIP: beginnings of page server page cache	2021-03-29 15:57:15 +03:00
Heikki Linnakangas	9a9480e8c9	Add WIP support for decoding WAL records.	2021-03-29 15:57:15 +03:00
Stas Kelvich	c856a2f2d2	pageserver stub	2021-03-29 15:57:15 +03:00

16 Commits