rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 14:02:55 +00:00

Author	SHA1	Message	Date
Eric Seppanen	1767208563	remove tokio-postgres from dependencies	2021-05-10 15:24:55 -07:00
Eric Seppanen	4b46693c81	adapt to new upstream tokio-postgres replication interface Switch over to a newer version of rust-postgres PR752. A few minor changes are required: - PgLsn::UNDEFINED -> PgLsn::from(0) - PgTimestamp -> SystemTime	2021-05-10 15:24:55 -07:00
Eric Seppanen	df5a55c445	add workspace_hack crate Our builds can be a little inconsistent, because Cargo doesn't deal well with workspaces where there are multiple crates which have different dependencies that select different features. As a workaround, copy what other big rust projects do: add a workspace_hack crate. This crate just pins down a set of dependencies and features that satisfies all of the workspace crates. The benefits are: - running `cargo build` from one of the workspace subdirectories now works without rebuilding anything. - running `cargo install` works (without rebuilding anything). - making small dependency changes is much less likely to trigger large dependency rebuilds.	2021-05-07 13:08:31 -07:00
Heikki Linnakangas	e5e5c3e067	Tidy up the `parse_relfilename` function. A few things that Eric commented on at PR #96: - Use thiserror to simplify the implemention of FilePathError - Add unit tests - Fix a few complaints from clippy	2021-05-07 11:01:34 +03:00
Heikki Linnakangas	61af9bb889	Move a few functions that have been copy-pasted around to shared module.	2021-05-06 21:57:10 +03:00
Eric Seppanen	2e0d45d092	Switch to upstream rust-s3 The local fork of rust-s3 has some code to support Google Cloud, but that PR no longer applies upstream, and will need significant changes before it can be re-submitted. In the meantime, we might as well just use the most similar upstream release. The benefit of switching is that it fixes a feature-resolution bug that was causing us to build 24 more crates than needed (mostly async-std and its dependencies).	2021-05-04 12:02:00 -07:00
Eric Seppanen	aac913f9dc	use nix kill instead of spawning a process Since we are now calling the syscall directly, read_pidfile can now parse an integer. We also verify the pid is >= 1, because calling kill on 0 or negative values goes straight to crazytown.	2021-05-03 23:20:51 -07:00
Eric Seppanen	a3818dee58	pin dependencies to versions If there isn't any version specified for a dependency crate, Cargo may choose a newer version. This could happen when Cargo.lock is updated ("cargo update") but can also happen unexpectedly when adding or changing other dependencies. This can allow API-breaking changes to be picked up, breaking the build. To prevent this, specify versions for all dependencies. Cargo is still allowed to pick newer versions that are (hopefully) non-breaking, by analyzing the semver version number. There are two special cases here: 1. serde_derive::{Serialize, Deserialize} isn't really used any more. It was only a separate crate in the past because of compiler limitations. Nowadays, people turn on the "derive" feature of the serde crate and use serde::{Serialize, Deserialize}. 2. parse_duration is unmaintained and has an open security issue. (gh iss. 87) That issue probably isn't critical for us because of where we use that crate, but it's probably still better to pin the version so we can't get hit with an API-breaking change at an awkward time.	2021-05-03 14:02:10 -07:00
Eric Seppanen	7d104e5660	update dependencies Running 'cargo update' happens to synchronize a few transitive dependencies, allowing us to build slightly fewer crates.	2021-05-02 16:01:18 -07:00
Konstantin Knizhnik	3b09a74f58	Implement offloading of old WAL files to S3 in walkeeper	2021-04-26 16:23:00 +03:00
Heikki Linnakangas	3b9e7fc5e6	Use explicit threads. Remove 'async' usage a much as feasible. Async code is harder to debug, and mixing async and non-async code is a recipe for confusion and bugs. There are a couple of exceptions: - The code in walredo.rs, which needs to read and write to the child process simultaneously, still uses async. It's more convenient there. The 'async' usage is carefully limited to just the functions that communicate with the child process. - Code in walreceiver.rs that uses tokio-postgres to do streaming replication. We have to use async there, because tokio-postgres is async. Most rust-postgres functionality has non-async wrappers, but not the new replication client code. The async usage is very limited here, too: we use just block_on to call the tokio-postgres functions. The code in 'page_service.rs' now launches a dedicated thread for each connection. This replaces tokio::sync:⌚:channel with std::sync:mpsc in 'seqwait.rs', to make that non-async. It's not a drop-in replacement, though: std::sync::mpsc doesn't support multiple consumers, so we cannot share a channel between multiple waiters. So this removes the code to check if an existing channel can be reused, and creates a new one for each waiter. That created another problem: BTreeMap cannot hold duplicates, so I replaced that with BinaryHeap. Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo manager and PageCache are replaced with std::sync::mpsc. (There is no separate 'oneshot' channel in the standard library.) Fixes github issue #58, and coincidentally also issue #66.	2021-04-26 13:07:51 +03:00
Heikki Linnakangas	93d7d2ae2a	Refactor pagecache <-> Wal redo communication After the rocksdb patch (commit `6aa38d3f7d`), the CacheEntry struct was used only momentarily in the communication between the page_cache and the walredo modules. It was in fact not stored in any cache anymore. For clarity, refactor the communication. There is now a WalRedoManager struct, with `request_redo` function, that can be used to request WAL replay of a particular page. It sends a request to a queue like before, but the queue has been replaced with tokio::sync::mpsc. Previously, the resulting page image was stored directly in the CacheEntry, and the requestor was notified using a condition variable. Now, the requestor includes a 'oneshot' channel in the request, and the WAL redo manager sends the response there.	2021-04-24 12:24:04 +03:00
Konstantin Knizhnik	28f2800275	Merge branch 'main' into rocksdb_pageserver	2021-04-22 14:00:57 +03:00
Heikki Linnakangas	8af5cbedb1	Move xlog_utils.rs to postgres_ffi module. I had copy-pasted these functions to a few other places. Clean that up, move them to a common module, and add some comments.	2021-04-22 13:22:34 +03:00
Konstantin Knizhnik	ed30f2096c	Disable GC by default	2021-04-22 11:30:27 +03:00
Konstantin Knizhnik	da9508716d	Address issues from Eric's review	2021-04-22 10:37:52 +03:00
Konstantin Knizhnik	9e7c45cb72	Merge with master	2021-04-22 09:45:13 +03:00
Eric Seppanen	2cd730d31f	page_cache: replace long mutex sleep with SeqWait When calling into the page cache, it was possible to wait on a blocking mutex, which can stall the async executor. Replace that sleep with a SeqWait::wait_for(lsn).await so that the executor can go on with other work while we wait. Change walreceiver_works to an AtomicBool to avoid the awkwardness of taking the lock, then dropping it while we call wait_for and then acquiring it again to do real work.	2021-04-21 18:02:13 -07:00
Eric Seppanen	8060e17b50	add SeqWait SeqWait adds a way to .await the arrival of some sequence number. It provides wait_for(num) which is an async fn, and advance(num) which is synchronous. This should be useful in solving the page cache deadlocks, and may be useful in other areas too. This implementation still uses a Mutex internally, but only for a brief critical section. If we find this code broadly useful and start to care more about executor stalls due to unfair thread scheduling, there might be ways to make it lock-free.	2021-04-21 18:02:13 -07:00
Konstantin Knizhnik	4f3f0304c2	Merge branch 'main' into rocksdb_pageserver	2021-04-21 19:05:02 +03:00
Konstantin Knizhnik	c981f4ad66	Implement garbage collection of unused versions	2021-04-21 19:04:30 +03:00
Heikki Linnakangas	e911427872	Remove some unnecessary dependencies	2021-04-21 16:42:12 +03:00
Konstantin Knizhnik	07507274c0	Merge branch 'main' into rocksdb_pageserver	2021-04-21 16:06:31 +03:00
Eric Seppanen	f387769203	add zenith_utils crate This is a place for code that's shared between other crates in this repository.	2021-04-20 11:11:29 -07:00
Heikki Linnakangas	3600b33f1c	Implement "timelines" in page server This replaces the page server's "datadir" concept. The Page Server now always works with a "Zenith Repository". When you initialize a new repository with "zenith init", it runs initdb and loads an initial basebackup of the freshly-created cluster into the repository, on "main" branch. Repository can hold multiple "timelines", which can be given human-friendly names, making them "branches". One page server simultaneously serves all timelines stored in the repository, and you can have multiple Postgres compute nodes connected to the page server, as long they all operate on a different timeline. There is a new command "zenith branch", which can be used to fork off new branches from existing branches. The repository uses the directory layout desribed as Repository format v1 in https://github.com/zenithdb/rfcs/pull/5. It it highly inefficient: - we never create new snapshots. So in practice, it's really just a base backup of the initial empty cluster, and everything else is reconstructed by redoing all WAL - when you create a new timeline, the base snapshot and all WAL is copied from the new timeline to the new one. There is no smarts about referencing the old snapshots/wal from the ancestor timeline. To support all this, this commit includes a bunch of other changes: - Implement "basebackup" funtionality in page server. When you initialize a new compute node with "zenith pg create", it connects to the page server, and requests a base backup of the Postgres data directory on that timeline. (the base backup excludes user tables, so it's not as bad as it sounds). - Have page server's WAL receiver write the WAL into timeline dir. This allows running a Page Server and Compute Nodes without a WAL safekeeper, until we get around to integrate that properly into the system. (Even after we integrate WAL safekeeper, this is perhaps how this will operate when you want to run the system on your laptop.) - restore_datadir.rs was renamed to restore_local_repo.rs, and heavily modified to use the new format. It now also restores all WAL. - Page server no longer scans and restores everything into memory at startup. Instead, when the first request is made for a timeline, the timeline is slurped into memory at that point. - The responsibility for telling page server to "callmemaybe" was moved into Postgres libpqpagestore code. Also, WAL producer connstring cannot be specified in the pageserver's command line anymore. - Having multiple "system identifiers" in the same page server is no longer supported. I repurposed much of that code to support multiple timelines, instead. - Implemented very basic, incomplete, support for PostgreSQL's Extended Query Protocol in page_service.rs. Turns out that rust-postgres' copy_out() function always uses the extended query protocol to send out the command, and I'm using that to stream the base backup from the page server. TODO: I haven't fixed the WAL safekeeper for this scheme, so all the integration tests involving safekeepers are failing. My plan is to modify the safekeeper to know about Zenith timelines, too, and modify it to work with the same Zenith repository format. It only needs to care about the '.zenith/timelines/<timeline>/wal' directories.	2021-04-20 19:11:27 +03:00
Konstantin Knizhnik	8aa3013ec2	Merge branch 'main' into rocksdb_pageserver	2021-04-19 16:28:29 +03:00
Eric Seppanen	b32cc6a088	pageserver: change over some error handling to anyhow+thiserror This is a first attempt at a new error-handling strategy: - Use anyhow::Error as the first choice for easy error handling - Use thiserror to generate local error types for anything that needs it (no error type is available to us) or will be inspected or matched on by higher layers.	2021-04-18 23:06:35 -07:00
Eric Seppanen	35e0099ac6	pin remote rust-s3 dependency to a git hash Using the hash should allow us to change the remote repo and propagate that change to user builds without that change becoming visible at a random time. It's unfortunate that we can't declare this dependency once in the top-level Cargo.toml; that feature request is rust-lang rfc 2906.	2021-04-16 15:26:11 -07:00
Eric Seppanen	4ff248515b	remote unnecessary dependencies between peer crates These dependencies make cargo rebuild more than is strictly necessary. Removing them makes the build a little faster.	2021-04-16 15:25:43 -07:00
Eric Seppanen	e8032f26e6	adopt new tokio-postgres:replication branch This PR has evolved a lot; jump to the newer version. This should make it easier to handle keepalive messages.	2021-04-16 08:29:47 -07:00
lubennikovaav	82dc1e82ba	Restore pageserver from s3 or local datadir (#9 ) * change pageserver --skip-recovery option to --restore-from=[s3\|local] * implement restore from local pgdata * add simple test for local restore	2021-04-14 21:14:10 +03:00
Stas Kelvich	c5f379bff3	[WIP] Implement CLI pg part	2021-04-13 18:58:22 +03:00
Stas Kelvich	59163cf3b3	Rework controle_plane code to reuse it in CLI. Move all paths from control_plane to local_env which can set them for testing environment or for local installation.	2021-04-10 12:09:20 +03:00
Konstantin Knizhnik	07fb30747a	Store pageserver data in RocksDB	2021-04-08 19:39:30 +03:00
Heikki Linnakangas	1367332447	Separate walkeeper and pageserver sources into different directories. The integration tests, which depend on both walkeeper and pageserver, are moved into yet another directory, 'integration_tests'.	2021-04-06 13:15:26 +03:00
Konstantin Knizhnik	13f507f0b4	Calculate records CRC in wal decoder	2021-04-02 10:30:56 +03:00
Konstantin Knizhnik	02ca245081	Port wal_acceptor to rust	2021-04-02 10:30:56 +03:00
anastasia	d1ef8a1784	[issue #7 ] CLI parse args for pg subcommand	2021-03-29 15:59:28 +03:00
Stas Kelvich	8a80d055b9	daemon mode for pageserver	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	0cec3b5ed7	Update dependencies with "cargo update". Notably, this updates the version of rust-postgres so that it doesn't panic if the replication connection is lost unexpectedly.	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	4c0be32bf5	Implement Text User Interface to show log streams in multiple "windows" Switch to 'slog' crate for logging, it gives us the flexibility that we need for the widget to scroll logs on TUI	2021-03-29 15:59:28 +03:00
Stas Kelvich	9e89c1e2cd	add CLI options to pageserver	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	303a546aba	Refactor locking in page cache, and use async I/O for WAL redo Story on why: The apply_wal_records() function spawned the special postgres process to perform WAL redo. That was done in a blocking fashion: it launches the process, then it writes the command to its stdin, then it reads the result from its stdout. I wanted to also read the child process's stderr, and forward it to the page server's log (which is just the page server's stderr ATM). That has classic potential for deadlock: the child process might block trying to write to stderr/stdout, if the parent isn't reading it. So the parent needs to perform the read/write with the child's stdin/stdout/stderr in an async fashion. So I refactored the code in walredo.c into async style. But it started to hang. It took me a while to figure it out; async makes for really ugly stacktraces, it's hard to figure out what's going on. The call path goes like this: Page service -> get_page_at_lsn() in page cache -> apply_wal_records() the page service is written in async style. And I refactored apply_wal_recorsds() to also be async. BUT, get_page_at_lsn() acquires a lock, in a blocking fashion. The lock-up happened like this: - a GetPage@LSN request arrives. The asynch handler thread calls get_page_at_lsn(), which acquires a lock. While holding the lock, it calls apply_wal_records(). - apply_wal_records() launches the child process, and waits on it using async functions - more GetPage@LSN requests arrive. They also call get_page_at_lsn(). But because the lock is already held, they all block The subsequent GetPage@LSN calls that block waiting on the lock use up all the async handler threads. All the threads are locked up, so there is no one left to make progress on the apply_wal_records() call, so it never releases the lock. Deadlock So my lesson here is that mixing async and blocking styles is painful. Googling around, this is a well known problem, there are long philosophical discussions on "what color is your function". My plan to fix that is to move the WAL redo into a separate thread or thread pool, and have the GetPage@LSN handlers communicate with it using channels. Having a separate thread pool for it makes sense anyway in the long run. We'll want to keep the postgres process around, rather than launch it separately every time we need to reconstruct a page. Also, when we're not busy reconstructing pages that are needed right now by GetPage@LSN calls, we want to proactively apply incoming WAL records from a "backlog". Solution: Launch a dedicated thread for WAL redo at startup. It has an event loop, where it listens on a channel for requests to apply WAL. When a page server thread needs some WAL to be applied, it sends the request on the channel, and waits for response. After it's done the WAL redo process puts the new page image in the page cache, and wakes up the requesting thread using a condition variable. This also needed locking changes in the page cache. Each cache entry now has a reference counter and a dedicated Mutex to protect just the entry.	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	e7694a1d5a	page server: use logger module	2021-03-29 15:59:28 +03:00
Stas Kelvich	851e910324	WIP: local control plane	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	8bb282dcad	page server: Restore base backup from S3 at page server startup This includes a "launch.sh" script that I've been using to initialize and launch the Postgres + Page Server combination.	2021-03-29 15:59:28 +03:00
Heikki Linnakangas	af7ebb6395	Implement WAL redo. When a page is requested from the page cache (GetPage@LSN), launch postgres in special "WAL redo" mode to reconstruct that page version. Plus a bunch of misc fixes to the WAL decoding code.	2021-03-29 15:59:27 +03:00
Stas Kelvich	626b4e9987	basic support of postgres backend protocol	2021-03-29 15:59:27 +03:00
Heikki Linnakangas	3058021ca7	WIP: beginnings of page server page cache	2021-03-29 15:57:15 +03:00
Heikki Linnakangas	9a9480e8c9	Add WIP support for decoding WAL records.	2021-03-29 15:57:15 +03:00

1 2

51 Commits