rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-19 14:10:37 +00:00

Author	SHA1	Message	Date
Stas Kelvich	8c07a36fda	Remove last_valid_lsn tracking in wal_receiver. There are two main reasons for that: a) Latest unfinished record may disapper after compute node restart, so let's try not leak volatile part of the WAL into the repository. Always use last_valid_record instead. That change requires different getPage@LSN logic in postgres -- we need to ask LSN's that point to some complete record instead of GetFlushRecPtr() that can point in the middle of the record. That was already done by @knizhnik to deal with the same problem during the work on `postgres --sync-safekeepers`. Postgres will use LSN's aligned on 0x8 boundary in get_page requests, so we also need to be sure that last_valid_record is aligned. b) Switch to get_last_record_lsn() in basebackup@no_lsn. When compute node is running without safekeepers and streams WAL directly to pageserver it is important to match basebackup LSN and LSN of replication start. Before this commit basebackup@no_lsn was waiting for last_valid_lsn and walreceiver started replication with last_record_lsn, which can be less. So replication was failing since compute node doesn't have requested WAL.	2021-09-02 12:06:12 +03:00
Heikki Linnakangas	cff671c1bd	Remove duplicated LSN fields from the page cache. Having multiple copies of the same values is a source of confusion. Commit `da9bf5dc63` fixed one race condition caused by that, for example. See also discussion at https://github.com/zenithdb/zenith/issues/57#issuecomment-824393470 This changes SeqWait.advance() to return the old number, and not panic if you try to move the value backwards. The caller should check for that and act accordingly.	2021-04-27 10:32:39 +03:00
Heikki Linnakangas	3b9e7fc5e6	Use explicit threads. Remove 'async' usage a much as feasible. Async code is harder to debug, and mixing async and non-async code is a recipe for confusion and bugs. There are a couple of exceptions: - The code in walredo.rs, which needs to read and write to the child process simultaneously, still uses async. It's more convenient there. The 'async' usage is carefully limited to just the functions that communicate with the child process. - Code in walreceiver.rs that uses tokio-postgres to do streaming replication. We have to use async there, because tokio-postgres is async. Most rust-postgres functionality has non-async wrappers, but not the new replication client code. The async usage is very limited here, too: we use just block_on to call the tokio-postgres functions. The code in 'page_service.rs' now launches a dedicated thread for each connection. This replaces tokio::sync:⌚:channel with std::sync:mpsc in 'seqwait.rs', to make that non-async. It's not a drop-in replacement, though: std::sync::mpsc doesn't support multiple consumers, so we cannot share a channel between multiple waiters. So this removes the code to check if an existing channel can be reused, and creates a new one for each waiter. That created another problem: BTreeMap cannot hold duplicates, so I replaced that with BinaryHeap. Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo manager and PageCache are replaced with std::sync::mpsc. (There is no separate 'oneshot' channel in the standard library.) Fixes github issue #58, and coincidentally also issue #66.	2021-04-26 13:07:51 +03:00
Eric Seppanen	f62ce4bcf7	make seqwait generic SeqWait can use any type that is Ord + Debug + Copy. Debug is not strictly necessary, but allows us to keep the panic message if a caller wants the sequence number to go backwards.	2021-04-25 19:37:02 -07:00
Eric Seppanen	fe79082e29	require documentation in seqwait.rs	2021-04-23 15:01:22 -07:00
Eric Seppanen	8beaf76c85	SeqWait: don't do wakeups under the lock Clippy pointed out that `drop(waiters)` didn't do anything, because there was a misplaced ";" causing `waiters` to be a unit type `()`. This change makes it do what was intended: the lock should be dropped first, then the wakeups should be processed.	2021-04-23 14:16:34 -07:00
Konstantin Knizhnik	9e7c45cb72	Merge with master	2021-04-22 09:45:13 +03:00
Eric Seppanen	8060e17b50	add SeqWait SeqWait adds a way to .await the arrival of some sequence number. It provides wait_for(num) which is an async fn, and advance(num) which is synchronous. This should be useful in solving the page cache deadlocks, and may be useful in other areas too. This implementation still uses a Mutex internally, but only for a brief critical section. If we find this code broadly useful and start to care more about executor stalls due to unfair thread scheduling, there might be ways to make it lock-free.	2021-04-21 18:02:13 -07:00

8 Commits