Commit Graph

24 Commits

Author SHA1 Message Date
Eric Seppanen
d744ddee7c bin_ser: preserve IO errors on deserialization
We're starting to deserialize directly from the TcpStream now, which
means that a socket error gets logged as "deserialize error". That's not
very helpful; preserve the io::Error so it can be logged.
2021-05-19 14:36:41 -07:00
Eric Seppanen
513696a485 break wal_service into multiple pieces
The pieces are:
base Connection
SendWal
ReplicationHandler

There are lots of other changes here:
- Put the replication reader in a background thread; this gets rid
  of some hacks with nonblocking mode.
- Stop manually buffering input data; use BufReader instead.
- Use BytesMut a lot less; use Read/Write traits where possible.
2021-05-19 14:36:41 -07:00
Eric Seppanen
71e93faed7 fix endian typos in BeSer
Cut/paste error: BeSer was using the little-endian config in two places.

Add better unit tests so this can't happen again.
2021-05-13 19:04:17 -07:00
Eric Seppanen
e5df42feef add workspace_hack dependency to zenith_utils
I didn't think this mattered, but it does: if you add a dependency to
zenith_utils, but forget to request a feature you need, the crate will
build from the workspace root, but not by itself.

It's probably better to pull in the whole dependency tree.

This leaves one problem unsolved: the missing feature above will now be
a latent bug. If that feature gets removed later by other crates, and
then the workspace_hack Cargo.toml is updated, this missing feature will
become a build failure.
2021-05-10 18:21:45 -07:00
Eric Seppanen
60d66267a9 add serde support to Lsn type
A serialized Lsn and a serialized u64 should be identical.
2021-05-10 16:21:05 -07:00
Eric Seppanen
36c12247b9 add bin_ser module
This module adds two traits that implement bincode-based serialization.
BeSer implements methods for big-endian encoding/decoding.
LeSer implements methods for little-endian encoding/decoding.

Right now, the BeSer and LeSer methods have the same names, meaning you
can't `use` them both at the same time. This is intended to be a safety
mechanism: mixing big-endian and little-endian encoding in the same file
is error-prone. There are ways around this, but the easiest fix is to
put the big-endian code and little-endian code in different files or
submodules.
2021-05-10 16:21:05 -07:00
anastasia
1591f058c6 implement Debug for Lsn type 2021-05-05 16:38:32 +03:00
Heikki Linnakangas
96beffb3c5 Add tests for the Lsn::fetch_max function. 2021-04-27 13:43:39 +03:00
Heikki Linnakangas
cff671c1bd Remove duplicated LSN fields from the page cache.
Having multiple copies of the same values is a source of confusion.
Commit da9bf5dc63 fixed one race condition caused by that, for example.
See also discussion at
https://github.com/zenithdb/zenith/issues/57#issuecomment-824393470

This changes SeqWait.advance() to return the old number, and not panic if
you try to move the value backwards. The caller should check for that and
act accordingly.
2021-04-27 10:32:39 +03:00
Konstantin Knizhnik
3b09a74f58 Implement offloading of old WAL files to S3 in walkeeper 2021-04-26 16:23:00 +03:00
Heikki Linnakangas
bc652e965e Save old 'async' version of SeqWait, in case we need it later.
It is currently unused, and is not built as part of 'cargo build', but
seems like a shame to throw it away completely.
2021-04-26 13:30:10 +03:00
Heikki Linnakangas
3b9e7fc5e6 Use explicit threads.
Remove 'async' usage a much as feasible. Async code is harder to debug,
and mixing async and non-async code is a recipe for confusion and bugs.

There are a couple of exceptions:

- The code in walredo.rs, which needs to read and write to the child
  process simultaneously, still uses async. It's more convenient there.
  The 'async' usage is carefully limited to just the functions that
  communicate with the child process.

- Code in walreceiver.rs that uses tokio-postgres to do streaming
  replication. We have to use async there, because tokio-postgres is
  async. Most rust-postgres functionality has non-async wrappers, but
  not the new replication client code. The async usage is very limited
  here, too: we use just block_on to call the tokio-postgres functions.

The code in 'page_service.rs' now launches a dedicated thread for each
connection.

This replaces tokio::sync::channel with std::sync:mpsc in
'seqwait.rs', to make that non-async. It's not a drop-in replacement,
though: std::sync::mpsc doesn't support multiple consumers, so we cannot
share a channel between multiple waiters. So this removes the code to
check if an existing channel can be reused, and creates a new one for
each waiter. That created another problem: BTreeMap cannot hold
duplicates, so I replaced that with BinaryHeap.

Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo
manager and PageCache are replaced with std::sync::mpsc. (There is no
separate 'oneshot' channel in the standard library.)

Fixes github issue #58, and coincidentally also issue #66.
2021-04-26 13:07:51 +03:00
Eric Seppanen
96b6f350a7 add test cases for Lsn math and AtomicLsn 2021-04-25 19:37:02 -07:00
Eric Seppanen
648755a25e add Lsn::block_offset, remaining_in_block, calc_padding
Replace open-coded math with member fns.
2021-04-25 19:37:02 -07:00
Eric Seppanen
07d0241076 add AtomicLsn
AtomicLsn is a wrapper around AtomicU64 that has load() and store()
members that are cheap (on x86, anyway) and can be safely used in any
context.

This commit uses AtomicLsn in the page cache, and fixes up some
downstream code that manually implemented LSN formatting.

There's also a bugfix to the logging in wait_lsn, which prints the
wrong lsn value.
2021-04-25 19:37:02 -07:00
Eric Seppanen
d760446053 remove Lsn::sub in favor of sub_checked
There is only one place doing subtraction, and it had a manually
implemented check.
2021-04-25 19:37:02 -07:00
Eric Seppanen
f62ce4bcf7 make seqwait generic
SeqWait can use any type that is Ord + Debug + Copy. Debug is not
strictly necessary, but allows us to keep the panic message if a caller
wants the sequence number to go backwards.
2021-04-25 19:37:02 -07:00
Eric Seppanen
3d3eb0ed16 add Lsn type
This type is a zero-cost wrapper for a u64, meant to help code
communicate with precision what that value means.

It implements Display and Debug. Display "{}" will format as
"1234ABCD:5678CDEF" while Debug will format as Lsn{1234567890}.
2021-04-25 19:37:02 -07:00
Eric Seppanen
fe79082e29 require documentation in seqwait.rs 2021-04-23 15:01:22 -07:00
Eric Seppanen
8beaf76c85 SeqWait: don't do wakeups under the lock
Clippy pointed out that `drop(waiters)` didn't do anything, because
there was a misplaced ";" causing `waiters` to be a unit type `()`.

This change makes it do what was intended: the lock should be dropped
first, then the wakeups should be processed.
2021-04-23 14:16:34 -07:00
Konstantin Knizhnik
9e7c45cb72 Merge with master 2021-04-22 09:45:13 +03:00
Eric Seppanen
8060e17b50 add SeqWait
SeqWait adds a way to .await the arrival of some sequence number.
It provides wait_for(num) which is an async fn, and advance(num) which
is synchronous.

This should be useful in solving the page cache deadlocks, and may be
useful in other areas too.

This implementation still uses a Mutex internally, but only for a brief
critical section. If we find this code broadly useful and start to care
more about executor stalls due to unfair thread scheduling, there might
be ways to make it lock-free.
2021-04-21 18:02:13 -07:00
Eric Seppanen
92e4f4b3b6 cargo fmt 2021-04-20 17:59:56 -07:00
Eric Seppanen
f387769203 add zenith_utils crate
This is a place for code that's shared between other crates in this
repository.
2021-04-20 11:11:29 -07:00