Commit Graph

266 Commits

Author SHA1 Message Date
Heikki Linnakangas
8e57c2e413 Provide more context to a panic.
I just bumped into this panic, but couldn't reproduce. Not sure what
happened, but let's provide more context.
2021-05-05 15:47:11 +03:00
Heikki Linnakangas
4dd63821bd Improve trace log messages in page server 2021-05-05 10:39:28 +03:00
Heikki Linnakangas
eeec1a3dcb Refactor the way truncations are handled.
Currently, truncation is implemented in the RocksDB repository by storing
a special sentinel entry for each page that was truncated away. Hide that
implementation detail better in the abstract Repository interface, so
that caller doesn't need to construct the special sentinel WAL record.

While we're at it, refactor the CacheEntryContent struct to an enum.
2021-05-05 10:39:28 +03:00
Heikki Linnakangas
b484b896b6 Refactor the functionality page_cache.rs.
This moves things around:

- The PageCache is split into two structs: Repository and Timeline. A
  Repository holds multiple Timelines. In order to get a page version,
  you must first get a reference to the Repository, then the Timeline
  in the repository, and finally call the get_page_at_lsn() function
  on the Timeline object. This sounds complicated, but because each
  connection from a compute node, and each WAL receiver, only deals
  with one timeline at a time, the callers can get the reference to
  the Timeline object once and hold onto it. The Timeline corresponds
  most closely to the old PageCache object.

- Repository and Timeline are now abstract traits, so that we can
  support multiple implementations. I don't actually expect us to have
  multiple implementations for long. We have the RocksDB
  implementation now, but as soon as we have a different
  implementation that's usable, I expect that we will retire the
  RocksDB implementation. But I think this abstraction works as good
  documentation in any case: it's now easier to see what the interface
  for storing and loading pages from the repository is, by looking at
  the Repository/Timeline traits. They abstract traits are in
  repository.rs, and the RocksDB implementation of them is in
  repository/rocksdb.rs.

- page_cache.rs is now a "switchboard" to get a handle to the
  repository. Currently, the page server can only handle one
  repository at a time, so there isn't much there, but in the future
  we might do multi-tenancy there.
2021-05-05 10:37:36 +03:00
Eric Seppanen
2e0d45d092 Switch to upstream rust-s3
The local fork of rust-s3 has some code to support Google Cloud, but
that PR no longer applies upstream, and will need significant changes
before it can be re-submitted.

In the meantime, we might as well just use the most similar upstream
release. The benefit of switching is that it fixes a feature-resolution
bug that was causing us to build 24 more crates than needed (mostly
async-std and its dependencies).
2021-05-04 12:02:00 -07:00
Eric Seppanen
ce646ea845 use tokio::try_join instead of futures::try_join
We don't use the `futures` crate much. Remove one of only two references
to it (tokio has the identical macro).
2021-05-03 18:46:10 -07:00
Eric Seppanen
a3818dee58 pin dependencies to versions
If there isn't any version specified for a dependency crate, Cargo may
choose a newer version. This could happen when Cargo.lock is updated
("cargo update") but can also happen unexpectedly when adding or
changing other dependencies. This can allow API-breaking changes to be
picked up, breaking the build.

To prevent this, specify versions for all dependencies. Cargo is still
allowed to pick newer versions that are (hopefully) non-breaking, by
analyzing the semver version number.

There are two special cases here:

1. serde_derive::{Serialize, Deserialize} isn't really used any more. It
was only a separate crate in the past because of compiler limitations.
Nowadays, people turn on the "derive" feature of the serde crate and
use serde::{Serialize, Deserialize}.

2. parse_duration is unmaintained and has an open security issue. (gh
iss. 87) That issue probably isn't critical for us because of where we
use that crate, but it's probably still better to pin the version so we
can't get hit with an API-breaking change at an awkward time.
2021-05-03 14:02:10 -07:00
anastasia
1cdeba9db7 [issue #18] log module name and position in the file 2021-05-03 15:17:51 +03:00
Konstantin Knizhnik
651a8139f5 Fix bug in transaction_id_set_status_bit 2021-04-30 19:24:00 +03:00
Konstantin Knizhnik
eea6f0898e Restore CLOG from snapshot 2021-04-30 14:22:47 +03:00
Heikki Linnakangas
086c0ad829 Remove unused 'apply_pending' field. 2021-04-30 12:44:06 +03:00
Eric Seppanen
b77597bd99 remove old Cargo.lock files
When using a cargo workspace (defined by the root Cargo.toml), there is
one shared Cargo.lock file at the root.
2021-04-29 10:31:01 -07:00
anastasia
1369145e83 code cleanup 2021-04-29 18:41:42 +03:00
anastasia
b49164a1d4 cargo fmt 2021-04-29 18:41:42 +03:00
anastasia
e7b112aacc Refactor pg_constants. Move them to postgres_ffi/ 2021-04-29 18:41:42 +03:00
Eric Seppanen
975b2d12dc cargo fmt 2021-04-28 10:01:58 -07:00
Heikki Linnakangas
41a3772e90 Replace pgbuild.sh with a Makefile
This allows building both Zenith and PostgreSQL in one command. The
command is 'make'

Reviewed-by: Arseny Sher <sher-ars@yandex.ru>
2021-04-28 16:54:45 +03:00
anastasia
421d586953 code cleanup for XLogRecord decoding 2021-04-28 13:56:27 +03:00
anastasia
ef37eb96b9 refactor XLogRecord reading 2021-04-28 13:56:27 +03:00
anastasia
d311f708b6 handle subtrans in COMMIT/ABORT records 2021-04-28 13:56:27 +03:00
Heikki Linnakangas
c7f54af1f1 Refactor page_cache <-> walredo interface.
Make the caller of request_redo() responsible for gathering the WAL records
to redo, and for storing the reconstructed page image back in the page
cache. This leaves the WAL redo manager purely responsible for dealing with
the postgres child process, removing its dependency on the PageCache.
2021-04-27 21:43:56 +03:00
Heikki Linnakangas
cff671c1bd Remove duplicated LSN fields from the page cache.
Having multiple copies of the same values is a source of confusion.
Commit da9bf5dc63 fixed one race condition caused by that, for example.
See also discussion at
https://github.com/zenithdb/zenith/issues/57#issuecomment-824393470

This changes SeqWait.advance() to return the old number, and not panic if
you try to move the value backwards. The caller should check for that and
act accordingly.
2021-04-27 10:32:39 +03:00
Eric Seppanen
4acdcbe90f clippy cleanup #3
Fix issues raised by clippy. Mostly trivial ones, though some allow
4-5 lines of code to be reduced to 1.
2021-04-26 12:35:35 -07:00
Eric Seppanen
fdf6829de5 cargo fmt 2021-04-26 09:36:22 -07:00
anastasia
b361558a8a fix typo in transaction replay code 2021-04-26 18:35:26 +03:00
Konstantin Knizhnik
c59830fd01 Do not restart wal-redo-postgres 2021-04-26 17:57:29 +03:00
Heikki Linnakangas
f617115467 Remove obsolete comment on async usage in the page cache 2021-04-26 14:12:57 +03:00
Heikki Linnakangas
3b9e7fc5e6 Use explicit threads.
Remove 'async' usage a much as feasible. Async code is harder to debug,
and mixing async and non-async code is a recipe for confusion and bugs.

There are a couple of exceptions:

- The code in walredo.rs, which needs to read and write to the child
  process simultaneously, still uses async. It's more convenient there.
  The 'async' usage is carefully limited to just the functions that
  communicate with the child process.

- Code in walreceiver.rs that uses tokio-postgres to do streaming
  replication. We have to use async there, because tokio-postgres is
  async. Most rust-postgres functionality has non-async wrappers, but
  not the new replication client code. The async usage is very limited
  here, too: we use just block_on to call the tokio-postgres functions.

The code in 'page_service.rs' now launches a dedicated thread for each
connection.

This replaces tokio::sync::channel with std::sync:mpsc in
'seqwait.rs', to make that non-async. It's not a drop-in replacement,
though: std::sync::mpsc doesn't support multiple consumers, so we cannot
share a channel between multiple waiters. So this removes the code to
check if an existing channel can be reused, and creates a new one for
each waiter. That created another problem: BTreeMap cannot hold
duplicates, so I replaced that with BinaryHeap.

Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo
manager and PageCache are replaced with std::sync::mpsc. (There is no
separate 'oneshot' channel in the standard library.)

Fixes github issue #58, and coincidentally also issue #66.
2021-04-26 13:07:51 +03:00
Konstantin Knizhnik
abcecc992e [refer #67] Replace File.write with File.write_all 2021-04-26 09:30:03 +03:00
Eric Seppanen
648755a25e add Lsn::block_offset, remaining_in_block, calc_padding
Replace open-coded math with member fns.
2021-04-25 19:37:02 -07:00
Eric Seppanen
1c775bdcac Drop LSNs from PageCacheStats
There's no clear way to sum LSNs across timelines, so just remove them
for now.
2021-04-25 19:37:02 -07:00
Eric Seppanen
07d0241076 add AtomicLsn
AtomicLsn is a wrapper around AtomicU64 that has load() and store()
members that are cheap (on x86, anyway) and can be safely used in any
context.

This commit uses AtomicLsn in the page cache, and fixes up some
downstream code that manually implemented LSN formatting.

There's also a bugfix to the logging in wait_lsn, which prints the
wrong lsn value.
2021-04-25 19:37:02 -07:00
Eric Seppanen
d760446053 remove Lsn::sub in favor of sub_checked
There is only one place doing subtraction, and it had a manually
implemented check.
2021-04-25 19:37:02 -07:00
Eric Seppanen
01e239afa3 apply Lsn type everywhere
Use the `Lsn` type everywhere that I can find u64 being used to
represent an LSN.
2021-04-25 19:37:02 -07:00
Konstantin Knizhnik
da9bf5dc63 Store atomic last_valid_lsn after seqwait_lsn.advance 2021-04-25 14:11:31 +03:00
Eric Seppanen
1cb9b5523b cargo fmt 2021-04-24 16:03:44 -07:00
Konstantin Knizhnik
968cd8f20c Do not delete versions in GC 2021-04-24 23:52:50 +03:00
Konstantin Knizhnik
3e007b0eb9 Do not delete versions in GC 2021-04-24 22:32:22 +03:00
Heikki Linnakangas
5e0cc89de8 Re-group functions in page_cache.rs, and add comments. 2021-04-24 17:54:31 +03:00
Heikki Linnakangas
0fc05569e0 Improve comments in page_cache.rs.
Explain the mix of async and other functions in the page cache.
2021-04-24 17:54:28 +03:00
Heikki Linnakangas
021462da3e Refactor put_wal_record() so that it doesn't need to be marked 'async'.
It was only marked as async because it calls relsize_get(), but
relsize_get() will in fact never block when it's called with the max
LSN value, like put_wal_record() does. Refactor to avoid marking
put_wal_record() as 'async'.
2021-04-24 17:54:26 +03:00
Heikki Linnakangas
93d7d2ae2a Refactor pagecache <-> Wal redo communication
After the rocksdb patch (commit 6aa38d3f7d), the CacheEntry struct was
used only momentarily in the communication between the page_cache and
the walredo modules. It was in fact not stored in any cache anymore.
For clarity, refactor the communication.

There is now a WalRedoManager struct, with `request_redo` function,
that can be used to request WAL replay of a particular page. It sends
a request to a queue like before, but the queue has been replaced with
tokio::sync::mpsc. Previously, the resulting page image was stored
directly in the CacheEntry, and the requestor was notified using a
condition variable. Now, the requestor includes a 'oneshot' channel in
the request, and the WAL redo manager sends the response there.
2021-04-24 12:24:04 +03:00
Konstantin Knizhnik
499b4f7eba Log garbage collection statistics 2021-04-23 18:02:58 +03:00
Konstantin Knizhnik
52ee3a2bac Support CREATE DATABASE command 2021-04-23 17:03:56 +03:00
anastasia
b64bd2a8af handle XLOG_DBASE_CREATE in waldecoder 2021-04-23 14:06:09 +03:00
anastasia
573f1ada83 [issue #56] Fix race at postgres instance + walreceiver start. Uses postgres/vendor issue_56_rebased branch. 2021-04-23 13:35:30 +03:00
Konstantin Knizhnik
59b23fef64 Wait for WAL receiver to start 2021-04-23 12:40:29 +03:00
Konstantin Knizhnik
ee87e6aad3 Sum log files in case of test failure 2021-04-22 22:14:41 +03:00
Konstantin Knizhnik
ff3488fadd Fix bug in do_gc 2021-04-22 19:37:33 +03:00
Konstantin Knizhnik
4a0a9e748c Enable garbage collector 2021-04-22 17:52:15 +03:00