Commit Graph

22 Commits

Author SHA1 Message Date
Konstantin Knizhnik
f82c3eb5e2 Enable wal proposer test 2021-04-30 15:18:32 +03:00
Konstantin Knizhnik
eea6f0898e Restore CLOG from snapshot 2021-04-30 14:22:47 +03:00
Eric Seppanen
6c7ea82a61 Disable test_embedded_wal_proposer without compiler warning 2021-04-29 15:10:04 -07:00
Konstantin Knizhnik
68aa2febc9 Disable test_embedded_wal_proposer test 2021-04-29 19:27:17 +03:00
Konstantin Knizhnik
f491a22d85 Add test for embedded WAL acceptor 2021-04-29 14:41:13 +03:00
Konstantin Knizhnik
26115818b7 Test for embedded wal acceptor 2021-04-29 10:48:56 +03:00
Eric Seppanen
975b2d12dc cargo fmt 2021-04-28 10:01:58 -07:00
anastasia
ab61ce2267 Fix merge conflict, add more comments to test_acceptors_unavailability 2021-04-28 17:24:31 +03:00
Konstantin Knizhnik
14168c7aa7 Increase downtime timeout to avoid address already in use error and fix checking for elapsed time 2021-04-28 17:24:31 +03:00
anastasia
7a8501d12f [issue #73] fix race in test_acceptors_unavailability test 2021-04-28 17:24:31 +03:00
Eric Seppanen
92e4f4b3b6 cargo fmt 2021-04-20 17:59:56 -07:00
Heikki Linnakangas
f69db17409 Make WAL safekeeper work with zenith timelines 2021-04-20 19:11:29 +03:00
Heikki Linnakangas
3600b33f1c Implement "timelines" in page server
This replaces the page server's "datadir" concept. The Page Server now
always works with a "Zenith Repository". When you initialize a new
repository with "zenith init", it runs initdb and loads an initial
basebackup of the freshly-created cluster into the repository, on "main"
branch. Repository can hold multiple "timelines", which can be given
human-friendly names, making them "branches". One page server simultaneously
serves all timelines stored in the repository, and you can have multiple
Postgres compute nodes connected to the page server, as long they all
operate on a different timeline.

There is a new command "zenith branch", which can be used to fork off
new branches from existing branches.

The repository uses the directory layout desribed as Repository format
v1 in https://github.com/zenithdb/rfcs/pull/5. It it *highly* inefficient:
- we never create new snapshots. So in practice, it's really just a base
  backup of the initial empty cluster, and everything else is reconstructed
  by redoing all WAL

- when you create a new timeline, the base snapshot and *all* WAL is copied
  from the new timeline to the new one. There is no smarts about
  referencing the old snapshots/wal from the ancestor timeline.

To support all this, this commit includes a bunch of other changes:

- Implement "basebackup" funtionality in page server. When you initialize
  a new compute node with "zenith pg create", it connects to the page
  server, and requests a base backup of the Postgres data directory on
  that timeline. (the base backup excludes user tables, so it's not
  as bad as it sounds).

- Have page server's WAL receiver write the WAL into timeline dir. This
  allows running a Page Server and Compute Nodes without a WAL safekeeper,
  until we get around to integrate that properly into the system. (Even
  after we integrate WAL safekeeper, this is perhaps how this will operate
  when you want to run the system on your laptop.)

- restore_datadir.rs was renamed to restore_local_repo.rs, and heavily
  modified to use the new format. It now also restores all WAL.

- Page server no longer scans and restores everything into memory at startup.
  Instead, when the first request is made for a timeline, the timeline is
  slurped into memory at that point.

- The responsibility for telling page server to "callmemaybe" was moved
  into Postgres libpqpagestore code. Also, WAL producer connstring cannot
  be specified in the pageserver's command line anymore.

- Having multiple "system identifiers" in the same page server is no
  longer supported. I repurposed much of that code to support multiple
  timelines, instead.

- Implemented very basic, incomplete, support for PostgreSQL's Extended
  Query Protocol in page_service.rs. Turns out that rust-postgres'
  copy_out() function always uses the extended query protocol to send
  out the command, and I'm using that to stream the base backup from the
  page server.

TODO: I haven't fixed the WAL safekeeper for this scheme, so all the
integration tests involving safekeepers are failing. My plan is to modify
the safekeeper to know about Zenith timelines, too, and modify it to work
with the same Zenith repository format. It only needs to care about the
'.zenith/timelines/<timeline>/wal' directories.
2021-04-20 19:11:27 +03:00
Eric Seppanen
533087fd5d cargo fmt 2021-04-19 23:26:13 -07:00
Konstantin Knizhnik
8879f747ee Add multitenancy test for wal_acceptor 2021-04-19 14:30:42 +03:00
Eric Seppanen
d1d6c968d5 control_plane: add error handling to reading pid files
print file errors to stderr; propagate the io::Error to the caller.
This error isn't handled very gracefully in WalAcceptorNode::drop(),
but there aren't any good options there since drop can't fail.
2021-04-13 14:30:48 -07:00
Stas Kelvich
c5f379bff3 [WIP] Implement CLI pg part 2021-04-13 18:58:22 +03:00
Stas Kelvich
6264dc6aa3 Move control_plane code out of lib.rs and split up control plane
into compute and storage parts.

Before that code was concentrated in lib.rs which was unhandy to
open by name.
2021-04-10 13:56:19 +03:00
Stas Kelvich
59163cf3b3 Rework controle_plane code to reuse it in CLI.
Move all paths from control_plane to local_env which can set them
for testing environment or for local installation.
2021-04-10 12:09:20 +03:00
Konstantin Knizhnik
dc16386639 Fix wal_accetor tests 2021-04-06 15:45:21 +03:00
Stas Kelvich
c0fcbbbe0c Cargo fmt pass over a codebase 2021-04-06 14:42:13 +03:00
Heikki Linnakangas
1367332447 Separate walkeeper and pageserver sources into different directories.
The integration tests, which depend on both walkeeper and pageserver,
are moved into yet another directory, 'integration_tests'.
2021-04-06 13:15:26 +03:00