96 Commits

Author SHA1 Message Date
Max Sharnoff
5eb1738e8b Rework walkeeper protocol to use libpq (#366)
Most of the work here was done on the postgres side. There's more
information in the commit message there.
 (see: 04cfa326a5)

On the WAL acceptor side, we're now expecting 'START_WAL_PUSH' to
initialize the WAL keeper protocol. Everything else is mostly the same,
with the only real difference being that protocol messages are now
discrete CopyData messages sent over the postgres protocol.

For the sake of documentation, the full set of these messages is:

  <- recv: START_WAL_PUSH query
  <- recv: server info from postgres   (type `ServerInfo`)
  -> send: walkeeper info              (type `SafeKeeperInfo`)
  <- recv: vote info                   (type `RequestVote`)

  if node id mismatch:
    -> send: self node id (type `NodeId`); exit

  -> send: confirm vote (with node id) (type `NodeId`)

  loop:
    <- recv: info and maybe WAL block  (type `SafeKeeperRequest` + bytes)
         (break loop if done)
    -> send: confirm receipt           (type `SafeKeeperResponse`)
2021-08-13 11:25:16 -07:00
Dmitry Rodionov
ce5333656f Introduce authentication v0.1.
Current state with authentication.
Page server validates JWT token passed as a password during connection
phase and later when performing an action such as create branch tenant
parameter of an operation is validated to match one submitted in token.
To allow access from console there is dedicated scope: PageServerApi,
this scope allows access to all tenants. See code for access validation in:
PageServerHandler::check_permission.

Because we are in progress of refactoring of communication layer
involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper
now doesn’t check token passed from compute, and uses “hardcoded” token
passed via environment variable to communicate with pageserver.

Compute postgres now takes token from environment variable and passes it
as a password field in pageserver connection. It is not passed through
settings because then user will be able to retrieve it using pg_settings
or SHOW ..

I’ve added basic test in test_auth.py. Probably after we add
authentication to remaining network paths we should enable it by default
and switch all existing tests to use it.
2021-08-11 20:05:54 +03:00
Arseny Sher
5f0fd093d7 Revert "Walkeeper safe info (#408)"
Temporary revert commit 0ee2e16b17 as it leads to
safekeeper state deserialization failure. Let's sort that out and get it back.
2021-08-11 16:26:35 +03:00
Konstantin Knizhnik
0ee2e16b17 Walkeeper safe info (#408)
* Align prev record CRC on 8-bytes boundary

* Upadate safekeeper in-memory status on receiving message from WAL proposer
2021-08-11 09:14:05 +03:00
Heikki Linnakangas
e59e0ae2dc Clarify the terms "WAL service", "safekeeper", "proposer" 2021-08-05 10:27:56 +03:00
Arseny Sher
cc3ac2b74c Allow safekeeper to stream till real end of wal.
Otherwise it prematurely terminates, e.g. in test_compute_restart.

ref #388
2021-08-04 18:03:43 +03:00
Arseny Sher
b77fade7b8 Look up wal directory properly in all find_end_of_wal callers.
ref #388
2021-08-04 14:15:07 +03:00
Stas Kelvich
56565c0f58 look up WAL in right directory 2021-08-04 14:15:07 +03:00
Dmitry Ivanov
ed634ec320 Extract message processing function from PostgresBackend's event loop
This patch has been extracted from #348, where it became unnecessary
after we had decided that we didn't want to measure anything inside
PostgresBackend.

IMO the change is good enough to make its way into the codebase,
even though it brings nothing "new" to the code.
2021-08-04 10:49:02 +03:00
Dmitry Rodionov
767590bbd5 support tenants
this patch adds support for tenants. This touches mostly pageserver.
Directory layout on disk is changed to contain new layer of indirection.
Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant
id>. Tenant id has the same format as timeline id. Tenant id is included in
pageserver commands when needed. Also new commands are available in
pageserver: tenant_list, tenant_create. This is also reflected CLI.
During init default tenant is created and it's id is saved in CLI config,
so following commands can use it without extra options. Tenant id is also included in
compute postgres configuration, so it can be passed via ServerInfo to
safekeeper and in connection string to pageserver.
For more info see docs/multitenancy.md.
2021-07-22 20:54:20 +03:00
Dmitry Ivanov
8b656bad5f Add a missing [cfg(test)]
We don't always need to compile tests.
2021-07-22 16:46:27 +03:00
Dmitry Ivanov
97329d4906 Add a test for EOF in walkeeper's background thread
It would be nice to have a proper Timeline mock api,
but this time we'll get by with what we have.
2021-07-22 12:12:55 +03:00
Dmitry Ivanov
6a3b9b1d46 Fix accidental busyloop in walkeeper's background thread
It used to be the case that walkeeper's background thread
failed to recognize the end of stream (EOF) signaled by the
`Ok(None)` result of `FeMessage::read`.
2021-07-22 12:12:55 +03:00
Arseny Sher
fe17188464 Alternative way to truncate behind-the-vcl part of log.
Which is important to do before bumping epoch.
2021-07-21 17:27:05 +03:00
Arseny Sher
51b50f5cf5 Fix truncating the wal after VCL. 2021-07-21 17:27:05 +03:00
Arseny Sher
9e3fe2b4d4 Truncate not matching part of log.
ref #296
2021-07-21 17:27:05 +03:00
Arseny Sher
eb1618f2ed TLA+ specification of proposer-acceptor consensus protocol.
And .cfg file for running TLC.

ref #293
2021-07-21 17:27:05 +03:00
Stas Kelvich
bf45bef284 md5 auth for postgres_backend.rs 2021-07-19 14:52:41 +03:00
Dmitry Rodionov
ed0fcfa9b7 replace parse_duration crate because of unpatched known vulnerability
resolves #87
2021-07-16 14:30:27 +03:00
Heikki Linnakangas
befefe8d84 Run 'cargo fmt'.
Fixes a few formatting discrepancies had crept in recently.
2021-07-14 22:03:14 +03:00
Dmitry Rodionov
75e717fe86 allow both domains and ip addresses in connection options for
pageserver and wal keeper. Also updated PageServerNode definition in
control plane to account for that. resolves #303
2021-07-09 16:46:21 +03:00
Stas Kelvich
4987d5ee1f reduce lodding in wal_acceptor 2021-07-09 16:45:48 +03:00
Konstantin Knizhnik
226204094a Fix recall parmeter handling in walkeeper 2021-06-25 09:43:55 +03:00
Arseny Sher
f923464b93 Remove pq_protocol.rs.
I forgot to do that in b2f51026aa.
2021-06-16 18:52:36 +03:00
Stas Kelvich
19602dc88a add wal_acceptor binary in Dockerfile 2021-06-14 11:58:53 +03:00
Stas Kelvich
c3011359ab remove --systemid from walkeeper 2021-06-14 11:58:53 +03:00
Arseny Sher
b2f51026aa Consolidate PG proto parsing-deparsing and backend code.
Now postgres_backend communicates with the client, passing queries to the
provided handler; we have two currently, for wal_acceptor and pageserver.

Now BytesMut is again used for writing data to avoid manual message length
calculation.

ref #118
2021-06-08 17:31:40 +03:00
anastasia
0c74f6fa4e Update README about source tree layout 2021-06-01 19:38:42 +03:00
Heikki Linnakangas
fc01fae9b4 Remove leftover references to safekeeper_proxy.
We don't use it anymore. The WAL proposer is now a background worker that
runs as part of the primary Postgres server.
2021-06-01 18:50:24 +03:00
Konstantin Knizhnik
1aceea1bdd Shutdown socket in ReplicationConn 2021-05-31 21:37:07 +03:00
Konstantin Knizhnik
e0cc4dee4f [refer #182] Make walkeeper periodically send callme requests to pageserver 2021-05-31 21:37:07 +03:00
Heikki Linnakangas
6b615cbde1 Remove Copy marker from large ServerInfo struct.
We don't want to encourage passing it by value. Doesn't matter much in
practice, but let's be tidy.

Per discussion at https://github.com/zenithdb/zenith/pull/195#issuecomment-849897327
2021-05-27 23:16:54 +03:00
Heikki Linnakangas
6a9c036ac1 Revert all changes related to storing and restoring non-rel data in page server
This includes the following commits:

35a1c3d521 Specify right LSN in test_createdb.py
d95e1da742 Fix issue with propagation of CREATE DATABASE to the branch
8465738aa5 [refer #167] Fix handling of pg_filenode.map files in page server
86056abd0e Fix merge conflict: set initial WAL position to second segment because of pg_resetwal
2bf2dd1d88 Add nonrelfile_utils.rs file
20b6279beb Fix restoring non-relational data during compute node startup
06f96f9600 Do not transfer WAL to computation nodes: use pg_resetwal for node startup

As well as some older changes related to storing CLOG and MultiXact data as
"pseudorelation" in the page server.

With this revert, we go back to the situtation that when you create a
new compute node, we ship *all* the WAL from the beginning of time to
the compute node. Obviously we need a better solution, like the code
that this reverts. But per discussion with Konstantin and Stas, this
stuff was still half-baked, and it's better for it to live in a branch
for now, until it's more complete and has gone through some review.
2021-05-24 16:05:45 +03:00
Eric Seppanen
4aabc9a682 easy clippy cleanups
Various things that clippy complains about, and are really easy to
fix.
2021-05-23 13:17:15 -07:00
Konstantin Knizhnik
86056abd0e Fix merge conflict: set initial WAL position to second segment because of pg_resetwal 2021-05-20 15:26:39 +03:00
Konstantin Knizhnik
3645133700 Fix conflicts with main branch 2021-05-20 14:39:27 +03:00
Eric Seppanen
4c35b22626 Remove FIXME about buffer pools
If I'm not going to do anything about it soon, it's not worth keeping
this comment.
2021-05-19 14:36:41 -07:00
Eric Seppanen
9fe3b73e13 walkeeper replication: remove the lock from the send stream.
I originally thought there would be multiple threads sending here, but
that's not currently the case, so remove the lock.
2021-05-19 14:36:41 -07:00
Eric Seppanen
e0146304e6 timeline: make SharedState and some constructors private
This was pointed out in code review: no need for these to be public.
2021-05-19 14:36:41 -07:00
Eric Seppanen
fbb04c592a wal_service: change error message at thread exit
Because many errors are propagated to this point, use a better message
than "socket error".
2021-05-19 14:36:41 -07:00
Eric Seppanen
8f43d7637c wal_service: move code around some more
Move ReceiveWalConn into its own file. Shuffle constants around so they
are close to the protocol they're associated with, or move them into
postgres_ffi if they seem to be global constants.
2021-05-19 14:36:41 -07:00
Eric Seppanen
cf30303d8f extract protocol peek code; rename Connection -> ReceiveWalConn
It may be more robust to use the TcpStream::peek function, so do all
protocol peeking before creating the protocol object. This reveals the
next cleanup step: rename Connection, since it's no longer the parent of
SendWalConn. Now we peek at the first bytes and choose which kind of
connection object to create.
2021-05-19 14:36:41 -07:00
Eric Seppanen
3296b7d770 wal_service: permit I/O errors while reading control file
I'm not sure why ignoring this error is a good idea, but the
test_embedded_wal_proposer test fails if we propagate the error upward.
2021-05-19 14:36:41 -07:00
Eric Seppanen
2148ae78ab wal_service: remove manual output buffering
Serialize objects directly to the stream. This allows us to remove a
bunch of buffer management code, along with the NewSerializer trait that
was a temporary bridge between the old code and the new.
2021-05-19 14:36:41 -07:00
Eric Seppanen
78dcf2207e replace manual deserialization with serde + BeSer
This struct is a little awkward, because in other places it is
serialized/deserialized as little-endian, but here it's big-endian.
2021-05-19 14:36:41 -07:00
Eric Seppanen
74b78608d9 split timeline code into a separate file 2021-05-19 14:36:41 -07:00
Eric Seppanen
a11558b84f break wal_service into multiple files
+ misc cleanups
2021-05-19 14:36:41 -07:00
Eric Seppanen
513696a485 break wal_service into multiple pieces
The pieces are:
base Connection
SendWal
ReplicationHandler

There are lots of other changes here:
- Put the replication reader in a background thread; this gets rid
  of some hacks with nonblocking mode.
- Stop manually buffering input data; use BufReader instead.
- Use BytesMut a lot less; use Read/Write traits where possible.
2021-05-19 14:36:41 -07:00
Eric Seppanen
cedc2eb5c2 wal_service: add BufReader
If we try to read a few bytes at a time, we will perform a lot more
syscalls than necessary. Wrap the socket in a BufReader, which will
buffer bytes as needed.
2021-05-19 14:36:41 -07:00
Stas Kelvich
746f667311 Refactor CLI and CLI<->pageserver interfaces to support remote pageserver
This patch started as an effort to support CLI working against remote
pageserver, but turned into a pretty big refactoring.

* CLI now does not look into repository files directly. New commands
'branch_create' and 'identify_system' were introduced into page_service to
support that.
* Branch management that was scattered between local_env and
zenith/main.rs is moved into pageserver/branches.rs. That code could better fit
in Repository/Timeline impl, but I'll leave that for a different patch.
* All tests-related code from local_env went into integration_tests/src/lib.rs as an
extension to PostgresNode trait.
* Paths-generating functions were concentrated around corresponding config
types (LocalEnv and PageserverConf).
2021-05-17 19:17:51 +03:00