Commit Graph

64 Commits

Author SHA1 Message Date
Patrick Insinger
c33faf98d1 zenith_utils - box BidiStream::Tls variant
Clippy warns that one variant is 40 bytes and the other is 568 bytes.
Box the larger variant to avoid this warning
2021-09-07 15:12:39 +03:00
Dmitry Rodionov
95453bc4af fix clippy warnings 2021-09-07 15:12:39 +03:00
Kirill Bulatov
3a37877edc Fix some typos 2021-09-07 15:12:39 +03:00
Heikki Linnakangas
2145ec5fe8 Fix infinite loop with forced repository checkpoint.
To fix, break out of the loop when you reach an in-memory layer that was
created after the checkpoint started. To do that, add a "generation"
counter into the layer map.

Fixes https://github.com/zenithdb/zenith/issues/494
2021-09-07 15:12:39 +03:00
Stas Kelvich
59c19d6e18 Rework basebackup.
* add lsn argument
* do not expose wait_lsn, wait inside list_nonrels()
* fix parameters parsing
* expose get_last_record_rlsn() to atomically read (last,prev) pair

More work is needed to correctly handle basebackup@old_lsn but current
approach already allows to fix test_restart_compute
2021-09-02 12:06:12 +03:00
Stas Kelvich
8c07a36fda Remove last_valid_lsn tracking in wal_receiver.
There are two main reasons for that:

a) Latest unfinished record may disapper after compute node restart, so let's
    try not leak volatile part of the WAL into the repository. Always use
    last_valid_record instead.

    That change requires different getPage@LSN logic in postgres -- we need
    to ask LSN's that point to some complete record instead of GetFlushRecPtr()
    that can point in the middle of the record. That was already done by @knizhnik
    to deal with the same problem during the work on `postgres --sync-safekeepers`.

    Postgres will use LSN's aligned on 0x8 boundary in get_page requests, so we
    also need to be sure that last_valid_record is aligned.

b) Switch to get_last_record_lsn() in basebackup@no_lsn. When compute node
    is running without safekeepers and streams WAL directly
    to pageserver it is important to match basebackup LSN and LSN of replication
    start. Before this commit basebackup@no_lsn was waiting for last_valid_lsn
    and walreceiver started replication with last_record_lsn, which can be less.
    So replication was failing since compute node doesn't have requested WAL.
2021-09-02 12:06:12 +03:00
Kirill Bulatov
212920e47e Collect and expose I/O disk write metrics 2021-09-02 11:33:00 +03:00
Patrick Insinger
5ac3cb1c72 TLS for postgres_backend and proxy
Add TLS support to `postgres_backend`.
Implement this support in `proxy`.
Other applications must opt-in and provide a `rustls::ServerConfig`.
2021-09-01 10:29:19 -07:00
Konstantin Knizhnik
beaa2cd0a2 Handle COPY error 2021-08-26 13:53:10 +03:00
Dmitry Rodionov
b135723994 review adjustments 2021-08-24 19:05:00 +03:00
Dmitry Rodionov
23b5249512 translate pageserver api to http 2021-08-24 19:05:00 +03:00
Max Sharnoff
39bb6fb19c Marginally improve walkeeper error visibility (#440)
Adds a warning if a postgres query fails, and some additional context to
errors generated inside `ReceiveWalConn::run`
2021-08-19 08:46:18 -07:00
Heikki Linnakangas
2450f82de5 Introduce a new "layered" repository implementation.
This replaces the RocksDB based implementation with an approach using
"snapshot files" on disk, and in-memory btreemaps to hold the recent
changes.

This make the repository implementation a configuration option. You can
choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>"
The unit tests have been refactored to exercise both implementations.
'layered' is now the default.

Push/pull is not implemented. The 'test_history_inmemory' test has been
commented out accordingly. It's not clear how we will implement that
functionality; probably by copying the snapshot files directly.
2021-08-16 10:06:48 +03:00
Max Sharnoff
5eb1738e8b Rework walkeeper protocol to use libpq (#366)
Most of the work here was done on the postgres side. There's more
information in the commit message there.
 (see: 04cfa326a5)

On the WAL acceptor side, we're now expecting 'START_WAL_PUSH' to
initialize the WAL keeper protocol. Everything else is mostly the same,
with the only real difference being that protocol messages are now
discrete CopyData messages sent over the postgres protocol.

For the sake of documentation, the full set of these messages is:

  <- recv: START_WAL_PUSH query
  <- recv: server info from postgres   (type `ServerInfo`)
  -> send: walkeeper info              (type `SafeKeeperInfo`)
  <- recv: vote info                   (type `RequestVote`)

  if node id mismatch:
    -> send: self node id (type `NodeId`); exit

  -> send: confirm vote (with node id) (type `NodeId`)

  loop:
    <- recv: info and maybe WAL block  (type `SafeKeeperRequest` + bytes)
         (break loop if done)
    -> send: confirm receipt           (type `SafeKeeperResponse`)
2021-08-13 11:25:16 -07:00
Heikki Linnakangas
70cb399d59 Add convenience function to create a RowDescriptor message for an int8 col.
Makes the code to construct a result set a bit more terse and readable.
2021-08-11 20:17:33 +03:00
Dmitry Rodionov
ce5333656f Introduce authentication v0.1.
Current state with authentication.
Page server validates JWT token passed as a password during connection
phase and later when performing an action such as create branch tenant
parameter of an operation is validated to match one submitted in token.
To allow access from console there is dedicated scope: PageServerApi,
this scope allows access to all tenants. See code for access validation in:
PageServerHandler::check_permission.

Because we are in progress of refactoring of communication layer
involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper
now doesn’t check token passed from compute, and uses “hardcoded” token
passed via environment variable to communicate with pageserver.

Compute postgres now takes token from environment variable and passes it
as a password field in pageserver connection. It is not passed through
settings because then user will be able to retrieve it using pg_settings
or SHOW ..

I’ve added basic test in test_auth.py. Probably after we add
authentication to remaining network paths we should enable it by default
and switch all existing tests to use it.
2021-08-11 20:05:54 +03:00
anastasia
5dd9a66f9e Move postgres backend messages to trace level 2021-08-10 14:26:28 +03:00
Stas Kelvich
fa04096733 cargo fmt pass 2021-08-04 23:51:02 +03:00
Dmitry Ivanov
754892402c Enable full feature set for hyper in zenith_utils
Server functionality requires not only the "server" feature flag, but
also either "http1" or "http2" (or both). To make things simpler
(and prevent analogous problems), enable all features.
2021-08-04 21:41:17 +03:00
Dmitry Ivanov
ed634ec320 Extract message processing function from PostgresBackend's event loop
This patch has been extracted from #348, where it became unnecessary
after we had decided that we didn't want to measure anything inside
PostgresBackend.

IMO the change is good enough to make its way into the codebase,
even though it brings nothing "new" to the code.
2021-08-04 10:49:02 +03:00
Dmitry Ivanov
cb1b4a12a6 Add some prometheus metrics to pageserver
The metrics are served by an http endpoint, which
is meant to be spawned in a new thread.

In the future the endpoint will provide more APIs,
but for the time being, we won't bother with proper routing.
2021-08-03 21:42:24 +03:00
Max Sharnoff
3f4815efa2 Correct LeSer doc: "Big Endian" -> "Little Endian" (#362) 2021-07-23 12:38:37 -07:00
Dmitry Ivanov
8b656bad5f Add a missing [cfg(test)]
We don't always need to compile tests.
2021-07-22 16:46:27 +03:00
Dmitry Ivanov
6a3b9b1d46 Fix accidental busyloop in walkeeper's background thread
It used to be the case that walkeeper's background thread
failed to recognize the end of stream (EOF) signaled by the
`Ok(None)` result of `FeMessage::read`.
2021-07-22 12:12:55 +03:00
Stas Kelvich
79d9314ba6 terminate socket explicitly 2021-07-19 14:52:41 +03:00
Stas Kelvich
2b33894e7b few more review fixes 2021-07-19 14:52:41 +03:00
Stas Kelvich
a118557331 review fixes 2021-07-19 14:52:41 +03:00
Stas Kelvich
1b6d99db7c unfreeze client session upon callback 2021-07-19 14:52:41 +03:00
Stas Kelvich
605b90c6c7 do an actual proxy pass 2021-07-19 14:52:41 +03:00
Stas Kelvich
dab34c3dd6 distinguish between new and old users 2021-07-19 14:52:41 +03:00
Stas Kelvich
bf45bef284 md5 auth for postgres_backend.rs 2021-07-19 14:52:41 +03:00
Heikki Linnakangas
befefe8d84 Run 'cargo fmt'.
Fixes a few formatting discrepancies had crept in recently.
2021-07-14 22:03:14 +03:00
Konstantin Knizhnik
ad92b66eed Fix TimestampTz type to i64 to be compatbile with Postgres 2021-07-14 15:55:12 +03:00
Dmitry Rodionov
75e717fe86 allow both domains and ip addresses in connection options for
pageserver and wal keeper. Also updated PageServerNode definition in
control plane to account for that. resolves #303
2021-07-09 16:46:21 +03:00
Eric Seppanen
d2d5a01522 minor clippy fixes 2021-06-15 10:52:11 -07:00
Arseny Sher
b2f51026aa Consolidate PG proto parsing-deparsing and backend code.
Now postgres_backend communicates with the client, passing queries to the
provided handler; we have two currently, for wal_acceptor and pageserver.

Now BytesMut is again used for writing data to avoid manual message length
calculation.

ref #118
2021-06-08 17:31:40 +03:00
Konstantin Knizhnik
874d82fd4c Fix tests in lsn.rs after changing wal_seg_size type 2021-05-20 14:45:09 +03:00
Konstantin Knizhnik
06f96f9600 Do not transfer WAL to computation nodes: use pg_resetwal for node startup 2021-05-20 14:13:47 +03:00
Eric Seppanen
1ec157653e bin_ser: expand serialize error type, add serialized_size 2021-05-19 14:36:41 -07:00
Eric Seppanen
858ca3a4ce bin_ser: simplify ser_into_slice
The conversion of &mut [u8] into Write is a little tricky.

Also, remove an unused generic parameter.
2021-05-19 14:36:41 -07:00
Eric Seppanen
d744ddee7c bin_ser: preserve IO errors on deserialization
We're starting to deserialize directly from the TcpStream now, which
means that a socket error gets logged as "deserialize error". That's not
very helpful; preserve the io::Error so it can be logged.
2021-05-19 14:36:41 -07:00
Eric Seppanen
513696a485 break wal_service into multiple pieces
The pieces are:
base Connection
SendWal
ReplicationHandler

There are lots of other changes here:
- Put the replication reader in a background thread; this gets rid
  of some hacks with nonblocking mode.
- Stop manually buffering input data; use BufReader instead.
- Use BytesMut a lot less; use Read/Write traits where possible.
2021-05-19 14:36:41 -07:00
Eric Seppanen
71e93faed7 fix endian typos in BeSer
Cut/paste error: BeSer was using the little-endian config in two places.

Add better unit tests so this can't happen again.
2021-05-13 19:04:17 -07:00
Eric Seppanen
e5df42feef add workspace_hack dependency to zenith_utils
I didn't think this mattered, but it does: if you add a dependency to
zenith_utils, but forget to request a feature you need, the crate will
build from the workspace root, but not by itself.

It's probably better to pull in the whole dependency tree.

This leaves one problem unsolved: the missing feature above will now be
a latent bug. If that feature gets removed later by other crates, and
then the workspace_hack Cargo.toml is updated, this missing feature will
become a build failure.
2021-05-10 18:21:45 -07:00
Eric Seppanen
60d66267a9 add serde support to Lsn type
A serialized Lsn and a serialized u64 should be identical.
2021-05-10 16:21:05 -07:00
Eric Seppanen
36c12247b9 add bin_ser module
This module adds two traits that implement bincode-based serialization.
BeSer implements methods for big-endian encoding/decoding.
LeSer implements methods for little-endian encoding/decoding.

Right now, the BeSer and LeSer methods have the same names, meaning you
can't `use` them both at the same time. This is intended to be a safety
mechanism: mixing big-endian and little-endian encoding in the same file
is error-prone. There are ways around this, but the easiest fix is to
put the big-endian code and little-endian code in different files or
submodules.
2021-05-10 16:21:05 -07:00
anastasia
1591f058c6 implement Debug for Lsn type 2021-05-05 16:38:32 +03:00
Heikki Linnakangas
96beffb3c5 Add tests for the Lsn::fetch_max function. 2021-04-27 13:43:39 +03:00
Heikki Linnakangas
cff671c1bd Remove duplicated LSN fields from the page cache.
Having multiple copies of the same values is a source of confusion.
Commit da9bf5dc63 fixed one race condition caused by that, for example.
See also discussion at
https://github.com/zenithdb/zenith/issues/57#issuecomment-824393470

This changes SeqWait.advance() to return the old number, and not panic if
you try to move the value backwards. The caller should check for that and
act accordingly.
2021-04-27 10:32:39 +03:00
Konstantin Knizhnik
3b09a74f58 Implement offloading of old WAL files to S3 in walkeeper 2021-04-26 16:23:00 +03:00