Commit Graph

703 Commits

Author SHA1 Message Date
Arseny Sher
a8a2f62bc3 imactive 2022-01-11 18:14:59 +03:00
Konstantin Knizhnik
26060dd68e Disable write WAL to files at pageserver 2021-08-31 11:13:55 +03:00
Konstantin Knizhnik
73d823e53c Make it possible for WAL decoder to skip continuation records 2021-08-31 10:59:26 +03:00
Konstantin Knizhnik
112909c5e4 Handle wal records larger than WAL segment size in find_end_of_wal 2021-08-30 17:32:40 +03:00
Konstantin Knizhnik
07adc9dbda Fix unit test for find_end_of_wal 2021-08-27 14:59:07 +03:00
Konstantin Knizhnik
c05cedc626 Do not check cont record for second segment because itcontains dummy checkpoint record 2021-08-27 12:48:28 +03:00
Konstantin Knizhnik
815528e0ce Use last record LSN as flush position reported by safekeepers to walproposer to prevent moving VCL backward on compute node restart 2021-08-26 18:08:29 +03:00
Konstantin Knizhnik
a2e135b404 Maintain safe LSN position at safekeepers 2021-08-25 10:24:45 +03:00
Stas Kelvich
72de70a8cc Change test_restart_compute to expose safekeeper problems 2021-08-25 00:42:08 +03:00
Konstantin Knizhnik
4051c5d4ff Undo some redundant fixes 2021-08-20 12:31:53 +03:00
Konstantin Knizhnik
f86bf26466 Restore icluding postgresql.conf in basebackup 2021-08-20 11:23:57 +03:00
Konstantin Knizhnik
3ca4b638ac Merge with main 2021-08-20 10:55:34 +03:00
Konstantin Knizhnik
d61699b0f8 [refer #439] Fix submodule version 2021-08-19 19:56:49 +03:00
Konstantin Knizhnik
ead94feb05 [refer #439] Correctly handle LSN parameter in BASEBACKUP command 2021-08-19 19:53:22 +03:00
Max Sharnoff
39bb6fb19c Marginally improve walkeeper error visibility (#440)
Adds a warning if a postgres query fails, and some additional context to
errors generated inside `ReceiveWalConn::run`
2021-08-19 08:46:18 -07:00
Dmitry Rodionov
82725725fd update README to match required Rust version and new python package installation process 2021-08-19 17:42:52 +03:00
Alexey Kondratov
1c3d51ed92 Add Docker images building doc and refactor the overall docs reference 2021-08-19 15:12:35 +03:00
Alexey Kondratov
04a309f562 Build zenithdb/zenith:latest in CI (zenithdb/console#18) 2021-08-19 15:12:35 +03:00
anastasia
20e6cd7724 Update test_twophase - check that we correctly restore files at compute node start. 2021-08-19 12:15:09 +03:00
Heikki Linnakangas
9fed5c8fb7 Add test for page server restart. 2021-08-18 20:19:07 +03:00
Dmitry Rodionov
4bce65ff9a bump rust version in ci to 1.52.1 2021-08-17 20:31:28 +03:00
Heikki Linnakangas
3319befc30 Revert a bunch of commits that I pushed by accident
This reverts commits:
  e35a5aa550
  a389c2ed7f
  11ebcb531f
  8d2b61f4d1
  882f549236
  ddb7155bbe

Those were follow-up work on top of PR
https://github.com/zenithdb/zenith/pull/430, but they were still very
much not ready.
2021-08-17 19:20:27 +03:00
Heikki Linnakangas
ddb7155bbe WIP Store base images in separate ImageLayers 2021-08-17 18:55:04 +03:00
Heikki Linnakangas
882f549236 WIP: store base images separately 2021-08-17 18:54:53 +03:00
Heikki Linnakangas
8d2b61f4d1 Move code to handle snapshot filenames 2021-08-17 18:54:53 +03:00
Heikki Linnakangas
11ebcb531f Add Gauge for # of layers 2021-08-17 18:54:53 +03:00
Heikki Linnakangas
a389c2ed7f WIP: Track oldest open layer 2021-08-17 18:54:53 +03:00
Heikki Linnakangas
e35a5aa550 WIP: track mem usage 2021-08-17 18:54:53 +03:00
Heikki Linnakangas
45f641cabb Handle last "open" layer specially in LayerMap.
There can be only one "open" layer for each segment. That's the last one,
implemented by InMemoryLayer. That's the only one where new records can
be appended to. Much of the code needed to distinguish between the last
open layer and other layers anyway, so make the distinction explicit
in LayerMap.
2021-08-17 18:54:51 +03:00
Heikki Linnakangas
48f4a7b886 Refactor get_page_at_lsn() logic to layered_repository.rs
There was a a lot of duplicated code between the get_page_at_lsn()
implementations in InMemoryLayer and SnapshotLayer. Move the code for
requesting WAL redo from the Layer trait into LayeredTimeline. The
get-function in Layer now just returns the WAL records and base image
to the caller, and the caller is responsible for performing the WAL
redo on them.
2021-08-17 18:54:48 +03:00
Heikki Linnakangas
91f72fabc9 Work with smaller segments.
Split each relish into fixed-sized 10 MB segments. Separate layers are
created for each segment. This reduces the write amplification if you
have a large relation and update only parts of it; the downside is
that you have a lot more files. The 10 MB is just a guess, we should
do some modeling and testing in the future to figure out the optimal
size.

Each segment tracks the size of the segment separately. To figure out
the total size of a relish, you need to loop through the segment to
find the highest segment that's in use. That's a bit inefficient, but
will do for now. We might want to add a cache or something later.
2021-08-17 18:54:41 +03:00
anastasia
cbeb67067c Issue #367.
Change CLI so that we always create node from scratch at 'pg start'.
This operation preserve previously existing config

Add new flag '--config-only' to 'pg create'.
If this flag is passed, don't perform basebackup, just fill initial postgresql.conf for the node.
2021-08-17 18:12:31 +03:00
anastasia
921ec390bc cargo fmt 2021-08-16 19:41:07 +03:00
Heikki Linnakangas
f37cb21305 Update Cargo.lock for addition of 'bincode'
Commit 5eb1738e8b added a dependency to the 'bincode' crate. 'cargo build'
adds it to Cargo.lock automatically, so let's remember it.
2021-08-16 19:24:26 +03:00
Heikki Linnakangas
7ee8de3725 Add metrics to WAL redo.
Track the time spent on replaying WAL records by the special Postgres
process, the time spent waiting for acces to the Postgres process (since
there is only one per tenant), and the number of records replayed.
2021-08-16 15:49:17 +03:00
Heikki Linnakangas
047a05efb2 Minor formatting and comment fixes. 2021-08-16 15:48:59 +03:00
Dmitry Rodionov
0c4ab80eac try to be more intelligent in WalAcceptor.start, added a bunch of typing sugar to wal acceptor fixtures 2021-08-16 14:27:44 +03:00
Heikki Linnakangas
2450f82de5 Introduce a new "layered" repository implementation.
This replaces the RocksDB based implementation with an approach using
"snapshot files" on disk, and in-memory btreemaps to hold the recent
changes.

This make the repository implementation a configuration option. You can
choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>"
The unit tests have been refactored to exercise both implementations.
'layered' is now the default.

Push/pull is not implemented. The 'test_history_inmemory' test has been
commented out accordingly. It's not clear how we will implement that
functionality; probably by copying the snapshot files directly.
2021-08-16 10:06:48 +03:00
Max Sharnoff
5eb1738e8b Rework walkeeper protocol to use libpq (#366)
Most of the work here was done on the postgres side. There's more
information in the commit message there.
 (see: 04cfa326a5)

On the WAL acceptor side, we're now expecting 'START_WAL_PUSH' to
initialize the WAL keeper protocol. Everything else is mostly the same,
with the only real difference being that protocol messages are now
discrete CopyData messages sent over the postgres protocol.

For the sake of documentation, the full set of these messages is:

  <- recv: START_WAL_PUSH query
  <- recv: server info from postgres   (type `ServerInfo`)
  -> send: walkeeper info              (type `SafeKeeperInfo`)
  <- recv: vote info                   (type `RequestVote`)

  if node id mismatch:
    -> send: self node id (type `NodeId`); exit

  -> send: confirm vote (with node id) (type `NodeId`)

  loop:
    <- recv: info and maybe WAL block  (type `SafeKeeperRequest` + bytes)
         (break loop if done)
    -> send: confirm receipt           (type `SafeKeeperResponse`)
2021-08-13 11:25:16 -07:00
Heikki Linnakangas
6e22a8f709 Refactor WAL redo to not use a separate thread.
My main motivation is to make it easier to attribute time spent in WAL
redo to the request that needed the WAL redo. With this patch, the WAL
redo is performed by the requester thread, so it shows up in stack traces
and in 'perf' report as part of the requester's call stack. This is also
slightly simpler (less lines of code) and should be a bit faster too.
2021-08-13 17:23:36 +03:00
Heikki Linnakangas
f8de71eab0 Update vendor/postgres to fix race condition leading to CRC errors.
Fixes https://github.com/zenithdb/zenith/issues/413
2021-08-13 14:02:26 +03:00
Heikki Linnakangas
8517d9696d Move gc_iteration() function to Repository trait.
The upcoming layered storage implementation handles GC as a
repository-wide operation because it needs to pay attention to the branch
points of all timelines.
2021-08-12 23:46:01 +03:00
Heikki Linnakangas
97f9021c88 Fix JWT token encoding issue in test.
On my laptop, the server was receiving the token as a string with extra
b'...' escaping, e.g as "b'eyJ0....0ifQA'" instead of just "eyJ0....0ifQA".
That was causing the test to fail.

I'm using Python 3.9, while the CI is using Python 3.8. I suspect that's
why. My version of pyjwt might be different too.

See also https://github.com/jpadilla/pyjwt/issues/391.
2021-08-12 20:46:14 +03:00
Heikki Linnakangas
0a92b31496 If a pg_regress test fails in CI, save regression.diffs 2021-08-12 18:39:23 +03:00
anastasia
6c3726913f Introduce check for physical relishes.
They represent files and use RelationSizeEntry to track existing and dropped files.
They can be both blocky and non-blocky.
get_relish_size() and get_rel_exists() functions work with physical relishes, not only with blocky ones.
2021-08-12 14:42:21 +03:00
anastasia
1bfade8adc Issue #330. Use put_unlink for twophase relishes.
Follow PostgreSQL logic: remove Twophase files when prepared transaction is committed/aborted.

Always store Twophase segments as materialized page images (no wal records).
2021-08-12 14:42:21 +03:00
anastasia
4eebe22fbb cargo fmt 2021-08-12 14:42:21 +03:00
Heikki Linnakangas
20d5e757ca Remove now-unused get_next_tag function.
The only caller was removed by commit c99a211b01.
2021-08-11 22:16:38 +03:00
Heikki Linnakangas
70cb399d59 Add convenience function to create a RowDescriptor message for an int8 col.
Makes the code to construct a result set a bit more terse and readable.
2021-08-11 20:17:33 +03:00
Dmitry Rodionov
ce5333656f Introduce authentication v0.1.
Current state with authentication.
Page server validates JWT token passed as a password during connection
phase and later when performing an action such as create branch tenant
parameter of an operation is validated to match one submitted in token.
To allow access from console there is dedicated scope: PageServerApi,
this scope allows access to all tenants. See code for access validation in:
PageServerHandler::check_permission.

Because we are in progress of refactoring of communication layer
involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper
now doesn’t check token passed from compute, and uses “hardcoded” token
passed via environment variable to communicate with pageserver.

Compute postgres now takes token from environment variable and passes it
as a password field in pageserver connection. It is not passed through
settings because then user will be able to retrieve it using pg_settings
or SHOW ..

I’ve added basic test in test_auth.py. Probably after we add
authentication to remaining network paths we should enable it by default
and switch all existing tests to use it.
2021-08-11 20:05:54 +03:00