Commit Graph

75 Commits

Author SHA1 Message Date
Heikki Linnakangas
feae7f39c1 Support read-only nodes
Change 'zenith.signal' file to a human-readable format, similar to
backup_label. It can contain a "PREV LSN: %X/%X" line, or a special
value to indicate that it's OK to start with invalid LSN ('none'), or
that it's a read-only node and generating WAL is forbidden
('invalid').

The 'zenith pg create' and 'zenith pg start' commands now take a node
name parameter, separate from the branch name. If the node name is not
given, it defaults to the branch name, so this doesn't break existing
scripts.

If you pass "foo@<lsn>" as the branch name, a read-only node anchored
at that LSN is created. The anchoring is performed by setting the
'recovery_target_lsn' option in the postgresql.conf file, and putting
the server into standby mode with 'standby.signal'.

We no longer store the synthetic checkpoint record in the WAL segment.
The postgres startup code has been changed to use the copy of the
checkpoint record in the pg_control file, when starting in zenith
mode.
2021-10-19 09:48:12 +03:00
Heikki Linnakangas
c2b468c958 Separate node name from the branch name in ComputeControlPlane
This is in preparation for supporting read-only nodes. You can launch
multiple read-only nodes on the same brach, so we need an identifier
for each node, separate from the branch name.
2021-10-19 09:48:10 +03:00
Max Sharnoff
84f7dcd052 Fix clippy errors on nightly (2021-09-29) (#691)
Most of the changes are for the new if-then-panic lint added in
https://github.com/rust-lang/rust-clippy/pull/7669.
2021-10-01 15:45:42 -07:00
Heikki Linnakangas
d4eed61f57 Refactor code for parsing and creating postgresql.conf.
There's surely more that could be done, but this makes it a bit more
readable at least.
2021-09-23 19:34:27 +03:00
Kirill Bulatov
7dda9f2894 Fix clippy lints and enable clippy checking in CI 2021-09-16 15:09:16 +03:00
Dmitry Rodionov
4ebe643d0c Support parallel test running for python tests
Support is done via pytest-xdist plugin.
To use the feature add -n<concurrency> to pytest invocation
e.g. pytest -n8 to run 8 tests in parallel.

Changes in code are mostly about ports assigning. Previously port for
pageserver was hardcoded without the ability to override through zenith
cli and ports for started compute nodes were calculated twice, in zenith
cli and in test code. Now zenith cli supports port arguments for
pageserver and compute nodes to be passed explicitly.

Tests are modified in such a way that each worker gets a non overlapping
port range which can be configured and now contains 100 ports. These
ports are distributed to test services (pageserver, wal acceptors,
compute nodes) so they can work independently.
2021-09-15 14:02:15 +03:00
Arseny Sher
0aec60938a Make flush_lsn reported by safekeepers point to record boundary.
Otherwise we produce corrupted record holes in WAL during compute node restart
in case there was an unfinished record from the old compute, as these reports
advance commit_lsn -- reliably persisted part of WAL.

ref #549.

Mostly by @knizhnik. I adjusted to make sure proposer always starts streaming
since record beginning so we don't need special quirks for decoding in
safekeeper.
2021-09-11 06:10:10 +03:00
anastasia
3d17255400 Add comment to 'pg stop' changes 2021-09-08 14:12:00 +03:00
anastasia
5488ce8834 Change CLI command 'pg stop' to avoid races in tests.
Stop postgres immediately only when destroy option is used. Otherwise, use default shutdown mode (fast).
2021-09-08 14:12:00 +03:00
Stas Kelvich
ed4eed0a19 Make use of postgres --sync-safekeepers in tests and CLI.
Change control plane code to call `postgres --sync-safekeepers` before
compute node start when safekeepers are enabled. Now `pg create` will
create an empty data directory with the proper config file. Subsequent
`pg start` will run `sync-safekeepers` and will call basebackup with
the resulting LSN. Also change few tests to accommodate this new behavior.
2021-09-06 13:06:20 +03:00
Alexey Kondratov
7e7b31a626 Extract basebackup directly from the CopyOutReader
Do not fetch it into the intermediate buffer.
2021-08-27 19:46:51 +03:00
Konstantin Knizhnik
beaa2cd0a2 Handle COPY error 2021-08-26 13:53:10 +03:00
Dmitry Rodionov
23b5249512 translate pageserver api to http 2021-08-24 19:05:00 +03:00
anastasia
cbeb67067c Issue #367.
Change CLI so that we always create node from scratch at 'pg start'.
This operation preserve previously existing config

Add new flag '--config-only' to 'pg create'.
If this flag is passed, don't perform basebackup, just fill initial postgresql.conf for the node.
2021-08-17 18:12:31 +03:00
Dmitry Rodionov
ce5333656f Introduce authentication v0.1.
Current state with authentication.
Page server validates JWT token passed as a password during connection
phase and later when performing an action such as create branch tenant
parameter of an operation is validated to match one submitted in token.
To allow access from console there is dedicated scope: PageServerApi,
this scope allows access to all tenants. See code for access validation in:
PageServerHandler::check_permission.

Because we are in progress of refactoring of communication layer
involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper
now doesn’t check token passed from compute, and uses “hardcoded” token
passed via environment variable to communicate with pageserver.

Compute postgres now takes token from environment variable and passes it
as a password field in pageserver connection. It is not passed through
settings because then user will be able to retrieve it using pg_settings
or SHOW ..

I’ve added basic test in test_auth.py. Probably after we add
authentication to remaining network paths we should enable it by default
and switch all existing tests to use it.
2021-08-11 20:05:54 +03:00
anastasia
14b6796915 Send pgdata subdirs with basebackup. Fix for 1e6267a. 2021-07-25 17:46:47 +03:00
anastasia
1e6267a35f Get rid of snapshot directory + related code cleanup and refactoring.
- Add new subdir postgres_ffi/samples/ for config file samples.
- Don't copy wal to the new branch on zenith init or zenith branch.
- Import_timeline_wal on zenith init.
2021-07-23 13:21:45 +03:00
Dmitry Rodionov
767590bbd5 support tenants
this patch adds support for tenants. This touches mostly pageserver.
Directory layout on disk is changed to contain new layer of indirection.
Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant
id>. Tenant id has the same format as timeline id. Tenant id is included in
pageserver commands when needed. Also new commands are available in
pageserver: tenant_list, tenant_create. This is also reflected CLI.
During init default tenant is created and it's id is saved in CLI config,
so following commands can use it without extra options. Tenant id is also included in
compute postgres configuration, so it can be passed via ServerInfo to
safekeeper and in connection string to pageserver.
For more info see docs/multitenancy.md.
2021-07-22 20:54:20 +03:00
Stas Kelvich
a17b2a4364 reflect postgres superuser changes in pageserver->compute connstring 2021-07-21 17:22:22 +03:00
Konstantin Knizhnik
eb0a56eb22 Replay non-relational WAL records on page server 2021-07-16 18:43:07 +03:00
Heikki Linnakangas
befefe8d84 Run 'cargo fmt'.
Fixes a few formatting discrepancies had crept in recently.
2021-07-14 22:03:14 +03:00
Dmitry Rodionov
75e717fe86 allow both domains and ip addresses in connection options for
pageserver and wal keeper. Also updated PageServerNode definition in
control plane to account for that. resolves #303
2021-07-09 16:46:21 +03:00
Arseny Sher
7c5532303e Preserve wal acceptor logs in CI.
And generally make removal of everything-but-logs a bit simpler, with files
staying in place.

Also renames postgres log from 'log' to 'pg.log'.
2021-06-16 14:45:43 +03:00
Arseny Sher
37b0236e9a Move wal acceptor tests to python.
Includes fixtures for wal acceptors and associated setup.

Nothing really new here, but surprisingly this caught some issues in
walproposer.

ref #182
2021-06-15 15:14:27 +03:00
Stas Kelvich
bf56ea8c43 Locate postgres binary and libs for 'postgres --wal-redo'
based on POSTGRES_DISTRIB_DIR.
2021-06-09 20:17:27 +03:00
Dmitry Ivanov
bb1446e33a Change behavior of ComputeControlPlane::new_node() (#235)
Previously, transaction commit could happen regardless of whether
pageserver has caught up or not. This patch aims to fix that.

There are two notable changes:

1. ComputeControlPlane::new_node() now sets the
`synchronous_standby_names = 'pageserver'` parameter to delay
transaction commit until pageserver acting as a standby has
fetched and ack'd a relevant portion of WAL.

2. pageserver now has to:
    - Specify the `application_name = pageserver` which matches the
    one in `synchronous_standby_names`.
    - Properly reply with the ack'd LSNs.

This means that some tests don't need sleeps anymore.

TODO: We should probably make this behavior configurable.

Fixes #187.
2021-06-09 11:24:55 +03:00
Heikki Linnakangas
fc01fae9b4 Remove leftover references to safekeeper_proxy.
We don't use it anymore. The WAL proposer is now a background worker that
runs as part of the primary Postgres server.
2021-06-01 18:50:24 +03:00
anastasia
5a73a6fdfc add -w flag to wait till pg_ctl actually finishes what was asked 2021-05-28 20:33:16 +03:00
Stas Kelvich
4608b1ec70 Set wal_log_hints=on
That is mandatory to correctly maintain visibility map (see issue#192).
It also makes sense to check that wal_log_hints is enabled at the pageserver side,
but for now let just check that tests will pass with this on.
2021-05-28 11:38:46 +03:00
Heikki Linnakangas
6a9c036ac1 Revert all changes related to storing and restoring non-rel data in page server
This includes the following commits:

35a1c3d521 Specify right LSN in test_createdb.py
d95e1da742 Fix issue with propagation of CREATE DATABASE to the branch
8465738aa5 [refer #167] Fix handling of pg_filenode.map files in page server
86056abd0e Fix merge conflict: set initial WAL position to second segment because of pg_resetwal
2bf2dd1d88 Add nonrelfile_utils.rs file
20b6279beb Fix restoring non-relational data during compute node startup
06f96f9600 Do not transfer WAL to computation nodes: use pg_resetwal for node startup

As well as some older changes related to storing CLOG and MultiXact data as
"pseudorelation" in the page server.

With this revert, we go back to the situtation that when you create a
new compute node, we ship *all* the WAL from the beginning of time to
the compute node. Obviously we need a better solution, like the code
that this reverts. But per discussion with Konstantin and Stas, this
stuff was still half-baked, and it's better for it to live in a branch
for now, until it's more complete and has gone through some review.
2021-05-24 16:05:45 +03:00
Stas Kelvich
6ad6e5bd84 Add --destroy flag to "pg stop" CLI command 2021-05-21 00:26:31 +03:00
Stas Kelvich
d534aeb9e1 Properly propagate control plane errors to CLI.
That allows to show decent error whenever we try to start already
started postgres.
2021-05-21 00:26:31 +03:00
Konstantin Knizhnik
20b6279beb Fix restoring non-relational data during compute node startup 2021-05-20 14:14:52 +03:00
Konstantin Knizhnik
06f96f9600 Do not transfer WAL to computation nodes: use pg_resetwal for node startup 2021-05-20 14:13:47 +03:00
Stas Kelvich
d59cb2ca7a clean up some leftovers after 746f66731 2021-05-19 22:17:48 +03:00
Stas Kelvich
58f34a8d76 Rework pg subcommand in CLI.
1. Create data directory on start
2. Remove distinct pg names, now pg name == branch name.
2021-05-19 22:17:48 +03:00
Stas Kelvich
746f667311 Refactor CLI and CLI<->pageserver interfaces to support remote pageserver
This patch started as an effort to support CLI working against remote
pageserver, but turned into a pretty big refactoring.

* CLI now does not look into repository files directly. New commands
'branch_create' and 'identify_system' were introduced into page_service to
support that.
* Branch management that was scattered between local_env and
zenith/main.rs is moved into pageserver/branches.rs. That code could better fit
in Repository/Timeline impl, but I'll leave that for a different patch.
* All tests-related code from local_env went into integration_tests/src/lib.rs as an
extension to PostgresNode trait.
* Paths-generating functions were concentrated around corresponding config
types (LocalEnv and PageserverConf).
2021-05-17 19:17:51 +03:00
anastasia
38c4b6f02f Move postgres code related to zenith pageserver to contrib/zenith.
- vendor/postgres changes
- Respective changes in RUST code: upload shared library, use new GUC names.
- Add contrib build to Makefile.
2021-05-13 16:23:21 +03:00
Eric Seppanen
d25656797c switch pageserver to blocking postgres interface 2021-05-10 15:24:55 -07:00
Eric Seppanen
49530145d8 cargo fmt 2021-05-02 11:03:58 -07:00
Arseny Sher
da96965897 Remove assert(is_ok) before unwrap.
It only hides the error.
2021-05-02 17:19:09 +03:00
Stas Kelvich
3762b53986 show branch name in "zenith pg list" 2021-05-01 03:32:48 +03:00
Eric Seppanen
fdf6829de5 cargo fmt 2021-04-26 09:36:22 -07:00
Konstantin Knizhnik
c59830fd01 Do not restart wal-redo-postgres 2021-04-26 17:57:29 +03:00
Konstantin Knizhnik
636194406f Dump log files in case of regress_tests failure 2021-04-26 17:04:26 +03:00
Konstantin Knizhnik
3b09a74f58 Implement offloading of old WAL files to S3 in walkeeper 2021-04-26 16:23:00 +03:00
Konstantin Knizhnik
5292b502f3 Check regression test exit status 2021-04-26 11:06:31 +03:00
Konstantin Knizhnik
59b23fef64 Wait for WAL receiver to start 2021-04-23 12:40:29 +03:00
Konstantin Knizhnik
0eaff5aa7f Fix pageserver.log path 2021-04-23 11:37:28 +03:00
Konstantin Knizhnik
db5712f28b Dump pageserver.log in case of test errors 2021-04-23 09:41:08 +03:00