Commit Graph

985 Commits

Author SHA1 Message Date
Heikki Linnakangas
66ec135676 Refactor pytest fixtures
Instead of having a lot of separate fixtures for setting up the page
server, the compute nodes, the safekeepers etc., have one big ZenithEnv
object that encapsulates the whole environment. Every test either uses
a shared "zenith_simple_env" fixture, which contains the default setup
of a pageserver with no authentication, and no safekeepers. Tests that
want to use safekeepers or authentication set up a custom test-specific
ZenithEnv fixture.

Gathering information about the whole environment into one object makes
some things simpler. For example, when a new compute node is created,
you no longer need to pass the 'wal_acceptors' connection string as
argument to the 'postgres.create_start' function. The 'create_start'
function fetches that information directly from the ZenithEnv object.
2021-10-25 14:14:47 +03:00
Heikki Linnakangas
28af3e5008 Remove some unnecessary fixture arguments 2021-10-25 14:14:45 +03:00
Heikki Linnakangas
f337d73a6c Rearrange output dirs a bit
Each test now gets its own test output directory, like
'test_output/test_foobar', even when TEST_SHARED_FIXTURES is used.
When TEST_SHARED_FIXTURES is not used, the zenith repo for each test
is created under a 'repo' subdir inside the test output dir, e.g.
'test_output/test_foobar/repo'
2021-10-25 14:14:43 +03:00
Heikki Linnakangas
57ce541521 Remove unnecessary 'pg_bin' object from 'postgres' fixture.
It was only used in check_restored_datadir_content(), and that function
can construct it easily from the other information it has.
2021-10-25 14:14:41 +03:00
Heikki Linnakangas
e14f24034f Turn a few path-fixtures to global variables
This way, they're readily accessible from the classes and functions
that are not themselves fixtures
2021-10-25 14:14:38 +03:00
Kirill Bulatov
04fb0a0342 Add core relish backup and restore functionality 2021-10-22 22:22:38 +03:00
Heikki Linnakangas
8c42dcc041 Fix safekeeper -D option.
The -D option to specify working directory was broken:

    $ mkdir foobar
    $ ./target/debug/safekeeper -D foobar
    Error: failed to open "foobar/safekeeper.log"

    Caused by:
        No such file or directory (os error 2)

This was because we both chdir'd into to specified directory, and also
prepended the directory to all the paths. So in the above example, it
actually tried to create the log file in "foobar/foobar/safekepeer.log"
Change it to work the same way as in the pageserver: chdir to the
specified directory, and leave 'workdir' always set to ".".

We wouldn't necessarily need the 'workdir' variable in the config at all,
and could assume that the current working directory is always the
safekeeper data directory, but I'd like to keep this consistent with the
the pageserver. The page server doesn't assume that for the sake of unit
tests. We don't currently have unit tests in the safekeeper that write
to disk but we might want to in the future.
2021-10-22 08:39:58 +03:00
Alexey Kondratov
9070a4dc02 Turn off back pressure by default 2021-10-22 01:40:43 +03:00
Egor Suvorov
86a28458c6 test_runner: use Python 3.7 in CI and improve its support (#775)
* We actually need Python 3.7 because of dataclasses
* Rerun 'pipenv lock' under Python 3.7 and add 'pipenv' to dev deps
* Update docs on developing for Python 3.7
* CircleCI: use Python 3.7 via Docker image instead of Orb
2021-10-21 20:01:29 +03:00
Egor Suvorov
c058d04250 Rename WalAcceptor to Safekeeper in most places (#741) 2021-10-21 18:26:43 +03:00
Konstantin Knizhnik
c310932121 Implement backpressure for compute node to avoid WAL overflow
Co-authored-by: Arseny Sher <sher-ars@yandex.ru>
Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>
2021-10-21 18:15:50 +03:00
Egor Suvorov
ff563ff080 test_runner: fix mypy errors and force it on CI (#774)
* Fix bugs found by mypy
* Add some missing types and runtime checks, remove unused code
* Make ZenithPageserver start right away for better type safety
* Add `types-*` packages to Pipfile
* Pin mypy version and run it on CircleCI
2021-10-21 13:51:54 +03:00
anastasia
7f9d2a7d05 Change 'zenith tenant list' API to return tenant state added in 0dc7a3fc 2021-10-21 11:04:22 +03:00
Arthur Petukhovsky
13f4e173c9 Wait for safekeepers to catch up in test_restarts_under_load (#776) 2021-10-20 14:42:53 +03:00
Dmitry Ivanov
85116a8375 [proxy] Prevent TLS stream from hanging
This change causes writer halves of a TLS stream to always flush after a
portion of bytes has been written by `std::io::copy`. Furthermore, some
cosmetic and minor functional changes are made to facilitate debug.
2021-10-20 14:15:49 +03:00
Egor Suvorov
e42c884c2b test_runner/README: add note on capturing logs (#778)
Became actual after #674
2021-10-20 01:55:49 +03:00
Egor Suvorov
eb706bc9f4 Force yapf (Python code formatter) in CI (#772)
* Add yapf run to CircleCI
* Pin yapf version
* Enable `SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES` setting
* Reformat all existing code with slight manual adjustments
* test_runner/README: note that yapf is forced
2021-10-19 20:13:47 +03:00
Dmitry Rodionov
798df756de suppress FileNotFound exception instead of missing_ok=True because the latter is added in python 3.8 and we claim to support >3.6 2021-10-19 17:13:42 +03:00
Dmitry Rodionov
732d13fe06 use cached-property package because python<3.8 doesnt have cached_property in functools 2021-10-19 17:13:42 +03:00
Heikki Linnakangas
feae7f39c1 Support read-only nodes
Change 'zenith.signal' file to a human-readable format, similar to
backup_label. It can contain a "PREV LSN: %X/%X" line, or a special
value to indicate that it's OK to start with invalid LSN ('none'), or
that it's a read-only node and generating WAL is forbidden
('invalid').

The 'zenith pg create' and 'zenith pg start' commands now take a node
name parameter, separate from the branch name. If the node name is not
given, it defaults to the branch name, so this doesn't break existing
scripts.

If you pass "foo@<lsn>" as the branch name, a read-only node anchored
at that LSN is created. The anchoring is performed by setting the
'recovery_target_lsn' option in the postgresql.conf file, and putting
the server into standby mode with 'standby.signal'.

We no longer store the synthetic checkpoint record in the WAL segment.
The postgres startup code has been changed to use the copy of the
checkpoint record in the pg_control file, when starting in zenith
mode.
2021-10-19 09:48:12 +03:00
Heikki Linnakangas
c2b468c958 Separate node name from the branch name in ComputeControlPlane
This is in preparation for supporting read-only nodes. You can launch
multiple read-only nodes on the same brach, so we need an identifier
for each node, separate from the branch name.
2021-10-19 09:48:10 +03:00
Heikki Linnakangas
e272a380b4 On new repo, start writing WAL only after the initial checkpoint record.
Previously, the first WAL record on the 'main' branch overwrote the
initial checkpoint record, with invalid 'xl_prev'. That's harmless, but
also pretty ugly. I bumped into this while I was trying to tighen up the
checks for when a valid 'prev_lsn' is required. With this patch, the
first WAL record gets a valid 'xl_prev' value. It doesn't matter much
currently, but let's be tidy.
2021-10-19 09:48:04 +03:00
anastasia
0dc7a3fc15 Change tenant_mgr to use TenantState.
It allows to avoid locking entire TENANTS list while one tenant is bootstrapping
and prepares the code for remote storage integration.
2021-10-18 15:40:06 +03:00
Egor Suvorov
a1bc0ada59 Dockerfile: remove wal_acceptor alias for safekeeper (#743) 2021-10-18 14:56:30 +03:00
Kirill Bulatov
e9b5224a8a Fix toml serde gotchas 2021-10-18 14:14:27 +03:00
Heikki Linnakangas
bdd039a9ee S3 DELETE call returns 204, not 200.
According to the S3 API docs, the DELETE call returns code "204 No content"
on success.
2021-10-17 16:21:58 +03:00
Heikki Linnakangas
b405eef324 Avoid writing the metadata file when it hasn't changed. 2021-10-17 14:54:39 +03:00
Kirill Bulatov
ba557d126b React on sigint 2021-10-15 21:24:24 +03:00
Patrick Insinger
2dde20a227 Bump MSRV to 1.55 2021-10-15 09:10:08 -07:00
Kirill Bulatov
4ade0bb41c Refactor upload/download_relish function signatures.
This makes them more generic, by taking any Read / Write trait
implementation, instead of operating directly on a a file.
2021-10-15 11:34:15 +03:00
Stas Kelvich
100da024b6 expose pageserver http socket in docker 2021-10-15 00:26:38 +03:00
Arseny Sher
de744a44dd Add /timeline http request to safekeeper returning its status.
Which is mainly generational state (terms) and useful LSNs.

Also add /status basic healthcheck request which is now used in tests to
determine the safekeeper is up; this fixes #726.

ref #115
2021-10-14 19:02:38 +03:00
Heikki Linnakangas
0e026371ec Optimize WAL decoding slightly.
This adds a fast-path for the common case that the record doesn't
cross a page boundary. We now split off a new Bytes directly from the
original input buffer in that case, instead of copying the record to a
new BytesMut. Shaves about 5% of the page server's CPU time on my
laptop, in the 'test_bulk_insert' test.
2021-10-14 14:21:23 +03:00
Arthur Petukhovsky
4b87acb1f6 Use logging in python tests (#674)
* Use logging in python tests

* Use f-strings for logs

* Don't log test output while running

* Use only pytest logging handler

* Add more info about pytest logging
2021-10-14 13:10:09 +03:00
Dmitry Ivanov
43957f4401 [cross-repo-ci] Use solely commit hash to test PRs in CI
See #744 for the discussion.
2021-10-13 17:16:02 +03:00
Heikki Linnakangas
8a4f092e82 Skip syncing the temp initdb installation.
Doesn't make much difference on my laptop with SSD, but every little
helps, and with a slower disk it might be noticeable.
2021-10-13 16:59:00 +03:00
Egor Suvorov
6b6b3f68be Safekeeper metrics refactor (#747) 2021-10-13 16:28:24 +03:00
Arseny Sher
96f1175a80 Cleanup hardcoded oids. 2021-10-13 10:52:47 +03:00
Patrick Insinger
1c29de81de pageserver - remove lsn from WALRecord 2021-10-13 00:03:42 -07:00
Egor Suvorov
f658263543 Revert "Dockerfile: remove wal_acceptor alias for safekeeper"
This reverts commit 64ca947722.
2021-10-12 19:05:58 +00:00
Egor Suvorov
64ca947722 Dockerfile: remove wal_acceptor alias for safekeeper 2021-10-12 19:05:16 +00:00
Egor Suvorov
23f4c0a742 Rename wal_acceptor binary to safekeeper (#740), stage 1/2
* Rename wal_acceptor binary to safekeeper
* Rename wal_acceptor.pid and wal_acceptor.log to safekeeper.pid and safekeeper.log
* Change some mentions of WAL acceptor to safekeeper
* Dockerfile: alias wal_acceptor to safekeeper temporarily until internal scripts are updated
2021-10-12 22:03:06 +03:00
Dmitry Ivanov
7c5b99683c Speed up builds by passing make jobserver to cargo
This change brings the following improvements to our build system:

* Now BUILD_TYPE also affects rust apps.
* From now on, cargo will respect `-jN` passed via `make`. However, note
  that `rustc` may spawn multiple threads depending on compile flags.
* Cargo is able to cooperate with make to better schedule parallel jobs,
  which leads to better build times (-20s in release mode on my machine).
2021-10-12 21:02:39 +03:00
Patrick Insinger
160c4aff61 pageserver - use write guard for checkpointing 2021-10-12 10:02:15 -07:00
Patrick Insinger
6e5ca5dc5c pageserver - create TimelineWriter 2021-10-12 10:02:15 -07:00
Egor Suvorov
f3445949d1 Wal acceptor: report socket bind errors better when daemonizing (#738)
Fixes #664
2021-10-12 16:51:28 +03:00
Heikki Linnakangas
95a85312f5 Simplify code to build walredo messages.
No need to use BytesMut in these functions. Plain Vec is simpler. And
should be marginally faster too; I saw BytesMut functions previously
in 'perf' profile, consuming around 5% of the overall pageserver CPU
time. That's gone with this patch, although I don't see any discernible
difference in the overall performance test results.
2021-10-12 10:16:26 +03:00
Heikki Linnakangas
934fb8592f Detect when a checkpoint is modified in a smarter way.
Previously, the WAL receiver we would make a decoded copy of the current
Checkpoint before each WAL record, and compare it with the Checkpoint
after the record has been processed. If it has changed, the checkpoint
relish is updated in the repository. That's somewhat expensive, the
Checkpoint::encode() function is visible in 'perf' profile. Change that
so that we set a flag whenever the Checkpoint struct is modified, so that
we dont need to compare the whole struct anymore.
2021-10-12 09:09:10 +03:00
Dmitry Ivanov
bb239b4f69 [Makefile] Set default build type to debug 2021-10-11 17:08:31 +03:00
Dmitry Ivanov
1cd7900790 [Makefile] Make build type detection more precise
Previously, typos like `BUILD_TYPE=rlease` would silently
lead to building debug binaries. The current approach is also
more future-proof, since we might add `profile`, `valgrind`
as well as other build types.
2021-10-11 17:03:51 +03:00