Commit Graph

85 Commits

Author SHA1 Message Date
Konstantin Knizhnik
f7878c5157 Bump postgres version 2021-09-24 19:10:18 +03:00
Konstantin Knizhnik
b51d3f6b2b Revert "Save page received from page server in local file cache"
This reverts commit 137472db91.
2021-09-24 18:04:21 +03:00
Konstantin Knizhnik
137472db91 Save page received from page server in local file cache 2021-09-24 15:15:54 +03:00
Arthur Petukhovsky
8ebf2fe550 Add test for acceptor restarts under load (#591)
In this test safekeepers are restarted one by one, while bank transactions
are executed and validated in the background. Bank transactions consist of
balance transfers and log writes. In the end balance sum should remain the
same and there should be progress from every client, when 2 of 3 safekeeper
nodes are up.
2021-09-22 11:59:20 +03:00
Heikki Linnakangas
49c8c03465 Add performance test for bulk INSERT 2021-09-21 13:25:46 +03:00
Dmitry Rodionov
b7aac87ec1 fix port distribution so services do not use ephemeral ports 2021-09-20 18:44:42 +03:00
Heikki Linnakangas
c2af6d98db Don't print 'pg_controldata' output after every startup in tests.
It's not interesting for most tests, and clutters the output. If there
are individual tests where it is worthwhole, let's add pg_controldata calls
to those tests, but I don't think it's needed for now.
2021-09-17 20:04:29 +03:00
Heikki Linnakangas
540973eac4 Don't get confused on request of latest page version with very old LSN.
If the 'latest' flag in the client request is true, the client wants the
latest page version regardless of the LSN in the request. The LSN is just
a hint in that case, indicating that the page hasn't been modified since
since that LSN. The LSN can be very old, so it's possible that the page
server has already garbage collected away the layer at that LSN. We tried
to fetch the old layer and errored out if that happened. To fix, always
fetch the data as of last-record-LSN, if 'latest' is set in the client
request. We now only use the LSN to wait if the requested LSN hasn't been
received and processed yet.

Fixes https://github.com/zenithdb/zenith/issues/567
2021-09-17 18:56:05 +03:00
Dmitry Ivanov
7b3fb760fa [test_runner] psql should be oblivious to user's preferences
This makes psql ignore $HOME/.psqlrc
2021-09-17 14:16:23 +03:00
Dmitry Rodionov
01ef2baef0 show more context for zenith cli run errors 2021-09-15 14:02:15 +03:00
Dmitry Rodionov
9563336d9a Bring back check for interferring processes, add more comments and
descriptive errors
2021-09-15 14:02:15 +03:00
Dmitry Rodionov
4ebe643d0c Support parallel test running for python tests
Support is done via pytest-xdist plugin.
To use the feature add -n<concurrency> to pytest invocation
e.g. pytest -n8 to run 8 tests in parallel.

Changes in code are mostly about ports assigning. Previously port for
pageserver was hardcoded without the ability to override through zenith
cli and ports for started compute nodes were calculated twice, in zenith
cli and in test code. Now zenith cli supports port arguments for
pageserver and compute nodes to be passed explicitly.

Tests are modified in such a way that each worker gets a non overlapping
port range which can be configured and now contains 100 ports. These
ports are distributed to test services (pageserver, wal acceptors,
compute nodes) so they can work independently.
2021-09-15 14:02:15 +03:00
Dmitry Rodionov
b4ecae33e4 add incremental tracking of logical timeline size
In order to exclude problems with synchronizing disk and memory logical
size is not stored in metadata on disk. It is calculated on timeline
"start" by scanning the contents of layered repo and then size is maintained
via an atomic variable.

This patch also adds new endpoint to pageserver http api: branch detail.
It allows retrieval of a particular branch info by its name. Size info
is also added to the response of the endpoint and used in tests.
2021-09-07 18:25:15 +03:00
anastasia
6f0c065743 preserve filediff artifacts in CI 2021-09-07 16:58:21 +03:00
anastasia
94c50e3e90 Fix check_restored_datadir_content(). Call 'basebackup' command directly, instead of relying on CLI 2021-09-07 16:58:21 +03:00
anastasia
eb3fd7a8da print diff for mismatching files in check_restored_datadir_content() 2021-09-06 18:21:23 +03:00
anastasia
1e172230ce Add test funciton to compare files in compute nodes to catch bugs in SLRU replay.
Compare files in existing compute node's pgdata with fresh basebackup at the same lsn. We expect that content is identical, except tmp files
Use it after some tests.
2021-09-06 18:21:23 +03:00
Stas Kelvich
ed4eed0a19 Make use of postgres --sync-safekeepers in tests and CLI.
Change control plane code to call `postgres --sync-safekeepers` before
compute node start when safekeepers are enabled. Now `pg create` will
create an empty data directory with the proper config file. Subsequent
`pg start` will run `sync-safekeepers` and will call basebackup with
the resulting LSN. Also change few tests to accommodate this new behavior.
2021-09-06 13:06:20 +03:00
Konstantin Knizhnik
b227c63edf Set proper xl_prev in basebackup, when possible.
In a passing fix two minor issues with basabackup:
* check that we can't create branches with pre-initdb LSN's
* normalize branch LSN's that are pointing to the segment boundary

patch by @knizhnik
closes #506
2021-09-03 14:58:59 +03:00
anastasia
fabf5ec664 Don't use term 'snapshot' to describe layers 2021-09-03 11:00:38 +03:00
Heikki Linnakangas
c6678c5dea Include # of bytes written in pgbench benchmark result
Now that the page server collects this metric (since commit 212920e47e),
let's include it in the performance test results

The new metric looks like this:

    performance/test_perf_pgbench.py .         [100%]
    --------------- Benchmark results ----------------
    test_pgbench.init: 6.784 s
    test_pgbench.pageserver_writes: 466 MB    <---- THIS IS NEW
    test_pgbench.5000_xacts: 8.196 s
    test_pgbench.size: 163 MB

    =============== 1 passed in 21.00s ===============
2021-09-03 09:00:26 +03:00
Kirill Bulatov
0e4cbe0165 Fix some typos 2021-09-02 17:27:18 +03:00
Stas Kelvich
ddd2c83c64 Change test_restart_compute to expose safekeeper problems.
Make this test look like 'test_compute_restart.sh' by @ololobus, which
was surprisingly good for checking safekeepers behavior. This test adds
an intermediate compute node start with bulk select that causes a lot of
FPI's and select itself wouldn't wait for all that WAL to be replicated.
So if we kill compute node right after that we end up with lagging safekeepers
with VCL != flush_lsn. And starting new node from that state takes special
care.

Also, run and print `pg_controldata` output after each compute node start
to eyeball lsn/checkpoint info of basebackup.

This commit only adds test without fixing the problem.
2021-09-02 12:06:12 +03:00
Kirill Bulatov
291c2c9a1b Test readme typo fix 2021-09-02 11:33:00 +03:00
anastasia
27442c3daa Add test for DROP DATABASE command 2021-08-30 17:29:29 +03:00
Heikki Linnakangas
074bd3bb12 Add basic performance test framework.
This provides a pytest fixture to record metrics from pytest tests. The
The recorded metrics are printed out at the end of the tests.

As a starter, this includes on small test, using pgbench. It prints out
three metrics: the initialization time, runtime of 5000 xacts, and the
repository size after the tests.
2021-08-27 21:00:45 +03:00
Alexey Kondratov
e1d8f97b9e Mention pipenv run as an option to run pytest 2021-08-27 19:46:51 +03:00
Heikki Linnakangas
5998744bcc Remove rocksdb implementation.
The layered storage format is good enough that we don't need the rocksdb
implementation anymore. There are a lot of known issues but we'll keep
working on them.
2021-08-25 18:37:22 +03:00
Dmitry Rodionov
23b5249512 translate pageserver api to http 2021-08-24 19:05:00 +03:00
Heikki Linnakangas
81dd4bc41e Fix decoding XLOG_HEAP_DELETE and XLOG_HEAP_UPDATE records.
Because the t_cid field was missing from the XlHeapDelete struct that
corresponds to the PostgreSQL xl_heap_delete struct, the check for the
XLH_DELETE_ALL_VISIBLE_CLEARED flag did not work correctly.

Decoding XlHeapUpdate struct was also missing the t_cid field, but that
didn't cause any immediate problems because in that struct, the t_cid
field is after all the fields that the page server cares about. But fix
that too, as it was an accident waiting to happen.

The bug was mostly hidden by the VM page handling in zenith_wallog_page,
where it forcibly generates a FPW record whenever a VM page is evicted:

    else if (forknum == VISIBILITYMAP_FORKNUM && !RecoveryInProgress())
    {
        /*
         * Always WAL-log vm.
         * We should never miss clearing visibility map bits.
         *
         * TODO Is it too bad for performance?
         * Hopefully we do not evict actively used vm too often.
         */
        XLogRecPtr recptr;
        recptr = log_newpage_copy(&reln->smgr_rnode.node, forknum, blocknum, buffer, false);
        XLogFlush(recptr);
        lsn = recptr;

But that was just hiding the issue: it's still visible if you had a
read-only node relying on the data in the page server, or you killed and
restarted the primary node, or you started a branch. In the included test
case, I used a new branch to expose this.

Fixes https://github.com/zenithdb/zenith/issues/461
2021-08-24 15:59:25 +03:00
anastasia
20e6cd7724 Update test_twophase - check that we correctly restore files at compute node start. 2021-08-19 12:15:09 +03:00
Heikki Linnakangas
9fed5c8fb7 Add test for page server restart. 2021-08-18 20:19:07 +03:00
Heikki Linnakangas
91f72fabc9 Work with smaller segments.
Split each relish into fixed-sized 10 MB segments. Separate layers are
created for each segment. This reduces the write amplification if you
have a large relation and update only parts of it; the downside is
that you have a lot more files. The 10 MB is just a guess, we should
do some modeling and testing in the future to figure out the optimal
size.

Each segment tracks the size of the segment separately. To figure out
the total size of a relish, you need to loop through the segment to
find the highest segment that's in use. That's a bit inefficient, but
will do for now. We might want to add a cache or something later.
2021-08-17 18:54:41 +03:00
anastasia
cbeb67067c Issue #367.
Change CLI so that we always create node from scratch at 'pg start'.
This operation preserve previously existing config

Add new flag '--config-only' to 'pg create'.
If this flag is passed, don't perform basebackup, just fill initial postgresql.conf for the node.
2021-08-17 18:12:31 +03:00
Dmitry Rodionov
0c4ab80eac try to be more intelligent in WalAcceptor.start, added a bunch of typing sugar to wal acceptor fixtures 2021-08-16 14:27:44 +03:00
Heikki Linnakangas
2450f82de5 Introduce a new "layered" repository implementation.
This replaces the RocksDB based implementation with an approach using
"snapshot files" on disk, and in-memory btreemaps to hold the recent
changes.

This make the repository implementation a configuration option. You can
choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>"
The unit tests have been refactored to exercise both implementations.
'layered' is now the default.

Push/pull is not implemented. The 'test_history_inmemory' test has been
commented out accordingly. It's not clear how we will implement that
functionality; probably by copying the snapshot files directly.
2021-08-16 10:06:48 +03:00
Heikki Linnakangas
97f9021c88 Fix JWT token encoding issue in test.
On my laptop, the server was receiving the token as a string with extra
b'...' escaping, e.g as "b'eyJ0....0ifQA'" instead of just "eyJ0....0ifQA".
That was causing the test to fail.

I'm using Python 3.9, while the CI is using Python 3.8. I suspect that's
why. My version of pyjwt might be different too.

See also https://github.com/jpadilla/pyjwt/issues/391.
2021-08-12 20:46:14 +03:00
Dmitry Rodionov
ce5333656f Introduce authentication v0.1.
Current state with authentication.
Page server validates JWT token passed as a password during connection
phase and later when performing an action such as create branch tenant
parameter of an operation is validated to match one submitted in token.
To allow access from console there is dedicated scope: PageServerApi,
this scope allows access to all tenants. See code for access validation in:
PageServerHandler::check_permission.

Because we are in progress of refactoring of communication layer
involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper
now doesn’t check token passed from compute, and uses “hardcoded” token
passed via environment variable to communicate with pageserver.

Compute postgres now takes token from environment variable and passes it
as a password field in pageserver connection. It is not passed through
settings because then user will be able to retrieve it using pg_settings
or SHOW ..

I’ve added basic test in test_auth.py. Probably after we add
authentication to remaining network paths we should enable it by default
and switch all existing tests to use it.
2021-08-11 20:05:54 +03:00
anastasia
949ac54401 Add test of clog (pg_xact) truncation 2021-08-11 05:49:24 +03:00
Stas Kelvich
02b9be488b Disable GC test.
Current GC test is flaky and overly strict. Since we are migrating to the layered repo format
with different GC implementation let's just silence this test for now.
2021-08-04 18:33:33 +03:00
Alexey Kondratov
bcaa59c0b9 Test compute restart with AND without safekeepers 2021-08-04 00:05:19 +03:00
Heikki Linnakangas
9ff122835f Refactor ObjectTags, intruducing a new concept called "relish"
This clarifies - I hope - the abstractions between Repository and
ObjectRepository. The ObjectTag struct was a mix of objects that could
be accessed directly through the public Timeline interface, and also
objects that were created and used internally by the ObjectRepository
implementation and not supposed to be accessed directly by the
callers.  With the RelishTag separaate from ObjectTag, the distinction
is more clear: RelishTag is used in the public interface, and
ObjectTag is used internally between object_repository.rs and
object_store.rs, and it contains the internal metadata object types.

One awkward thing with the ObjectTag struct was that the Repository
implementation had to distinguish between ObjectTags for relations,
and track the size of the relation, while others were used to store
"blobs".  With the RelishTags, some relishes are considered
"non-blocky", and the Repository implementation is expected to track
their sizes, while others are stored as blobs. I'm not 100% happy with
how RelishTag captures that either: it just knows that some relish
kinds are blocky and some non-blocky, and there's an is_block()
function to check that.  But this does enable size-tracking for SLRUs,
allowing us to treat them more like relations.

This changes the way SLRUs are stored in the repository. Each SLRU
segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a
separate relish.  This removes the need for the SLRU-specific
put_slru_truncate() function in the Timeline trait. SLRU truncation is
now handled by caling put_unlink() on the segment. This is more in
line with how PostgreSQL stores SLRUs and handles their trunction.

The SLRUs are "blocky", so they are accessed one 8k page at a time,
and repository tracks their size. I considered an alternative design
where we would treat each SLRU segment as non-blocky, and just store
the whole file as one blob. Each SLRU segment is up to 256 kB in size,
which isn't that large, so that might've worked fine, too. One reason
I didn't do that is that it seems better to have the WAL redo
routines be as close as possible to the PostgreSQL routines. It
doesn't matter much in the repository, though; we have to track the
size for relations anyway, so there's not much difference in whether
we also do it for SLRUs.

While working on this, I noticed that the CLOG and MultiXact redo code
did not handle wraparound correctly. We need to fix that, but for now,
I just commented them out with a FIXME comment.
2021-08-03 14:01:05 +03:00
Heikki Linnakangas
acc0f41985 Don't try to launch duplicate WAL redo thread if tenant already exists.
The codepath for tenant_create command first launched the WAL redo
thread, and then called branches::create_repo() which checked if the
tenant's directory already exists. That's problematic, because
launching the WAL redo thread will run initdb if the directory doesn't
already exist. Race condition: If the tenant already exists, it will
have a WAL redo thread already running, and the old and new WAL redo
thread might try to run initdb at the same time, causing all kinds of
weird failures.

The test_pageserver_api test was failing 100% repeatably on my laptop
because of this. I'm not sure why this doesn't occur on the CI:

    Jul 31 18:05:48.877 INFO running initdb in "./tenants/5227e4eb90894775ac6b8a8c76f24b2e/wal-redo-datadir", location: pageserver::walredo, pageserver/src/walredo.rs:483
    thread 'WAL redo thread' panicked at 'initdb failed: The files belonging to this database system will be owned by user "heikki".
    This user must also own the server process.

    The database cluster will be initialized with locale "C".
    The default database encoding has accordingly been set to "SQL_ASCII".
    The default text search configuration will be set to "english".

    Data page checksums are disabled.

    creating directory ./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir ... ok
    creating subdirectories ... ok
    selecting dynamic shared memory implementation ... posix
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting default time zone ... Europe/Helsinki
    creating configuration files ... ok
    running bootstrap script ...
    stderr:
    2021-07-31 15:05:48.875 GMT [282569] LOG:  could not open configuration file "/home/heikki/git-sandbox/zenith/test_output/test_tenant_list/repo/./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir/postgresql.conf": No such file or directory
    2021-07-31 15:05:48.875 GMT [282569] FATAL:  configuration file "/home/heikki/git-sandbox/zenith/test_output/test_tenant_list/repo/./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir/postgresql.conf" contains errors
    child process exited with exit code 1
    initdb: removing data directory "./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir"
2021-07-31 18:13:21 +03:00
Dmitry Rodionov
767590bbd5 support tenants
this patch adds support for tenants. This touches mostly pageserver.
Directory layout on disk is changed to contain new layer of indirection.
Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant
id>. Tenant id has the same format as timeline id. Tenant id is included in
pageserver commands when needed. Also new commands are available in
pageserver: tenant_list, tenant_create. This is also reflected CLI.
During init default tenant is created and it's id is saved in CLI config,
so following commands can use it without extra options. Tenant id is also included in
compute postgres configuration, so it can be passed via ServerInfo to
safekeeper and in connection string to pageserver.
For more info see docs/multitenancy.md.
2021-07-22 20:54:20 +03:00
Stas Kelvich
791312824d set superuser name in python tests too 2021-07-21 17:22:22 +03:00
Konstantin Knizhnik
eb0a56eb22 Replay non-relational WAL records on page server 2021-07-16 18:43:07 +03:00
Dmitry Rodionov
75e717fe86 allow both domains and ip addresses in connection options for
pageserver and wal keeper. Also updated PageServerNode definition in
control plane to account for that. resolves #303
2021-07-09 16:46:21 +03:00
Heikki Linnakangas
ced338fd20 Handle relation DROPs in page server.
Add back code to parse transaction commit and abort records, and in
particular the list of dropped relations in them. Add 'put_unlink'
function to the Timeline trait and implementation. We had the code to
handle dropped relations in the GC code and elsewhere in ObjectRepository
already, but there was nothing to create the RelationSizeEntry::Unlink
tombstone entries until now. Also add a test to check that GC correctly
removes all page versions of a dropped relation.

Implements https://github.com/zenithdb/zenith/issues/232, except for the
"orphaned" rels.

Reviewed-by: Konstantin Knizhnik
2021-06-29 00:27:10 +03:00
Heikki Linnakangas
ec44f4b299 Add test for Garbage Collection.
This expose a command in in page server to run GC immediately on a given
timeline. It's just for testing purposes.
2021-06-28 17:07:28 +03:00
Dmitry Ivanov
257ade0688 Extract PostgreSQL connection logic into PgProtocol
This patch aims to:

* Unify connection & querying logic of ZenithPagerserver and Postgres.
* Mitigate changes to transaction machinery introduced in `psycopg2 >= 2.9`.

Now it's possible to acquire db connection using the corresponding
method:

```python
pg = postgres.create_start('main')
conn = pg.connect()
...
conn.close()
```

This pattern can be further improved with the help of `closing`:

```python
from contextlib import closing

pg = postgres.create_start('main')

with closing(pg.connect()) as conn:
    ...
```

All connections produced by this method will have autocommit
enabled by default.
2021-06-17 20:19:04 +03:00