Instead of having a lot of separate fixtures for setting up the page
server, the compute nodes, the safekeepers etc., have one big ZenithEnv
object that encapsulates the whole environment. Every test either uses
a shared "zenith_simple_env" fixture, which contains the default setup
of a pageserver with no authentication, and no safekeepers. Tests that
want to use safekeepers or authentication set up a custom test-specific
ZenithEnv fixture.
Gathering information about the whole environment into one object makes
some things simpler. For example, when a new compute node is created,
you no longer need to pass the 'wal_acceptors' connection string as
argument to the 'postgres.create_start' function. The 'create_start'
function fetches that information directly from the ZenithEnv object.
* Add yapf run to CircleCI
* Pin yapf version
* Enable `SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES` setting
* Reformat all existing code with slight manual adjustments
* test_runner/README: note that yapf is forced
* Use logging in python tests
* Use f-strings for logs
* Don't log test output while running
* Use only pytest logging handler
* Add more info about pytest logging
Make this test look like 'test_compute_restart.sh' by @ololobus, which
was surprisingly good for checking safekeepers behavior. This test adds
an intermediate compute node start with bulk select that causes a lot of
FPI's and select itself wouldn't wait for all that WAL to be replicated.
So if we kill compute node right after that we end up with lagging safekeepers
with VCL != flush_lsn. And starting new node from that state takes special
care.
Also, run and print `pg_controldata` output after each compute node start
to eyeball lsn/checkpoint info of basebackup.
This commit only adds test without fixing the problem.
this patch adds support for tenants. This touches mostly pageserver.
Directory layout on disk is changed to contain new layer of indirection.
Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant
id>. Tenant id has the same format as timeline id. Tenant id is included in
pageserver commands when needed. Also new commands are available in
pageserver: tenant_list, tenant_create. This is also reflected CLI.
During init default tenant is created and it's id is saved in CLI config,
so following commands can use it without extra options. Tenant id is also included in
compute postgres configuration, so it can be passed via ServerInfo to
safekeeper and in connection string to pageserver.
For more info see docs/multitenancy.md.
This patch aims to:
* Unify connection & querying logic of ZenithPagerserver and Postgres.
* Mitigate changes to transaction machinery introduced in `psycopg2 >= 2.9`.
Now it's possible to acquire db connection using the corresponding
method:
```python
pg = postgres.create_start('main')
conn = pg.connect()
...
conn.close()
```
This pattern can be further improved with the help of `closing`:
```python
from contextlib import closing
pg = postgres.create_start('main')
with closing(pg.connect()) as conn:
...
```
All connections produced by this method will have autocommit
enabled by default.
Previously, transaction commit could happen regardless of whether
pageserver has caught up or not. This patch aims to fix that.
There are two notable changes:
1. ComputeControlPlane::new_node() now sets the
`synchronous_standby_names = 'pageserver'` parameter to delay
transaction commit until pageserver acting as a standby has
fetched and ack'd a relevant portion of WAL.
2. pageserver now has to:
- Specify the `application_name = pageserver` which matches the
one in `synchronous_standby_names`.
- Properly reply with the ack'd LSNs.
This means that some tests don't need sleeps anymore.
TODO: We should probably make this behavior configurable.
Fixes#187.