Files
neon/test_runner
Heikki Linnakangas 6127b6638b Major storage format rewrite
Major changes and new concepts:

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which relations
exist and what are their sizes, from Repository to a new module,
pgdatadir_mapping.rs. The interface to Repository is now a simple key-value
PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes with
an LSN.

Key
---

The key to the Repository key-value store is a Key struct, which consists
of a few integer fields. It's wide enough to store a full RelFileNode,
fork and block number, and to distinguish those from metadata keys.

See pgdatadir_mapping.rs for how relation blocks and metadata keys are
mapped to the Key struct.

Store arbitrary key-ranges in the layer files
---------------------------------------------

The concept of a "segment" is gone. Each layer file can store an arbitrary
range of Keys.

TODO:

- Deleting keys, to reclaim space. This isn't visible to Postgres, dropping
  or truncating a relation works as you would expect if you look at it from
  the compute node. If you drop a relation, for example, the relation is
  removed from the metadata entry, so that it appears to be gone. However,
  the layered repository implementation never reclaims the storage.

- Tracking "logical database size", for disk space quotas. That ought to
  be reimplemented now in pgdatadir_mapping.rs, or perhaps in walingest.rs.

- LSM compaction. The logic for checkpointing and creating image layers is
  very dumb. AFAIK the *read* code could deal with a full-fledged LSM tree
  now consisting of the delta and image layers. But there's no code to
  take a bunch of delta layers and compact them, and the heuristics for
  when to create image layers is pretty dumb.

- The code to track the layers is inefficient. All layers are just stored in
  a vector, and whenever we need to find a layer, we do a linear search in
  it.
2022-03-09 11:36:39 +02:00
..
2022-03-09 11:36:39 +02:00
2022-02-16 10:59:51 -05:00
2022-03-04 01:10:42 +03:00

Zenith test runner

This directory contains integration tests.

Prerequisites:

  • Correctly configured Python, see /docs/sourcetree.md
  • Zenith and Postgres binaries
    • See the root README.md for build directions
    • Tests can be run from the git tree; or see the environment variables below to run from other directories.
  • The zenith git repo, including the postgres submodule (for some tests, e.g. pg_regress)

Test Organization

The tests are divided into a few batches, such that each batch takes roughly the same amount of time. The batches can be run in parallel, to minimize total runtime. Currently, there are only two batches:

  • test_batch_pg_regress: Runs PostgreSQL regression tests
  • test_others: All other tests

Running the tests

There is a wrapper script to invoke pytest: ./scripts/pytest. It accepts all the arguments that are accepted by pytest. Depending on your installation options pytest might be invoked directly.

Test state (postgres data, pageserver state, and log files) will be stored under a directory test_output.

You can run all the tests with:

./scripts/pytest

If you want to run all the tests in a particular file:

./scripts/pytest test_pgbench.py

If you want to run all tests that have the string "bench" in their names:

./scripts/pytest -k bench

Useful environment variables:

ZENITH_BIN: The directory where zenith binaries can be found. POSTGRES_DISTRIB_DIR: The directory where postgres distribution can be found. TEST_OUTPUT: Set the directory where test state and test output files should go. TEST_SHARED_FIXTURES: Try to re-use a single pageserver for all the tests. ZENITH_PAGESERVER_OVERRIDES: add a ;-separated set of configs that will be passed as FORCE_MOCK_S3: inits every test's pageserver with a mock S3 used as a remote storage. --pageserver-config-override=${value} parameter values when zenith cli is invoked RUST_LOG: logging configuration to pass into Zenith CLI

Let stdout, stderr and INFO log messages go to the terminal instead of capturing them: ./scripts/pytest -s --log-cli-level=INFO ... (Note many tests capture subprocess outputs separately, so this may not show much.)

Exit after the first test failure: ./scripts/pytest -x ... (there are many more pytest options; run pytest -h to see them.)

Writing a test

Every test needs a Zenith Environment, or ZenithEnv to operate in. A Zenith Environment is like a little cloud-in-a-box, and consists of a Pageserver, 0-N Safekeepers, and compute Postgres nodes. The connections between them can be configured to use JWT authentication tokens, and some other configuration options can be tweaked too.

The easiest way to get access to a Zenith Environment is by using the zenith_simple_env fixture. The 'simple' env may be shared across multiple tests, so don't shut down the nodes or make other destructive changes in that environment. Also don't assume that there are no tenants or branches or data in the cluster. For convenience, there is a branch called empty, though. The convention is to create a test-specific branch of that and load any test data there, instead of the 'main' branch.

For more complicated cases, you can build a custom Zenith Environment, with the zenith_env fixture:

def test_foobar(zenith_env_builder: ZenithEnvBuilder):
    # Prescribe the environment.
    # We want to have 3 safekeeper nodes, and use JWT authentication in the
    # connections to the page server
    zenith_env_builder.num_safekeepers = 3
    zenith_env_builder.set_pageserver_auth(True)

    # Now create the environment. This initializes the repository, and starts
    # up the page server and the safekeepers
    env = zenith_env_builder.init_start()

    # Run the test
    ...

For more information about pytest fixtures, see https://docs.pytest.org/en/stable/fixture.html

At the end of a test, all the nodes in the environment are automatically stopped, so you don't need to worry about cleaning up. Logs and test data are preserved for the analysis, in a directory under ../test_output/<testname>

Before submitting a patch

Ensure that you pass all obligatory checks.

Also consider:

  • Writing a couple of docstrings to clarify the reasoning behind a new test.
  • Adding more type hints to your code to avoid Any, especially:
    • For fixture parameters, they are not automatically deduced.
    • For function arguments and return values.