Files
neon/test_runner/performance/pageserver
John Spray 842c3d8c10 tests: simplify code around unstable test_basebackup_with_high_slru_count (#8477)
## Problem

In `test_basebackup_with_high_slru_count`, the pageserver is sometimes
mysteriously hanging on startup, having been started+stopped earlier in
the test setup while populating template tenant data.

- #7586 

We can't see why this is hanging in this particular test. The test does
some weird stuff though, like attaching a load of broken tenants and
then doing a SIGQUIT kill of a pageserver.

## Summary of changes

- Attach tenants normally instead of doing a failpoint dance to attach
them as broken
- Shut the pageserver down gracefully during init instead of using
immediate mode
- Remove the "sequential" variant of the unstable test, as this is going
away soon anyway
- Log before trying to acquire lock file, so that if it hangs we have a
clearer sense of if that's really where it's hanging. It seems like it
is, but that code does a non-blocking flock so it's surprising.
2024-07-24 11:26:24 +01:00
..

How to reproduce benchmark results / run these benchmarks interactively.

  1. Get an EC2 instance with Instance Store. Use the same instance type as used for the benchmark run.
  2. Mount the Instance Store => neon.git/scripts/ps_ec2_setup_instance_store
  3. Use a pytest command line (see other READMEs further up in the pytest hierarchy).

For tests that take a long time to set up / consume a lot of storage space, we use the test suite's repo_dir snapshotting functionality (from_repo_dir). It supports mounting snapshots using overlayfs, which improves iteration time.

Here's a full command line.

RUST_BACKTRACE=1 NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS=1 DEFAULT_PG_VERSION=15 BUILD_TYPE=release \
    ./scripts/pytest test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py