mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-09 14:32:57 +00:00
## Problem In `test_basebackup_with_high_slru_count`, the pageserver is sometimes mysteriously hanging on startup, having been started+stopped earlier in the test setup while populating template tenant data. - #7586 We can't see why this is hanging in this particular test. The test does some weird stuff though, like attaching a load of broken tenants and then doing a SIGQUIT kill of a pageserver. ## Summary of changes - Attach tenants normally instead of doing a failpoint dance to attach them as broken - Shut the pageserver down gracefully during init instead of using immediate mode - Remove the "sequential" variant of the unstable test, as this is going away soon anyway - Log before trying to acquire lock file, so that if it hangs we have a clearer sense of if that's really where it's hanging. It seems like it is, but that code does a non-blocking flock so it's surprising.
How to reproduce benchmark results / run these benchmarks interactively.
- Get an EC2 instance with Instance Store. Use the same instance type as used for the benchmark run.
- Mount the Instance Store =>
neon.git/scripts/ps_ec2_setup_instance_store - Use a pytest command line (see other READMEs further up in the pytest hierarchy).
For tests that take a long time to set up / consume a lot of storage space,
we use the test suite's repo_dir snapshotting functionality (from_repo_dir).
It supports mounting snapshots using overlayfs, which improves iteration time.
Here's a full command line.
RUST_BACKTRACE=1 NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS=1 DEFAULT_PG_VERSION=15 BUILD_TYPE=release \
./scripts/pytest test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py