Files
neon/test_runner/performance/pageserver
Peter Bendel 56da624870 allow storage_controller error during pagebench (#8109)
## Problem

`test_pageserver_max_throughput_getpage_at_latest_lsn` is a pagebench
testcase which creates several tenants/timelines to verify pageserver
performance.


The test swaps environments around in the tenant duplication stage, so
the storage controller uses two separate db instances (one in the
duplication stage and another one in the benchmarking stage).
In the benchmarking stage, the storage controller starts without any
knowledge of nodes, but with knowledge of tenants (via
attachments.json). When we re-attach and attempt to update the scheduler
stats, the scheduler rightfully complains
about the node not being known. The setup should preserve the storage
controller across the two envs, but i think it's fine to just allow list
the error in this case.

## Summary of changes

add the error message 

`2024-06-19T09:38:27.866085Z ERROR Scheduler missing node 1``

to the list of allowed errors for storage_controller
2024-06-19 13:04:29 +00:00
..

How to reproduce benchmark results / run these benchmarks interactively.

  1. Get an EC2 instance with Instance Store. Use the same instance type as used for the benchmark run.
  2. Mount the Instance Store => neon.git/scripts/ps_ec2_setup_instance_store
  3. Use a pytest command line (see other READMEs further up in the pytest hierarchy).

For tests that take a long time to set up / consume a lot of storage space, we use the test suite's repo_dir snapshotting functionality (from_repo_dir). It supports mounting snapshots using overlayfs, which improves iteration time.

Here's a full command line.

RUST_BACKTRACE=1 NEON_ENV_BUILDER_USE_OVERLAYFS_FOR_SNAPSHOTS=1 DEFAULT_PG_VERSION=15 BUILD_TYPE=release \
    ./scripts/pytest test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py