tests: allow-list a controller heartbeat error (#8471)

## Problem

`test_change_pageserver` stops pageservers in a way that can overlap
with the controller's heartbeats: the controller can get a heartbeat
success and then immediately find the node unavailable. This particular
situation triggers a log that isn't in our current allow-list of
messages for nodes offline

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8339/10048487700/index.html#testresult/19678f27810231df/retries

## Summary of changes

- Add the message to the allow list
This commit is contained in:
John Spray
2024-07-23 21:09:05 +01:00
committed by GitHub
parent d47c94b336
commit 9e23410074
2 changed files with 3 additions and 0 deletions

View File

@@ -828,6 +828,8 @@ impl Service {
);
}
Err(err) => {
// Transition to active involves reconciling: if a node responds to a heartbeat then
// becomes unavailable again, we may get an error here.
tracing::error!(
"Failed to update node {} after heartbeat round: {}",
node_id,

View File

@@ -102,6 +102,7 @@ DEFAULT_STORAGE_CONTROLLER_ALLOWED_ERRORS = [
# failing to connect to them.
".*Call to node.*management API.*failed.*receive body.*",
".*Call to node.*management API.*failed.*ReceiveBody.*",
".*Failed to update node .+ after heartbeat round.*error sending request for url.*",
# Many tests will start up with a node offline
".*startup_reconcile: Could not scan node.*",
# Tests run in dev mode