tests: accomodate some messages that can fail tests (#8144)

## Problem

- `test_storage_controller_many_tenants` can fail with warnings in the
storage controller about tenant creation holding a lock for too long,
because this test stresses the machine running the test with many
concurrent timeline creations
- `test_tenant_delete_smoke` can fail when synthetic remote storage
errors show up

## Summary of changes

- tolerate warnings about slow timeline creation in
test_storage_controller_many_tenants
- tolerate both possible errors during error_tolerant_delete
This commit is contained in:
John Spray
2024-06-24 18:03:53 +01:00
committed by GitHub
parent 3d760938e1
commit 1ea5d8b132
2 changed files with 16 additions and 3 deletions

View File

@@ -48,7 +48,16 @@ def test_storage_controller_many_tenants(
# We will intentionally stress reconciler concurrrency, which triggers a warning when lots
# of shards are hitting the delayed path.
env.storage_controller.allowed_errors.append(".*Many shards are waiting to reconcile")
env.storage_controller.allowed_errors.extend(
[
# We will intentionally stress reconciler concurrrency, which triggers a warning when lots
# of shards are hitting the delayed path.
".*Many shards are waiting to reconcile",
# We will create many timelines concurrently, so they might get slow enough to trip the warning
# that timeline creation is holding a lock too long.
".*Shared lock by TimelineCreate.*was held.*",
]
)
for ps in env.pageservers:
# This can happen because when we do a loop over all pageservers and mark them offline/active,

View File

@@ -31,8 +31,12 @@ def error_tolerant_delete(ps_http, tenant_id):
if e.status_code == 500:
# This test uses failure injection, which can produce 500s as the pageserver expects
# the object store to always be available, and the ListObjects during deletion is generally
# an infallible operation
assert "simulated failure of remote operation" in e.message
# an infallible operation. This can show up as a clear simulated error, or as a general
# error during delete_objects()
assert (
"simulated failure of remote operation" in e.message
or "failed to delete" in e.message
)
else:
raise
else: