tests: sync_after_each_test -> sync_between_tests (#9239)

## Problem

We are seeing frequent pageserver startup timelines while it calls
syncfs(). There is an existing fixture that syncs _after_ tests, but not
before the first one. We hypothesize that some failures are happening on
the first test in a job.

## Summary of changes

- extend the existing sync_after_each_test to be a sync between all
tests, including sync'ing before running the first test. That should
remove any ambiguity about whether the sync is happening on the correct
node.

This is an alternative to https://github.com/neondatabase/neon/pull/8957
-- I didn't realize until I saw Alexander's comment on that PR that we
have an existing hook that syncs filesystems and can be extended.
This commit is contained in:
John Spray
2024-10-02 17:44:25 +01:00
committed by GitHub
parent 700885471f
commit d54624153d
2 changed files with 18 additions and 14 deletions

View File

@@ -341,7 +341,7 @@ jobs:
PERF_TEST_RESULT_CONNSTR: "${{ secrets.PERF_TEST_RESULT_CONNSTR }}"
TEST_RESULT_CONNSTR: "${{ secrets.REGRESS_TEST_RESULT_CONNSTR_NEW }}"
PAGESERVER_VIRTUAL_FILE_IO_ENGINE: tokio-epoll-uring
SYNC_AFTER_EACH_TEST: true
SYNC_BETWEEN_TESTS: true
# XXX: no coverage data handling here, since benchmarks are run on release builds,
# while coverage is currently collected for the debug ones

View File

@@ -340,23 +340,27 @@ def neon_with_baseline(request: FixtureRequest) -> PgCompare:
@pytest.fixture(scope="function", autouse=True)
def sync_after_each_test():
# The fixture calls `sync(2)` after each test if `SYNC_AFTER_EACH_TEST` env var is `true`
def sync_between_tests():
# The fixture calls `sync(2)` after each test if `SYNC_BETWEEN_TESTS` env var is `true`
#
# In CI, `SYNC_AFTER_EACH_TEST` is set to `true` only for benchmarks (`test_runner/performance`)
# In CI, `SYNC_BETWEEN_TESTS` is set to `true` only for benchmarks (`test_runner/performance`)
# that are run on self-hosted runners because some of these tests are pretty write-heavy
# and create issues to start the processes within 10s
key = "SYNC_AFTER_EACH_TEST"
key = "SYNC_BETWEEN_TESTS"
enabled = os.environ.get(key) == "true"
if enabled:
start = time.time()
# we only run benches on unices, the method might not exist on windows
os.sync()
elapsed = time.time() - start
log.info(f"called sync before test {elapsed=}")
yield
if not enabled:
# regress test, or running locally
return
start = time.time()
# we only run benches on unices, the method might not exist on windows
os.sync()
elapsed = time.time() - start
log.info(f"called sync after test {elapsed=}")
if enabled:
start = time.time()
# we only run benches on unices, the method might not exist on windows
os.sync()
elapsed = time.time() - start
log.info(f"called sync after test {elapsed=}")