rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-05 21:20:37 +00:00

Author	SHA1	Message	Date
John Spray	0159ae9536	safekeeper: eviction metrics (#8348 ) ## Problem Follow up to https://github.com/neondatabase/neon/pull/8335, to improve observability of how many evict/restores we are doing. ## Summary of changes - Add `safekeeper_eviction_events_started_total` and `safekeeper_eviction_events_completed_total`, with a "kind" label of evict or restore. This gives us rates, and also ability to calculate how many are in progress. - Generalize SafekeeperMetrics test type to use the same helpers as pageserver, and enable querying any metric. - Read the new metrics at the end of the eviction test.	2024-07-11 17:05:35 +01:00
Arseny Sher	af40bf3c2e	Fix term/epoch confusion in python tests. Call epoch last_log_term and add separate term field.	2024-05-31 12:58:59 +03:00
Arseny Sher	3797566c36	safekeeper: test pull_timeline with WAL gc. Do pull_timeline while WAL is being removed. To this end - extract pausable_failpoint to utils, sprinkle pull_timeline with it - add 'checkpoint' sk http endpoint to force WAL removal. After fixing checking for pull file status code test fails so far which is expected.	2024-05-25 06:06:32 +03:00
Joonas Koivunen	d9dcbffac3	python: allow using allowed_errors.py (#7719 ) See #7718. Fix it by renaming all `types.py` to `common_types.py`. Additionally, add an advert for using `allowed_errors.py` to test any added regex.	2024-05-13 15:16:23 +03:00
Heikki Linnakangas	74d09b78c7	Keep walproposer alive until shutdown checkpoint is safe on safekepeers The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited. As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example. Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core. After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always. In commit `568f91420a`, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again. Some tests wrongly assumed that no WAL could have been written between pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing flush_ep_to_pageserver which first stops the endpoint and then waits till all committed WAL reaches the pageserver. In passing extract safekeeper http client to its own module.	2024-03-11 23:29:32 +04:00

5 Commits