libs: move metric collection for pageserver and safekeeper in a background task (#12525)

## Problem

Safekeeper and pageserver metrics collection might time out. We've seen
this in both hadron and neon.

## Summary of changes

This PR moves metrics collection in PS/SK to the background so that we
will always get some metrics, despite there may be some delays. Will
leave it to the future work to reduce metrics collection time.

---------

Co-authored-by: Chen Luo <chen.luo@databricks.com>
This commit is contained in:
Vlad Lazar
2025-07-10 12:58:22 +01:00
committed by GitHub
parent bdca5b500b
commit ffeede085e
15 changed files with 217 additions and 10 deletions

View File

@@ -1002,7 +1002,7 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def get_metrics_str(self) -> str:
"""You probably want to use get_metrics() instead."""
res = self.get(f"http://localhost:{self.port}/metrics")
res = self.get(f"http://localhost:{self.port}/metrics?use_latest=true")
self.verbose_error(res)
return res.text

View File

@@ -143,7 +143,7 @@ class SafekeeperHttpClient(requests.Session, MetricsGetter):
def get_metrics_str(self) -> str:
"""You probably want to use get_metrics() instead."""
request_result = self.get(f"http://localhost:{self.port}/metrics")
request_result = self.get(f"http://localhost:{self.port}/metrics?use_latest=true")
request_result.raise_for_status()
return request_result.text