neon/metrics at cbb599f353a7489e18201dbbcf8e7d596f9bfb66 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-09 14:32:57 +00:00

Files

Christian Schwarz e49602ecf5 feat(metrics): per-timeline metric for on-demand downloads, remove calls_started histogram (#6834 )

refs #6737 

# Problem

Before this PR, on-demand downloads weren't  measured per tenant_id.
This makes root-cause analysis of latency spikes harder, requiring us to
resort to log scraping for

```
{neon_service="pageserver"} |= `downloading on-demand` |= `$tenant_id`
```

which can be expensive when zooming out in Grafana.

Context: https://neondb.slack.com/archives/C033RQ5SPDH/p1707809037868189

# Solution / Changes

- Remove the calls_started histogram
- I did the dilegence, there are only 2 dashboards using this histogram,
    and in fact only one uses it as a histogram, the other just as a
    a counter.
- [Link
1](8115b54d9f/neonprod/dashboards/hkXNF7oVz/dashboard-Z31XmM24k.yaml (L1454)):
`Pageserver Thrashing` dashboard, linked from playbook, will fix.
- [Link
2](8115b54d9f/neonprod/dashboards/CEllzAO4z/dashboard-sJqfNFL4k.yaml (L599)):
one of my personal dashboards, unused for a long time, already broken in
other ways, no need to fix.
- replace `pageserver_remote_timeline_client_calls_unfinished` gauge
with a counter pair
- Required `Clone`-able `IntCounterPair`, made the necessary changes in
the `libs/metrics` crate
-  fix tests to deal with the fallout

A subsequent PR will remove a timeline-scoped metric to compensate.

Note that we don't need additional global counters for the per-timeline
counters affected by this PR; we can use the `remote_storage` histogram
for those, which, conveniently, also include the secondary-mode
downloads, which aren't covered by the remote timeline client metrics
(should they?).

2024-02-20 17:52:23 +01:00

src

feat(metrics): per-timeline metric for on-demand downloads, remove calls_started histogram (#6834 )

2024-02-20 17:52:23 +01:00

Cargo.toml

feat: expose locked memory in pageserver /metrics (#6669 )

2024-02-07 19:39:52 +00:00