neon/pageserver at ed31dd2a3c9cdb6dce6ea26e50a42be477e2a3a2 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 05:22:56 +00:00

Files

Christian Schwarz ed31dd2a3c pageserver: better observability for slow wait_lsn (#11176 )

# Problem

We leave too few observability breadcrumbs in the case where wait_lsn is
exceptionally slow.

# Changes

- refactor: extract the monitoring logic out of `log_slow` into
`monitor_slow_future`
- add global + per-timeline counter for time spent waiting for wait_lsn
- It is updated while we're still waiting, similar to what we do for
page_service response flush.
- add per-timeline counterpair for started & finished wait_lsn count
- add slow-logging to leave breadcrumbs in logs, not just metrics

For the slow-logging, we need to consider not flooding the logs during a
broker or network outage/blip.
The solution is a "log-streak-level" concurrency limit per timeline.
At any given time, there is at most one slow wait_lsn that is logging
the "still running" and "completed" sequence of logs.
Other concurrent slow wait_lsn's don't log at all.
This leaves at least one breadcrumb in each timeline's logs if some
wait_lsn was exceptionally slow during a given period.
The full degree of slowness can then be determined by looking at the
per-timeline metric.

# Performance

Reran the `bench_log_slow` benchmark, no difference, so, existing call
sites are fine.

We do use a Semaphore, but only try_acquire it _after_ things have
already been determined to be slow. So, no baseline overhead
anticipated.

# Refs

-
https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222

2025-03-13 15:03:53 +00:00

benches

add benchmark demonstrating metrics/prometheus crate multicore scalability pitfalls & workarounds (#11019 )

2025-03-11 07:22:56 +00:00

client

pageserver: https for management API (#11025 )

2025-03-10 15:07:59 +00:00

compaction

utils: explicit OTEL export config and OTEL enablement via common entry point (#11139 )

2025-03-12 11:07:49 +00:00

ctl

Migrate the last crates to edition 2024 (#10998 )