## Problem
We don't have good observability for memory usage. This would be useful
e.g. to debug OOM incidents or optimize performance or resource usage.
We would also like to use continuous profiling with e.g. [Grafana Cloud
Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/)
(see https://github.com/neondatabase/cloud/issues/14888).
This PR is intended as a proof of concept, to try it out in staging and
drive further discussions about profiling more broadly.
Touches https://github.com/neondatabase/neon/issues/9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
Depends on #9779.
Depends on #9780.
## Summary of changes
Adds a HTTP route `/profile/heap` that takes a heap profile and returns
it. Query parameters:
* `format`: output format (`jemalloc` or `pprof`; default `pprof`).
Unlike CPU profiles (see #9764), heap profiles are not symbolized and
require the original binary to translate addresses to function names. To
make this work with Grafana, we'll probably have to symbolize the
process server-side -- this is left as future work, as is other output
formats like SVG.
Heap profiles don't work on macOS due to limitations in jemalloc.
## Problem
To add Safekeeper heap profiling in #9778, we need to switch to an
allocator that supports it. Pageserver and proxy already use jemalloc.
Touches #9534.
## Summary of changes
Use jemalloc in Safekeeper.
## Problem
We don't have a convenient way to gather CPU profiles from a running
binary, e.g. during production incidents or end-to-end benchmarks, nor
during microbenchmarks (particularly on macOS).
We would also like to have continuous profiling in production, likely
using [Grafana Cloud
Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/).
We may choose to use either eBPF profiles or pprof profiles for this
(pending testing and discussion with SREs), but pprof profiles appear
useful regardless for the reasons listed above. See
https://github.com/neondatabase/cloud/issues/14888.
This PR is intended as a proof of concept, to try it out in staging and
drive further discussions about profiling more broadly.
Touches #9534.
Touches https://github.com/neondatabase/cloud/issues/14888.
## Summary of changes
Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns
it. Defaults to a 5-second pprof Protobuf profile for use with e.g.
`pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query
parameters:
* `format`: output format (`pprof` or `svg`)
* `frequency`: sampling frequency in microseconds (default 100)
* `seconds`: number of seconds to profile (default 5)
Also integrates pprof profiles into Criterion benchmarks, such that
flamegraph reports can be taken with `cargo bench ... --profile-duration
<seconds>`. Output under `target/criterion/*/profile/flamegraph.svg`.
Example profiles:
* pprof profile (use [`pprof`](https://github.com/google/pprof)):
[profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz)
* Web interface: `pprof -http :6060 profile.pb.gz`
* Interactive flamegraph:
[profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)
## Problem
We don't have any benchmarks for Safekeeper WAL ingestion.
## Summary of changes
Add some basic benchmarks for WAL ingestion, specifically for
`SafeKeeper::process_msg()` (single append) and `WalAcceptor` (pipelined
batch ingestion). Also add some baseline file write benchmarks.