pageserver,safekeeper: disable heap profiling (#10268)

## Problem

Since enabling continuous profiling in staging, we've seen frequent seg
faults. This is suspected to be because jemalloc and pprof-rs take a
stack trace at the same time, and the handlers aren't signal safe.
jemalloc does this probabilistically on every allocation, regardless of
whether someone is taking a heap profile, which means that any CPU
profile has a chance to cause a seg fault.

Touches #10225.

## Summary of changes

For now, just disable heap profiles -- CPU profiles are more important,
and we need to be able to take them without risking a crash.
This commit is contained in:
Erik Grinaker
2025-01-03 16:21:31 +01:00
committed by GitHub
parent e9d30edc7f
commit b33299dc37
2 changed files with 12 additions and 8 deletions

View File

@@ -53,10 +53,12 @@ project_build_tag!(BUILD_TAG);
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
/// Configure jemalloc to sample allocations for profiles every 1 MB (1 << 20).
#[allow(non_upper_case_globals)]
#[export_name = "malloc_conf"]
pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:20\0";
// Configure jemalloc to sample allocations for profiles every 1 MB (1 << 20).
// TODO: disabled because concurrent CPU profiles cause seg faults. See:
// https://github.com/neondatabase/neon/issues/10225.
//#[allow(non_upper_case_globals)]
//#[export_name = "malloc_conf"]
//pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:20\0";
const PID_FILE_NAME: &str = "pageserver.pid";

View File

@@ -51,10 +51,12 @@ use utils::{
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
/// Configure jemalloc to sample allocations for profiles every 1 MB (1 << 20).
#[allow(non_upper_case_globals)]
#[export_name = "malloc_conf"]
pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:20\0";
// Configure jemalloc to sample allocations for profiles every 1 MB (1 << 20).
// TODO: disabled because concurrent CPU profiles cause seg faults. See:
// https://github.com/neondatabase/neon/issues/10225.
//#[allow(non_upper_case_globals)]
//#[export_name = "malloc_conf"]
//pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:20\0";
const PID_FILE_NAME: &str = "safekeeper.pid";
const ID_FILE_NAME: &str = "safekeeper.id";