mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-23 06:09:59 +00:00
# TLDR All changes are no-op except some metrics. ## Summary of changes I ### Pageserver Added a new global counter metric `pageserver_pagestream_handler_results_total` that categorizes pagestream request results according to their outcomes: 1. Success 2. Internal errors 3. Other errors Internal errors include: 1. Page reconstruction error: This probably indicates a pageserver bug/corruption 2. LSN timeout error: Could indicate overload or bugs with PS's ability to reach other components 3. Misrouted request error: Indicates bugs in the Storage Controller/HCC Other errors include transient errors that are expected during normal operation or errors indicating bugs with other parts of the system (e.g., malformed requests, errors due to cancelled operations during PS shutdown, etc.) ## Summary of changes II This PR adds a pageserver endpoint and its counterpart in storage controller to list visible size of all tenant shards. This will be a prerequisite of the tenant rebalance command. ## Problem III We need a way to download WAL segments/layerfiles from S3 and replay WAL records. We cannot access production S3 from our laptops directly, and we also can't transfer any user data out of production systems for GDPR compliance, so we need solutions. ## Summary of changes III This PR adds a couple of tools to support the debugging workflow in production: 1. A new `pagectl download-remote-object` command that can be used to download remote storage objects assuming the correct access is set up. ## Summary of changes IV This PR adds a command to list all visible delta and image layers from index_part. This is useful to debug compaction issues as index_part often contain a lot of covered layers due to PITR. --------- Co-authored-by: William Huang <william.huang@databricks.com> Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Vlad Lazar <vlad@neon.tech>