mirror of
https://github.com/neondatabase/neon.git
synced 2026-05-20 14:40:37 +00:00
## Problem While running tenant split tests I ran into a situation where PG got stuck completely. This seems to be a general problem that was not found in the previous chaos testing fixes. What happened is that if PG gets throttled by PS, and SC decided to move some tenant away, then PG reconfiguration could be blocked forever because it cannot talk to the old PS anymore to refresh the throttling stats, and reconfiguration cannot proceed because it's being throttled. Neon has considered the case that configuration could be blocked if the PG storage is full, but forgot the backpressure case. ## Summary of changes The PR fixes this problem by simply skipping throttling while PS is being configured, i.e., `max_cluster_size < 0`. An alternative fix is to set those throttle knobs to -1 (e.g., max_replication_apply_lag), however these knobs were labeled with PGC_POSTMASTER so their values cannot be changed unless we restart PG. ## How is this tested? Tested manually. Co-authored-by: Chen Luo <chen.luo@databricks.com>
neon extension consists of several parts:
shared preload library neon.so
-
implements storage manager API and network communications with remote page server.
-
walproposer: implements broadcast protocol between postgres and WAL safekeepers.
-
control plane connector: Captures updates to roles/databases using ProcessUtility_hook and sends them to the control ProcessUtility_hook.
-
remote extension server: Request compute_ctl to download extension files.
-
file_cache: Local file cache is used to temporary store relations pages in local file system for better performance.
-
relsize_cache: Relation size cache for better neon performance.
SQL functions in neon--*.sql
Utility functions to expose neon specific information to user and metrics collection. This extension is created in all databases in the cluster by default.