rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-30 11:30:37 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	dd1440960a	Update pgxn/neon/relperst_cache.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	6164f5eaeb	Update pgxn/neon/relperst_cache.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	b36c02dda5	Update pgxn/neon/pagestore_smgr.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	9955d02a01	Update pgxn/neon/pagestore_smgr.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	657c63b9cb	Update pgxn/neon/pagestore_smgr.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	4885621e55	Update pgxn/neon/pagestore_smgr.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	4580391963	Update pgxn/neon/pagestore_smgr.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	28ce584d01	Rename relkind to relpersistence	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	ae7b92abeb	Undo check for INIT_FORKNUM	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	3c54a235dd	Add test_unlogged_build.py	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	de33affb1f	Fix merge conflicts	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	eabac14080	Fix merge conflicts	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	8e150568ec	Handle init fork in specialk way	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	1c0f4d6f97	Replace spinlock with LWLock	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	67c31b61e8	Fix warning	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	9d12eea25a	Fix merge problems	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	c1362cbf71	Fix empty list check	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	902ea0ccd9	Address review comments	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	fb6d7c4676	Fix merge conflict	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	5d93a8cc71	Update pgxn/neon/relkind_cache.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	c3fdab3886	Update pgxn/neon/pagestore_client.h Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	1e4783f3f9	Update pgxn/neon/pagestore_client.h Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	20dea3aafb	Move lwlock to pagestore_smgr	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	ca13e7ad7a	Do not return from TRY/CATCH in determine_entry_relkind	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	87c9b067c2	Remove obsolete comment	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	e9df43abda	Change return type of determine_entry_relkind to RelKind	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	840c73e3c4	Rename safe_mdexists to determine_entry_relkind and do unpin instead of unlock in it	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	a9e940e236	Add assertion to store_cached_relkind	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	02ecb1ebbf	Update pgxn/neon/pagestore_client.h Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	2c0a87af68	Update pgxn/neon/relkind_cache.c Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	a9d4cbe242	Unpin entry in case of mdexists error	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	d5d41241fa	Fix incorrect unpin condition in get_cached_relkind	2025-07-29 08:04:11 +03:00
Kosntantin Knizhnik	2e34fe03c7	Replace flags with enum	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	510c891ae5	Add comments	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	ac233dc9aa	Fix access to uninitialized flag	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	c083765840	Address review comments	2025-07-29 08:04:11 +03:00
Konstantin Knizhnik	883379f936	Add cache for relation kind	2025-07-29 08:04:11 +03:00
Heikki Linnakangas	40cae8cc36	Fix misc typos and some cosmetic code cleanup (#12695 )	2025-07-28 16:21:35 +00:00
Heikki Linnakangas	02fc8b7c70	Add compatibility macros for MyProcNumber and PGIOAlignedBlock (#12715 ) There were a few uses of these already, so collect them to the compatibility header to avoid the repetition and scattered #ifdefs. The definition of MyProcNumber is a little different from what was used before, but the end result is the same. (PGPROC->pgprocno values were just assigned sequentially to all PGPROC array members, see InitProcGlobal(). That's a bit silly, which is why it was removed in v17.)	2025-07-28 15:05:36 +00:00
Tristan Partin	b623fbae0c	Cancel PG query if stuck at refreshing configuration (#12717 ) ## Problem While configuring or reconfiguring PG due to PageServer movements, it's possible PG may get stuck if PageServer is moved around after fetching the spec from StorageController. ## Summary of changes To fix this issue, this PR introduces two changes: 1. Fail the PG query directly if the query cannot request configuration for certain number of times. 2. Introduce a new state `RefreshConfiguration` in compute tools to differentiate it from `RefreshConfigurationPending`. If compute tool is already in `RefreshConfiguration` state, then it will not accept new request configuration requests. ## How is this tested? Chaos testing. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-25 00:01:59 +00:00
Tristan Partin	11527b9df7	[BRC-2951] Enforce PG backpressure parameters at the shard level (#12694 ) ## Problem Currently PG backpressure parameters are enforced globally. With tenant splitting, this makes it hard to balance small tenants and large tenants. For large tenants with more shards, we need to increase the lagging because each shard receives total/shard_count amount of data, while doing so could be suboptimal to small tenants with fewer shards. ## Summary of changes This PR makes these parameters to be enforced at the shard level, i.e., PG will compute the actual lag limit by multiply the shard count. ## How is this tested? Added regression test. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-24 18:41:29 +00:00
Tristan Partin	89554af1bd	[BRC-1778] Have PG signal compute_ctl to refresh configuration if it suspects that it is talking to the wrong PSs (#12712 ) ## Problem This is a follow-up to TODO, as part of the effort to rewire the compute reconfiguration/notification mechanism to make it more robust. Please refer to that commit or ticket BRC-1778 for full context of the problem. ## Summary of changes The previous change added mechanism in `compute_ctl` that makes it possible to refresh the configuration of PG on-demand by having `compute_ctl` go out to download a new config from the control plane/HCC. This change wired this mechanism up with PG so that PG will signal `compute_ctl` to refresh its configuration when it suspects that it could be talking to incorrect pageservers due to a stale configuration. PG will become suspicious that it is talking to the wrong pageservers in the following situations: 1. It cannot connect to a pageserver (e.g., getting a network-level connection refused error) 2. It can connect to a pageserver, but the pageserver does not return any data for the GetPage request 3. It can connect to a pageserver, but the pageserver returns a malformed response 4. It can connect to a pageserver, but there is an error receiving the GetPage request response for any other reason This change also includes a minor tweak to `compute_ctl`'s config refresh behavior. Upon receiving a request to refresh PG configuration, `compute_ctl` will reach out to download a config, but it will not attempt to apply the configuration if the config is the same as the old config is it replacing. This optimization is added because the act of reconfiguring itself requires working pageserver connections. In many failure situations it is likely that PG detects an issue with a pageserver before the control plane can detect the issue, migrate tenants, and update the compute config. In this case even the latest compute config won't point PG to working pageservers, causing the configuration attempt to hang and negatively impact PG's time-to-recovery. With this change, `compute_ctl` only attempts reconfiguration if the refreshed config points PG to different pageservers. ## How is this tested? The new code paths are exercised in all existing tests because this mechanism is on by default. Explicitly tested in `test_runner/regress/test_change_pageserver.py`. Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-24 16:44:45 +00:00
Erik Grinaker	d793088225	pgxn: set `MACOSX_DEPLOYMENT_TARGET` (#12723 ) ## Problem Compiling `neon-pg-ext-v17` results in these linker warnings for `libcommunicator.a`: ``` $ make -j`nproc` -s neon-pg-ext-v17 Installing PostgreSQL v17 headers Compiling PostgreSQL v17 Compiling neon-specific Postgres extensions for v17 ld: warning: object file (/Users/erik.grinaker/Projects/neon/target/debug/libcommunicator.a[1159](25ac62e5b3c53843-curve25519.o)) was built for newer 'macOS' version (15.5) than being linked (15.0) ld: warning: object file (/Users/erik.grinaker/Projects/neon/target/debug/libcommunicator.a[1160](0bbbd18bda93c05b-aes_nohw.o)) was built for newer 'macOS' version (15.5) than being linked (15.0) ld: warning: object file (/Users/erik.grinaker/Projects/neon/target/debug/libcommunicator.a[1161](00c879ee3285a50d-montgomery.o)) was built for newer 'macOS' version (15.5) than being linked (15.0) [...] ``` ## Summary of changes Set `MACOSX_DEPLOYMENT_TARGET` to the current local SDK version (15.5 in this case), which links against object files for that version.	2025-07-24 14:48:35 +00:00
Tristan Partin	9b2e6f862a	Set an upper limit on PG backpressure throttling (#12675 ) ## Problem Tenant split test revealed another bug with PG backpressure throttling that under some cases PS may never report its progress back to SK (e.g., observed when aborting tenant shard where the old shard needs to re-establish SK connection and re-ingest WALs from a much older LSN). In this case, PG may get stuck forever. ## Summary of changes As a general precaution that PS feedback mechanism may not always be reliable, this PR uses the previously introduced WAL write rate limit mechanism to slow down write rates instead of completely pausing it. The idea is to introduce a new `databricks_effective_max_wal_bytes_per_second`, which is set to `databricks_max_wal_mb_per_second` when no PS back pressure and is set to `10KB` when there is back pressure. This way, PG can still write to SK, though at a very low speed. The PR also fixes the problem that the current WAL rate limiting mechanism is too coarse grained and cannot enforce limits < 1MB. This is because it always resets the rate limiter after 1 second, even if PG could have written more data in the past second. The fix is to introduce a `batch_end_time_us` which records the expected end time of the current batch. For example, if PG writes 10MB of data in a single batch, and max WAL write rate is set as `1MB/s`, then `batch_end_time_us` will be set as 10 seconds later. ## How is this tested? Tweaked the existing test, and also did manual testing on dev. I set `max_replication_flush_lag` as 1GB, and loaded 500GB pgbench tables. It's expected to see PG gets throttled periodically because PS will accumulate 4GB of data before flushing. Results: when PG is throttled: ``` 9500000 of 3300000000 tuples (0%) done (elapsed 10.36 s, remaining 3587.62 s) 9600000 of 3300000000 tuples (0%) done (elapsed 124.07 s, remaining 42523.59 s) 9700000 of 3300000000 tuples (0%) done (elapsed 255.79 s, remaining 86763.97 s) 9800000 of 3300000000 tuples (0%) done (elapsed 315.89 s, remaining 106056.52 s) 9900000 of 3300000000 tuples (0%) done (elapsed 412.75 s, remaining 137170.58 s) ``` when PS just flushed: ``` 18100000 of 3300000000 tuples (0%) done (elapsed 433.80 s, remaining 78655.96 s) 18200000 of 3300000000 tuples (0%) done (elapsed 433.85 s, remaining 78231.71 s) 18300000 of 3300000000 tuples (0%) done (elapsed 433.90 s, remaining 77810.62 s) 18400000 of 3300000000 tuples (0%) done (elapsed 433.96 s, remaining 77395.86 s) 18500000 of 3300000000 tuples (0%) done (elapsed 434.03 s, remaining 76987.27 s) 18600000 of 3300000000 tuples (0%) done (elapsed 434.08 s, remaining 76579.59 s) 18700000 of 3300000000 tuples (0%) done (elapsed 434.13 s, remaining 76177.12 s) 18800000 of 3300000000 tuples (0%) done (elapsed 434.19 s, remaining 75779.45 s) 18900000 of 3300000000 tuples (0%) done (elapsed 434.84 s, remaining 75489.40 s) 19000000 of 3300000000 tuples (0%) done (elapsed 434.89 s, remaining 75097.90 s) 19100000 of 3300000000 tuples (0%) done (elapsed 434.94 s, remaining 74712.56 s) 19200000 of 3300000000 tuples (0%) done (elapsed 498.93 s, remaining 85254.20 s) 19300000 of 3300000000 tuples (0%) done (elapsed 498.97 s, remaining 84817.95 s) 19400000 of 3300000000 tuples (0%) done (elapsed 623.80 s, remaining 105486.76 s) 19500000 of 3300000000 tuples (0%) done (elapsed 745.86 s, remaining 125476.51 s) ``` Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-23 22:37:27 +00:00
Tristan Partin	12e87d7a9f	Add neon.lakebase_mode boolean GUC (#12714 ) This GUC will become useful for temporarily disabling Lakebase-specific features during the code merge. Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-23 22:37:20 +00:00
Tristan Partin	fc242afcc2	PG ignore PageserverFeedback from unknown shards (#12671 ) ## Problem When testing tenant splits, I found that PG can get backpressure throttled indefinitely if the split is aborted afterwards. It turns out that each PageServer activates new shard separately even before the split is committed and they may start sending PageserverFeedback to PG directly. As a result, if the split is aborted, no one resets the pageserver feedback in PG, and thus PG will be backpressure throttled forever unless it's restarted manually. ## Summary of changes This PR fixes this problem by having `walprop_pg_process_safekeeper_feedback` simply ignore all pageserver feedback from unknown shards. The source of truth here is defined by the shard map, which is guaranteed to be reloaded only after the split is committed. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-22 21:41:56 +00:00
Heikki Linnakangas	51ffeef93f	Fix postgres version compatibility macros (#12658 ) The argument to BufTagInit was called 'spcOid', and it was also setting a field called 'spcOid'. The field name would erroneously also be expanded with the macro arg. It happened to work so far, because all the users of the macro pass a variable called 'spcOid' for the 'spcOid' argument, but as soon as you try to pass anything else, it fails. And same story for 'dbOid' and 'relNumber'. Rename the arguments to avoid the name collision. Also while we're at it, add parens around the arguments in a few macros, to make them safer if you pass something non-trivial as the argument.	2025-07-22 16:52:57 +00:00
Heikki Linnakangas	8bb45fd5da	Introduce built-in Prometheus exporter to the Postgres extension (#12591 ) Currently, the exporter exposes the same LFC metrics that are exposed by the "autoscaling" sql_exporter in the docker image. With this, we can remove the dedicated sql_exporter instance. (Actually doing the removal is left as a TODO until this is rolled out to production and we have changed autoscaling-agent to fetch the metrics from this new endpoint.) The exporter runs as a Postgres background worker process. This is extracted from the Rust communicator rewrite project, which will use the same worker process for much more, to handle the communications with the pageservers. For now, though, it merely handles the metrics requests. In the future, we will add more metrics, and perhaps even APIs to control the running Postgres instance. The exporter listens on a Unix Domain socket within the Postgres data directory. A Unix Domain socket is a bit unconventional, but it has some advantages: - Permissions are taken care of. Only processes that can access the data directory, and therefore already have full access to the running Postgres instance, can connect to it. - No need to allocate and manage a new port number for the listener It has some downsides too: it's not immediately accessible from the outside world, and the functions to work with Unix Domain sockets are more low-level than TCP sockets (see the symlink hack in `postgres_metrics_client.rs`, for example). To expose the metrics from the local Unix Domain Socket to the autoscaling agent, introduce a new '/autoscaling_metrics' endpoint in the compute_ctl's HTTP server. Currently it merely forwards the request to the Postgres instance, but we could add rate limiting and access control there in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-07-22 12:00:20 +00:00
Vlad Lazar	88bc06f148	communicator: debug log more fields of the get page response (#12644 ) It's helpful to correlate requests and responses in local investigations where the issue is reproducible. Hence, log the rel, fork and block of the get page response.	2025-07-22 11:25:11 +00:00
Tristan Partin	b7bc3ce61e	Skip PG throttle during configuration (#12670 ) ## Problem While running tenant split tests I ran into a situation where PG got stuck completely. This seems to be a general problem that was not found in the previous chaos testing fixes. What happened is that if PG gets throttled by PS, and SC decided to move some tenant away, then PG reconfiguration could be blocked forever because it cannot talk to the old PS anymore to refresh the throttling stats, and reconfiguration cannot proceed because it's being throttled. Neon has considered the case that configuration could be blocked if the PG storage is full, but forgot the backpressure case. ## Summary of changes The PR fixes this problem by simply skipping throttling while PS is being configured, i.e., `max_cluster_size < 0`. An alternative fix is to set those throttle knobs to -1 (e.g., max_replication_apply_lag), however these knobs were labeled with PGC_POSTMASTER so their values cannot be changed unless we restart PG. ## How is this tested? Tested manually. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-21 20:50:02 +00:00

1 2 3 4 5 ...

430 Commits