rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 05:22:56 +00:00

Author	SHA1	Message	Date
Erik Grinaker	6c8a144e25	Pass stripe size during shard map updates	2025-07-23 16:35:19 +02:00
Erik Grinaker	464ed0cbc7	rustfmt	2025-07-23 09:41:01 +02:00
Erik Grinaker	f55ccd2c17	Fix lints	2025-07-23 08:17:06 +02:00
Erik Grinaker	c9758dc46b	Fix communicator build	2025-07-23 08:06:20 +02:00
Heikki Linnakangas	a7a6df3d6f	fix datatype used in test mock function	2025-07-23 01:44:45 +03:00
Heikki Linnakangas	bfb4b0991d	Refactor the way lfc_get_stats() is implemented This reduces the boilerplate a little, and makes it more straightforward to dispatch the call to the old or the new communicator	2025-07-23 01:40:42 +03:00
Heikki Linnakangas	c18f4a52f8	refactor metrics to use 'measured' crate	2025-07-23 00:56:21 +03:00
Heikki Linnakangas	48535798ba	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-23 00:00:10 +03:00
Heikki Linnakangas	51ffeef93f	Fix postgres version compatibility macros (#12658 ) The argument to BufTagInit was called 'spcOid', and it was also setting a field called 'spcOid'. The field name would erroneously also be expanded with the macro arg. It happened to work so far, because all the users of the macro pass a variable called 'spcOid' for the 'spcOid' argument, but as soon as you try to pass anything else, it fails. And same story for 'dbOid' and 'relNumber'. Rename the arguments to avoid the name collision. Also while we're at it, add parens around the arguments in a few macros, to make them safer if you pass something non-trivial as the argument.	2025-07-22 16:52:57 +00:00
Heikki Linnakangas	8bb45fd5da	Introduce built-in Prometheus exporter to the Postgres extension (#12591 ) Currently, the exporter exposes the same LFC metrics that are exposed by the "autoscaling" sql_exporter in the docker image. With this, we can remove the dedicated sql_exporter instance. (Actually doing the removal is left as a TODO until this is rolled out to production and we have changed autoscaling-agent to fetch the metrics from this new endpoint.) The exporter runs as a Postgres background worker process. This is extracted from the Rust communicator rewrite project, which will use the same worker process for much more, to handle the communications with the pageservers. For now, though, it merely handles the metrics requests. In the future, we will add more metrics, and perhaps even APIs to control the running Postgres instance. The exporter listens on a Unix Domain socket within the Postgres data directory. A Unix Domain socket is a bit unconventional, but it has some advantages: - Permissions are taken care of. Only processes that can access the data directory, and therefore already have full access to the running Postgres instance, can connect to it. - No need to allocate and manage a new port number for the listener It has some downsides too: it's not immediately accessible from the outside world, and the functions to work with Unix Domain sockets are more low-level than TCP sockets (see the symlink hack in `postgres_metrics_client.rs`, for example). To expose the metrics from the local Unix Domain Socket to the autoscaling agent, introduce a new '/autoscaling_metrics' endpoint in the compute_ctl's HTTP server. Currently it merely forwards the request to the Postgres instance, but we could add rate limiting and access control there in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-07-22 12:00:20 +00:00
Vlad Lazar	88bc06f148	communicator: debug log more fields of the get page response (#12644 ) It's helpful to correlate requests and responses in local investigations where the issue is reproducible. Hence, log the rel, fork and block of the get page response.	2025-07-22 11:25:11 +00:00
Tristan Partin	b7bc3ce61e	Skip PG throttle during configuration (#12670 ) ## Problem While running tenant split tests I ran into a situation where PG got stuck completely. This seems to be a general problem that was not found in the previous chaos testing fixes. What happened is that if PG gets throttled by PS, and SC decided to move some tenant away, then PG reconfiguration could be blocked forever because it cannot talk to the old PS anymore to refresh the throttling stats, and reconfiguration cannot proceed because it's being throttled. Neon has considered the case that configuration could be blocked if the PG storage is full, but forgot the backpressure case. ## Summary of changes The PR fixes this problem by simply skipping throttling while PS is being configured, i.e., `max_cluster_size < 0`. An alternative fix is to set those throttle knobs to -1 (e.g., max_replication_apply_lag), however these knobs were labeled with PGC_POSTMASTER so their values cannot be changed unless we restart PG. ## How is this tested? Tested manually. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-21 20:50:02 +00:00
Ruslan Talpa	0dbe551802	proxy: subzero integration in auth-broker (embedded data-api) (#12474 ) ## Problem We want to have the data-api served by the proxy directly instead of relying on a 3rd party to run a deployment for each project/endpoint. ## Summary of changes With the changes below, the proxy (auth-broker) becomes also a "rest-broker", that can be thought of as a "Multi-tenant" data-api which provides an automated REST api for all the databases in the region. The core of the implementation (that leverages the subzero library) is in proxy/src/serverless/rest.rs and this is the only place that has "new logic". --------- Co-authored-by: Ruslan Talpa <ruslan.talpa@databricks.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-07-21 18:16:28 +00:00
Heikki Linnakangas	dc35bda074	WIP: Implement LFC prewarming This doesn't pass the tests yet, immediate issue is that we'r emissing some stats that the tests depend on. And there's a lot more cleanup, commenting etc. to do. But this is roughly how it should look like.	2025-07-20 01:23:34 +03:00
Heikki Linnakangas	e2c3c2eccb	Merge remote-tracking branch 'origin/main' into HEAD	2025-07-20 00:58:57 +03:00
Victor Polevoy	cb50291dcd	Fetches the SLRU segment via the new communicator. The fetch is done not into a buffer as earlier, but directly into the file.	2025-07-18 10:02:31 +02:00
Konstantin Knizhnik	7fef4435c1	Store stripe_size in shared memory (#12560 ) ## Problem See https://databricks.slack.com/archives/C09254R641L/p1752004515032899 stripe_size GUC update may be delayed at different backends and so cause inconsistency with connection strings (shard map). ## Summary of changes Postmaster should store stripe_size in shared memory as well as connection strings. It should be also enforced that stripe size is defined prior to connection strings in postgresql.conf --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com>	2025-07-17 20:32:34 +00:00
Konstantin Knizhnik	43fd5b218b	Refactor shmem initialization in Neon extension (#12630 ) ## Problem Initializing of shared memory in extension is complex and non-portable. In neon extension this boilerplate code is duplicated in several files. ## Summary of changes Perform all initialization in one place - neon.c All other module procvide ShmemRequest() and ShmemInit() fuinction which are called from neon.c --------- Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-17 20:20:38 +00:00
Heikki Linnakangas	10a7d49726	Use XLogRecPtr for LSNs in C generated code. This hopefully silences the static assertion Erik is seeing: ``` pgxn/neon/communicator_new.c:1352:9: error: static assertion failed due to requirement '__builtin_types_compatible_p(unsigned long long, unsigned long)': (r->lsn) does not have type XLogRecPtr 1352 \| LSN_FORMAT_ARGS(r->lsn)); \| ^~~~~~~~~~~~~~~~~~~~~~~ ```	2025-07-17 13:37:45 +03:00
Erik Grinaker	edcdd6ca9c	Merge branch 'main' into communicator-rewrite	2025-07-17 10:59:37 +02:00
Tristan Partin	9e154a8130	PG: smooth max wal rate (#12514 ) ## Problem We were only resetting the limit in the wal proposer. If backends are back pressured, it might take a while for the wal proposer to receive a new WAL to reset the limit. ## Summary of changes Backend also checks the time and resets the limit. ## How is this tested? pgbench has more smooth tps Signed-off-by: Tristan Partin <tristan.partin@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-16 16:11:25 +00:00
Alexey Kondratov	dd7fff655a	feat(compute): Introduce privileged_role_name parameter (#12539 ) ## Problem Currently `neon_superuser` is hardcoded in many places. It makes it harder to reuse the same code in different envs. ## Summary of changes Parametrize `neon_superuser` in `compute_ctl` via `--privileged-role-name` and in `neon` extensions via `neon.privileged_role_name`, so it's now possible to use different 'superuser' role names if needed. Everything still defaults to `neon_superuser`, so no control plane code changes are needed and I intentionally do not touch regression and migrations tests. Postgres PRs: - https://github.com/neondatabase/postgres/pull/674 - https://github.com/neondatabase/postgres/pull/675 - https://github.com/neondatabase/postgres/pull/676 - https://github.com/neondatabase/postgres/pull/677 Cloud PR: - https://github.com/neondatabase/cloud/pull/31138	2025-07-15 20:22:57 +00:00
Heikki Linnakangas	62af2a14e2	Improve comments a little	2025-07-15 16:06:49 +03:00
Matthias van de Meent	3e6fdb0aa6	Add and use [U]INT64_[HEX_]FORMAT for various [u]int64 needs (#12592 ) We didn't consistently apply these, and it wasn't consistently solved. With this patch we should have a more consistent approach to this, and have less issues porting changes to newer versions. This also removes some potentially buggy casts to `long` from `uint64` - they could've truncated the value in systems where `long` only has 32 bits.	2025-07-14 16:47:07 +00:00
Heikki Linnakangas	69dbad700c	Merge remote-tracking branch 'origin/main' into HEAD	2025-07-12 16:43:57 +03:00
Matthias van de Meent	4566b12a22	NEON: Finish Zenith->Neon rename (#12566 ) Even though we're now part of Databricks, let's at least make this part consistent. ## Summary of changes - PG14: https://github.com/neondatabase/postgres/pull/669 - PG15: https://github.com/neondatabase/postgres/pull/670 - PG16: https://github.com/neondatabase/postgres/pull/671 - PG17: https://github.com/neondatabase/postgres/pull/672 --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-07-11 18:56:39 +00:00
Heikki Linnakangas	3300207523	Update working set size estimate without lock (#12570 ) Update the WSS estimate before acquring the lock, so that we don't need to hold the lock for so long. That seems safe to me, see added comment. I was planning to do this with the new rust-based communicator implementation anyway, but it might help a little with the current C implementation too. And more importantly, having this as a separate PR gives us a chance to review this aspect independently.	2025-07-11 16:05:22 +00:00
Heikki Linnakangas	a8db7ebffb	Minor refactor of the SQL functions to get working set size estimate (#12550 ) Split the functions into two: one internal function to calculate the estimate, and another (two functions) to expose it as SQL functions. This is in preparation of adding new communicator implementation. With that, the SQL functions will dispatch the call to the old or new implementation depending on which is being used.	2025-07-11 14:17:44 +00:00
Erik Grinaker	1637fbce25	Merge fix	2025-07-11 10:50:19 +02:00
Erik Grinaker	8cd5370c00	Merge branch 'main' into communicator-rewrite	2025-07-11 10:39:26 +02:00
Alex Chi Z.	b91f821e8b	fix(libpagestore): update the default stripe size (#12557 ) ## Problem Part of LKB-379 The pageserver connstrings are updated in the postmaster and then there's a hook to propagate it to the shared memory of all backends. However, the shard stripe doesn't. This would cause problems during shard splits: * the compute has active reads/writes * shard split happens and the cplane applies the new config (pageserver connstring + stripe size) * pageserver connstring will be updated immediately once the postmaster receives the SIGHUP, and it will be copied over the the shared memory of all other backends. * stripe size is a normal GUC and we don't have special handling around that, so if any active backend has ongoing txns the value won't be applied. * now it's possible for backends to issue requests based on the wrong stripe size; what's worse, if a request gets cached in the prefetch buffer, it will get stuck forever. ## Summary of changes To make sure it aligns with the current default in storcon. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-10 21:49:52 +00:00
Tristan Partin	1b7339b53e	PG: add max_wal_rate (#12470 ) ## Problem One PG tenant may write too fast and overwhelm the PS. The other tenants sharing the same PSs will get very little bandwidth. We had one experiment that two tenants sharing the same PSs. One tenant runs a large ingestion that delivers hundreds of MB/s while the other only get < 10 MB/s. ## Summary of changes Rate limit how fast PG can generate WALs. The default is -1. We may scale the default value with the CPU count. Need to run some experiments to verify. ## How is this tested? CI. PGBench. No limit first. Then set to 1 MB/s and you can see the tps drop. Then reverted the change and tps increased again. pgbench -i -s 10 -p 55432 -h 127.0.0.1 -U cloud_admin -d postgres pgbench postgres -c 10 -j 10 -T 6000000 -P 1 -b tpcb-like -h 127.0.0.1 -U cloud_admin -p 55432 progress: 33.0 s, 986.0 tps, lat 10.142 ms stddev 3.856 progress: 34.0 s, 973.0 tps, lat 10.299 ms stddev 3.857 progress: 35.0 s, 1004.0 tps, lat 9.939 ms stddev 3.604 progress: 36.0 s, 984.0 tps, lat 10.183 ms stddev 3.713 progress: 37.0 s, 998.0 tps, lat 10.004 ms stddev 3.668 progress: 38.0 s, 648.9 tps, lat 12.947 ms stddev 24.970 progress: 39.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 40.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 41.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 42.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 43.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 44.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 45.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 46.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 47.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 48.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 49.0 s, 347.3 tps, lat 321.560 ms stddev 1805.633 progress: 50.0 s, 346.8 tps, lat 9.898 ms stddev 3.809 progress: 51.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 52.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 53.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 54.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 55.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 56.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 57.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 58.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 59.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 60.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 61.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 62.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 63.0 s, 494.5 tps, lat 276.504 ms stddev 1853.689 progress: 64.0 s, 488.0 tps, lat 20.530 ms stddev 71.981 progress: 65.0 s, 407.8 tps, lat 9.502 ms stddev 3.329 progress: 66.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 67.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 68.0 s, 504.5 tps, lat 71.627 ms stddev 397.733 progress: 69.0 s, 371.0 tps, lat 24.898 ms stddev 29.007 progress: 70.0 s, 541.0 tps, lat 19.684 ms stddev 24.094 progress: 71.0 s, 342.0 tps, lat 29.542 ms stddev 54.935 Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-10 20:34:11 +00:00
Heikki Linnakangas	bceafc6c32	Update LFC cache hit/miss counters Fixes EXPLAIN (FILECACHE) option	2025-07-10 16:36:53 +03:00
Heikki Linnakangas	dcf8e0565f	Improve communicator README	2025-07-10 15:19:20 +03:00
Heikki Linnakangas	c14cf15b52	Tidy up the memory ordering instructions on request slot code I believe the explicit memory fence instructions are unnecessary. Performing a store with Release ordering makes all the previous non-atomic writes visible too. Per rust docs for Ordering::Release ( https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html#variant.Release): > When coupled with a store, all previous operations become ordered > before any load of this value with Acquire (or stronger) > ordering. In particular, all previous writes become visible to all > threads that perform an Acquire (or stronger) load of this value. > > ... > > Corresponds to memory_order_release in C++20. The "all previous writes" means non-atomic writes too. It's not very clear from that text, but the C++20 docs that it links to is more explicit about it: > All memory writes (including non-atomic and relaxed atomic) that > happened-before the atomic store from the point of view of thread A, > become visible side-effects in thread B. That is, once the atomic > load is completed, thread B is guaranteed to see everything thread A > wrote to memory. In addition to removing the fence instructions, fix the comments on each atomic Acquire operation to point to the correct Release counterpart. We had such comments but they had gone out-of-date as code has moved.	2025-07-10 15:19:20 +03:00
Heikki Linnakangas	5da06d4129	Make start_neon_io_request() wakeup the communicator process All the callers did that previously. So rather than document that the caller needs to do it, just do it in start_neon_io_request() straight away. (We might want to revisit this if we get codepaths where the C code submits multiple IO requests as a batch. In that case, it would be more efficient to fill all the request slots first and only send one notification to the pipe for all of them)	2025-07-10 15:19:20 +03:00
Heikki Linnakangas	f30c59bec9	Improve comments on request slots	2025-07-10 15:19:20 +03:00
Heikki Linnakangas	47c099a0fb	Rename NeonIOHandle to NeonIORequestSlot All the code talks about "request slots", better to make the struct name reflect that. The "Handle" term was borrowed from Postgres v18 AIO implementation, from the similar handles or slots used to submit IO requests from backends to worker processes. But even though the idea is similar, it's a completely separate implementation and there's nothing else shared between them than the very high level design.	2025-07-10 14:52:16 +03:00
Heikki Linnakangas	b67e8f2edc	Move some code, just for more natural logical ordering	2025-07-10 14:49:29 +03:00
Heikki Linnakangas	b5b1db29bb	Implement shard map live-update	2025-07-10 12:25:15 +03:00
Dimitri Fontaine	1a45b2ec90	Review security model for executing Event Trigger code. (#12463 ) When a function is owned by a superuser (bootstrap user or otherwise), we consider it safe to run it. Only a superuser could have installed it, typically from CREATE EXTENSION script: we trust the code to execute. ## Problem This is intended to solve running pg_graphql Event Triggers graphql_watch_ddl and graphql_watch_drop which are executing the secdef function graphql.increment_schema_version(). ## Summary of changes Allow executing Event Trigger function owned by a superuser and with SECURITY DEFINER properties. The Event Trigger code runs with superuser privileges, and we consider that it's fine. --------- Co-authored-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-10 08:06:33 +00:00
Heikki Linnakangas	ed4652b65b	Update the relsize cache rather than forget it at end of index build This greatly reduces the cases where we make a request to the pageserver with a very recent LSN. Those cases are slow because the pageserver needs to wait for the WAL to arrive. This speeds up the Postgres pg_regress and isolation tests greatly.	2025-07-09 17:21:06 +03:00
Heikki Linnakangas	60d87966b8	minor comment improvement	2025-07-09 16:39:40 +03:00
Heikki Linnakangas	8db138ef64	Plumb through the stripe size to the communicator	2025-07-09 16:18:26 +03:00
Heikki Linnakangas	1ee24602d5	Implement working set size estimation	2025-07-09 16:18:26 +03:00
Heikki Linnakangas	732bd26e70	cargo fmt	2025-07-09 16:18:26 +03:00
Konstantin Knizhnik	4ee0da0a20	Check prefetch response before assignment to slot (#12371 ) ## Problem See [Slack Channel](https://databricks.enterprise.slack.com/archives/C091LHU6NNB) Dropping connection without resetting prefetch state can cause request/response mismatch. And lack of check response correctness in communicator_prefetch_lookupv can cause data corruption. ## Summary of changes 1. Validate response before assignment to prefetch slot. 2. Consume prefetch requests before sending any other requests. --------- Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-07-09 12:49:21 +00:00
Heikki Linnakangas	d63f1d259a	avoid assertion failure about calling palloc() in critical section	2025-07-08 21:33:25 +03:00
Heikki Linnakangas	4053092408	Fix LSN tracking on "unlogged index builds" Fixes the test_gin_redo.py test failure, and probably some others	2025-07-08 17:22:24 +03:00
Heikki Linnakangas	ccf88e9375	Improve debug logging by printing IO request details	2025-07-08 17:16:09 +03:00

1 2 3 4 5 ...

514 Commits