rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-26 23:59:58 +00:00

Author	SHA1	Message	Date
Erik Grinaker	b29a63a3d2	pageserver: route gRPC requests to child shards	2025-07-23 16:35:19 +02:00
Erik Grinaker	6c8a144e25	Pass stripe size during shard map updates	2025-07-23 16:35:19 +02:00
Folke Behrens	c7761b689d	otel: Use blocking reqwest in dedicated thread OTel 0.28+ by default uses blocking operations in a dedicated thread.	2025-07-23 14:44:26 +02:00
Erik Grinaker	eaec6e2fb4	Fix notify_local shard count	2025-07-23 11:16:35 +02:00
Heikki Linnakangas	f7e403eea1	Fix broken link in doc comment	2025-07-23 11:37:27 +03:00
Erik Grinaker	464ed0cbc7	rustfmt	2025-07-23 09:41:01 +02:00
Erik Grinaker	f55ccd2c17	Fix lints	2025-07-23 08:17:06 +02:00
Erik Grinaker	c9758dc46b	Fix communicator build	2025-07-23 08:06:20 +02:00
Erik Grinaker	78c5d70b4c	cargo hakari generate	2025-07-23 07:58:20 +02:00
Heikki Linnakangas	fc35be0397	Remove the half-baked Adaptive Radix Tree implementation We are committed to using the resizeable hash table for now. ART is a great data structure, but it's too much for now. Maybe later.	2025-07-23 01:49:56 +03:00
Heikki Linnakangas	a7a6df3d6f	fix datatype used in test mock function	2025-07-23 01:44:45 +03:00
Heikki Linnakangas	bfb4b0991d	Refactor the way lfc_get_stats() is implemented This reduces the boilerplate a little, and makes it more straightforward to dispatch the call to the old or the new communicator	2025-07-23 01:40:42 +03:00
Heikki Linnakangas	c18f4a52f8	refactor metrics to use 'measured' crate	2025-07-23 00:56:21 +03:00
Heikki Linnakangas	48535798ba	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-23 00:00:10 +03:00
Heikki Linnakangas	51ffeef93f	Fix postgres version compatibility macros (#12658 ) The argument to BufTagInit was called 'spcOid', and it was also setting a field called 'spcOid'. The field name would erroneously also be expanded with the macro arg. It happened to work so far, because all the users of the macro pass a variable called 'spcOid' for the 'spcOid' argument, but as soon as you try to pass anything else, it fails. And same story for 'dbOid' and 'relNumber'. Rename the arguments to avoid the name collision. Also while we're at it, add parens around the arguments in a few macros, to make them safer if you pass something non-trivial as the argument.	2025-07-22 16:52:57 +00:00
Erik Grinaker	0fe07dec32	test_runner: allow stuck reconciliation errors (#12682 ) This log message was added in #12589. During chaos tests, reconciles may not succeed for some time, triggering the log message. Resolves [LKB-2467](https://databricks.atlassian.net/browse/LKB-2467).	2025-07-22 16:43:35 +00:00
HaoyuHuang	8de320ab9b	Add a few compute_tool changes (#12677 ) ## Summary of changes All changes are no-op.	2025-07-22 16:22:18 +00:00
Folke Behrens	108f7ec544	Bump opentelemetry crates to 0.30 (#12680 ) This rebuilds #11552 on top the current Cargo.lock. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2025-07-22 16:05:35 +00:00
Tristan Partin	63d2b1844d	Fix final pyright issues with neon_api.py (#8476 ) Fix final pyright issues with neon_api.py Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-22 16:04:52 +00:00
Dmitrii Kovalkov	133f16e9b5	storcon: finish safekeeper migration gracefully (#12528 ) ## Problem We don't detect if safekeeper migration fails after the the commiting the membership configuration to the database. As a result, we might leave stale timelines on excluded safekeepers and do not notify cplane/safekepeers about new configuration. - Implements solution proposed in https://github.com/neondatabase/neon/pull/12432 - Closes: https://github.com/neondatabase/neon/issues/12192 - Closes: [LKB-944](https://databricks.atlassian.net/browse/LKB-944) ## Summary of changes - Add `sk_set_notified_generation` column to `timelines` database - Update `_notified_generation` in database during the finish state. - Commit reconciliation requests to database atomically with membership configuration. - Reload pending ops and retry "finish" step if we detect `_notified_generation` mismatch. - Add failpoints and test that we handle failures well	2025-07-22 14:58:20 +00:00
Alex Chi Z.	88391ce069	feat(pageserver): create image layers at L0-L1 boundary by default (#12669 ) ## Problem Post LKB-198 rollout. We added a new strategy to generate image layers at the L0-L1 boundary instead of the latest LSN to ensure too many L0 layers do not trigger image layer creation. ## Summary of changes We already rolled it out to all users so we can remove the feature flag now. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-22 14:29:26 +00:00
Heikki Linnakangas	8bb45fd5da	Introduce built-in Prometheus exporter to the Postgres extension (#12591 ) Currently, the exporter exposes the same LFC metrics that are exposed by the "autoscaling" sql_exporter in the docker image. With this, we can remove the dedicated sql_exporter instance. (Actually doing the removal is left as a TODO until this is rolled out to production and we have changed autoscaling-agent to fetch the metrics from this new endpoint.) The exporter runs as a Postgres background worker process. This is extracted from the Rust communicator rewrite project, which will use the same worker process for much more, to handle the communications with the pageservers. For now, though, it merely handles the metrics requests. In the future, we will add more metrics, and perhaps even APIs to control the running Postgres instance. The exporter listens on a Unix Domain socket within the Postgres data directory. A Unix Domain socket is a bit unconventional, but it has some advantages: - Permissions are taken care of. Only processes that can access the data directory, and therefore already have full access to the running Postgres instance, can connect to it. - No need to allocate and manage a new port number for the listener It has some downsides too: it's not immediately accessible from the outside world, and the functions to work with Unix Domain sockets are more low-level than TCP sockets (see the symlink hack in `postgres_metrics_client.rs`, for example). To expose the metrics from the local Unix Domain Socket to the autoscaling agent, introduce a new '/autoscaling_metrics' endpoint in the compute_ctl's HTTP server. Currently it merely forwards the request to the Postgres instance, but we could add rate limiting and access control there in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-07-22 12:00:20 +00:00
Vlad Lazar	88bc06f148	communicator: debug log more fields of the get page response (#12644 ) It's helpful to correlate requests and responses in local investigations where the issue is reproducible. Hence, log the rel, fork and block of the get page response.	2025-07-22 11:25:11 +00:00
Vlad Lazar	d91d018afa	storcon: handle pageserver disk loss (#12667 ) NB: effectively a no-op in the neon env since the handling is config gated in storcon ## Problem When a pageserver suffers from a local disk/node failure and restarts, the storage controller will receive a re-attach call and return all the tenants the pageserver is suppose to attach, but the pageserver will not act on any tenants that it doesn't know about locally. As a result, the pageserver will not rehydrate any tenants from remote storage if it restarted following a local disk loss, while the storage controller still thinks that the pageserver have all the tenants attached. This leaves the system in a bad state, and the symptom is that PG's pageserver connections will fail with "tenant not found" errors. ## Summary of changes Made a slight change to the storage controller's `re_attach` API: * The pageserver will set an additional bit `empty_local_disk` in the reattach request, indicating whether it has started with an empty disk or does not know about any tenants. * Upon receiving the reattach request, if this `empty_local_disk` bit is set, the storage controller will go ahead and clear all observed locations referencing the pageserver. The reconciler will then discover the discrepancy between the intended state and observed state of the tenant and take care of the situation. To facilitate rollouts this extra behavior in the `re_attach` API is guarded by the `handle_ps_local_disk_loss` command line flag of the storage controller. --------- Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-22 11:04:03 +00:00
Folke Behrens	9c0efba91e	Bump rand crate to 0.9 (#12674 )	2025-07-22 09:31:39 +00:00
Konstantin Knizhnik	5464552020	Limit number of parallel config apply connections to 100 (#12663 ) ## Problem See https://databricks.slack.com/archives/C092W8NBXC0/p1752924508578339 In case of larger number of databases and large `max_connections` we can open too many connection for parallel apply config which may cause `Too many open files` error. ## Summary of changes Limit maximal number of parallel config apply connections by 100. --------- Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com>	2025-07-22 04:39:54 +00:00
Arpad Müller	80baeaa084	storcon: add force_upsert flag to timeline_import endpoint (#12622 ) It is useful to have ability to update an existing timeline entry, as a way to mirror legacy migrations to the storcon managed table.	2025-07-21 21:14:15 +00:00
Tristan Partin	b7bc3ce61e	Skip PG throttle during configuration (#12670 ) ## Problem While running tenant split tests I ran into a situation where PG got stuck completely. This seems to be a general problem that was not found in the previous chaos testing fixes. What happened is that if PG gets throttled by PS, and SC decided to move some tenant away, then PG reconfiguration could be blocked forever because it cannot talk to the old PS anymore to refresh the throttling stats, and reconfiguration cannot proceed because it's being throttled. Neon has considered the case that configuration could be blocked if the PG storage is full, but forgot the backpressure case. ## Summary of changes The PR fixes this problem by simply skipping throttling while PS is being configured, i.e., `max_cluster_size < 0`. An alternative fix is to set those throttle knobs to -1 (e.g., max_replication_apply_lag), however these knobs were labeled with PGC_POSTMASTER so their values cannot be changed unless we restart PG. ## How is this tested? Tested manually. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-21 20:50:02 +00:00
Ivan Efremov	050c9f704f	proxy: expose session_id to clients and proxy latency to probes (#12656 ) Implements #8728	2025-07-21 20:27:15 +00:00
Ruslan Talpa	0dbe551802	proxy: subzero integration in auth-broker (embedded data-api) (#12474 ) ## Problem We want to have the data-api served by the proxy directly instead of relying on a 3rd party to run a deployment for each project/endpoint. ## Summary of changes With the changes below, the proxy (auth-broker) becomes also a "rest-broker", that can be thought of as a "Multi-tenant" data-api which provides an automated REST api for all the databases in the region. The core of the implementation (that leverages the subzero library) is in proxy/src/serverless/rest.rs and this is the only place that has "new logic". --------- Co-authored-by: Ruslan Talpa <ruslan.talpa@databricks.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-07-21 18:16:28 +00:00
Tristan Partin	187170be47	Add max_wal_rate test (#12621 ) ## Problem Add a test for max_wal_rate ## Summary of changes Test max_wal_rate ## How is this tested? python test Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-21 17:58:03 +00:00
Vlad Lazar	30e1213141	pageserver: check env var for ip address before node registration (#12666 ) Include the ip address (optionally read from an env var) in the pageserver's registration request. Note that the ip address is ignored by the storage controller at the moment, which makes it a no-op in the neon env.	2025-07-21 15:32:28 +00:00
Vlad Lazar	25efbcc7f0	safekeeper: parallelise segment copy (#12664 ) Parallelise segment copying on the SK. I'm not aware of the neon deployment using this endpoint.	2025-07-21 14:47:58 +00:00
Conrad Ludgate	b2ecb10f91	[proxy] rework handling of notices in sql-over-http (#12659 ) A replacement for #10254 which allows us to introduce notice messages for sql-over-http in the future if we want to. This also removes the `ParameterStatus` and `Notification` handling as there's nothing we could/should do for those.	2025-07-21 12:50:13 +00:00
Erik Grinaker	5a48365fb9	pageserver/client_grpc: don't set stripe size for unsharded tenants (#12639 ) ## Problem We've had bugs where the compute would use the stale default stripe size from an unsharded tenant after the tenant split with a new stripe size. ## Summary of changes Never specify a stripe size for unsharded tenants, to guard against misuse. Only specify it once tenants are sharded and the stripe size can't change. Also opportunistically changes `GetPageSplitter` to return `anyhow::Result`, since we'll be using this in other code paths as well (specifically during server-side shard splits).	2025-07-21 12:28:39 +00:00
Erik Grinaker	194b9ffc41	pageserver: remove gRPC `CheckRelExists` (#12616 ) ## Problem Postgres will often immediately follow a relation existence check with a relation size query. This incurs two roundtrips, and may prevent effective caching. See [Slack thread](https://databricks.slack.com/archives/C091SDX74SC/p1751951732136139). Touches #11728. ## Summary of changes For the gRPC API: * Add an `allow_missing` parameter to `GetRelSize`, which returns `missing=true` instead of a `NotFound` error. * Remove `CheckRelExists`. There are no changes to libpq behavior.	2025-07-21 11:43:26 +00:00
Dimitri Fontaine	1e30b31fa7	Cherry pick: pg hooks for online table. (#12654 ) ## Problem ## Summary of changes	2025-07-21 11:10:10 +00:00
Erik Grinaker	e181b996c3	utils: move `ShardStripeSize` into `shard` module (#12640 ) ## Problem `ShardStripeSize` will be used in the compute spec and internally in the communicator. It shouldn't require pulling in all of `pageserver_api`. ## Summary of changes Move `ShardStripeSize` into `utils::shard`, along with other basic shard types. Also remove the `Default` implementation, to discourage clients from falling back to a default (it's generally a footgun). The type is still re-exported from `pageserver_api::shard`, along with all the other shard types.	2025-07-21 10:56:20 +00:00
Erik Grinaker	1406bdc6a8	pageserver: improve gRPC cancellation (#12635 ) ## Problem The gRPC page service does not properly react to shutdown cancellation. In particular, Tonic considers an open GetPage stream to be an in-flight request, so it will wait for it to complete before shutting down. Touches [LKB-191](https://databricks.atlassian.net/browse/LKB-191). ## Summary of changes Properly react to the server's cancellation token and take out gate guards in gRPC request handlers. Also document cancellation handling. In particular, that Tonic will drop futures when clients go away (e.g. on timeout or shutdown), so the read path must be cancellation-safe. It is believed to be (modulo possible logging noise), but this will be verified later.	2025-07-21 10:52:18 +00:00
Heikki Linnakangas	dc35bda074	WIP: Implement LFC prewarming This doesn't pass the tests yet, immediate issue is that we'r emissing some stats that the tests depend on. And there's a lot more cleanup, commenting etc. to do. But this is roughly how it should look like.	2025-07-20 01:23:34 +03:00
Heikki Linnakangas	e2c3c2eccb	Merge remote-tracking branch 'origin/main' into HEAD	2025-07-20 00:58:57 +03:00
Paul Banks	791b5d736b	Fixes #10441 : control_plane README incorrect neon init args (#12646 ) ## Problem As reported in #10441 the `control_plane/README/md` incorrectly specified that `--pg-version` should be specified in the `cargo neon init` command. This is not the case and causes an invalid argument error. ## Summary of changes Fix the README ## Test Plan I verified that the steps in the README now work locally. I connected to the started postgres endpoint and executed some basic metadata queries.	2025-07-18 17:09:20 +00:00
Krzysztof Szafrański	96bcfba79e	[proxy] Cache GetEndpointAccessControl errors (#12571 ) Related to https://github.com/neondatabase/cloud/issues/19353	2025-07-18 10:17:58 +00:00
Shockingly Good	8e95455aef	Update the postgres submodules (#12636 ) Synchronises the main branch's postgres submodules with the `neondatabase/postgres` repository state.	2025-07-18 08:21:22 +00:00
Victor Polevoy	cb50291dcd	Fetches the SLRU segment via the new communicator. The fetch is done not into a buffer as earlier, but directly into the file.	2025-07-18 10:02:31 +02:00
Alex Chi Z.	f3ef60d236	fix(storcon): use unified interface to handle 404 lsn lease (#12650 ) ## Problem Close LKB-270. This is part of our series of efforts to make sure lsn_lease API prompts clients to retry. Follow up of https://github.com/neondatabase/neon/pull/12631. Slack thread w/ Vlad: https://databricks.slack.com/archives/C09254R641L/p1752677940697529 ## Summary of changes - Use `tenant_remote_mutation` API for LSN leases. Makes it consistent with new APIs added to storcon. - For 404, we now always retry because we know the tenant is to-be-attached and will eventually reach a point that we can find that tenant on the intent pageserver. - Using the `tenant_remote_mutation` API also prevents us from the case where the intent pageserver changes within the lease request. The wrapper function will error with 503 if such things happen. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-18 04:40:35 +00:00
HaoyuHuang	8f627ea0ab	A few more SC changes (#12649 ) ## Problem ## Summary of changes	2025-07-17 23:17:01 +00:00
Arpad Müller	6a353c33e3	print more timestamps in find_lsn_for_timestamp (#12641 ) Observability of `find_lsn_for_timestamp` is lacking, as well as how and when we update gc space and time cutoffs. Log them.	2025-07-17 22:13:21 +00:00
Folke Behrens	64d0008389	proxy: Shorten the initial TTL of cancel keys (#12647 ) ## Problem A high rate of short-lived connections means that there a lot of cancel keys in Redis with TTL=10min that could be avoided by having a much shorter initial TTL. ## Summary of changes * Introduce an initial TTL of 1min used with the SET command. * Fix: don't delay repushing cancel data when expired. * Prepare for exponentially increasing TTLs. ## Alternatives A best-effort UNLINK command on connection termination would clean up cancel keys right away. This needs a bigger refactor due to how batching is handled.	2025-07-17 21:52:20 +00:00
Alexey Kondratov	53a05e8ccb	fix(compute_ctl): Only offload LFC state if no prewarming is in progress (#12645 ) ## Problem We currently offload LFC state unconditionally, which can cause problems. Imagine a situation: 1. Endpoint started with `autoprewarm: true`. 2. While prewarming is not completed, we upload the new incomplete state. 3. Compute gets interrupted and restarts. 4. We start again and try to prewarm with the state from 2. instead of the previous complete state. During the orchestrated prewarming, it's probably not a big issue, but it's still better to do not interfere with the prewarm process. ## Summary of changes Do not offload LFC state if we are currently prewarming or any issue occurred. While on it, also introduce `Skipped` LFC prewarm status, which is used when the corresponding LFC state is not present in the endpoint storage. It's primarily needed to distinguish the first compute start for particular endpoint, as it's completely valid to do not have LFC state yet.	2025-07-17 21:43:43 +00:00

1 2 3 4 5 ...

8620 Commits