rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-04 03:52:56 +00:00

Author	SHA1	Message	Date
Erik Grinaker	0fe07dec32	test_runner: allow stuck reconciliation errors (#12682 ) This log message was added in #12589. During chaos tests, reconciles may not succeed for some time, triggering the log message. Resolves [LKB-2467](https://databricks.atlassian.net/browse/LKB-2467).	2025-07-22 16:43:35 +00:00
HaoyuHuang	8de320ab9b	Add a few compute_tool changes (#12677 ) ## Summary of changes All changes are no-op.	2025-07-22 16:22:18 +00:00
Folke Behrens	108f7ec544	Bump opentelemetry crates to 0.30 (#12680 ) This rebuilds #11552 on top the current Cargo.lock. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2025-07-22 16:05:35 +00:00
Tristan Partin	63d2b1844d	Fix final pyright issues with neon_api.py (#8476 ) Fix final pyright issues with neon_api.py Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-22 16:04:52 +00:00
Dmitrii Kovalkov	133f16e9b5	storcon: finish safekeeper migration gracefully (#12528 ) ## Problem We don't detect if safekeeper migration fails after the the commiting the membership configuration to the database. As a result, we might leave stale timelines on excluded safekeepers and do not notify cplane/safekepeers about new configuration. - Implements solution proposed in https://github.com/neondatabase/neon/pull/12432 - Closes: https://github.com/neondatabase/neon/issues/12192 - Closes: [LKB-944](https://databricks.atlassian.net/browse/LKB-944) ## Summary of changes - Add `sk_set_notified_generation` column to `timelines` database - Update `_notified_generation` in database during the finish state. - Commit reconciliation requests to database atomically with membership configuration. - Reload pending ops and retry "finish" step if we detect `_notified_generation` mismatch. - Add failpoints and test that we handle failures well	2025-07-22 14:58:20 +00:00
Alex Chi Z.	88391ce069	feat(pageserver): create image layers at L0-L1 boundary by default (#12669 ) ## Problem Post LKB-198 rollout. We added a new strategy to generate image layers at the L0-L1 boundary instead of the latest LSN to ensure too many L0 layers do not trigger image layer creation. ## Summary of changes We already rolled it out to all users so we can remove the feature flag now. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-22 14:29:26 +00:00
Heikki Linnakangas	8bb45fd5da	Introduce built-in Prometheus exporter to the Postgres extension (#12591 ) Currently, the exporter exposes the same LFC metrics that are exposed by the "autoscaling" sql_exporter in the docker image. With this, we can remove the dedicated sql_exporter instance. (Actually doing the removal is left as a TODO until this is rolled out to production and we have changed autoscaling-agent to fetch the metrics from this new endpoint.) The exporter runs as a Postgres background worker process. This is extracted from the Rust communicator rewrite project, which will use the same worker process for much more, to handle the communications with the pageservers. For now, though, it merely handles the metrics requests. In the future, we will add more metrics, and perhaps even APIs to control the running Postgres instance. The exporter listens on a Unix Domain socket within the Postgres data directory. A Unix Domain socket is a bit unconventional, but it has some advantages: - Permissions are taken care of. Only processes that can access the data directory, and therefore already have full access to the running Postgres instance, can connect to it. - No need to allocate and manage a new port number for the listener It has some downsides too: it's not immediately accessible from the outside world, and the functions to work with Unix Domain sockets are more low-level than TCP sockets (see the symlink hack in `postgres_metrics_client.rs`, for example). To expose the metrics from the local Unix Domain Socket to the autoscaling agent, introduce a new '/autoscaling_metrics' endpoint in the compute_ctl's HTTP server. Currently it merely forwards the request to the Postgres instance, but we could add rate limiting and access control there in the future. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-07-22 12:00:20 +00:00
Vlad Lazar	88bc06f148	communicator: debug log more fields of the get page response (#12644 ) It's helpful to correlate requests and responses in local investigations where the issue is reproducible. Hence, log the rel, fork and block of the get page response.	2025-07-22 11:25:11 +00:00
Vlad Lazar	d91d018afa	storcon: handle pageserver disk loss (#12667 ) NB: effectively a no-op in the neon env since the handling is config gated in storcon ## Problem When a pageserver suffers from a local disk/node failure and restarts, the storage controller will receive a re-attach call and return all the tenants the pageserver is suppose to attach, but the pageserver will not act on any tenants that it doesn't know about locally. As a result, the pageserver will not rehydrate any tenants from remote storage if it restarted following a local disk loss, while the storage controller still thinks that the pageserver have all the tenants attached. This leaves the system in a bad state, and the symptom is that PG's pageserver connections will fail with "tenant not found" errors. ## Summary of changes Made a slight change to the storage controller's `re_attach` API: * The pageserver will set an additional bit `empty_local_disk` in the reattach request, indicating whether it has started with an empty disk or does not know about any tenants. * Upon receiving the reattach request, if this `empty_local_disk` bit is set, the storage controller will go ahead and clear all observed locations referencing the pageserver. The reconciler will then discover the discrepancy between the intended state and observed state of the tenant and take care of the situation. To facilitate rollouts this extra behavior in the `re_attach` API is guarded by the `handle_ps_local_disk_loss` command line flag of the storage controller. --------- Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-22 11:04:03 +00:00
Folke Behrens	9c0efba91e	Bump rand crate to 0.9 (#12674 )	2025-07-22 09:31:39 +00:00
Konstantin Knizhnik	5464552020	Limit number of parallel config apply connections to 100 (#12663 ) ## Problem See https://databricks.slack.com/archives/C092W8NBXC0/p1752924508578339 In case of larger number of databases and large `max_connections` we can open too many connection for parallel apply config which may cause `Too many open files` error. ## Summary of changes Limit maximal number of parallel config apply connections by 100. --------- Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com>	2025-07-22 04:39:54 +00:00
Arpad Müller	80baeaa084	storcon: add force_upsert flag to timeline_import endpoint (#12622 ) It is useful to have ability to update an existing timeline entry, as a way to mirror legacy migrations to the storcon managed table.	2025-07-21 21:14:15 +00:00
Tristan Partin	b7bc3ce61e	Skip PG throttle during configuration (#12670 ) ## Problem While running tenant split tests I ran into a situation where PG got stuck completely. This seems to be a general problem that was not found in the previous chaos testing fixes. What happened is that if PG gets throttled by PS, and SC decided to move some tenant away, then PG reconfiguration could be blocked forever because it cannot talk to the old PS anymore to refresh the throttling stats, and reconfiguration cannot proceed because it's being throttled. Neon has considered the case that configuration could be blocked if the PG storage is full, but forgot the backpressure case. ## Summary of changes The PR fixes this problem by simply skipping throttling while PS is being configured, i.e., `max_cluster_size < 0`. An alternative fix is to set those throttle knobs to -1 (e.g., max_replication_apply_lag), however these knobs were labeled with PGC_POSTMASTER so their values cannot be changed unless we restart PG. ## How is this tested? Tested manually. Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-21 20:50:02 +00:00
Ivan Efremov	050c9f704f	proxy: expose session_id to clients and proxy latency to probes (#12656 ) Implements #8728	2025-07-21 20:27:15 +00:00
Ruslan Talpa	0dbe551802	proxy: subzero integration in auth-broker (embedded data-api) (#12474 ) ## Problem We want to have the data-api served by the proxy directly instead of relying on a 3rd party to run a deployment for each project/endpoint. ## Summary of changes With the changes below, the proxy (auth-broker) becomes also a "rest-broker", that can be thought of as a "Multi-tenant" data-api which provides an automated REST api for all the databases in the region. The core of the implementation (that leverages the subzero library) is in proxy/src/serverless/rest.rs and this is the only place that has "new logic". --------- Co-authored-by: Ruslan Talpa <ruslan.talpa@databricks.com> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-07-21 18:16:28 +00:00
Tristan Partin	187170be47	Add max_wal_rate test (#12621 ) ## Problem Add a test for max_wal_rate ## Summary of changes Test max_wal_rate ## How is this tested? python test Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-21 17:58:03 +00:00
Vlad Lazar	30e1213141	pageserver: check env var for ip address before node registration (#12666 ) Include the ip address (optionally read from an env var) in the pageserver's registration request. Note that the ip address is ignored by the storage controller at the moment, which makes it a no-op in the neon env.	2025-07-21 15:32:28 +00:00
Vlad Lazar	25efbcc7f0	safekeeper: parallelise segment copy (#12664 ) Parallelise segment copying on the SK. I'm not aware of the neon deployment using this endpoint.	2025-07-21 14:47:58 +00:00
Conrad Ludgate	b2ecb10f91	[proxy] rework handling of notices in sql-over-http (#12659 ) A replacement for #10254 which allows us to introduce notice messages for sql-over-http in the future if we want to. This also removes the `ParameterStatus` and `Notification` handling as there's nothing we could/should do for those.	2025-07-21 12:50:13 +00:00
Erik Grinaker	5a48365fb9	pageserver/client_grpc: don't set stripe size for unsharded tenants (#12639 ) ## Problem We've had bugs where the compute would use the stale default stripe size from an unsharded tenant after the tenant split with a new stripe size. ## Summary of changes Never specify a stripe size for unsharded tenants, to guard against misuse. Only specify it once tenants are sharded and the stripe size can't change. Also opportunistically changes `GetPageSplitter` to return `anyhow::Result`, since we'll be using this in other code paths as well (specifically during server-side shard splits).	2025-07-21 12:28:39 +00:00
Erik Grinaker	194b9ffc41	pageserver: remove gRPC `CheckRelExists` (#12616 ) ## Problem Postgres will often immediately follow a relation existence check with a relation size query. This incurs two roundtrips, and may prevent effective caching. See [Slack thread](https://databricks.slack.com/archives/C091SDX74SC/p1751951732136139). Touches #11728. ## Summary of changes For the gRPC API: * Add an `allow_missing` parameter to `GetRelSize`, which returns `missing=true` instead of a `NotFound` error. * Remove `CheckRelExists`. There are no changes to libpq behavior.	2025-07-21 11:43:26 +00:00
Dimitri Fontaine	1e30b31fa7	Cherry pick: pg hooks for online table. (#12654 ) ## Problem ## Summary of changes	2025-07-21 11:10:10 +00:00
Erik Grinaker	e181b996c3	utils: move `ShardStripeSize` into `shard` module (#12640 ) ## Problem `ShardStripeSize` will be used in the compute spec and internally in the communicator. It shouldn't require pulling in all of `pageserver_api`. ## Summary of changes Move `ShardStripeSize` into `utils::shard`, along with other basic shard types. Also remove the `Default` implementation, to discourage clients from falling back to a default (it's generally a footgun). The type is still re-exported from `pageserver_api::shard`, along with all the other shard types.	2025-07-21 10:56:20 +00:00
Erik Grinaker	1406bdc6a8	pageserver: improve gRPC cancellation (#12635 ) ## Problem The gRPC page service does not properly react to shutdown cancellation. In particular, Tonic considers an open GetPage stream to be an in-flight request, so it will wait for it to complete before shutting down. Touches [LKB-191](https://databricks.atlassian.net/browse/LKB-191). ## Summary of changes Properly react to the server's cancellation token and take out gate guards in gRPC request handlers. Also document cancellation handling. In particular, that Tonic will drop futures when clients go away (e.g. on timeout or shutdown), so the read path must be cancellation-safe. It is believed to be (modulo possible logging noise), but this will be verified later.	2025-07-21 10:52:18 +00:00
Paul Banks	791b5d736b	Fixes #10441 : control_plane README incorrect neon init args (#12646 ) ## Problem As reported in #10441 the `control_plane/README/md` incorrectly specified that `--pg-version` should be specified in the `cargo neon init` command. This is not the case and causes an invalid argument error. ## Summary of changes Fix the README ## Test Plan I verified that the steps in the README now work locally. I connected to the started postgres endpoint and executed some basic metadata queries.	2025-07-18 17:09:20 +00:00
Krzysztof Szafrański	96bcfba79e	[proxy] Cache GetEndpointAccessControl errors (#12571 ) Related to https://github.com/neondatabase/cloud/issues/19353	2025-07-18 10:17:58 +00:00
Shockingly Good	8e95455aef	Update the postgres submodules (#12636 ) Synchronises the main branch's postgres submodules with the `neondatabase/postgres` repository state.	2025-07-18 08:21:22 +00:00
Alex Chi Z.	f3ef60d236	fix(storcon): use unified interface to handle 404 lsn lease (#12650 ) ## Problem Close LKB-270. This is part of our series of efforts to make sure lsn_lease API prompts clients to retry. Follow up of https://github.com/neondatabase/neon/pull/12631. Slack thread w/ Vlad: https://databricks.slack.com/archives/C09254R641L/p1752677940697529 ## Summary of changes - Use `tenant_remote_mutation` API for LSN leases. Makes it consistent with new APIs added to storcon. - For 404, we now always retry because we know the tenant is to-be-attached and will eventually reach a point that we can find that tenant on the intent pageserver. - Using the `tenant_remote_mutation` API also prevents us from the case where the intent pageserver changes within the lease request. The wrapper function will error with 503 if such things happen. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-18 04:40:35 +00:00
HaoyuHuang	8f627ea0ab	A few more SC changes (#12649 ) ## Problem ## Summary of changes	2025-07-17 23:17:01 +00:00
Arpad Müller	6a353c33e3	print more timestamps in find_lsn_for_timestamp (#12641 ) Observability of `find_lsn_for_timestamp` is lacking, as well as how and when we update gc space and time cutoffs. Log them.	2025-07-17 22:13:21 +00:00
Folke Behrens	64d0008389	proxy: Shorten the initial TTL of cancel keys (#12647 ) ## Problem A high rate of short-lived connections means that there a lot of cancel keys in Redis with TTL=10min that could be avoided by having a much shorter initial TTL. ## Summary of changes * Introduce an initial TTL of 1min used with the SET command. * Fix: don't delay repushing cancel data when expired. * Prepare for exponentially increasing TTLs. ## Alternatives A best-effort UNLINK command on connection termination would clean up cancel keys right away. This needs a bigger refactor due to how batching is handled.	2025-07-17 21:52:20 +00:00
Alexey Kondratov	53a05e8ccb	fix(compute_ctl): Only offload LFC state if no prewarming is in progress (#12645 ) ## Problem We currently offload LFC state unconditionally, which can cause problems. Imagine a situation: 1. Endpoint started with `autoprewarm: true`. 2. While prewarming is not completed, we upload the new incomplete state. 3. Compute gets interrupted and restarts. 4. We start again and try to prewarm with the state from 2. instead of the previous complete state. During the orchestrated prewarming, it's probably not a big issue, but it's still better to do not interfere with the prewarm process. ## Summary of changes Do not offload LFC state if we are currently prewarming or any issue occurred. While on it, also introduce `Skipped` LFC prewarm status, which is used when the corresponding LFC state is not present in the endpoint storage. It's primarily needed to distinguish the first compute start for particular endpoint, as it's completely valid to do not have LFC state yet.	2025-07-17 21:43:43 +00:00
Vlad Lazar	62c0152e6b	pageserver: shut down compute connections at libpq level (#12642 ) ## Problem Previously, if a get page failure was cause by timeline shutdown, the pageserver would attempt to tear down the connection gracefully: `shutdown(SHUT_WR)` followed by `close()`. This triggers a code path on the compute where it has to tell apart between an idle connection and a closed one. That code is bug prone, so we can just side-step the issue by shutting down the connection via a libpq error message. This surfaced as instability in test_shard_resolve_during_split_abort. It's a new test, but the issue existed for ages. ## Summary of Changes Send a libpq error message instead of doing graceful TCP connection shutdown. Closes LKB-648	2025-07-17 21:03:55 +00:00
Konstantin Knizhnik	7fef4435c1	Store stripe_size in shared memory (#12560 ) ## Problem See https://databricks.slack.com/archives/C09254R641L/p1752004515032899 stripe_size GUC update may be delayed at different backends and so cause inconsistency with connection strings (shard map). ## Summary of changes Postmaster should store stripe_size in shared memory as well as connection strings. It should be also enforced that stripe size is defined prior to connection strings in postgresql.conf --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com>	2025-07-17 20:32:34 +00:00
Konstantin Knizhnik	43fd5b218b	Refactor shmem initialization in Neon extension (#12630 ) ## Problem Initializing of shared memory in extension is complex and non-portable. In neon extension this boilerplate code is duplicated in several files. ## Summary of changes Perform all initialization in one place - neon.c All other module procvide ShmemRequest() and ShmemInit() fuinction which are called from neon.c --------- Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2025-07-17 20:20:38 +00:00
Alex Chi Z.	29ee273d78	fix(storcon): correctly converts 404 for tenant passthrough requests (#12631 ) ## Problem Follow up of https://github.com/neondatabase/neon/pull/12620 Discussions: https://databricks.slack.com/archives/C09254R641L/p1752677940697529 The original code and after the patch above we converts 404s to 503s regardless of the type of 404. We should only do that for tenant not found errors. For other 404s like timeline not found, we should not prompt clients to retry. ## Summary of changes - Inspect the response body to figure out the type of 404. If it's a tenant not found error, return 503. - Otherwise, fallthrough and return 404 as-is. - Add `tenant_shard_remote_mutation` that manipulates a single shard. - Use `Service::tenant_shard_remote_mutation` for tenant shard passthrough requests. This prevents us from another race that the attach state changes within the request. (This patch mainly addresses the case that the tenant is "not yet attached"). - TODO: lease API is still using the old code path. We should refactor it to use `tenant_remote_mutation`. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-17 19:42:48 +00:00
Conrad Ludgate	8b0f2efa57	experiment with an InfoMetrics metric family (#12612 ) Putting this in the neon codebase for now, to experiment. Can be lifted into measured at a later date. This metric family is like a MetricVec, but it only supports 1 label being set at a time. It is useful for reporting info, rather than reporting metrics. https://www.robustperception.io/exposing-the-software-version-to-prometheus/	2025-07-17 17:58:47 +00:00
quantumish	b309cbc6e9	Add resizable hashmap and RwLock implementations to `neon-shmem` (#12596 ) Second PR for the hashmap behind the updated LFC implementation ([see first here](https://github.com/neondatabase/neon/pull/12595)). This only adds the raw code for the hashmap/lock implementations and doesn't plug it into the crate (that's dependent on the previous PR and should probably be done when the full integration into the new communicator is merged alongside `communicator-rewrite` changes?). Some high level details: the communicator codebase expects to be able to store references to entries within this hashmap for arbitrary periods of time and so the hashmap cannot be allowed to move them during a rehash. As a result, this implementation has a slightly unusual structure where key-value pairs (and hash chains) are allocated in a separate region with a freelist. The core hashmap structure is then an array of "dictionary entries" that are just indexes into this region of key-value pairs. Concurrency support is very naive at the moment with the entire map guarded by one big `RwLock` (which is implemented on top of a `pthread_rwlock_t` since Rust doesn't guarantee that a `std::sync::RwLock` is safe to use in shared memory). This (along with a lot of other things) is being changed on the `quantumish/lfc-resizable-map` branch.	2025-07-17 17:40:53 +00:00
Aleksandr Sarantsev	f0c0733a64	storcon: Ignore stuck reconciles when considering optimizations (#12589 ) ## Problem The `keep_failing_reconciles` counter was introduced in #12391, but there is a special case: > if a reconciliation loop claims to have succeeded, but maybe_reconcile still thinks the tenant is in need of reconciliation, then that's a probable bug and we should activate a similar backoff to prevent flapping. This PR redefines "flapping" to include not just repeated failures, but also consecutive reconciliations of any kind (success or failure). ## Summary of Changes - Replace `keep_failing_reconciles` with a new `stuck_reconciles` metric - Replace `MAX_CONSECUTIVE_RECONCILIATION_ERRORS` with `MAX_CONSECUTIVE_RECONCILES`, and increasing that from 5 to 10 - Increment the consecutive reconciles counter for all reconciles, not just failures - Reset the counter in `reconcile_all` when no reconcile is needed for a shard - Improve and fix the related test --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-17 14:52:57 +00:00
Vlad Lazar	8862e7c4bf	tests: use new snapshot in test_forward_compat (#12637 ) ## Problem The forward compatibility test is erroneously using the downloaded (old) compatibility data. This test is meant to test that old binaries can work with new data. Using the old compatibility data renders this test useless. ## Summary of changes Use new snapshot in test_forward_compat Closes LKB-666 Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-17 13:20:40 +00:00
HaoyuHuang	b7fc5a2fe0	A few SC changes (#12615 ) ## Summary of changes A bunch of no-op changes. --------- Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-07-17 13:14:36 +00:00
Aleksandr Sarantsev	4559ba79b6	Introduce force flag for new deletion API (#12588 ) ## Problem The force deletion API should behave like the graceful deletion API - it needs to support cancellation, persistence, and be non-blocking. ## Summary of Changes - Added a `force` flag to the `NodeStartDelete` command. - Passed the `force` flag through the `start_node_delete` handler in the storage controller. - Handled the `force` flag in the `delete_node` function. - Set the tombstone after removing the node from memory. - Minor cleanup, like adding a `get_error_on_cancel` closure. --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-17 11:51:31 +00:00
Alexander Bayandin	5dd24c7ad8	test_total_size_limit: support hosts with up to 256 GB of RAM (#12617 ) ## Problem `test_total_size_limit` fails on runners with 256 GB of RAM ## Summary of changes - Generate more data in `test_total_size_limit`	2025-07-17 08:57:36 +00:00
Alex Chi Z.	f2828bbe19	fix(pageserver): skip gc-compaction for metadata key ranges (#12618 ) ## Problem part of https://github.com/neondatabase/neon/issues/11318 ; it is not entirely safe to run gc-compaction over the metadata key range due to tombstones and implications of image layers (missing key in image layer == key not exist). The auto gc-compaction trigger already skips metadata key ranges (see `schedule_auto_compaction` call in `trigger_auto_compaction`). In this patch we enforce it directly in gc_compact_inner so that compactions triggered via HTTP API will also be subject to this restriction. ## Summary of changes Ensure gc-compaction only runs on rel key ranges. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-16 21:52:18 +00:00
Alexander Bayandin	fb796229bf	Fix `make neon-pgindent` (#12535 ) ## Problem `make neon-pgindent` doesn't work: - there's no `$(BUILD_DIR)/neon-v17` dir - `make -C ...` along with relative `BUILD_DIR` resolves to a path that doesn't exist ## Summary of changes - Fix path for to neon extension for `make neon-pgindent` - Make `BUILD_DIR` absolute - Remove trailing slash from `POSTGRES_INSTALL_DIR` to avoid duplicated slashed in commands (doesn't break anything, it make it look nicer)	2025-07-16 21:20:44 +00:00
Dimitri Fontaine	267fb49908	Update Postgres branches. (#12628 ) ## Problem ## Summary of changes	2025-07-16 18:39:54 +00:00
Krzysztof Szafrański	e2982ed3ec	[proxy] Cache node info only for TTL, even if Redis is available (#12626 ) This PR simplifies our node info cache. Now we'll store entries for at most the TTL duration, even if Redis notifications are available. This will allow us to cache intermittent errors later (e.g. due to rate limits) with more predictable behavior. Related to https://github.com/neondatabase/cloud/issues/19353	2025-07-16 16:23:05 +00:00
Tristan Partin	9e154a8130	PG: smooth max wal rate (#12514 ) ## Problem We were only resetting the limit in the wal proposer. If backends are back pressured, it might take a while for the wal proposer to receive a new WAL to reset the limit. ## Summary of changes Backend also checks the time and resets the limit. ## How is this tested? pgbench has more smooth tps Signed-off-by: Tristan Partin <tristan.partin@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-16 16:11:25 +00:00
JC Grünhage	79d72c94e8	reformat cargo install invocations in build-tools image (#12629 ) ## Problem Same change with different formatting happened in multiple branches. ## Summary of changes Realign formatting with the other branch.	2025-07-16 16:02:07 +00:00
Alex Chi Z.	80e5771c67	fix(storcon): passthrough 404 as 503 during migrations (#12620 ) ## Problem close LKB-270, close LKB-253 We periodically saw pageserver returns 404 -> storcon converts it to 500 to cplane, and causing branch operations fail. This is due to storcon is migrating tenants across pageservers and the request was forwarded from the storcon to pageservers while the tenant was not attached yet. Such operations should be retried from cplane and storcon should return 503 in such cases. ## Summary of changes - Refactor `tenant_timeline_lsn_lease` to have a single function process and passthrough such requests: `collect_tenant_shards` for collecting all shards and checking if they're consistent with the observed state, `process_result_and_passthrough_errors` to convert 404 into 503 if necessary. - `tenant_shard_node` also checks observed state now. Note that for passthrough shard0, we originally had a check to convert 404 to 503: ``` // Transform 404 into 503 if we raced with a migration if resp.status() == reqwest::StatusCode::NOT_FOUND { // Look up node again: if we migrated it will be different let new_node = service.tenant_shard_node(tenant_shard_id).await?; if new_node.get_id() != node.get_id() { // Rather than retry here, send the client a 503 to prompt a retry: this matches // the pageserver's use of 503, and all clients calling this API should retry on 503. return Err(ApiError::ResourceUnavailable( format!("Pageserver {node} returned 404, was migrated to {new_node}").into(), )); } } ``` However, this only checks the intent state. It is possible that the migration is in progress before/after the request is processed and intent state is always the same throughout the API call, therefore 404 not being processed by this branch. Also, not sure about if this new code is correct or not, need second eyes on that: ``` // As a reconciliation is in flight, we do not have the observed state yet, and therefore we assume it is always inconsistent. Ok((node.clone(), false)) ``` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-16 15:51:20 +00:00

1 2 3 4 5 ...

8466 Commits