rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-24 16:40:38 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	e14bb4be39	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-05 16:59:51 +03:00
Heikki Linnakangas	f3a6c0d8ff	cargo fmt	2025-07-05 16:26:24 +03:00
Heikki Linnakangas	17ec37aab2	Close gRPC getpage streams on shutdown Some tests were failing, because pageserver didn't shut down promptly. Tonic server graceful shutdown was a little too graceful; any open streams linger until they're closed. Check the cancellation token while waiting for next request, and close the stream if shutdown/cancellation was requested.	2025-07-05 16:26:24 +03:00
Heikki Linnakangas	d6ec1f1a1c	Skip legacy LFC initialization when communicator is used It clashes with the initialization of the LFC file	2025-07-05 16:26:24 +03:00
Erik Grinaker	6f3fb4433f	Add TODO	2025-07-05 14:15:34 +02:00
Erik Grinaker	d7678df445	Reap idle pool resources	2025-07-05 13:35:28 +02:00
Erik Grinaker	03d9f0ec41	Comment tweaks	2025-07-05 11:16:40 +02:00
Erik Grinaker	56845f2da2	Add `GetPageClass::is_bulk`	2025-07-05 11:15:28 +02:00
Heikki Linnakangas	b568189f7b	Build dummy libcommunicator into the 'neon' extension (#12266 ) This doesn't do anything interesting yet, but demonstrates linking Rust code to the neon Postgres extension, so that we can review and test drive just the build process changes independently.	2025-07-04 23:27:28 +00:00
Heikki Linnakangas	9a37bfdf63	Fix re-finding an entry in bucket chain	2025-07-05 00:44:46 +03:00
Arpad Müller	b94a5ce119	Don't await the walreceiver on timeline shutdown (#12402 ) Mostly a revert of https://github.com/neondatabase/neon/pull/11851 and https://github.com/neondatabase/neon/pull/12330 . Christian suggested reverting his PR to fix the issue https://github.com/neondatabase/neon/issues/12369 . Alternatives considered: 1. I have originally wanted to introduce cancellation tokens to `RequestContext`, but in the end I gave up on them because I didn't find a select-free way of preventing `test_layer_download_cancelled_by_config_location` from hanging. Namely if I put a select around the `get_or_maybe_download` invocation in `get_values_reconstruct_data`, it wouldn't hang, but if I put it around the `download_init_and_wait` invocation in `get_or_maybe_download`, the test would still hang. Not sure why, even though I made the attached child function of the `RequestContext` create a child token. 2. Introduction of a `download_cancel` cancellation token as a child of a timeline token, putting it into `RemoteTimelineClient` together with the main token, and then putting it into the whole `RemoteTimelineClient` read path. 3. Greater refactorings, like to make cancellation tokens follow a DAG structure so you can have tokens cancelled either by say timeline shutting down or a request ending. It doesn't just represent an effort that we don't have the engineering budget for, it also causes interesting questions like what to do about batching (do you cancel the entire request if only some requests get cancelled?). We might see a reemergence of https://github.com/neondatabase/neon/issues/11762, but given that we have https://github.com/neondatabase/neon/pull/11853 and https://github.com/neondatabase/neon/pull/12376 now, it is possible that it will not come back. Looking at some code, it might actually fix the locations where the error pops up. Let's see. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-07-04 20:12:10 +00:00
Heikki Linnakangas	4c916552e8	Reduce logging noise These are very useful while debugging, but also very noisy; let's dial it down a little.	2025-07-04 23:11:36 +03:00
Heikki Linnakangas	50fbf4ac53	Fix hash table initialization across forked processes attach_writer()/reader() are called from each forked process. It's too late to do initialization there, in fact we used to overwrite the contents of the hash table (or at least the freelist?) every time a new process attached to it. The initialization must be done earlier, in the HashMapInit() constructors.	2025-07-04 23:08:34 +03:00
Erik Grinaker	cb698a3951	Add dedicated client pools for bulk requests	2025-07-04 21:52:25 +02:00
Mikhail	7ed4530618	`offload_lfc_interval_seconds` in ComputeSpec (#12447 ) - Add ComputeSpec flag `offload_lfc_interval_seconds` controlling whether LFC should be offloaded to endpoint storage. Default value (None) means "don't offload". - Add glue code around it for `neon_local` and integration tests. - Add `autoprewarm` mode for `test_lfc_prewarm` testing `offload_lfc_interval_seconds` and `autoprewarm` flags in conjunction. - Rename `compute_ctl_lfc_prewarm_requests_total` and `compute_ctl_lfc_offload_requests_total` to `compute_ctl_lfc_prewarms_total` and `compute_ctl_lfc_offloads_total` to reflect we count prewarms and offloads, not `compute_ctl` requests of those. Don't count request in metrics if there is a prewarm/offload already ongoing. https://github.com/neondatabase/cloud/issues/19011 Resolves: https://github.com/neondatabase/cloud/issues/30770	2025-07-04 18:49:57 +00:00
Erik Grinaker	f6cc5cbd0c	Split out retry handler to separate module	2025-07-04 20:20:09 +02:00
Heikki Linnakangas	00affada26	Add request ID to all communicator log lines as context information	2025-07-04 20:34:26 +03:00
Heikki Linnakangas	90d3c09c24	Minor cleanup Tidy up and add some comments. Rename a few things for clarity.	2025-07-04 20:32:59 +03:00
Heikki Linnakangas	6c398aeae7	Fix dependency in Makefile	2025-07-04 20:24:21 +03:00
Heikki Linnakangas	3a44774227	impr(ci): Simplify build-macos workflow, prepare for rust communicator (#12357 ) Don't build walproposer-lib as a separate job. It only takes a few seconds, after you have built all its dependencies. Don't cache the Neon Pg extensions in the per-postgres-version caches. This is in preparation for the communicator project, which will introduce Rust parts to the Neon Pg extension, which complicates the build process. With that, the 'make neon-pg-ext' step requires some of the Rust bits to be built already, or it will build them on the spot, which in turn requires all the Rust sources to be present, and we don't want to repeat that part for each Postgres version anyway. To prepare for that, rely on "make all" to build the neon extension and the rust bits in the correct order instead. Building the neon extension doesn't currently take very long anyway after you have built Postgres itself, so you don't gain much by caching it. See https://github.com/neondatabase/neon/pull/12266. Add an explicit "rustup update" step to update the toolchain. It's not strictly necessary right now, because currently "make all" will only invoke "cargo build" once and the race condition described in the comment doesn't happen. But prepare for the future. To further simplify the build, get rid of the separate 'build-postgres' jobs too, and just build Postgres as a step in the main job. That makes the overall workflow run longer, because we no longer build all the postgres versions in parallel (although you still get intra-runner parallelism thanks to `make -j`), but that's acceptable. In the cache-hit case, it might even be a little faster because there is less overhead from launching jobs, and in the cache-miss case, it's maybe 5-10 minutes slower altogether. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-07-04 15:34:58 +00:00
Heikki Linnakangas	1856bbbb9f	Minor cleanup and commenting	2025-07-04 18:28:34 +03:00
Aleksandr Sarantsev	b2705cfee6	storcon: Make node deletion process cancellable (#12320 ) ## Problem The current deletion operation is synchronous and blocking, which is unsuitable for potentially long-running tasks like. In such cases, the standard HTTP request-response pattern is not a good fit. ## Summary of Changes - Added new `storcon_cli` commands: `NodeStartDelete` and `NodeCancelDelete` to initiate and cancel deletion asynchronously. - Added corresponding `storcon` HTTP handlers to support the new start/cancel deletion flow. - Introduced a new type of background operation: `Delete`, to track and manage the deletion process outside the request lifecycle. --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-04 14:08:09 +00:00
Heikki Linnakangas	bd46dd60a0	Add a temporary timeout to handling an IO request in the communicator It's nicer to timeout in the communicator and return an error to the backend, than PANIC the backend.	2025-07-04 16:08:22 +03:00
Heikki Linnakangas	5f2d476a58	Add request ID to io-in-progress locking table, to ease debugging I also added INFO messages for when a backend blocks on the io-in-progress lock. It's probably too noisy for production, but useful now to get a picture of how much it happens.	2025-07-04 15:55:57 +03:00
Heikki Linnakangas	3231cb6138	Await the io-in-progress locking futures Otherwise they don't do anything. Oops.	2025-07-04 15:55:57 +03:00
Heikki Linnakangas	e558e0da5c	Assign request_id earlier, in the originating backend Makes it more useful for stitching together logs etc. for a specific request.	2025-07-04 15:55:55 +03:00
Heikki Linnakangas	70bf2e088d	Request multiple block numbers in a single GetPageV request That's how it was always intended to be used	2025-07-04 15:49:04 +03:00
Trung Dinh	225267b3ae	Make disk eviction run by default (#12464 ) ## Problem ## Summary of changes Provide a sane set of default values for disk_usage_based_eviction. Closes https://github.com/neondatabase/neon/issues/12301.	2025-07-04 12:06:10 +00:00
Vlad Lazar	d378726e38	pageserver: reset the broker subscription if it's been idle for a while (#12436 ) ## Problem I suspect that the pageservers get stuck on receiving broker updates. ## Summary of changes This is a an opportunistic (staging only) patch that resets the susbscription stream if it's been idle for a while. This won't go to prod in this form. I'll revert or update it before Friday.	2025-07-04 10:25:03 +00:00
Konstantin Knizhnik	436a117c15	Do not allocate anything in subtransaction memory context (#12176 ) ## Problem See https://github.com/neondatabase/neon/issues/12173 ## Summary of changes Allocate table in TopTransactionMemoryContext --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-07-04 10:24:39 +00:00
Heikki Linnakangas	da3f9ee72d	cargo fmt	2025-07-04 12:39:41 +03:00
Alex Chi Z.	cc699f6f85	fix(pageserver): do not log no-route-to-host errors (#12468 ) ## Problem close https://github.com/neondatabase/neon/issues/12344 ## Summary of changes Add `HostUnreachable` and `NetworkUnreachable` to expected I/O error. This was new in Rust 1.83. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-03 21:57:42 +00:00
Erik Grinaker	88d1127bf4	Tweak GetPageSplitter	2025-07-03 21:12:26 +02:00
David Freifeld	794bb7a9e8	Merge branch 'quantumish/comm-lfc-integration' into communicator-rewrite	2025-07-03 10:52:29 -07:00
Konstantin Knizhnik	495112ca50	Add GUC for dynamically enable compare local mode (#12424 ) ## Problem DEBUG_LOCAL_COMPARE mode allows to detect data corruption. But it requires rebuild of neon extension (and so requires special image) and significantly slowdown execution because always fetch pages from page server. ## Summary of changes Introduce new GUC `neon.debug_compare_local`, accepting the following values: " none", "prefetch", "lfc", "all" (by default it is definitely disabled). In mode less than "all", neon SMGR will not fetch page from PS if it is found in local caches. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-07-03 17:37:05 +00:00
Suhas Thalanki	46158ee63f	fix(compute): background installed extensions worker would collect data without waiting for interval (#12465 ) ## Problem The background installed extensions worker relied on `interval.tick()` to go to sleep for a period of time. This can lead to bugs due to the interval being updated at the end of the loop as the first tick is [instantaneous](https://docs.rs/tokio/latest/tokio/time/struct.Interval.html#method.tick). ## Summary of changes Changed it to a `tokio::time::sleep` to prevent this issue. Now it puts the thread to sleep and only wakes up after the specified duration	2025-07-03 17:10:30 +00:00
Alex Chi Z.	305fe61ac1	fix(pageserver): also print open layer size in backpressure (#12440 ) ## Problem Better investigate memory usage during backpressure ## Summary of changes Print open layer size if backpressure is activated Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-03 16:37:11 +00:00
Vlad Lazar	f95fdf5b44	pageserver: fix duplicate tombstones in ancestor detach (#12460 ) ## Problem Ancestor detach from a previously detached parent when there were no writes panics since it tries to upload the tombstone layer twice. ## Summary of Changes If we're gonna copy the tombstone from the ancestor, don't bother creating it. Fixes https://github.com/neondatabase/neon/issues/12458	2025-07-03 16:35:46 +00:00
Erik Grinaker	42e4e5a418	Add GetPage request splitting	2025-07-03 18:31:12 +02:00
Arpad Müller	a852bc5e39	Add new activating scheduling policy for safekeepers (#12441 ) When deploying new safekeepers, we don't immediately want to send traffic to them. Maybe they are not ready yet by the time the deploy script is registering them with the storage controller. For pageservers, the storcon solves the problem by not scheduling stuff to them unless there has been a positive heartbeat response. We can't do the same for safekeepers though, otherwise a single down safekeeper would mean we can't create new timelines in smaller regions where there is only three safekeepers in total. So far we have created safekeepers as `pause` but this adds a manual step to safekeeper deployment which is prone to oversight. We want things to be automatted. So we introduce a new state `activating` that acts just like `pause`, except that we automatically transition the policy to `active` once we get a positive heartbeat from the safekeeper. For `pause`, we always keep the safekeeper paused.	2025-07-03 16:27:43 +00:00
Aleksandr Sarantsev	b96983a31c	storcon: Ignore keep-failing reconciles (#12391 ) ## Problem Currently, if `storcon` (storage controller) reconciliations repeatedly fail, the system will indefinitely freeze optimizations. This can result in optimization starvation for several days until the reconciliation issues are manually resolved. To mitigate this, we should detect persistently failing reconciliations and exclude them from influencing the optimization decision. ## Summary of Changes - A tenant shard reconciliation is now considered "keep-failing" if it fails 5 consecutive times. These failures are excluded from the optimization readiness check. - Added a new metric: `storage_controller_keep_failing_reconciles` to monitor such cases. - Added a warning log message when a reconciliation is marked as "keep-failing". --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-03 16:21:36 +00:00
Heikki Linnakangas	96a817fa2b	Fix the case that storage auth token is _not_ used I broke that in previous commit while fixing the case of using a token.	2025-07-03 18:39:06 +03:00
Heikki Linnakangas	e7b057f2e8	Fix passing storage JWT token to the communicator process Makes the 'test_compute_auth_to_pageserver' test pass	2025-07-03 18:14:22 +03:00
Dmitrii Kovalkov	3ed28661b1	storcon: remote feature testing safekeeper quorum checks (#12459 ) ## Problem Previous PR didn't fix the creation of timeline in neon_local with <3 safekeepers because there is one more check down the stack. - Closes: https://github.com/neondatabase/neon/issues/12298 - Follow up on https://github.com/neondatabase/neon/pull/12378 ## Summary of changes - Remove feature `testing` safekeeper quorum checks from storcon --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-07-03 15:02:30 +00:00
Conrad Ludgate	03e604e432	Nightly lints and small tweaks (#12456 ) Let chains available in 1.88 :D new clippy lints coming up in future releases.	2025-07-03 14:47:12 +00:00
HaoyuHuang	4db934407a	SK changes #1 (#12448 ) ## TLDR This PR is a no-op. The changes are disabled by default. ## Problem I. Currently we don't have a way to detect disk I/O failures from WAL operations. II. We observe that the offloader fails to upload a segment due to race conditions on XLOG SWITCH and PG start streaming WALs. wal_backup task continously failing to upload a full segment while the segment remains partial on the disk. The consequence is that commit_lsn for all SKs move forward but backup_lsn stays the same. Then, all SKs run out of disk space. III. We have discovered SK bugs where the WAL offload owner cannot keep up with WAL backup/upload to S3, which results in an unbounded accumulation of WAL segment files on the Safekeeper's disk until the disk becomes full. This is a somewhat dangerous operation that is hard to recover from because the Safekeeper cannot write its control files when it is out of disk space. There are actually 2 problems here: 1. A single problematic timeline can take over the entire disk for the SK 2. Once out of disk, it's difficult to recover SK IV. Neon reports certain storage errors as "critical" errors using a marco, which will increment a counter/metric that can be used to raise alerts. However, this metric isn't sliced by tenant and/or timeline today. We need the tenant/timeline dimension to better respond to incidents and for blast radius analysis. ## Summary of changes I. The PR adds a `safekeeper_wal_disk_io_errors ` which is incremented when SK fails to create or flush WALs. II. To mitigate this issue, we will re-elect a new offloader if the current offloader is lagging behind too much. Each SK makes the decision locally but they are aware of each other's commit and backup lsns. The new algorithm is - determine_offloader will pick a SK. say SK-1. - Each SK checks -- if commit_lsn - back_lsn > threshold, -- -- remove SK-1 from the candidate and call determine_offloader again. SK-1 will step down and all SKs will elect the same leader again. After the backup is caught up, the leader will become SK-1 again. This also helps when SK-1 is slow to backup. I'll set the reelect backup lag to 4 GB later. Setting to 128 MB in dev to trigger the code more frequently. III. This change addresses problem no. 1 by having the Safekeeper perform a timeline disk utilization check check when processing WAL proposal messages from Postgres/compute. The Safekeeper now rejects the WAL proposal message, effectively stops writing more WAL for the timeline to disk, if the existing WAL files for the timeline on the SK disk exceeds a certain size (the default threshold is 100GB). The disk utilization is calculated based on a `last_removed_segno` variable tracked by the background task removing WAL files, which produces an accurate and conservative estimate (>= than actual disk usage) of the actual disk usage. IV. * Add a new metric `hadron_critical_storage_event_count` that has the `tenant_shard_id` and `timeline_id` as dimensions. * Modified the `crtitical!` marco to include tenant_id and timeline_id as additional arguments and adapted existing call sites to populate the tenant shard and timeline ID fields. The `critical!` marco invocation now increments the `hadron_critical_storage_event_count` with the extra dimensions. (In SK there isn't the notion of a tenant-shard, so just the tenant ID is recorded in lieu of tenant shard ID.) I considered adding a separate marco to avoid merge conflicts, but I think in this case (detecting critical errors) conflicts are probably more desirable so that we can be aware whenever Neon adds another `critical!` invocation in their code. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com> Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-03 14:32:53 +00:00
Heikki Linnakangas	956c2f4378	cargo fmt	2025-07-03 16:16:42 +03:00
Heikki Linnakangas	3293e4685e	Fix cases where pageserver gets stuck waiting for LSN The compute might make a request with an LSN that it hasn't even flushed yet.	2025-07-03 16:14:45 +03:00
Erik Grinaker	6f8650782f	Client tweaks	2025-07-03 14:54:23 +02:00
Erik Grinaker	14214eb853	Add client shard routing	2025-07-03 14:42:35 +02:00

1 2 3 4 5 ...

8404 Commits