rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-22 21:59:59 +00:00

Author	SHA1	Message	Date
Christian Schwarz	7458d031b1	clippy: fix unfounded warning on macOS (#12501 ) Before this PR, macOS builds would get clippy warning ``` warning: `tokio_epoll_uring::thread_local_system` does not refer to an existing function ``` The reason is that the `thread_local_system` function is only defined on Linux. Add `allow-invalid = true` to make macOS clippy pass, and manually test that on Linux builds, clippy still fails when we use it. refs - https://databricks.slack.com/archives/C09254R641L/p1751917655527099 Co-authored-by: Christian Schwarz <Christian Schwarz>	2025-07-08 13:59:45 +00:00
Aleksandr Sarantsev	38384c37ac	Make node deletion context-aware (#12494 ) ## Problem Deletion process does not calculate preferred nodes correctly - it doesn't consider current tenant-shard layout among all pageservers. ## Summary of changes - Added a schedule context calculation for node deletion Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-08 13:15:14 +00:00
Christian Schwarz	2b2a547671	fix(tests): periodic and immediate gc is effectively a no-op in tests (#12431 ) The introduction of the default lease deadline feature 9 months ago made it so that after PS restart, `.timeline_gc()` calls in Python tests are no-ops for 10 minute after pageserver startup: the `gc_iteration()` bails with `Skipping GC because lsn lease deadline is not reached`. I did some impact analysis in the following PR. About 30 Python tests are affected: - https://github.com/neondatabase/neon/pull/12411 Rust tests that don't explicitly enable periodic GC or invoke GC manually are unaffected because we disable periodic GC by default in the `TenantHarness`'s tenant config. Two tests explicitly did `start_paused=true` + `tokio::time::advance()`, but it would add cognitive and code bloat to each existing and future test case that uses TenantHarness if we took that route. So, this PR sets the default lease deadline feature in both Python and Rust tests to zero by default. Tests that test the feature were thus identified by failing the test: - Python test `test_readonly_node_gc` + `test_lsn_lease_size` - Rust test `test_lsn_lease`. To accomplish the above, I changed the code that computes the initial lease deadline to respect the pageserver.toml's default tenant config, which it didn't before (and I would consider a bug). The Python test harness and the Rust TenantHarness test harness then simply set the default tenant config field to zero. Drive-by: - `test_lsn_lease_size` was writing a lot of data unnecessarily; reduce the amount and speed up the test refs - PR that introduced default lease deadline: https://github.com/neondatabase/neon/pull/9055/files - fixes https://databricks.atlassian.net/browse/LKB-92 --------- Co-authored-by: Christian Schwarz <Christian Schwarz>	2025-07-08 12:56:22 +00:00
a-masterov	59e393aef3	Enable parallel execution of extension tests (#12118 ) ## Problem Extension tests were previously run sequentially, resulting in unnecessary wait time and underutilization of available CPU cores. ## Summary of changes Tests are now executed in a customizable number of parallel threads using separate database branches. This reduces overall test time by approximately 50% (e.g., on my laptop, parallel test lasts 173s, while sequential one lasts 340s) and increases the load on the pageserver, providing better test coverage. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Alexey Masterov <alexey.masterov@databricks.com>	2025-07-08 11:28:39 +00:00
Peter Bendel	f51ed4a2c4	"disable" disk eviction in pagebench periodic benchmark (#12487 ) ## Problem https://github.com/neondatabase/neon/pull/12464 introduced new defaults for pageserver disk based eviction which activated disk based eviction for pagebench periodic pagebench. This caused the testcase to fail. ## Summary of changes Override the new defaults during testcase execution. ## Test run https://github.com/neondatabase/neon/actions/runs/16120217757/job/45483869734 Test run was successful, so merging this now	2025-07-08 09:38:06 +00:00
Mikhail	4f16ab3f56	add lfc offload and prewarm error metrics (#12486 ) Add `compute_ctl_lfc_prewarm_errors_total` and `compute_ctl_lfc_offload_errors_total` metrics. Add comments in `test_lfc_prewarm`. Correction PR for https://github.com/neondatabase/neon/pull/12447 https://github.com/neondatabase/cloud/issues/19011	2025-07-08 09:34:01 +00:00
Dmitrii Kovalkov	18796fd1dd	tests: more allowed errors for test_safekeeper_migration (#12495 ) ## Problem Pageserver now writes errors in the log during the safekeeper migration. Some errors are added to allowed errors, but "timeline not found in global map" is not. - Will be properly fixed in https://github.com/neondatabase/neon/issues/12191 ## Summary of changes Add "timeline not found in global map" error in a list of allowed errors in `test_safekeeper_migration_simple`	2025-07-08 09:15:29 +00:00
Aleksandr Sarantsev	2f3fc7cb57	Fix keep-failing reconciles test & add logs (#12497 ) ## Problem Test is flaky due to the following warning in the logs: ``` Keeping extra secondaries: can't determine which of [NodeId(1), NodeId(2)] to remove (some nodes offline?) ``` Some nodes being offline is expected behavior in this test. ## Summary of changes - Added `Keeping extra secondaries` to the list of allowed errors - Improved logging for better debugging experience Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-08 08:51:50 +00:00
Folke Behrens	e65d5f7369	proxy: Remove the endpoint filter cache (#12488 ) ## Problem The endpoint filter cache is still unused because it's not yet reliable enough to be used. It only consumes a lot of memory. ## Summary of changes Remove the code. Needs a new design. neondatabase/cloud#30634	2025-07-07 17:46:33 +00:00
Conrad Ludgate	55aef2993d	introduce a JSON serialization lib (#12417 ) See #11992 and #11961 for some examples of usecases. This introduces a JSON serialization lib, designed for more flexibility than serde_json offers. ## Dynamic construction Sometimes you have dynamic values you want to serialize, that are not already in a serde-aware model like a struct or a Vec etc. To achieve this with serde, you need to implement a lot of different traits on a lot of different new-types. Because of this, it's often easier to give-in and pull all the data into a serde-aware model (serde_json::Value or some intermediate struct), but that is often not very efficient. This crate allows full control over the JSON encoding without needing to implement any extra traits. Just call the relevant functions, and it will guarantee a correctly encoded JSON value. ## Async construction Similar to the above, sometimes the values arrive asynchronously. Often collecting those values in memory is more expensive than writing them as JSON, since the overheads of `Vec` and `String` is much higher, however there are exceptions. Serializing to JSON all in one go is also more CPU intensive and can cause lag spikes, whereas serializing values incrementally spreads out the CPU load and reduces lag.	2025-07-07 15:12:02 +00:00
Erik Grinaker	1eef961f09	pageserver: add gRPC error logging (#12445 ) ## Problem We don't log gRPC request errors on the server. Touches #11728. ## Summary of changes Automatically log non-OK gRPC response statuses in the observability middleware, and add corresponding logging for the `get_pages` stream. Also adds the peer address and gRPC method to the gRPC tracing span. Example output: ``` 2025-07-02T20:18:16.813718Z WARN grpc:pageservice{peer=127.0.0.1:56698 method=CheckRelExists tenant_id=c7b45faa1924b1958f05c5fdee8b0d04 timeline_id=4a36ee64fd2f97781b9dcc2c3cddd51b shard_id=0000}: request failed with NotFound: Tenant c7b45faa1924b1958f05c5fdee8b0d04 not found ```	2025-07-07 12:24:06 +00:00
Dmitrii Kovalkov	fc10bb9438	storage: rename term -> last_log_term in TimelineMembershipSwitchResponse (#12481 ) ## Problem Names are not consistent between safekeeper migration RFC and the actual implementation. It's not used anywhere in production yet, so it's safe to rename. We don't need to worry about backward compatibility. - Follow up on https://github.com/neondatabase/neon/pull/12432 ## Summary of changes - rename term -> last_log_term in TimelineMembershipSwitchResponse - add missing fields to TimelineMembershipSwitchResponse in python	2025-07-07 09:22:03 +00:00
Dmitrii Kovalkov	4b5c75b52f	docs: revise safekeeper migration rfc (#12432 ) ## Problem The safekeeper migration code/logic slightly diverges from the initial RFC. This PR aims to address these differences. - Part of https://github.com/neondatabase/neon/issues/12192 ## Summary of changes - Adjust the RFC to reflect that we implemented the safekeeper reconciler with in-memory queue. - Add `sk_set_notified_generation` field to the `timelines` table in the RFC to address the "finish migration atomically" problem. - Describe how we are going to make the timeline migration handler fully retriable with in-memory reconciler queue. - Unify type/field/method names in the code and RFC. - Fix typos	2025-07-07 07:25:15 +00:00
Peter Bendel	ca9d8761ff	Move some perf benchmarks from hetzner to aws arm github runners (#12393 ) ## Problem We want to move some benchmarks from hetzner runners to aws graviton runners ## Summary of changes Adjust the runner labels for some workflows. Adjust the pagebench number of clients to match the latecny knee at 8 cores of the new instance type Add `--security-opt seccomp=unconfined` to docker run command to bypass IO_URING EPERM error. ## New runners https://us-east-2.console.aws.amazon.com/ec2/home?region=us-east-2#Instances:instanceState=running;search=:github-unit-perf-runner-arm;v=3;$case=tags:true%5C,client:false;$regex=tags:false%5C,client:false;sort=tag:Name ## Important Notes I added the run-benchmarks label to get this tested before we merge it. [See](https://github.com/neondatabase/neon/actions/runs/15974141360) I also test a run of pagebench with the new setup from this branch, see https://github.com/neondatabase/neon/actions/runs/15972523054 - Update: the benchmarking workflow had failures, [see] (https://github.com/neondatabase/neon/actions/runs/15974141360/job/45055897591) - changed docker run command to avoid io_uring EPERM error, new run [see](https://github.com/neondatabase/neon/actions/runs/15997965633/job/45125689920?pr=12393) Update: the pagebench test run on the new runner [completed successfully](https://github.com/neondatabase/neon/actions/runs/15972523054/job/45046772556) Update 2025-07-07: the latest runs with instance store ext4 have been successful and resolved the direct I/O issues we have been seeing before in some runs. We only had one perf testcase failing (shard split) that had been flaky before. So I think we can merge this now. ## Follow up if this is merged and works successfully we must create a separate issue to de-provision the hetzner unit-perf runners defined [here](`91a41729af/ansible/inventory/hosts_metal (L111)`)	2025-07-07 06:44:41 +00:00
Heikki Linnakangas	b568189f7b	Build dummy libcommunicator into the 'neon' extension (#12266 ) This doesn't do anything interesting yet, but demonstrates linking Rust code to the neon Postgres extension, so that we can review and test drive just the build process changes independently.	2025-07-04 23:27:28 +00:00
Arpad Müller	b94a5ce119	Don't await the walreceiver on timeline shutdown (#12402 ) Mostly a revert of https://github.com/neondatabase/neon/pull/11851 and https://github.com/neondatabase/neon/pull/12330 . Christian suggested reverting his PR to fix the issue https://github.com/neondatabase/neon/issues/12369 . Alternatives considered: 1. I have originally wanted to introduce cancellation tokens to `RequestContext`, but in the end I gave up on them because I didn't find a select-free way of preventing `test_layer_download_cancelled_by_config_location` from hanging. Namely if I put a select around the `get_or_maybe_download` invocation in `get_values_reconstruct_data`, it wouldn't hang, but if I put it around the `download_init_and_wait` invocation in `get_or_maybe_download`, the test would still hang. Not sure why, even though I made the attached child function of the `RequestContext` create a child token. 2. Introduction of a `download_cancel` cancellation token as a child of a timeline token, putting it into `RemoteTimelineClient` together with the main token, and then putting it into the whole `RemoteTimelineClient` read path. 3. Greater refactorings, like to make cancellation tokens follow a DAG structure so you can have tokens cancelled either by say timeline shutting down or a request ending. It doesn't just represent an effort that we don't have the engineering budget for, it also causes interesting questions like what to do about batching (do you cancel the entire request if only some requests get cancelled?). We might see a reemergence of https://github.com/neondatabase/neon/issues/11762, but given that we have https://github.com/neondatabase/neon/pull/11853 and https://github.com/neondatabase/neon/pull/12376 now, it is possible that it will not come back. Looking at some code, it might actually fix the locations where the error pops up. Let's see. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-07-04 20:12:10 +00:00
Mikhail	7ed4530618	`offload_lfc_interval_seconds` in ComputeSpec (#12447 ) - Add ComputeSpec flag `offload_lfc_interval_seconds` controlling whether LFC should be offloaded to endpoint storage. Default value (None) means "don't offload". - Add glue code around it for `neon_local` and integration tests. - Add `autoprewarm` mode for `test_lfc_prewarm` testing `offload_lfc_interval_seconds` and `autoprewarm` flags in conjunction. - Rename `compute_ctl_lfc_prewarm_requests_total` and `compute_ctl_lfc_offload_requests_total` to `compute_ctl_lfc_prewarms_total` and `compute_ctl_lfc_offloads_total` to reflect we count prewarms and offloads, not `compute_ctl` requests of those. Don't count request in metrics if there is a prewarm/offload already ongoing. https://github.com/neondatabase/cloud/issues/19011 Resolves: https://github.com/neondatabase/cloud/issues/30770	2025-07-04 18:49:57 +00:00
Heikki Linnakangas	3a44774227	impr(ci): Simplify build-macos workflow, prepare for rust communicator (#12357 ) Don't build walproposer-lib as a separate job. It only takes a few seconds, after you have built all its dependencies. Don't cache the Neon Pg extensions in the per-postgres-version caches. This is in preparation for the communicator project, which will introduce Rust parts to the Neon Pg extension, which complicates the build process. With that, the 'make neon-pg-ext' step requires some of the Rust bits to be built already, or it will build them on the spot, which in turn requires all the Rust sources to be present, and we don't want to repeat that part for each Postgres version anyway. To prepare for that, rely on "make all" to build the neon extension and the rust bits in the correct order instead. Building the neon extension doesn't currently take very long anyway after you have built Postgres itself, so you don't gain much by caching it. See https://github.com/neondatabase/neon/pull/12266. Add an explicit "rustup update" step to update the toolchain. It's not strictly necessary right now, because currently "make all" will only invoke "cargo build" once and the race condition described in the comment doesn't happen. But prepare for the future. To further simplify the build, get rid of the separate 'build-postgres' jobs too, and just build Postgres as a step in the main job. That makes the overall workflow run longer, because we no longer build all the postgres versions in parallel (although you still get intra-runner parallelism thanks to `make -j`), but that's acceptable. In the cache-hit case, it might even be a little faster because there is less overhead from launching jobs, and in the cache-miss case, it's maybe 5-10 minutes slower altogether. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-07-04 15:34:58 +00:00
Aleksandr Sarantsev	b2705cfee6	storcon: Make node deletion process cancellable (#12320 ) ## Problem The current deletion operation is synchronous and blocking, which is unsuitable for potentially long-running tasks like. In such cases, the standard HTTP request-response pattern is not a good fit. ## Summary of Changes - Added new `storcon_cli` commands: `NodeStartDelete` and `NodeCancelDelete` to initiate and cancel deletion asynchronously. - Added corresponding `storcon` HTTP handlers to support the new start/cancel deletion flow. - Introduced a new type of background operation: `Delete`, to track and manage the deletion process outside the request lifecycle. --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-04 14:08:09 +00:00
Trung Dinh	225267b3ae	Make disk eviction run by default (#12464 ) ## Problem ## Summary of changes Provide a sane set of default values for disk_usage_based_eviction. Closes https://github.com/neondatabase/neon/issues/12301.	2025-07-04 12:06:10 +00:00
Vlad Lazar	d378726e38	pageserver: reset the broker subscription if it's been idle for a while (#12436 ) ## Problem I suspect that the pageservers get stuck on receiving broker updates. ## Summary of changes This is a an opportunistic (staging only) patch that resets the susbscription stream if it's been idle for a while. This won't go to prod in this form. I'll revert or update it before Friday.	2025-07-04 10:25:03 +00:00
Konstantin Knizhnik	436a117c15	Do not allocate anything in subtransaction memory context (#12176 ) ## Problem See https://github.com/neondatabase/neon/issues/12173 ## Summary of changes Allocate table in TopTransactionMemoryContext --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-07-04 10:24:39 +00:00
Alex Chi Z.	cc699f6f85	fix(pageserver): do not log no-route-to-host errors (#12468 ) ## Problem close https://github.com/neondatabase/neon/issues/12344 ## Summary of changes Add `HostUnreachable` and `NetworkUnreachable` to expected I/O error. This was new in Rust 1.83. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-03 21:57:42 +00:00
Konstantin Knizhnik	495112ca50	Add GUC for dynamically enable compare local mode (#12424 ) ## Problem DEBUG_LOCAL_COMPARE mode allows to detect data corruption. But it requires rebuild of neon extension (and so requires special image) and significantly slowdown execution because always fetch pages from page server. ## Summary of changes Introduce new GUC `neon.debug_compare_local`, accepting the following values: " none", "prefetch", "lfc", "all" (by default it is definitely disabled). In mode less than "all", neon SMGR will not fetch page from PS if it is found in local caches. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-07-03 17:37:05 +00:00
Suhas Thalanki	46158ee63f	fix(compute): background installed extensions worker would collect data without waiting for interval (#12465 ) ## Problem The background installed extensions worker relied on `interval.tick()` to go to sleep for a period of time. This can lead to bugs due to the interval being updated at the end of the loop as the first tick is [instantaneous](https://docs.rs/tokio/latest/tokio/time/struct.Interval.html#method.tick). ## Summary of changes Changed it to a `tokio::time::sleep` to prevent this issue. Now it puts the thread to sleep and only wakes up after the specified duration	2025-07-03 17:10:30 +00:00
Alex Chi Z.	305fe61ac1	fix(pageserver): also print open layer size in backpressure (#12440 ) ## Problem Better investigate memory usage during backpressure ## Summary of changes Print open layer size if backpressure is activated Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-03 16:37:11 +00:00
Vlad Lazar	f95fdf5b44	pageserver: fix duplicate tombstones in ancestor detach (#12460 ) ## Problem Ancestor detach from a previously detached parent when there were no writes panics since it tries to upload the tombstone layer twice. ## Summary of Changes If we're gonna copy the tombstone from the ancestor, don't bother creating it. Fixes https://github.com/neondatabase/neon/issues/12458	2025-07-03 16:35:46 +00:00
Arpad Müller	a852bc5e39	Add new activating scheduling policy for safekeepers (#12441 ) When deploying new safekeepers, we don't immediately want to send traffic to them. Maybe they are not ready yet by the time the deploy script is registering them with the storage controller. For pageservers, the storcon solves the problem by not scheduling stuff to them unless there has been a positive heartbeat response. We can't do the same for safekeepers though, otherwise a single down safekeeper would mean we can't create new timelines in smaller regions where there is only three safekeepers in total. So far we have created safekeepers as `pause` but this adds a manual step to safekeeper deployment which is prone to oversight. We want things to be automatted. So we introduce a new state `activating` that acts just like `pause`, except that we automatically transition the policy to `active` once we get a positive heartbeat from the safekeeper. For `pause`, we always keep the safekeeper paused.	2025-07-03 16:27:43 +00:00
Aleksandr Sarantsev	b96983a31c	storcon: Ignore keep-failing reconciles (#12391 ) ## Problem Currently, if `storcon` (storage controller) reconciliations repeatedly fail, the system will indefinitely freeze optimizations. This can result in optimization starvation for several days until the reconciliation issues are manually resolved. To mitigate this, we should detect persistently failing reconciliations and exclude them from influencing the optimization decision. ## Summary of Changes - A tenant shard reconciliation is now considered "keep-failing" if it fails 5 consecutive times. These failures are excluded from the optimization readiness check. - Added a new metric: `storage_controller_keep_failing_reconciles` to monitor such cases. - Added a warning log message when a reconciliation is marked as "keep-failing". --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-03 16:21:36 +00:00
Dmitrii Kovalkov	3ed28661b1	storcon: remote feature testing safekeeper quorum checks (#12459 ) ## Problem Previous PR didn't fix the creation of timeline in neon_local with <3 safekeepers because there is one more check down the stack. - Closes: https://github.com/neondatabase/neon/issues/12298 - Follow up on https://github.com/neondatabase/neon/pull/12378 ## Summary of changes - Remove feature `testing` safekeeper quorum checks from storcon --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-07-03 15:02:30 +00:00
Conrad Ludgate	03e604e432	Nightly lints and small tweaks (#12456 ) Let chains available in 1.88 :D new clippy lints coming up in future releases.	2025-07-03 14:47:12 +00:00
HaoyuHuang	4db934407a	SK changes #1 (#12448 ) ## TLDR This PR is a no-op. The changes are disabled by default. ## Problem I. Currently we don't have a way to detect disk I/O failures from WAL operations. II. We observe that the offloader fails to upload a segment due to race conditions on XLOG SWITCH and PG start streaming WALs. wal_backup task continously failing to upload a full segment while the segment remains partial on the disk. The consequence is that commit_lsn for all SKs move forward but backup_lsn stays the same. Then, all SKs run out of disk space. III. We have discovered SK bugs where the WAL offload owner cannot keep up with WAL backup/upload to S3, which results in an unbounded accumulation of WAL segment files on the Safekeeper's disk until the disk becomes full. This is a somewhat dangerous operation that is hard to recover from because the Safekeeper cannot write its control files when it is out of disk space. There are actually 2 problems here: 1. A single problematic timeline can take over the entire disk for the SK 2. Once out of disk, it's difficult to recover SK IV. Neon reports certain storage errors as "critical" errors using a marco, which will increment a counter/metric that can be used to raise alerts. However, this metric isn't sliced by tenant and/or timeline today. We need the tenant/timeline dimension to better respond to incidents and for blast radius analysis. ## Summary of changes I. The PR adds a `safekeeper_wal_disk_io_errors ` which is incremented when SK fails to create or flush WALs. II. To mitigate this issue, we will re-elect a new offloader if the current offloader is lagging behind too much. Each SK makes the decision locally but they are aware of each other's commit and backup lsns. The new algorithm is - determine_offloader will pick a SK. say SK-1. - Each SK checks -- if commit_lsn - back_lsn > threshold, -- -- remove SK-1 from the candidate and call determine_offloader again. SK-1 will step down and all SKs will elect the same leader again. After the backup is caught up, the leader will become SK-1 again. This also helps when SK-1 is slow to backup. I'll set the reelect backup lag to 4 GB later. Setting to 128 MB in dev to trigger the code more frequently. III. This change addresses problem no. 1 by having the Safekeeper perform a timeline disk utilization check check when processing WAL proposal messages from Postgres/compute. The Safekeeper now rejects the WAL proposal message, effectively stops writing more WAL for the timeline to disk, if the existing WAL files for the timeline on the SK disk exceeds a certain size (the default threshold is 100GB). The disk utilization is calculated based on a `last_removed_segno` variable tracked by the background task removing WAL files, which produces an accurate and conservative estimate (>= than actual disk usage) of the actual disk usage. IV. * Add a new metric `hadron_critical_storage_event_count` that has the `tenant_shard_id` and `timeline_id` as dimensions. * Modified the `crtitical!` marco to include tenant_id and timeline_id as additional arguments and adapted existing call sites to populate the tenant shard and timeline ID fields. The `critical!` marco invocation now increments the `hadron_critical_storage_event_count` with the extra dimensions. (In SK there isn't the notion of a tenant-shard, so just the tenant ID is recorded in lieu of tenant shard ID.) I considered adding a separate marco to avoid merge conflicts, but I think in this case (detecting critical errors) conflicts are probably more desirable so that we can be aware whenever Neon adds another `critical!` invocation in their code. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com> Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-03 14:32:53 +00:00
Ruslan Talpa	95e1011cd6	subzero pre-integration refactor (#12416 ) ## Problem integrating subzero requires a bit of refactoring. To make the integration PR a bit more manageable, the refactoring is done in this separate PR. ## Summary of changes * move common types/functions used in sql_over_http to errors.rs and http_util.rs * add the "Local" auth backend to proxy (similar to local_proxy), useful in local testing * change the Connect and Send type for the http client to allow for custom body when making post requests to local_proxy from the proxy --------- Co-authored-by: Ruslan Talpa <ruslan.talpa@databricks.com>	2025-07-03 11:04:08 +00:00
Conrad Ludgate	1bc1eae5e8	fix redis credentials check (#12455 ) ## Problem `keep_connection` does not exit, so it was never setting `credentials_refreshed`. ## Summary of changes Set `credentials_refreshed` to true when we first establish a connection, and after we re-authenticate the connection.	2025-07-03 09:51:35 +00:00
Matthias van de Meent	e12d4f356a	Work around Clap's incorrect usage of Display for default_value_t (#12454 ) ## Problem #12450 ## Summary of changes Instead of `#[arg(default_value_t = typed_default_value)]`, we use `#[arg(default_value = "str that deserializes into the value")]`, because apparently you can't convince clap to _not_ deserialize from the Display implementation of an imported enum.	2025-07-03 09:41:09 +00:00
Folke Behrens	3415b90e88	proxy/logging: Add "ep" and "query_id" to list of extracted fields (#12437 ) Extract two more interesting fields from spans: ep (endpoint) and query_id. Useful for reliable filtering in logging.	2025-07-03 08:09:10 +00:00
Conrad Ludgate	e01c8f238c	[proxy] update noisy error logging (#12438 ) Health checks for pg-sni-router open a TCP connection and immediately close it again. This is noisy. We will filter out any EOF errors on the first message. "acquired permit" debug log is incorrect since it logs when we timedout as well. This fixes the debug log.	2025-07-03 07:46:48 +00:00
Conrad Ludgate	45607cbe0c	[local_proxy]: ignore TLS for endpoint (#12316 ) ## Problem When local proxy is configured with TLS, the certificate does not match the endpoint string. This currently returns an error. ## Summary of changes I don't think this code is necessary anymore, taking the prefix from the hostname is good enough (and is equivalent to what `endpoint_sni` was doing) and we ignore checking the domain suffix.	2025-07-03 07:35:57 +00:00
Tristan Partin	8b4fbefc29	Patch pgaudit to disable logging in parallel workers (#12325 ) We want to turn logging in parallel workers off to reduce log amplification in queries which use parallel workers. Part-of: https://github.com/neondatabase/cloud/issues/28483 Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-02 19:54:47 +00:00
Alex Chi Z.	a9a51c038b	rfc: storage feature flags (#11805 ) ## Problem Part of https://github.com/neondatabase/neon/issues/11813 ## Summary of changes --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-02 17:41:36 +00:00
Alexey Kondratov	44121cc175	docs(compute): RFC for compute rolling restart with prewarm (#11294 ) ## Problem Neon currently implements several features that guarantee high uptime of compute nodes: 1. Storage high-availability (HA), i.e. each tenant shard has a secondary pageserver location, so we can quickly switch over compute to it in case of primary pageserver failure. 2. Fast compute provisioning, i.e. we have a fleet of pre-created empty computes, that are ready to serve workload, so restarting unresponsive compute is very fast. 3. Preemptive NeonVM compute provisioning in case of k8s node unavailability. This helps us to be well-within the uptime SLO of 99.95% most of the time. Problems begin when we go up to multi-TB workloads and 32-64 CU computes. During restart, compute looses all caches: LFC, shared buffers, file system cache. Depending on the workload, it can take a lot of time to warm up the caches, so that performance could be degraded and might be even unacceptable for certain workloads. The latter means that although current approach works well for small to medium workloads, we still have to do some additional work to avoid performance degradation after restart of large instances. [Rendered version](https://github.com/neondatabase/neon/blob/alexk/pg-prewarm-rfc/docs/rfcs/2025-03-17-compute-prewarm.md) Part of https://github.com/neondatabase/cloud/issues/19011	2025-07-02 17:16:00 +00:00
Dmitry Savelev	0429a0db16	Switch the billing metrics storage format to ndjson. (#12427 ) ## Problem The billing team wants to change the billing events pipeline and use a common events format in S3 buckets across different event producers. ## Summary of changes Change the events storage format for billing events from JSON to NDJSON. Also partition files by hours, rather than days. Resolves: https://github.com/neondatabase/cloud/issues/29995	2025-07-02 16:30:47 +00:00
Conrad Ludgate	d6beb3ffbb	[proxy] rewrite pg-text to json routines (#12413 ) We would like to move towards an arena system for JSON encoding the responses. This change pushes an "out" parameter into the pg-test to json routines to make swapping in an arena system easier in the future. (see #11992) This additionally removes the redundant `column: &[Type]` argument, as well as rewriting the pg_array parser. --- I rewrote the pg_array parser since while making these changes I found it hard to reason about. I went back to the specification and rewrote it from scratch. There's 4 separate routines: 1. pg_array_parse - checks for any prelude (multidimensional array ranges) 2. pg_array_parse_inner - only deals with the arrays themselves 3. pg_array_parse_item - parses a single item from the array, this might be quoted, unquoted, or another nested array. 4. pg_array_parse_quoted - parses a quoted string, following the relevant string escaping rules.	2025-07-02 12:46:11 +00:00
Arpad Müller	efd7e52812	Don't error if timeline offload is already in progress (#12428 ) Don't print errors like: ``` Compaction failed 1 times, retrying in 2s: Failed to offload timeline: Unexpected offload error: Timeline deletion is already in progress ``` Print it at info log level instead. https://github.com/neondatabase/cloud/issues/30666	2025-07-02 12:06:55 +00:00
Ivan Efremov	0f879a2e8f	[proxy]: Fix redis IRSA expiration failure errors (#12430 ) Relates to the [#30688](https://github.com/neondatabase/cloud/issues/30688)	2025-07-02 08:55:44 +00:00
Dmitrii Kovalkov	8e7ce42229	tests: start primary compute on not-readonly branches (#12408 ) ## Problem https://github.com/neondatabase/neon/pull/11712 changed how computes are started in the test: the lsn is specified, making them read-only static replicas. Lsn is `last_record_lsn` from pageserver. It works fine with read-only branches (because their `last_record_lsn` is equal to `start_lsn` and always valid). But with writable timelines, the `last_record_lsn` on the pageserver might be stale. Particularly in this test, after the `detach_branch` operation, the tenant is reset on the pagesever. It leads to `last_record_lsn` going back to `disk_consistent_lsn`, so basically rolling back some recent writes. If we start a primary compute, it will start at safekeepers' commit Lsn, which is the correct one , and will wait till pageserver catches up with this Lsn after reset. - Closes: https://github.com/neondatabase/neon/issues/12365 ## Summary of changes - Start `primary` compute for writable timelines.	2025-07-02 05:41:17 +00:00
Alex Chi Z.	5ec8881c0b	feat(pageserver): resolve feature flag based on remote size (#12400 ) ## Problem Part of #11813 ## Summary of changes * Compute tenant remote size in the housekeeping loop. * Add a new `TenantFeatureResolver` struct to cache the tenant-specific properties. * Evaluate feature flag based on the remote size. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-01 18:11:24 +00:00
Alex Chi Z.	b254dce8a1	feat(pageserver): report compaction progress (#12401 ) ## Problem close https://github.com/neondatabase/neon/issues/11528 ## Summary of changes Gives us better observability of compaction progress. - Image creation: num of partition processed / total partition - Gc-compaction: index of the in the queue / total items for a full compaction - Shard ancestor compaction: layers to rewrite / total layers Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-01 17:00:27 +00:00
Alex Chi Z.	3815e3b2b5	feat(pageserver): reduce lock contention in l0 compaction (#12360 ) ## Problem L0 compaction currently holds the read lock for a long region while it doesn't need to. ## Summary of changes This patch reduces the one long contention region into 2 short ones: gather the layers to compact at the beginning, and several short read locks when querying the image coverage. Co-Authored-By: Chen Luo --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-01 16:58:41 +00:00
Suhas Thalanki	bbcd70eab3	Dynamic Masking Support for `anon` v2 (#11733 ) ## Problem This PR works on adding dynamic masking support for `anon` v2. It currently only supports static masking. ## Summary of changes Added a security definer function that sets the dynamic masking guc to `true` with superuser permissions. Added a security definer function that adds `anon` to `session_preload_libraries` if it's not already present. Related to: https://github.com/neondatabase/cloud/issues/20456	2025-07-01 16:50:27 +00:00

1 2 3 4 5 ...

8220 Commits