rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 05:22:56 +00:00

Author	SHA1	Message	Date
quantumish	44ec0e6f06	Document problems and pitfalls with fine-grained hashmap impl	2025-08-12 13:42:48 -07:00
quantumish	d33071a386	Fix concurrency bugs in resizing (WIP)	2025-08-12 10:34:29 -07:00
David Freifeld	add1a0ad78	Add initial work in implementing incremental resizing (WIP)	2025-07-14 09:02:56 -07:00
David Freifeld	282b90df28	Fix freelist bug, clean up interface to BucketIdx	2025-07-11 12:47:27 -07:00
David Freifeld	7f655ddb9a	Heavily WIP rewrite of hashmap for concurrency + perf	2025-07-10 17:34:30 -07:00
David Freifeld	da6f4dbafe	Merge branch 'communicator-rewrite' into quantumish/lfc-resizable-map	2025-07-10 17:33:56 -07:00
Heikki Linnakangas	e14bb4be39	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-05 16:59:51 +03:00
Heikki Linnakangas	9a37bfdf63	Fix re-finding an entry in bucket chain	2025-07-05 00:44:46 +03:00
Heikki Linnakangas	50fbf4ac53	Fix hash table initialization across forked processes attach_writer()/reader() are called from each forked process. It's too late to do initialization there, in fact we used to overwrite the contents of the hash table (or at least the freelist?) every time a new process attached to it. The initialization must be done earlier, in the HashMapInit() constructors.	2025-07-04 23:08:34 +03:00
Mikhail	7ed4530618	`offload_lfc_interval_seconds` in ComputeSpec (#12447 ) - Add ComputeSpec flag `offload_lfc_interval_seconds` controlling whether LFC should be offloaded to endpoint storage. Default value (None) means "don't offload". - Add glue code around it for `neon_local` and integration tests. - Add `autoprewarm` mode for `test_lfc_prewarm` testing `offload_lfc_interval_seconds` and `autoprewarm` flags in conjunction. - Rename `compute_ctl_lfc_prewarm_requests_total` and `compute_ctl_lfc_offload_requests_total` to `compute_ctl_lfc_prewarms_total` and `compute_ctl_lfc_offloads_total` to reflect we count prewarms and offloads, not `compute_ctl` requests of those. Don't count request in metrics if there is a prewarm/offload already ongoing. https://github.com/neondatabase/cloud/issues/19011 Resolves: https://github.com/neondatabase/cloud/issues/30770	2025-07-04 18:49:57 +00:00
Aleksandr Sarantsev	b2705cfee6	storcon: Make node deletion process cancellable (#12320 ) ## Problem The current deletion operation is synchronous and blocking, which is unsuitable for potentially long-running tasks like. In such cases, the standard HTTP request-response pattern is not a good fit. ## Summary of Changes - Added new `storcon_cli` commands: `NodeStartDelete` and `NodeCancelDelete` to initiate and cancel deletion asynchronously. - Added corresponding `storcon` HTTP handlers to support the new start/cancel deletion flow. - Introduced a new type of background operation: `Delete`, to track and manage the deletion process outside the request lifecycle. --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-04 14:08:09 +00:00
Heikki Linnakangas	da3f9ee72d	cargo fmt	2025-07-04 12:39:41 +03:00
Alex Chi Z.	cc699f6f85	fix(pageserver): do not log no-route-to-host errors (#12468 ) ## Problem close https://github.com/neondatabase/neon/issues/12344 ## Summary of changes Add `HostUnreachable` and `NetworkUnreachable` to expected I/O error. This was new in Rust 1.83. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-03 21:57:42 +00:00
David Freifeld	794bb7a9e8	Merge branch 'quantumish/comm-lfc-integration' into communicator-rewrite	2025-07-03 10:52:29 -07:00
Arpad Müller	a852bc5e39	Add new activating scheduling policy for safekeepers (#12441 ) When deploying new safekeepers, we don't immediately want to send traffic to them. Maybe they are not ready yet by the time the deploy script is registering them with the storage controller. For pageservers, the storcon solves the problem by not scheduling stuff to them unless there has been a positive heartbeat response. We can't do the same for safekeepers though, otherwise a single down safekeeper would mean we can't create new timelines in smaller regions where there is only three safekeepers in total. So far we have created safekeepers as `pause` but this adds a manual step to safekeeper deployment which is prone to oversight. We want things to be automatted. So we introduce a new state `activating` that acts just like `pause`, except that we automatically transition the policy to `active` once we get a positive heartbeat from the safekeeper. For `pause`, we always keep the safekeeper paused.	2025-07-03 16:27:43 +00:00
Conrad Ludgate	03e604e432	Nightly lints and small tweaks (#12456 ) Let chains available in 1.88 :D new clippy lints coming up in future releases.	2025-07-03 14:47:12 +00:00
HaoyuHuang	4db934407a	SK changes #1 (#12448 ) ## TLDR This PR is a no-op. The changes are disabled by default. ## Problem I. Currently we don't have a way to detect disk I/O failures from WAL operations. II. We observe that the offloader fails to upload a segment due to race conditions on XLOG SWITCH and PG start streaming WALs. wal_backup task continously failing to upload a full segment while the segment remains partial on the disk. The consequence is that commit_lsn for all SKs move forward but backup_lsn stays the same. Then, all SKs run out of disk space. III. We have discovered SK bugs where the WAL offload owner cannot keep up with WAL backup/upload to S3, which results in an unbounded accumulation of WAL segment files on the Safekeeper's disk until the disk becomes full. This is a somewhat dangerous operation that is hard to recover from because the Safekeeper cannot write its control files when it is out of disk space. There are actually 2 problems here: 1. A single problematic timeline can take over the entire disk for the SK 2. Once out of disk, it's difficult to recover SK IV. Neon reports certain storage errors as "critical" errors using a marco, which will increment a counter/metric that can be used to raise alerts. However, this metric isn't sliced by tenant and/or timeline today. We need the tenant/timeline dimension to better respond to incidents and for blast radius analysis. ## Summary of changes I. The PR adds a `safekeeper_wal_disk_io_errors ` which is incremented when SK fails to create or flush WALs. II. To mitigate this issue, we will re-elect a new offloader if the current offloader is lagging behind too much. Each SK makes the decision locally but they are aware of each other's commit and backup lsns. The new algorithm is - determine_offloader will pick a SK. say SK-1. - Each SK checks -- if commit_lsn - back_lsn > threshold, -- -- remove SK-1 from the candidate and call determine_offloader again. SK-1 will step down and all SKs will elect the same leader again. After the backup is caught up, the leader will become SK-1 again. This also helps when SK-1 is slow to backup. I'll set the reelect backup lag to 4 GB later. Setting to 128 MB in dev to trigger the code more frequently. III. This change addresses problem no. 1 by having the Safekeeper perform a timeline disk utilization check check when processing WAL proposal messages from Postgres/compute. The Safekeeper now rejects the WAL proposal message, effectively stops writing more WAL for the timeline to disk, if the existing WAL files for the timeline on the SK disk exceeds a certain size (the default threshold is 100GB). The disk utilization is calculated based on a `last_removed_segno` variable tracked by the background task removing WAL files, which produces an accurate and conservative estimate (>= than actual disk usage) of the actual disk usage. IV. * Add a new metric `hadron_critical_storage_event_count` that has the `tenant_shard_id` and `timeline_id` as dimensions. * Modified the `crtitical!` marco to include tenant_id and timeline_id as additional arguments and adapted existing call sites to populate the tenant shard and timeline ID fields. The `critical!` marco invocation now increments the `hadron_critical_storage_event_count` with the extra dimensions. (In SK there isn't the notion of a tenant-shard, so just the tenant ID is recorded in lieu of tenant shard ID.) I considered adding a separate marco to avoid merge conflicts, but I think in this case (detecting critical errors) conflicts are probably more desirable so that we can be aware whenever Neon adds another `critical!` invocation in their code. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com> Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-03 14:32:53 +00:00
Erik Grinaker	14214eb853	Add client shard routing	2025-07-03 14:42:35 +02:00
Erik Grinaker	d4b4724921	Sanity-check Pageserver URLs	2025-07-03 14:18:14 +02:00
Erik Grinaker	52c586f678	Restructure shard management	2025-07-03 11:51:19 +02:00
David Freifeld	b018b0ff60	Fix cargo clippy lints for neon-shmem	2025-07-02 13:01:26 -07:00
David Freifeld	86fb7b966a	Update `integrated_cache.rs` to use new hashmap API	2025-07-02 12:18:37 -07:00
David Freifeld	0c099b0944	Merge branch 'quantumish/lfc-resizable-map' into quantumish/comm-lfc-integration	2025-07-02 12:05:24 -07:00
David Freifeld	2fe27f510d	Make neon-shmem tests thread-safe and report errno in panics	2025-07-02 11:57:49 -07:00
David Freifeld	19b5618578	Switch to neon_shmem::sync lock_api and integrate into hashmap	2025-07-02 11:44:38 -07:00
David Freifeld	9d3e07ef2c	Add initial prototype of shmem sync primitives	2025-06-30 17:07:07 -07:00
Suhas Thalanki	5f3532970e	[compute] fix: background worker that collects installed extension metrics now updates collection interval (#12277 ) ## Problem Previously, the background worker that collects the list of installed extensions across DBs had a timeout set to 1 hour. This cause a problem with computes that had a `suspend_timeout` > 1 hour as this collection was treated as activity, preventing compute shutdown. Issue: https://github.com/neondatabase/cloud/issues/30147 ## Summary of changes Passing the `suspend_timeout` as part of the `ComputeSpec` so that any updates to this are taken into account by the background worker and updates its collection interval.	2025-06-30 22:12:37 +00:00
Erik Grinaker	c3cb1ab98d	Merge branch 'main' into communicator-rewrite	2025-06-30 21:07:01 +02:00
Erik Grinaker	d0a4ae3e8f	pageserver: add gRPC LSN lease support (#12384 ) ## Problem The gRPC API does not provide LSN leases. ## Summary of changes * Add LSN lease support to the gRPC API. * Use gRPC LSN leases for static computes with `grpc://` connstrings. * Move `PageserverProtocol` into the `compute_api::spec` module and reuse it.	2025-06-30 12:44:17 +00:00
Erik Grinaker	a384d7d501	pageserver: assert no changes to shard identity (#12379 ) ## Problem Location config changes can currently result in changes to the shard identity. Such changes will cause data corruption, as seen with #12217. Resolves #12227. Requires #12377. ## Summary of changes Assert that the shard identity does not change on location config updates and on (re)attach. This is currently asserted with `critical!`, in case it misfires in production. Later, we should reject such requests with an error and turn this into a proper assertion.	2025-06-30 12:36:45 +00:00
Erik Grinaker	a5b0fc560c	Fix/allow remaining clippy lints	2025-06-30 12:36:20 +02:00
Erik Grinaker	67b04f8ab3	Fix a bunch of linter warnings	2025-06-30 11:10:02 +02:00
Erik Grinaker	1d43f3bee8	pageserver: fix stripe size persistence in legacy HTTP handlers (#12377 ) ## Problem Similarly to #12217, the following endpoints may result in a stripe size mismatch between the storage controller and Pageserver if an unsharded tenant has a different stripe size set than the default. This can lead to data corruption if the tenant is later manually split without specifying an explicit stripe size, since the storage controller and Pageserver will apply different defaults. This commonly happens with tenants that were created before the default stripe size was changed from 32k to 2k. * `PUT /v1/tenant/config` * `PATCH /v1/tenant/config` These endpoints are no longer in regular production use (they were used when cplane still managed Pageserver directly), but can still be called manually or by tests. ## Summary of changes Retain the current shard parameters when updating the location config in `PUT \| PATCH /v1/tenant/config`. Also opportunistically derive `Copy` for `ShardParameters`.	2025-06-30 09:08:44 +00:00
Dmitrii Kovalkov	c746678bbc	storcon: implement safekeeper_migrate handler (#11849 ) This PR implements a safekeeper migration algorithm from RFC-035 https://github.com/neondatabase/neon/blob/main/docs/rfcs/035-safekeeper-dynamic-membership-change.md#change-algorithm - Closes: https://github.com/neondatabase/neon/issues/11823 It is not production-ready yet, but I think it's good enough to commit and start testing. There are some known issues which will be addressed in later PRs: - https://github.com/neondatabase/neon/issues/12186 - https://github.com/neondatabase/neon/issues/12187 - https://github.com/neondatabase/neon/issues/12188 - https://github.com/neondatabase/neon/issues/12189 - https://github.com/neondatabase/neon/issues/12190 - https://github.com/neondatabase/neon/issues/12191 - https://github.com/neondatabase/neon/issues/12192 ## Summary of changes - Implement `tenant_timeline_safekeeper_migrate` handler to drive the migration - Add possibility to specify number of safekeepers per timeline in tests (`timeline_safekeeper_count`) - Add `term` and `flush_lsn` to `TimelineMembershipSwitchResponse` - Implement compare-and-swap (CAS) operation over timeline in DB for updating membership configuration safely. - Write simple test to verify that migration code works	2025-06-30 08:30:05 +00:00
Heikki Linnakangas	a352d290eb	Plumb through both libpq and grpc connection strings to the compute Add a new 'pageserver_connection_info' field in the compute spec. It replaces the old 'pageserver_connstring' field with a more complicated struct that includes both libpq and grpc URLs, for each shard (or only one of the the URLs, depending on the configuration). It also includes a flag suggesting which one to use; compute_ctl now uses it to decide which protocol to use for the basebackup. This is compatible with everything that's in production, because the control plane never used the 'pageserver_connstring' field. That was added a long time ago with the idea that it would replace the code that digs the 'neon.pageserver_connstring' GUC from the list of Postgres settings, but we never got around to do that in the control plane. Hence, it was only used with neon_local. But the plan now is to pass the 'pageserver_connection_info' from the control plane, and once that's fully deployed everywhere, the code to parse 'neon.pageserver_connstring' in compute_ctl can be removed. The 'grpc' flag on an endpoint in endpoint config is now more of a suggestion. Compute_ctl gets both URLs, so it can choose to use libpq or grpc as it wishes. It currently always obeys the 'prefer_grpc' flag that's part of the connection info though. Postgres however uses grpc iff the new rust-based communicator is enabled. TODO/plan for the control plane: - Start to pass `pageserver_connection_info` in the spec file. - Also keep the current `neon.pageserver_connstring` setting for now, for backwards compatibility with old computes After that, the `pageserver_connection_info.prefer_grpc` flag in the spec file can be used to control whether compute_ctl uses grpc or libpq. The actual compute's grpc usage will be controlled by the `neon.enable_new_communicator` GUC. It can be set separately from 'prefer_grpc'. Later: - Once all old computes are gone, remove the code to pass `neon.pageserver_connstring`	2025-06-29 18:16:49 +03:00
David Freifeld	74330920ee	Simplify API, squash bugs, and expand hashmap test suite	2025-06-27 17:11:22 -07:00
David Freifeld	c3c136ef3a	Remove statistics utilities from neon_shmem crate	2025-06-27 17:10:52 -07:00
Christian Schwarz	e33e109403	fix(pageserver): buffered writer cancellation error handling (#12376 ) ## Problem The problem has been well described in already-commited PR #11853. tl;dr: BufferedWriter is sensitive to cancellation, which the previous approach was not. The write path was most affected (ingest & compaction), which was mostly fixed in #11853: it introduced `PutError` and mapped instances of `PutError` that were due to cancellation of underlying buffered writer into `CreateImageLayersError::Cancelled`. However, there is a long tail of remaining errors that weren't caught by #11853 that result in `CompactionError::Other`s, which we log with great noise. ## Solution The stack trace logging for CompactionError::Other added in #11853 allows us to chop away at that long tail using the following pattern: - look at the stack trace - from leaf up, identify the place where we incorrectly map from the distinguished variant X indicating cancellation to an `anyhow::Error` - follow that anyhow further up, ensuring it stays the same anyhow all the way up in the `CompactionError::Other` - since it stayed one anyhow chain all the way up, root_cause() will yield us X - so, in `log_compaction_error`, add an additional `downcast_ref` check for X This PR specifically adds checks for - the flush task cancelling (FlushTaskError, BlobWriterError) - opening of the layer writer (GateError) That should cover all the reports in issues - https://github.com/neondatabase/cloud/issues/29434 - https://github.com/neondatabase/neon/issues/12162 ## Refs - follow-up to #11853 - fixup of / fixes https://github.com/neondatabase/neon/issues/11762 - fixes https://github.com/neondatabase/neon/issues/12162 - refs https://github.com/neondatabase/cloud/issues/29434	2025-06-27 15:26:00 +00:00
Dmitrii Kovalkov	6fa1562b57	pageserver: increase default max_size_entries limit for basebackup cache (#12343 ) ## Problem Some pageservers hit `max_size_entries` limit in staging with only ~25 MiB storage used by basebackup cache. The limit is too strict. It should be safe to relax it. - Part of https://github.com/neondatabase/cloud/issues/29353 ## Summary of changes - Increase the default `max_size_entries` from 1000 to 10000	2025-06-27 09:18:18 +00:00
David Freifeld	78b6da270b	Sketchily integrate hashmap rewrite with `integrated_cache`	2025-06-26 16:45:48 -07:00
David Freifeld	47664e40d4	Initial work in visualizing properties of hashmap	2025-06-26 16:00:33 -07:00
David Freifeld	b1e3161d4e	Satisfy `cargo clippy` lints, simplify shrinking API	2025-06-26 14:32:32 -07:00
David Freifeld	1e74b52f7e	Merge branch 'quantumish/lfc-resizable-map' into communicator-rewrite	2025-06-26 10:26:22 -07:00
Alex Chi Z.	33c0d5e2f4	fix(pageserver): make posthog config parsing more robust (#12356 ) ## Problem In our infra config, we have to split server_api_key and other fields in two files: the former one in the sops file, and the latter one in the normal config. It creates the situation that we might misconfigure some regions that it only has part of the fields available, causing storcon/pageserver refuse to start. ## Summary of changes Allow PostHog config to have part of the fields available. Parse it later. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-26 15:49:08 +00:00
Dmitrii Kovalkov	605fb04f89	pageserver: use bounded sender for basebackup cache (#12342 ) ## Problem Basebackup cache now uses unbounded channel for prepare requests. In theory it can grow large if the cache is hung and does not process the requests. - Part of https://github.com/neondatabase/cloud/issues/29353 ## Summary of changes - Replace an unbounded channel with a bounded one, the size is configurable. - Add `pageserver_basebackup_cache_prepare_queue_size` to observe the size of the queue. - Refactor a bit to move all metrics logic to `basebackup_cache.rs`	2025-06-26 13:26:24 +00:00
Alex Chi Z.	6f70885e11	fix(pageserver): allow refresh_interval to be empty (#12349 ) ## Problem Fix for https://github.com/neondatabase/neon/pull/12324 ## Summary of changes Need `serde(default)` to allow this field not present in the config, otherwise there will be a config deserialization error. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-25 22:15:03 +00:00
Alex Chi Z.	6c77638ea1	feat(storcon): retrieve feature flag and pass to pageservers (#12324 ) ## Problem part of https://github.com/neondatabase/neon/issues/11813 ## Summary of changes It costs $$$ to directly retrieve the feature flags from the pageserver. Therefore, this patch adds new APIs to retrieve the spec from the storcon and updates it via pageserver. * Storcon retrieves the feature flag and send it to the pageservers. * If the feature flag gets updated outside of the normal refresh loop of the pageserver, pageserver won't fetch the flags on its own as long as the last updated time <= refresh_period. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-25 14:58:18 +00:00
Conrad Ludgate	27ca1e21be	[console_redirect_proxy]: fix channel binding (#12238 ) ## Problem While working more on TLS to compute, I realised that Console Redirect -> pg-sni-router -> compute would break if channel binding was set to prefer. This is because the channel binding data would differ between Console Redirect -> pg-sni-router vs pg-sni-router -> compute. I also noticed that I actually disabled channel binding in #12145, since `connect_raw` would think that the connection didn't support TLS. ## Summary of changes Make sure we specify the channel binding. Make sure that `connect_raw` can see if we have TLS support.	2025-06-25 13:41:30 +00:00
David Freifeld	1fb3639170	Properly change type of HashMapInit in .with_hasher()	2025-06-25 03:03:19 -07:00
David Freifeld	00dfaa2eb4	Add Criterion microbenchmarks for rehashing and insertions	2025-06-24 16:30:59 -07:00

1 2 3 4 5 ...

1212 Commits