rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-26 01:20:38 +00:00

Author	SHA1	Message	Date
Christian Schwarz	e2facbde4e	Merge commit 'cec0543b5' into problame/standby-horizon-leases	2025-08-06 17:47:10 +02:00
Christian Schwarz	b8c8168378	Merge commit 'be5bbaeca' into problame/standby-horizon-leases	2025-08-06 17:46:44 +02:00
Christian Schwarz	28a2cd05d5	Merge commit '5ec82105c' into problame/standby-horizon-leases	2025-08-06 17:46:37 +02:00
Christian Schwarz	1635390a96	fix all clippy complaints in this branch	2025-08-06 17:39:17 +02:00
Christian Schwarz	1877b70a35	Merge commit 'e7d18bc18' into problame/standby-horizon-leases	2025-08-06 17:19:37 +02:00
Christian Schwarz	fb7a027211	Merge commit '4ee0da0a2' into problame/standby-horizon-leases	2025-08-06 17:17:45 +02:00
Christian Schwarz	47146fe1d6	Merge commit '7049003cf' into problame/standby-horizon-leases	2025-08-06 17:17:11 +02:00
Christian Schwarz	577eee16f9	https://github.com/neondatabase/neon/pull/12676#discussion_r2220512343 ; concern about backward compat of TimelineInfo	2025-08-05 23:07:26 +02:00
Christian Schwarz	2ee0f4271c	fix(page_service): lsn lease API puts tenant_shard_id in tenant_id tracing field The LSN lease api actually accepts a tenant_shard_id, not a tenant_id. But we put the Display of the tenant_shard_id into the tenant_id field. This PR fixes it. Refs - fixes https://databricks.atlassian.net/browse/LKB-2930	2025-08-05 22:48:27 +02:00
Christian Schwarz	8a9f1dd5e7	use tokio::time::Instant internally, chrono::DateTime<Utc> externally; commuicate expiration through rfc3339 format; chrono::DateTime has good Debug fmt so this also serves observability; finish implementing release valve mechanism	2025-08-05 22:47:53 +02:00
Christian Schwarz	9f01840c18	use standby_horizon leases feature in the test, demonstrating that it passes now	2025-08-05 22:47:28 +02:00
Christian Schwarz	44466cebdb	WIP better observability for return values (SystemTime Debug is useless)	2025-08-05 22:46:54 +02:00
Christian Schwarz	b865e85de3	previous commit broke the tests because of the cfg business, see this commit's TODO	2025-08-05 22:46:24 +02:00
Christian Schwarz	73336962a8	finalize 3-stepped feature-gating (legacy,all,leases) + more tests + observability + fixes	2025-08-05 19:24:06 +02:00
Christian Schwarz	fc7267a760	feature-gate compute side code	2025-08-05 19:22:58 +02:00
Christian Schwarz	3365c8c648	enforce standby_horizon leases are always above applied_gc_cutoff (check against cutoff on upsert + block gc for lease length to allow renewals after attach)	2025-07-26 16:38:44 +02:00
Christian Schwarz	bc09df8823	add todo about init deadline	2025-07-26 16:23:59 +02:00
Christian Schwarz	e1eb98c0e9	add basic test & fix embarrasing bug in cull (needs comment out todo!())	2025-07-26 16:23:59 +02:00
Christian Schwarz	1e61ac6af2	cargo fmt (unrelated to prev commit)	2025-07-26 16:23:59 +02:00
Christian Schwarz	a948054db3	naming orhtodoxy: always refere to leases as LSN leases	2025-07-26 16:23:59 +02:00
Christian Schwarz	2ee24900ca	have claude generate plumbing for standby_horizon_lease_length	2025-07-25 13:16:20 +02:00
Christian Schwarz	23d1029afd	explain why there's no need to check standby_horizon lease deadline for getpage requests	2025-07-25 09:30:27 +00:00
Christian Schwarz	b47d3900b9	observability and debugging facilities	2025-07-20 23:13:02 +00:00
Christian Schwarz	f4b38d5975	expand comment on why we normalize_lsn	2025-07-20 22:38:07 +00:00
Christian Schwarz	2a89f72389	rudimentary leases impl, lacks initial lease deadline stuff	2025-07-20 22:17:03 +00:00
Christian Schwarz	2b5bb850f2	WIP	2025-07-20 20:46:20 +00:00
Christian Schwarz	4dc3acf3ed	WIP: standby_horizon leases	2025-07-20 19:21:47 +00:00
Tristan Partin	cec0543b51	Add background to compute migration 0002-alter_roles.sql (#11708 ) On December 8th, 2023, an engineering escalation (INC-110) was opened after it was found that BYPASSRLS was being applied to all roles. PR that introduced the issue: https://github.com/neondatabase/neon/pull/5657 Subsequent commit on main: `ad99fa5f03` NOBYPASSRLS and INHERIT are the defaults for a Postgres role, but because it isn't easy to know if a Postgres cluster is affected by the issue, we need to keep the migration around for a long time, if not indefinitely, so any cluster can be fixed. Branching is the gift that keeps on giving... Signed-off-by: Tristan Partin <tristan.partin@databricks.com> Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-10 22:58:54 +00:00
Erik Grinaker	8aa9540a05	pageserver/page_api: include block number and rel in gRPC `GetPageResponse` (#12542 ) ## Problem With gRPC `GetPageRequest` batches, we'll have non-trivial fragmentation/reassembly logic in several places of the stack (concurrent reads, shard splits, LFC hits, etc). If we included the block numbers with the pages in `GetPageResponse` we could have better verification and observability that the final responses are correct. Touches #11735. Requires #12480. ## Summary of changes Add a `Page` struct with`block_number` for `GetPageResponse`, along with the `RelTag` for completeness, and verify them in the rich gRPC client.	2025-07-10 22:35:14 +00:00
Alex Chi Z.	b91f821e8b	fix(libpagestore): update the default stripe size (#12557 ) ## Problem Part of LKB-379 The pageserver connstrings are updated in the postmaster and then there's a hook to propagate it to the shared memory of all backends. However, the shard stripe doesn't. This would cause problems during shard splits: * the compute has active reads/writes * shard split happens and the cplane applies the new config (pageserver connstring + stripe size) * pageserver connstring will be updated immediately once the postmaster receives the SIGHUP, and it will be copied over the the shared memory of all other backends. * stripe size is a normal GUC and we don't have special handling around that, so if any active backend has ongoing txns the value won't be applied. * now it's possible for backends to issue requests based on the wrong stripe size; what's worse, if a request gets cached in the prefetch buffer, it will get stuck forever. ## Summary of changes To make sure it aligns with the current default in storcon. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-10 21:49:52 +00:00
Erik Grinaker	44ea17b7b2	pageserver/page_api: add attempt to GetPage request ID (#12536 ) ## Problem `GetPageRequest::request_id` is supposed to be a unique ID for a request. It's not, because we may retry the request using the same ID. This causes assertion failures and confusion. Touches #11735. Requires #12480. ## Summary of changes Extend the request ID with a retry attempt, and handle it in the gRPC client and server.	2025-07-10 20:39:42 +00:00
Tristan Partin	1b7339b53e	PG: add max_wal_rate (#12470 ) ## Problem One PG tenant may write too fast and overwhelm the PS. The other tenants sharing the same PSs will get very little bandwidth. We had one experiment that two tenants sharing the same PSs. One tenant runs a large ingestion that delivers hundreds of MB/s while the other only get < 10 MB/s. ## Summary of changes Rate limit how fast PG can generate WALs. The default is -1. We may scale the default value with the CPU count. Need to run some experiments to verify. ## How is this tested? CI. PGBench. No limit first. Then set to 1 MB/s and you can see the tps drop. Then reverted the change and tps increased again. pgbench -i -s 10 -p 55432 -h 127.0.0.1 -U cloud_admin -d postgres pgbench postgres -c 10 -j 10 -T 6000000 -P 1 -b tpcb-like -h 127.0.0.1 -U cloud_admin -p 55432 progress: 33.0 s, 986.0 tps, lat 10.142 ms stddev 3.856 progress: 34.0 s, 973.0 tps, lat 10.299 ms stddev 3.857 progress: 35.0 s, 1004.0 tps, lat 9.939 ms stddev 3.604 progress: 36.0 s, 984.0 tps, lat 10.183 ms stddev 3.713 progress: 37.0 s, 998.0 tps, lat 10.004 ms stddev 3.668 progress: 38.0 s, 648.9 tps, lat 12.947 ms stddev 24.970 progress: 39.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 40.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 41.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 42.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 43.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 44.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 45.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 46.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 47.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 48.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 49.0 s, 347.3 tps, lat 321.560 ms stddev 1805.633 progress: 50.0 s, 346.8 tps, lat 9.898 ms stddev 3.809 progress: 51.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 52.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 53.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 54.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 55.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 56.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 57.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 58.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 59.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 60.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 61.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 62.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 63.0 s, 494.5 tps, lat 276.504 ms stddev 1853.689 progress: 64.0 s, 488.0 tps, lat 20.530 ms stddev 71.981 progress: 65.0 s, 407.8 tps, lat 9.502 ms stddev 3.329 progress: 66.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 67.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 68.0 s, 504.5 tps, lat 71.627 ms stddev 397.733 progress: 69.0 s, 371.0 tps, lat 24.898 ms stddev 29.007 progress: 70.0 s, 541.0 tps, lat 19.684 ms stddev 24.094 progress: 71.0 s, 342.0 tps, lat 29.542 ms stddev 54.935 Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-10 20:34:11 +00:00
Mikhail	3593fe195a	split TerminationPending into two values, keeping ComputeStatus stateless (#12506 ) After https://github.com/neondatabase/neon/pull/12240 we observed issues in our go code as `ComputeStatus` is not stateless, thus doesn't deserialize as string. ``` could not check compute activity: json: cannot unmarshal object into Go struct field ComputeState.status of type computeclient.ComputeStatus ``` - Fix this by splitting this status into two. - Update compute OpenApi spec to reflect changes to `/terminate` in previous PR	2025-07-10 19:28:10 +00:00
Mikhail	c5aaf1ae21	Qualify call to neon extension in compute_ctl's prewarming (#12554 ) https://github.com/neondatabase/cloud/issues/19011 Calls without `neon.` failed on staging. Also fix local tests to work with qualified calls	2025-07-10 18:37:54 +00:00
Alex Chi Z.	13b5e7b26f	fix(compute_ctl): reload config before applying spec (#12551 ) ## Problem If we have catalog update AND a pageserver migration batched in a single spec, we will not be able to apply the spec (running the SQL) because the compute is not attached to the right pageserver and we are not able to read anything if we don't pick up the latest pageserver connstring. This is not a case for now because cplane always schedules shard split / pageserver migrations with `skip_pg_catalog_updates` (I suppose). Context: https://databricks.slack.com/archives/C09254R641L/p1752163559259399?thread_ts=1752160163.141149&cid=C09254R641L With this fix, backpressure will likely not be able to affect reconfigurations. ## Summary of changes Do `pg_reload_conf` before we apply specs in SQL. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-10 18:02:54 +00:00
Erik Grinaker	dcdfe80bf0	pagebench: add support for rich gRPC client (#12477 ) ## Problem We need to benchmark the rich gRPC client `client_grpc::PageserverClient` against the basic, no-frills `page_api::Client` to determine how much overhead it adds. Touches #11735. Requires #12476. ## Summary of changes Add a `pagebench --rich-client` parameter to use `client_grpc::PageserverClient`. Also adds a compression parameter to the client.	2025-07-10 17:30:09 +00:00
Alexander Bayandin	8630d37f5e	test_runner: manually reuse ports in PortDistributor (#12423 ) ## Problem Sometimes we run out of free ports in `PortDistributor`. This affects particularly failed tests that we rerun automatically up to 3 times (which makes it use up to 3x more ports) ## Summary of changes - Cycle over the range of ports to reuse freed ports from previous tests Ref: LKB-62	2025-07-10 15:53:38 +00:00
Erik Grinaker	2fc77c836b	pageserver/client_grpc: add shard map updates (#12480 ) ## Problem The communicator gRPC client must support changing the shard map on splits. Touches #11735. Requires #12476. ## Summary of changes * Wrap the shard set in a `ArcSwap` to allow swapping it out. * Add a new `ShardSpec` parameter struct to pass validated shard info to the client. * Add `update_shards()` to change the shard set. In-flight requests are allowed to complete using the old shards. * Restructure `get_page` to use a stable view of the shard map, and retry errors at the top (pre-split) level to pick up shard map changes. * Also marks `tonic::Status::Internal` as non-retryable, so that we can use it for client-side invariant checks without continually retrying these.	2025-07-10 15:46:39 +00:00
HaoyuHuang	2c6b327be6	A few PS changes (#12540 ) # TLDR All changes are no-op except some metrics. ## Summary of changes I ### Pageserver Added a new global counter metric `pageserver_pagestream_handler_results_total` that categorizes pagestream request results according to their outcomes: 1. Success 2. Internal errors 3. Other errors Internal errors include: 1. Page reconstruction error: This probably indicates a pageserver bug/corruption 2. LSN timeout error: Could indicate overload or bugs with PS's ability to reach other components 3. Misrouted request error: Indicates bugs in the Storage Controller/HCC Other errors include transient errors that are expected during normal operation or errors indicating bugs with other parts of the system (e.g., malformed requests, errors due to cancelled operations during PS shutdown, etc.) ## Summary of changes II This PR adds a pageserver endpoint and its counterpart in storage controller to list visible size of all tenant shards. This will be a prerequisite of the tenant rebalance command. ## Problem III We need a way to download WAL segments/layerfiles from S3 and replay WAL records. We cannot access production S3 from our laptops directly, and we also can't transfer any user data out of production systems for GDPR compliance, so we need solutions. ## Summary of changes III This PR adds a couple of tools to support the debugging workflow in production: 1. A new `pagectl download-remote-object` command that can be used to download remote storage objects assuming the correct access is set up. ## Summary of changes IV This PR adds a command to list all visible delta and image layers from index_part. This is useful to debug compaction issues as index_part often contain a lot of covered layers due to PITR. --------- Co-authored-by: William Huang <william.huang@databricks.com> Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-07-10 14:39:38 +00:00
Alex Chi Z.	be5bbaecad	fix(storcon): correctly handle 404 error in lsn lease (#12537 ) ## Problem close LKB-253 ## Summary of changes 404 for timeline requests could happen when the tenant is intended to be on a pageserver but not attached yet. This patch adds handling for the lease request. In the future, we should extend this handling to more operations. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-10 14:28:58 +00:00
Arpad Müller	d33b3c7457	Print viability via custom printing impl (#12544 ) As per https://github.com/neondatabase/neon/pull/12485#issuecomment-3056525882 , we don't want to print the viability error via a debug impl as it prints the backtrace. SafekeeperInfo doesn't have a display impl, so fall back to `Debug` for the `Ok` case. It gives single line output so it's okay to use `Debug` for it. Follow up of https://github.com/neondatabase/neon/pull/12485	2025-07-10 14:03:20 +00:00
Vlad Lazar	ffeede085e	libs: move metric collection for pageserver and safekeeper in a background task (#12525 ) ## Problem Safekeeper and pageserver metrics collection might time out. We've seen this in both hadron and neon. ## Summary of changes This PR moves metrics collection in PS/SK to the background so that we will always get some metrics, despite there may be some delays. Will leave it to the future work to reduce metrics collection time. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-10 11:58:22 +00:00
Mikhail	bdca5b500b	Fix test_lfc_prewarm: reduce number of prewarms, sleep before LFC offloading (#12515 ) Fixes: - Sleep before LFC offloading in `test_lfc_prewarm[autoprewarm]` to ensure offloaded LFC is the one exported after all writes finish - Reduce number of prewarms and increase timeout in `test_lfc_prewarm_under_workload` as debug builds were failing due to timeout. Additional changes: - Remove `check_pinned_entries`: https://github.com/neondatabase/neon/pull/12447#discussion_r2185946210 - Fix LFC error metrics description: https://github.com/neondatabase/neon/pull/12486#discussion_r2190763107	2025-07-10 11:11:53 +00:00
Erik Grinaker	f4b03ddd7b	pageserver/client_grpc: reap idle pool resources (#12476 ) ## Problem The gRPC client pools don't reap idle resources. Touches #11735. Requires #12475. ## Summary of changes Reap idle pool resources (channels/clients/streams) after 3 minutes of inactivity. Also restructure the `StreamPool` to use a mutex rather than atomics for synchronization, for simplicity. This will be optimized later.	2025-07-10 10:18:37 +00:00
Vlad Lazar	08b19f001c	pageserver: optionally force image layer creation on timeout (#12529 ) This PR introduces a `image_creation_timeout` to page servers so that we can force the image creation after a certain period. This is set to 1 day on dev/staging for now, and will rollout to production 1/2 weeks later. Majority of the PR are boilerplate code to add the new knob. Specific changes of the PR are: 1. During L0 compaction, check if we should force a compaction if min(LSN) of all delta layers < force_image_creation LSN. 2. During image creation, check if we should force a compaction if the image's LSN < force_image_creation LSN and there are newer deltas with overlapping key ranges. 3. Also tweaked the check image creation interval to make sure we honor image_creation_timeout. Vlad's note: This should be a no-op. I added an extra PS config for the large timeline threshold to enable this. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-10 10:07:21 +00:00
Dimitri Fontaine	1a45b2ec90	Review security model for executing Event Trigger code. (#12463 ) When a function is owned by a superuser (bootstrap user or otherwise), we consider it safe to run it. Only a superuser could have installed it, typically from CREATE EXTENSION script: we trust the code to execute. ## Problem This is intended to solve running pg_graphql Event Triggers graphql_watch_ddl and graphql_watch_drop which are executing the secdef function graphql.increment_schema_version(). ## Summary of changes Allow executing Event Trigger function owned by a superuser and with SECURITY DEFINER properties. The Event Trigger code runs with superuser privileges, and we consider that it's fine. --------- Co-authored-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-10 08:06:33 +00:00
Tristan Partin	13e38a58a1	Grant pg_signal_backend to neon_superuser (#12533 ) Allow neon_superuser to cancel backends from non-neon_superusers, excluding Postgres superusers. Signed-off-by: Tristan Partin <tristan.partin@databricks.com> Co-authored-by: Vikas Jain <vikas.jain@databricks.com>	2025-07-09 21:35:39 +00:00
Christian Schwarz	2edd59aefb	impr(compaction): unify checking of `CompactionError` for cancellation reason (#12392 ) There are a couple of places that call `CompactionError::is_cancel` but don't check the `::Other` variant via downcasting for root cause being cancellation. The only place that does it is `log_compaction_error`. It's sad we have to do it, but, until we get around cleaning up all the culprits, a step forward is to unify the behavior so that all places that inspect a `CompactionError` for cancellation reason follow the same behavior. Thus, this PR ... - moves the downcasting checks against the `::Other` variant from `log_compaction_error` into `is_cancel()` and - enforces via type system that `.is_cancel()` is used to check whether a CompactionError is due to cancellation (matching on the `CompactionError::ShuttingDown` will cause a compile-time error). I don't think there's a _serious_ case right now where matching instead of using `is_cancel` causes problems. The worst case I could find is the circuit breaker and `compaction_failed`, which don't really matter if we're shutting down the timeline anyway. But it's unaesthetic and might cause log/alert noise down the line, so, this PR fixes that at least. Refs - https://databricks.atlassian.net/browse/LKB-182 - slack conversation about this PR: https://databricks.slack.com/archives/C09254R641L/p1751284317955159	2025-07-09 21:15:44 +00:00
Alex Chi Z.	0b639ba608	fix(storcon): correctly pass through lease error code (#12519 ) ## Problem close LKB-199 ## Summary of changes We always return the error as 500 to the cplane if a LSN lease request fails. This cause issues for the cplane as they don't retry on 500. This patch correctly passes through the error and assign the error code so that cplane can know if it is a retryable error. (TODO: look at the cplane code and learn the retry logic). Note that this patch does not resolve LKB-253 -- we need to handle not found error separately in the lsn lease path, like wait until the tenant gets attached, or return 503 so that cplane can retry. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-09 20:22:55 +00:00
Tristan Partin	28f604d628	Make pg_monitor neon_superuser test more robust (#12532 ) Make sure to check for NULL just in case. Signed-off-by: Tristan Partin <tristan.partin@databricks.com> Co-authored-by: Vikas Jain <vikas.jain@databricks.com>	2025-07-09 18:45:50 +00:00

1 2 3 4 5 ...

8298 Commits