rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-22 21:59:59 +00:00

Author	SHA1	Message	Date
Vlad Lazar	3e4cbaed67	storcon: validate intent state before applying optimization (#12593 ) ## Problem In the gap between picking an optimization and applying it, something might insert a change to the intent state that makes it incompatible. If the change is done via the `schedule()` method, we are covered by the increased sequence number, but otherwise we can panic if we violate the intent state invariants. ## Summary of Changes Validate the optimization right before applying it. Since we hold the service lock at that point, nothing else can sneak in. Closes LKB-65	2025-07-16 14:37:40 +00:00
Conrad Ludgate	c71aea0223	proxy: for json logging, only use callsite IDs if span name is duplicated (#12625 ) ## Problem We run multiple proxies, we get logs like ``` ... spans={"http_conn#22":{"conn_id": ... ... spans={"http_conn#24":{"conn_id": ... ``` these are the same span, and the difference is confusing. ## Summary of changes Introduce a counter per span name, rather than a global counter. If the counter is 0, no change to the span name is made. To follow up: see which span names are duplicated within the codebase in different callsites	2025-07-16 13:29:18 +00:00
Conrad Ludgate	87915df2fa	proxy: replace serde_json with our new json ser crate in the logging impl (#12602 ) This doesn't solve any particular problem, but it does simplify some of the code that was forced to round-trip through verbose Serialize impls.	2025-07-16 13:27:00 +00:00
Alexander Bayandin	caca08fe78	CI: rework and merge `lint-openapi-spec` and `validate-compute-manifest` jobs (#12575 ) ## Problem We have several linters that use Node.js, but they are currently set up differently, both locally and on CI. ## Summary of changes - Add Node.js to `build-tools` image - Move `compute/package.json` -> `build-tools/package.json` and add `redocly` to it `@redocly/cli` - Unify and merge into one job `lint-openapi-spec` and `validate-compute-manifest`	2025-07-16 11:08:27 +00:00
Alexander Bayandin	0c99f16c60	CI(run-python-test-set): don't collect code coverage for real (#12611 ) ## Problem neondatabase/neon#12601 did't compleatly disable writing `.profraw` files, but instead of `/tmp/coverage` it started to write into the current directory ## Summary of changes - Set `LLVM_PROFILE_FILE=/dev/null` to avoing writing `.profraw` at all	2025-07-16 08:26:52 +00:00
Alexey Kondratov	dd7fff655a	feat(compute): Introduce privileged_role_name parameter (#12539 ) ## Problem Currently `neon_superuser` is hardcoded in many places. It makes it harder to reuse the same code in different envs. ## Summary of changes Parametrize `neon_superuser` in `compute_ctl` via `--privileged-role-name` and in `neon` extensions via `neon.privileged_role_name`, so it's now possible to use different 'superuser' role names if needed. Everything still defaults to `neon_superuser`, so no control plane code changes are needed and I intentionally do not touch regression and migrations tests. Postgres PRs: - https://github.com/neondatabase/postgres/pull/674 - https://github.com/neondatabase/postgres/pull/675 - https://github.com/neondatabase/postgres/pull/676 - https://github.com/neondatabase/postgres/pull/677 Cloud PR: - https://github.com/neondatabase/cloud/pull/31138	2025-07-15 20:22:57 +00:00
quantumish	809633903d	Move `ShmemHandle` into separate module, tweak documentation (#12595 ) Initial PR for the hashmap behind the updated LFC implementation. This refactors `neon-shmem` so that the actual shared memory utilities are in a separate module within the crate. Beyond that, it slightly changes some of the docstrings so that they play nicer with `cargo doc`.	2025-07-15 17:40:40 +00:00
Arpad Müller	5c934efb29	Don't depend on the postgres_ffi just for one type (#12610 ) We don't want to depend on postgres_ffi in an API crate. If there is no such dependency, we can compile stuff like `storcon_cli` without needing a full working postgres build. Fixes regression of #12548 (before we could compile it).	2025-07-15 17:28:08 +00:00
Heikki Linnakangas	5c9c3b3317	Misc cosmetic cleanups (#12598 ) - Remove a few obsolete "allowed error messages" from tests. The pageserver doesn't emit those messages anymore. - Remove misplaced and outdated docstring comment from `test_tenants.py`. A docstring is supposed to be the first thing in a function, but we had added some code before it. And it was outdated, as we haven't supported running without safekeepers for a long time. - Fix misc typos in comments - Remove obsolete comment about backwards compatibility with safekeepers without `TIMELINE_STATUS` API. All safekeepers have it by now.	2025-07-15 14:36:28 +00:00
Alexander Bayandin	921a4f2009	CI(run-python-test-set): don't collect code coverage (#12601 ) ## Problem We don't use code coverage produced by `regress-tests` (neondatabase/neon#6798), so there's no need to collect it. Potentially, disabling it should reduce the load on disks and improve the stability of debug builds. ## Summary of changes - Disable code coverage collection for regression tests	2025-07-15 11:16:29 +00:00
dependabot[bot]	eb93c3e3c6	build(deps): bump aiohttp from 3.10.11 to 3.12.14 in the pip group across 1 directory (#12600 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-07-15 11:06:58 +00:00
Alexander Bayandin	7a7ab2a1d1	Move `build-tools.Dockerfile` -> `build-tools/Dockerfile` (#12590 ) ## Problem This is a prerequisite for neondatabase/neon#12575 to keep all things relevant to `build-tools` image in a single directory ## Summary of changes - Rename `build_tools/` to `build-tools/` - Move `build-tools.Dockerfile` to `build-tools/Dockerfile`	2025-07-15 10:45:49 +00:00
Krzysztof Szafrański	ff526a1051	[proxy] Recognize more cplane errors, use retry_delay_ms as TTL (#12543 ) ## Problem Not all cplane errors are properly recognized and cached/retried. ## Summary of changes Add more cplane error reasons. Also, use retry_delay_ms as cache TTL if present. Related to https://github.com/neondatabase/cloud/issues/19353	2025-07-15 07:42:48 +00:00
Heikki Linnakangas	9a2456bea5	Reduce noise from get_installed_extensions during e.g shut down (#12479 ) All Errors that can occur during get_installed_extensions() come from tokio-postgres functions, e.g. if the database is being shut down ("FATAL: terminating connection due to administrator command"). I'm seeing a lot of such errors in the logs with the regression tests, with very verbose stack traces. The compute_ctl stack trace is pretty useless for errors originating from the Postgres connection, the error message has all the information, so stop printing the stack trace. I changed the result type of the functions to return the originating tokio_postgres Error rather than anyhow::Error, so that if we introduce other error sources to the functions where the stack trace might be useful, we'll be forced to revisit this, probably by introducing a new Error type that separates postgres errors from other errors. But this will do for now.	2025-07-14 18:42:36 +00:00
Mikhail	a456e818af	LFC prewarm perftest: increase timeout for initialization job (#12594 ) Tests on https://github.com/neondatabase/neon/actions/runs/16268609007/job/45930162686 time out due to pgbench init job taking more than 30 minutes to run. Increase test timeout duration to 2 hours.	2025-07-14 17:37:47 +00:00
Matthias van de Meent	3e6fdb0aa6	Add and use [U]INT64_[HEX_]FORMAT for various [u]int64 needs (#12592 ) We didn't consistently apply these, and it wasn't consistently solved. With this patch we should have a more consistent approach to this, and have less issues porting changes to newer versions. This also removes some potentially buggy casts to `long` from `uint64` - they could've truncated the value in systems where `long` only has 32 bits.	2025-07-14 16:47:07 +00:00
Vlad Lazar	f8d3f86f58	pageserver: include records in get page debug handler (#12578 ) Include records and image in the debug get page handler. This endpoint does not update the metrics and does not support tracing. Note that this now returns individual bytes which need to be encoded properly for debugging. Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-14 16:37:28 +00:00
HaoyuHuang	f67a8a173e	A few SK changes (#12577 ) # TLDR This PR is a no-op. ## Problem When a SK loses a disk, it must recover all WALs from the very beginning. This may take days/weeks to catch up to the latest WALs for all timelines it owns. ## Summary of changes When SK starts up, if it finds that it has 0 timelines, - it will ask SC for the timeline it owns. - Then, pulls the timeline from its peer safekeepers to restore the WAL redundancy right away. After pulling timeline is complete, it will become active and accepts new WALs. The current impl is a prototype. We can optimize the impl further, e.g., parallel pull timelines. --------- Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-14 16:37:04 +00:00
Mikhail	2288efae66	Performance test for LFC prewarm (#12524 ) https://github.com/neondatabase/cloud/issues/19011 Measure relative performance for prewarmed and non-prewarmed endpoints. Add test that runs on every commit, and one performance test with a remote cluster.	2025-07-14 13:41:31 +00:00
a-masterov	4fedcbc0ac	Leverage the existing mechanism to retry 404 errors instead of implementing new code. (#12567 ) ## Problem In https://github.com/neondatabase/neon/pull/12513, the new code was implemented to retry 404 errors caused by the replication lag. However, this implemented the new logic, making the script more complicated, while we have an existing one in `neon_api.py`. ## Summary of changes The existing mechanism is used to retry 404 errors. --------- Co-authored-by: Alexey Masterov <alexey.masterov@databricks.com>	2025-07-14 13:25:25 +00:00
Erik Grinaker	eb830fa547	pageserver/client_grpc: use unbounded pools (#12585 ) ## Problem The communicator gRPC client currently uses bounded client/stream pools. This can artificially constrain clients, especially after we remove pipelining in #12584. [Benchmarks](https://github.com/neondatabase/neon/pull/12583) show that the cost of an idle server-side GetPage worker task is about 26 KB (2.5 GB for 100,000), so we can afford to scale out. In the worst case, we'll degenerate to the current libpq state with one stream per backend, but without the TCP connection overhead. In the common case we expect significantly lower stream counts due to stream sharing, driven e.g. by idle backends, LFC hits, read coalescing, sharding (backends typically only talk to one shard at a time), etc. Currently, Pageservers rarely serve more than 4000 backend connections, so we have at least 2 orders of magnitude of headroom. Touches #11735. Requires #12584. ## Summary of changes Remove the pool limits, and restructure the pools. We still keep a separate bulk pool for Getpage batches of >4 pages (>32 KB), with fewer streams per connection. This reduces TCP-level congestion and head-of-line blocking for non-bulk requests, and concentrates larger window sizes on a smaller set of streams/connections, presumably reducing memory usage. Apart from this, bulk requests don't have any latency penalty compared to other requests.	2025-07-14 13:22:38 +00:00
Erik Grinaker	a203f9829a	pageserver: add timeline_id span when freezing layers (#12572 ) ## Problem We don't log the timeline ID when rolling ephemeral layers during housekeeping. Resolves [LKB-179](https://databricks.atlassian.net/browse/LKB-179) ## Summary of changes Add a span with timeline ID when calling `maybe_freeze_ephemeral_layer` from the housekeeping loop. We don't instrument the function itself, since future callers may not have a span including the tenant_id already, but we don't want to duplicate the tenant_id for these spans.	2025-07-14 12:30:28 +00:00
Erik Grinaker	42ab34dc36	pageserver/client_grpc: don't pipeline GetPage requests (#12584 ) ## Problem The communicator gRPC client currently attempts to pipeline GetPage requests from multiple callers onto the same gRPC stream. This has a number of issues: * Head-of-line blocking: the request may block on e.g. layer download or LSN wait, delaying the next request. * Cancellation: we can't easily cancel in-progress requests (e.g. due to timeout or backend termination), so it may keep blocking the next request (even its own retry). * Complex stream scheduling: picking a stream becomes harder/slower, and additional Tokio tasks and synchronization is needed for stream management. Touches #11735. Requires #12579. ## Summary of changes This patch removes pipelining of gRPC stream requests, and instead prefers to scale out the number of streams to achieve the same throughput. Stream scheduling has been rewritten, and mostly follows the same pattern as the client pool with exclusive acquisition by a single caller. [Benchmarks](https://github.com/neondatabase/neon/pull/12583) show that the cost of an idle server-side GetPage worker task is about 26 KB (2.5 GB for 100,000), so we can afford to scale out. This has a number of advantages: * It (mostly) eliminates head-of-line blocking (except at the TCP level). * Cancellation becomes trivial, by closing the stream. * Stream scheduling becomes significantly simpler and cheaper. * Individual callers can still use client-side batching for pipelining.	2025-07-14 12:11:33 +00:00
Erik Grinaker	30b877074c	pagebench: add CPU profiling support (#12478 ) ## Problem The new communicator gRPC client has significantly worse Pagebench performance than a basic gRPC client. We need to find out why. ## Summary of changes Add a `pagebench --profile` flag which takes a client CPU profile of the benchmark and writes a flamegraph to `profile.svg`.	2025-07-14 11:44:53 +00:00
Erik Grinaker	f18cc808f0	pageserver/client_grpc: reap idle channels immediately (#12587 ) ## Problem It can take 3x the idle timeout to reap a channel. We have to wait for the idle timeout to trigger first for the stream, then the client, then the channel. Touches #11735. ## Summary of changes Reap empty channels immediately, and rely indirectly on the channel/stream timeouts. This can still lead to 2x the idle timeout for streams (first stream then client), but that's okay -- if the stream closes abruptly (e.g. due to timeout or error) we want to keep the client around in the pool for a while.	2025-07-14 10:47:26 +00:00
Erik Grinaker	d14d8271b8	pageserver/client_grpc: improve retry logic (#12579 ) ## Problem gRPC client retries currently include pool acquisition under the per-attempt timeout. If pool acquisition is slow (e.g. full pool), this will cause spurious timeout warnings, and the caller will lose its place in the pool queue. Touches #11735. ## Summary of changes Makes several improvements to retries and related logic: * Don't include pool acquisition time under request timeouts. * Move attempt timeouts out of `Retry` and into the closure. * Make `Retry` configurable, move constants into main module. * Don't backoff on the first retry, and reduce initial/max backoffs to 5ms and 5s respectively. * Add `with_retries` and `with_timeout` helpers. * Add slow logging for pool acquisition, and a `warn_slow` counterpart to `log_slow`. * Add debug logging for requests and responses at the client boundary.	2025-07-14 10:43:10 +00:00
Erik Grinaker	fecb707b19	pagebench: add `idle-streams` (#12583 ) ## Problem For the communicator scheduling policy, we need to understand the server-side cost of idle gRPC streams. Touches #11735. ## Summary of changes Add an `idle-streams` benchmark to `pagebench` which opens a large number of idle gRPC GetPage streams.	2025-07-14 09:41:58 +00:00
Folke Behrens	296c9190b2	proxy: Use EXPIRE command to refresh cancel entries (#12580 ) ## Problem When refreshing cancellation data we resend the entire value again just to reset the TTL, which causes unnecessary load in proxy, on network and possibly on redis side. ## Summary of changes * Switch from using SET with full value to using EXPIRE to reset TTL. * Add a tiny delay between retries to prevent busy loop. * Shorten CancelKeyOp variants: drop redundant suffix. * Retry SET when EXPIRE failed.	2025-07-13 22:49:23 +00:00
Folke Behrens	a5fe67f361	proxy: cancel maintain_cancel_key task immediately (#12586 ) ## Problem When a connection terminates its maintain_cancel_key task keeps running until the CANCEL_KEY_REFRESH sleep finishes and then it triggers another cancel key TTL refresh before exiting. ## Summary of changes * Check for cancellation while sleeping and interrupt sleep. * If cancelled, break the loop, don't send a refresh cmd.	2025-07-13 17:27:39 +00:00
Dmitrii Kovalkov	ee7bb1a667	storcon: validate new_sk_set before starting safekeeper migration (#12546 ) ## Problem We don't validate the validity of the `new_sk_set` before starting the migration. It is validated later, so the migration to an invalid safekeeper set will fail anyway. But at this point we might already commited an invalid `new_sk_set` to the database and there is no `abort` command yet (I ran into this issue in neon_local and ruined the timeline :) - Part of https://github.com/neondatabase/neon/issues/11669 ## Summary of changes - Add safekeeper count and safekeeper duplication checks before starting the migration - Test that we validate the `new_sk_set` before starting the migration - Add `force` option to the `TimelineSafekeeperMigrateRequest` to disable not-mandatory checks	2025-07-12 04:57:04 +00:00
Conrad Ludgate	9bba31bf68	proxy: encode json as we parse rows (#11992 ) Serialize query row responses directly into JSON. Some of this code should be using the `json::value_as_object/list` macros, but I've avoided it for now to minimize the size of the diff.	2025-07-11 19:39:08 +00:00
Folke Behrens	380d167b7c	proxy: For cancellation data replace HSET+EXPIRE/HGET with SET..EX/GET (#12553 ) ## Problem To store cancellation data we send two commands to redis because the redis server version doesn't support HSET with EX. Also, HSET is not really needed. ## Summary of changes * Replace the HSET + EXPIRE command pair with one SET .. EX command. * Replace HGET with GET. * Leave a workaround for old keys set with HSET. * Replace some anyhow errors with specific errors to surface the WRONGTYPE error from redis.	2025-07-11 19:35:42 +00:00
HaoyuHuang	cb991fba42	A few more PS changes (#12552 ) # TLDR Problem-I is a bug fix. The rest are no-ops. ## Problem I Page server checks image layer creation based on the elapsed time but this check depends on the current logical size, which is only computed on shard 0. Thus, for non-0 shards, the check will be ineffective and image creation will never be done for idle tenants. ## Summary of changes I This PR fixes the problem by simply removing the dependency on current logical size. ## Summary of changes II This PR adds a timeout when calling page server to split shard to make sure SC does not wait for the API call forever. Currently the PR doesn't adds any retry logic because it's not clear whether page server shard split can be safely retried if the existing operation is still ongoing or left the storage in a bad state. Thus it's better to abort the whole operation and restart. ## Problem III `test_remote_failures` requires PS to be compiled in the testing mode. For PS in dev/staging, they are compiled without this mode. ## Summary of changes III Remove the restriction and also increase the number of total failures allowed. ## Summary of changes IV remove test on PS getpage http route. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Yecheng Yang <carlton.yang@databricks.com> Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-07-11 19:27:55 +00:00
Matthias van de Meent	4566b12a22	NEON: Finish Zenith->Neon rename (#12566 ) Even though we're now part of Databricks, let's at least make this part consistent. ## Summary of changes - PG14: https://github.com/neondatabase/postgres/pull/669 - PG15: https://github.com/neondatabase/postgres/pull/670 - PG16: https://github.com/neondatabase/postgres/pull/671 - PG17: https://github.com/neondatabase/postgres/pull/672 --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-07-11 18:56:39 +00:00
Alex Chi Z.	63ca084696	fix(pageserver): downgrade wal apply error during gc-compaction (#12518 ) ## Problem close LKB-162 close https://github.com/neondatabase/cloud/issues/30665, related to https://github.com/neondatabase/cloud/issues/29434 We see a lot of errors like: ``` 2025-05-22T23:06:14.928959Z ERROR compaction_loop{tenant_id=? shard_id=0304}:run:gc_compact_timeline{timeline_id=?}: error applying 4 WAL records 35/DC0DF0B8..3B/E43188C0 (8119 bytes) to key 000000067F0000400500006027000000B9D0, from base image with LSN 0/0 to reconstruct page image at LSN 61/150B9B20 n_attempts=0: apply_wal_records Caused by: 0: read walredo stdout 1: early eof ``` which is an acceptable form of error and we should downgrade it to warning. ## Summary of changes walredo error during gc-compaction is expected when the data below the gc horizon does not contain a full key history. This is possible in some rare cases of gc that is only able to remove data in the middle of the history but not all earlier history when a full keyspace gets deleted. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-11 18:37:55 +00:00
Arpad Müller	379259bdd7	storcon: don't error log on timeline delete if tenant migration is in progress (#12523 ) Fixes [LKB-61](https://databricks.atlassian.net/browse/LKB-61): `test_timeline_archival_chaos` being flaky with storcon error `Requested tenant is missing`. When a tenant migration is ongoing, and the attach request has been sent to the new location, but the attach hasn't finished yet, it is possible for the pageserver to return a 412 precondition failed HTTP error on timeline deletion, because it is being sent to the new location already. That one we would previously log via sth like: ``` ERROR request{method=DELETE path=/v1/tenant/1f544a11c90d1afd7af9b26e48985a4e/timeline/32818fb3ebf07cb7f06805429d7dee38 request_id=c493c04b-7f33-46d2-8a65-aac8a5516055}: Error processing HTTP request: InternalServerError(Error deleting timeline 32 818fb3ebf07cb7f06805429d7dee38 on 1f544a11c90d1afd7af9b26e48985a4e on node 2 (localhost): pageserver API: Precondition failed: Requested tenant is missing ``` This patch changes that and makes us return a more reasonable resource unavailable error. Not sure how scalable this is with tenants with a large number of shards, but that's a different discussion (we'd probably need a limited amount of per-storcon retries). example [link](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-12398/15981821532/index.html#/testresult/e7785dfb1238d92f).	2025-07-11 17:07:14 +00:00
Heikki Linnakangas	3300207523	Update working set size estimate without lock (#12570 ) Update the WSS estimate before acquring the lock, so that we don't need to hold the lock for so long. That seems safe to me, see added comment. I was planning to do this with the new rust-based communicator implementation anyway, but it might help a little with the current C implementation too. And more importantly, having this as a separate PR gives us a chance to review this aspect independently.	2025-07-11 16:05:22 +00:00
Tristan Partin	a0a7733b5a	Use relative paths in submodule URL references (#12559 ) This is a nifty trick from the hadron repo that seems to help with SSH key dance. Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-11 15:57:50 +00:00
Conrad Ludgate	f4245403b3	[proxy] allow testing query cancellation locally (#12568 ) ## Problem Canceelation requires redis, redis required control-plane. ## Summary of changes Make redis for cancellation not require control plane. Add instructions for setting up redis locally.	2025-07-11 15:13:36 +00:00
Heikki Linnakangas	a8db7ebffb	Minor refactor of the SQL functions to get working set size estimate (#12550 ) Split the functions into two: one internal function to calculate the estimate, and another (two functions) to expose it as SQL functions. This is in preparation of adding new communicator implementation. With that, the SQL functions will dispatch the call to the old or new implementation depending on which is being used.	2025-07-11 14:17:44 +00:00
Vlad Lazar	154f6dc59c	pageserver: log only on final shard resolution failure (#12565 ) This log is too noisy. Instead of warning on every retry, let's log only on the final failure.	2025-07-11 13:25:25 +00:00
Vlad Lazar	15f633922a	pageserver: use image consistent LSN for force image layer creation (#12547 ) This is a no-op for the neon deployment * Introduce the concept image consistent lsn: of the largest LSN below which all pages have been redone successfully * Use the image consistent LSN for forced image layer creations * Optionally expose the image consistent LSN via the timeline describe HTTP endpoint * Add a sharded timeline describe endpoint to storcon --------- Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-11 11:39:51 +00:00
Dmitrii Kovalkov	c34d36d8a2	storcon_cli: timeline-safekeeper-migrate and timeline-locate subcommands (#12548 ) ## Problem We have a `safekeeper_migrate` handler, but no subcommand in `storcon_cli`. Same for `/:timeline_id/locate` for identifying current set of safekeepers. - Closes: https://github.com/neondatabase/neon/issues/12395 ## Summary of changes - Add `timeline-safekeeper-migrate` and `timeline-locate` subcommands to `storcon_cli`	2025-07-11 10:49:37 +00:00
Tristan Partin	cec0543b51	Add background to compute migration 0002-alter_roles.sql (#11708 ) On December 8th, 2023, an engineering escalation (INC-110) was opened after it was found that BYPASSRLS was being applied to all roles. PR that introduced the issue: https://github.com/neondatabase/neon/pull/5657 Subsequent commit on main: `ad99fa5f03` NOBYPASSRLS and INHERIT are the defaults for a Postgres role, but because it isn't easy to know if a Postgres cluster is affected by the issue, we need to keep the migration around for a long time, if not indefinitely, so any cluster can be fixed. Branching is the gift that keeps on giving... Signed-off-by: Tristan Partin <tristan.partin@databricks.com> Signed-off-by: Tristan Partin <tristan.partin@databricks.com>	2025-07-10 22:58:54 +00:00
Erik Grinaker	8aa9540a05	pageserver/page_api: include block number and rel in gRPC `GetPageResponse` (#12542 ) ## Problem With gRPC `GetPageRequest` batches, we'll have non-trivial fragmentation/reassembly logic in several places of the stack (concurrent reads, shard splits, LFC hits, etc). If we included the block numbers with the pages in `GetPageResponse` we could have better verification and observability that the final responses are correct. Touches #11735. Requires #12480. ## Summary of changes Add a `Page` struct with`block_number` for `GetPageResponse`, along with the `RelTag` for completeness, and verify them in the rich gRPC client.	2025-07-10 22:35:14 +00:00
Alex Chi Z.	b91f821e8b	fix(libpagestore): update the default stripe size (#12557 ) ## Problem Part of LKB-379 The pageserver connstrings are updated in the postmaster and then there's a hook to propagate it to the shared memory of all backends. However, the shard stripe doesn't. This would cause problems during shard splits: * the compute has active reads/writes * shard split happens and the cplane applies the new config (pageserver connstring + stripe size) * pageserver connstring will be updated immediately once the postmaster receives the SIGHUP, and it will be copied over the the shared memory of all other backends. * stripe size is a normal GUC and we don't have special handling around that, so if any active backend has ongoing txns the value won't be applied. * now it's possible for backends to issue requests based on the wrong stripe size; what's worse, if a request gets cached in the prefetch buffer, it will get stuck forever. ## Summary of changes To make sure it aligns with the current default in storcon. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-10 21:49:52 +00:00
Erik Grinaker	44ea17b7b2	pageserver/page_api: add attempt to GetPage request ID (#12536 ) ## Problem `GetPageRequest::request_id` is supposed to be a unique ID for a request. It's not, because we may retry the request using the same ID. This causes assertion failures and confusion. Touches #11735. Requires #12480. ## Summary of changes Extend the request ID with a retry attempt, and handle it in the gRPC client and server.	2025-07-10 20:39:42 +00:00
Tristan Partin	1b7339b53e	PG: add max_wal_rate (#12470 ) ## Problem One PG tenant may write too fast and overwhelm the PS. The other tenants sharing the same PSs will get very little bandwidth. We had one experiment that two tenants sharing the same PSs. One tenant runs a large ingestion that delivers hundreds of MB/s while the other only get < 10 MB/s. ## Summary of changes Rate limit how fast PG can generate WALs. The default is -1. We may scale the default value with the CPU count. Need to run some experiments to verify. ## How is this tested? CI. PGBench. No limit first. Then set to 1 MB/s and you can see the tps drop. Then reverted the change and tps increased again. pgbench -i -s 10 -p 55432 -h 127.0.0.1 -U cloud_admin -d postgres pgbench postgres -c 10 -j 10 -T 6000000 -P 1 -b tpcb-like -h 127.0.0.1 -U cloud_admin -p 55432 progress: 33.0 s, 986.0 tps, lat 10.142 ms stddev 3.856 progress: 34.0 s, 973.0 tps, lat 10.299 ms stddev 3.857 progress: 35.0 s, 1004.0 tps, lat 9.939 ms stddev 3.604 progress: 36.0 s, 984.0 tps, lat 10.183 ms stddev 3.713 progress: 37.0 s, 998.0 tps, lat 10.004 ms stddev 3.668 progress: 38.0 s, 648.9 tps, lat 12.947 ms stddev 24.970 progress: 39.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 40.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 41.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 42.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 43.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 44.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 45.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 46.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 47.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 48.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 49.0 s, 347.3 tps, lat 321.560 ms stddev 1805.633 progress: 50.0 s, 346.8 tps, lat 9.898 ms stddev 3.809 progress: 51.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 52.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 53.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 54.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 55.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 56.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 57.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 58.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 59.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 60.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 61.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 62.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 63.0 s, 494.5 tps, lat 276.504 ms stddev 1853.689 progress: 64.0 s, 488.0 tps, lat 20.530 ms stddev 71.981 progress: 65.0 s, 407.8 tps, lat 9.502 ms stddev 3.329 progress: 66.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 67.0 s, 0.0 tps, lat 0.000 ms stddev 0.000 progress: 68.0 s, 504.5 tps, lat 71.627 ms stddev 397.733 progress: 69.0 s, 371.0 tps, lat 24.898 ms stddev 29.007 progress: 70.0 s, 541.0 tps, lat 19.684 ms stddev 24.094 progress: 71.0 s, 342.0 tps, lat 29.542 ms stddev 54.935 Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com>	2025-07-10 20:34:11 +00:00
Mikhail	3593fe195a	split TerminationPending into two values, keeping ComputeStatus stateless (#12506 ) After https://github.com/neondatabase/neon/pull/12240 we observed issues in our go code as `ComputeStatus` is not stateless, thus doesn't deserialize as string. ``` could not check compute activity: json: cannot unmarshal object into Go struct field ComputeState.status of type computeclient.ComputeStatus ``` - Fix this by splitting this status into two. - Update compute OpenApi spec to reflect changes to `/terminate` in previous PR	2025-07-10 19:28:10 +00:00
Mikhail	c5aaf1ae21	Qualify call to neon extension in compute_ctl's prewarming (#12554 ) https://github.com/neondatabase/cloud/issues/19011 Calls without `neon.` failed on staging. Also fix local tests to work with qualified calls	2025-07-10 18:37:54 +00:00

... 2 3 4 5 6 ...

8464 Commits