rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 05:22:56 +00:00

Author	SHA1	Message	Date
Alex Chi Z.	305fe61ac1	fix(pageserver): also print open layer size in backpressure (#12440 ) ## Problem Better investigate memory usage during backpressure ## Summary of changes Print open layer size if backpressure is activated Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-03 16:37:11 +00:00
Vlad Lazar	f95fdf5b44	pageserver: fix duplicate tombstones in ancestor detach (#12460 ) ## Problem Ancestor detach from a previously detached parent when there were no writes panics since it tries to upload the tombstone layer twice. ## Summary of Changes If we're gonna copy the tombstone from the ancestor, don't bother creating it. Fixes https://github.com/neondatabase/neon/issues/12458	2025-07-03 16:35:46 +00:00
HaoyuHuang	4db934407a	SK changes #1 (#12448 ) ## TLDR This PR is a no-op. The changes are disabled by default. ## Problem I. Currently we don't have a way to detect disk I/O failures from WAL operations. II. We observe that the offloader fails to upload a segment due to race conditions on XLOG SWITCH and PG start streaming WALs. wal_backup task continously failing to upload a full segment while the segment remains partial on the disk. The consequence is that commit_lsn for all SKs move forward but backup_lsn stays the same. Then, all SKs run out of disk space. III. We have discovered SK bugs where the WAL offload owner cannot keep up with WAL backup/upload to S3, which results in an unbounded accumulation of WAL segment files on the Safekeeper's disk until the disk becomes full. This is a somewhat dangerous operation that is hard to recover from because the Safekeeper cannot write its control files when it is out of disk space. There are actually 2 problems here: 1. A single problematic timeline can take over the entire disk for the SK 2. Once out of disk, it's difficult to recover SK IV. Neon reports certain storage errors as "critical" errors using a marco, which will increment a counter/metric that can be used to raise alerts. However, this metric isn't sliced by tenant and/or timeline today. We need the tenant/timeline dimension to better respond to incidents and for blast radius analysis. ## Summary of changes I. The PR adds a `safekeeper_wal_disk_io_errors ` which is incremented when SK fails to create or flush WALs. II. To mitigate this issue, we will re-elect a new offloader if the current offloader is lagging behind too much. Each SK makes the decision locally but they are aware of each other's commit and backup lsns. The new algorithm is - determine_offloader will pick a SK. say SK-1. - Each SK checks -- if commit_lsn - back_lsn > threshold, -- -- remove SK-1 from the candidate and call determine_offloader again. SK-1 will step down and all SKs will elect the same leader again. After the backup is caught up, the leader will become SK-1 again. This also helps when SK-1 is slow to backup. I'll set the reelect backup lag to 4 GB later. Setting to 128 MB in dev to trigger the code more frequently. III. This change addresses problem no. 1 by having the Safekeeper perform a timeline disk utilization check check when processing WAL proposal messages from Postgres/compute. The Safekeeper now rejects the WAL proposal message, effectively stops writing more WAL for the timeline to disk, if the existing WAL files for the timeline on the SK disk exceeds a certain size (the default threshold is 100GB). The disk utilization is calculated based on a `last_removed_segno` variable tracked by the background task removing WAL files, which produces an accurate and conservative estimate (>= than actual disk usage) of the actual disk usage. IV. * Add a new metric `hadron_critical_storage_event_count` that has the `tenant_shard_id` and `timeline_id` as dimensions. * Modified the `crtitical!` marco to include tenant_id and timeline_id as additional arguments and adapted existing call sites to populate the tenant shard and timeline ID fields. The `critical!` marco invocation now increments the `hadron_critical_storage_event_count` with the extra dimensions. (In SK there isn't the notion of a tenant-shard, so just the tenant ID is recorded in lieu of tenant shard ID.) I considered adding a separate marco to avoid merge conflicts, but I think in this case (detecting critical errors) conflicts are probably more desirable so that we can be aware whenever Neon adds another `critical!` invocation in their code. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com> Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-03 14:32:53 +00:00
Dmitry Savelev	0429a0db16	Switch the billing metrics storage format to ndjson. (#12427 ) ## Problem The billing team wants to change the billing events pipeline and use a common events format in S3 buckets across different event producers. ## Summary of changes Change the events storage format for billing events from JSON to NDJSON. Also partition files by hours, rather than days. Resolves: https://github.com/neondatabase/cloud/issues/29995	2025-07-02 16:30:47 +00:00
Arpad Müller	efd7e52812	Don't error if timeline offload is already in progress (#12428 ) Don't print errors like: ``` Compaction failed 1 times, retrying in 2s: Failed to offload timeline: Unexpected offload error: Timeline deletion is already in progress ``` Print it at info log level instead. https://github.com/neondatabase/cloud/issues/30666	2025-07-02 12:06:55 +00:00
Alex Chi Z.	5ec8881c0b	feat(pageserver): resolve feature flag based on remote size (#12400 ) ## Problem Part of #11813 ## Summary of changes * Compute tenant remote size in the housekeeping loop. * Add a new `TenantFeatureResolver` struct to cache the tenant-specific properties. * Evaluate feature flag based on the remote size. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-01 18:11:24 +00:00
Alex Chi Z.	b254dce8a1	feat(pageserver): report compaction progress (#12401 ) ## Problem close https://github.com/neondatabase/neon/issues/11528 ## Summary of changes Gives us better observability of compaction progress. - Image creation: num of partition processed / total partition - Gc-compaction: index of the in the queue / total items for a full compaction - Shard ancestor compaction: layers to rewrite / total layers Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-01 17:00:27 +00:00
Alex Chi Z.	3815e3b2b5	feat(pageserver): reduce lock contention in l0 compaction (#12360 ) ## Problem L0 compaction currently holds the read lock for a long region while it doesn't need to. ## Summary of changes This patch reduces the one long contention region into 2 short ones: gather the layers to compact at the beginning, and several short read locks when querying the image coverage. Co-Authored-By: Chen Luo --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-07-01 16:58:41 +00:00
Trung Dinh	daa402f35a	pageserver: Make ImageLayerWriter sync, infallible and lazy (#12403 ) ## Problem ## Summary of changes Make ImageLayerWriter sync, infallible and lazy. Address https://github.com/neondatabase/neon/issues/12389. All unit tests passed.	2025-07-01 09:53:11 +00:00
Arpad Müller	2e681e0ef8	detach_ancestor: delete the right layer when hardlink fails (#12397 ) If a hardlink operation inside `detach_ancestor` fails due to the layer already existing, we delete the layer to make sure the source is one we know about, and then retry. But we deleted the wrong file, namely, the one we wanted to use as the source of the hardlink. As a result, the follow up hard link operation failed. Our PR corrects this mistake.	2025-06-30 21:36:15 +00:00
Erik Grinaker	d0a4ae3e8f	pageserver: add gRPC LSN lease support (#12384 ) ## Problem The gRPC API does not provide LSN leases. ## Summary of changes * Add LSN lease support to the gRPC API. * Use gRPC LSN leases for static computes with `grpc://` connstrings. * Move `PageserverProtocol` into the `compute_api::spec` module and reuse it.	2025-06-30 12:44:17 +00:00
Erik Grinaker	a384d7d501	pageserver: assert no changes to shard identity (#12379 ) ## Problem Location config changes can currently result in changes to the shard identity. Such changes will cause data corruption, as seen with #12217. Resolves #12227. Requires #12377. ## Summary of changes Assert that the shard identity does not change on location config updates and on (re)attach. This is currently asserted with `critical!`, in case it misfires in production. Later, we should reject such requests with an error and turn this into a proper assertion.	2025-06-30 12:36:45 +00:00
Christian Schwarz	66f53d9d34	refactor(pageserver): force explicit mapping to `CreateImageLayersError::Other` (#12382 ) Implicit mapping to an `anyhow::Error` when we do `?` is discouraged because tooling to find those places isn't great. As a drive-by, also make SplitImageLayerWriter::new infallible and sync. I think we should also make ImageLayerWriter::new completely lazy, then `BatchLayerWriter:new` infallible and async.	2025-06-30 11:03:48 +00:00
Erik Grinaker	1d43f3bee8	pageserver: fix stripe size persistence in legacy HTTP handlers (#12377 ) ## Problem Similarly to #12217, the following endpoints may result in a stripe size mismatch between the storage controller and Pageserver if an unsharded tenant has a different stripe size set than the default. This can lead to data corruption if the tenant is later manually split without specifying an explicit stripe size, since the storage controller and Pageserver will apply different defaults. This commonly happens with tenants that were created before the default stripe size was changed from 32k to 2k. * `PUT /v1/tenant/config` * `PATCH /v1/tenant/config` These endpoints are no longer in regular production use (they were used when cplane still managed Pageserver directly), but can still be called manually or by tests. ## Summary of changes Retain the current shard parameters when updating the location config in `PUT \| PATCH /v1/tenant/config`. Also opportunistically derive `Copy` for `ShardParameters`.	2025-06-30 09:08:44 +00:00
Erik Grinaker	e50b914a8e	compute_tools: support gRPC base backups in `compute_ctl` (#12244 ) ## Problem `compute_ctl` should support gRPC base backups. Requires #12111. Requires #12243. Touches #11926. ## Summary of changes Support `grpc://` connstrings for `compute_ctl` base backups.	2025-06-27 16:39:00 +00:00
Christian Schwarz	e33e109403	fix(pageserver): buffered writer cancellation error handling (#12376 ) ## Problem The problem has been well described in already-commited PR #11853. tl;dr: BufferedWriter is sensitive to cancellation, which the previous approach was not. The write path was most affected (ingest & compaction), which was mostly fixed in #11853: it introduced `PutError` and mapped instances of `PutError` that were due to cancellation of underlying buffered writer into `CreateImageLayersError::Cancelled`. However, there is a long tail of remaining errors that weren't caught by #11853 that result in `CompactionError::Other`s, which we log with great noise. ## Solution The stack trace logging for CompactionError::Other added in #11853 allows us to chop away at that long tail using the following pattern: - look at the stack trace - from leaf up, identify the place where we incorrectly map from the distinguished variant X indicating cancellation to an `anyhow::Error` - follow that anyhow further up, ensuring it stays the same anyhow all the way up in the `CompactionError::Other` - since it stayed one anyhow chain all the way up, root_cause() will yield us X - so, in `log_compaction_error`, add an additional `downcast_ref` check for X This PR specifically adds checks for - the flush task cancelling (FlushTaskError, BlobWriterError) - opening of the layer writer (GateError) That should cover all the reports in issues - https://github.com/neondatabase/cloud/issues/29434 - https://github.com/neondatabase/neon/issues/12162 ## Refs - follow-up to #11853 - fixup of / fixes https://github.com/neondatabase/neon/issues/11762 - fixes https://github.com/neondatabase/neon/issues/12162 - refs https://github.com/neondatabase/cloud/issues/29434	2025-06-27 15:26:00 +00:00
Arpad Müller	4c7956fa56	Fix hang deleting offloaded timelines (#12366 ) We don't have cancellation support for timeline deletions. In other words, timeline deletion might still go on in an older generation while we are attaching it in a newer generation already, because the cancellation simply hasn't reached the deletion code. This has caused us to hit a situation with offloaded timelines in which the timeline was in an unrecoverable state: always returning an accepted response, but never a 404 like it should be. The detailed description can be found in [here](https://github.com/neondatabase/cloud/issues/30406#issuecomment-3008667859) (private repo link). TLDR: 1. we ask to delete timeline on old pageserver/generation, starts process in background 2. the storcon migrates the tenant to a different pageserver. - during attach, the pageserver still finds an index part, so it adds it to `offloaded_timelines` 4. the timeline deletion finishes, removing the index part in S3 5. there is a retry of the timeline deletion endpoint, sent to the new pageserver location. it is bound to fail however: - as the index part is gone, we print `Timeline already deleted in remote storage`. - the problem is that we then return an accepted response code, and not a 404. - this confuses the code calling us. it thinks the timeline is not deleted, so keeps retrying. - this state never gets recovered from until a reset/detach, because of the `offloaded_timelines` entry staying there. This is where this PR fixes things: if no index part can be found, we can safely assume that the timeline is gone in S3 (it's the last thing to be deleted), so we can remove it from `offloaded_timelines` and trigger a reupload of the manifest. Subsequent retries will pick that up. Why not improve the cancellation support? It is a more disruptive code change, that might have its own risks. So we don't do it for now. Fixes https://github.com/neondatabase/cloud/issues/30406	2025-06-27 15:14:55 +00:00
Arpad Müller	37e181af8a	Update rust to 1.88.0 (#12364 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Announcement blog post](https://blog.rust-lang.org/2025/06/26/Rust-1.88.0/) Prior update was in https://github.com/neondatabase/neon/pull/11938	2025-06-27 13:51:59 +00:00
Vlad Lazar	ebb6e26a64	pageserver: handle multiple attached children in shard resolution (#12336 ) ## Problem When resolving a shard during a split we might have multiple attached shards with the old shard count (i.e. not all of them are marked in progress and ignored). Hence, we can compute the desired shard number based on the old shard count and misroute the request. ## Summary of Changes Recompute the desired shard every time the shard count changes during the iteration	2025-06-27 12:46:18 +00:00
Vlad Lazar	72b3c9cd11	pageserver: fix wal receiver hang on remote client shutdown (#12348 ) ## Problem Druing shard splits we shut down the remote client early and allow the parent shard to keep ingesting data. While ingesting data, the wal receiver task may wait for the current flush to complete in order to apply backpressure. Notifications are delivered via `Timeline::layer_flush_done_tx`. When the remote client was being shut down the flush loop exited whithout delivering a notification. This left `Timeline::wait_flush_completion` hanging indefinitely which blocked the shutdown of the wal receiver task, and, hence, the shard split. ## Summary of Changes Deliver a final notification when the flush loop is shutting down without the timeline cancel cancellation token having fired. I tried writing a test for this, but got stuck in failpoint hell and decided it's not worth it. `test_sharding_autosplit`, which reproduces this reliably in CI, passed with the proposed fix in https://github.com/neondatabase/neon/pull/12304. Closes https://github.com/neondatabase/neon/issues/12060	2025-06-26 16:35:34 +00:00
Erik Grinaker	a2d2108e6a	pageserver: use base backup cache with gRPC (#12352 ) ## Problem gRPC base backups do not use the base backup cache. Touches https://github.com/neondatabase/neon/issues/11728. ## Summary of changes Integrate gRPC base backups with the base backup cache. Also fixes a bug where the base backup cache did not differentiate between primary/replica base backups (at least I think that's a bug?).	2025-06-26 15:52:15 +00:00
Alex Chi Z.	33c0d5e2f4	fix(pageserver): make posthog config parsing more robust (#12356 ) ## Problem In our infra config, we have to split server_api_key and other fields in two files: the former one in the sops file, and the latter one in the normal config. It creates the situation that we might misconfigure some regions that it only has part of the fields available, causing storcon/pageserver refuse to start. ## Summary of changes Allow PostHog config to have part of the fields available. Parse it later. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-26 15:49:08 +00:00
Dmitrii Kovalkov	605fb04f89	pageserver: use bounded sender for basebackup cache (#12342 ) ## Problem Basebackup cache now uses unbounded channel for prepare requests. In theory it can grow large if the cache is hung and does not process the requests. - Part of https://github.com/neondatabase/cloud/issues/29353 ## Summary of changes - Replace an unbounded channel with a bounded one, the size is configurable. - Add `pageserver_basebackup_cache_prepare_queue_size` to observe the size of the queue. - Refactor a bit to move all metrics logic to `basebackup_cache.rs`	2025-06-26 13:26:24 +00:00
Alex Chi Z.	6f70885e11	fix(pageserver): allow refresh_interval to be empty (#12349 ) ## Problem Fix for https://github.com/neondatabase/neon/pull/12324 ## Summary of changes Need `serde(default)` to allow this field not present in the config, otherwise there will be a config deserialization error. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-25 22:15:03 +00:00
Erik Grinaker	f755979102	pageserver: payload compression for gRPC base backups (#12346 ) ## Problem gRPC base backups use gRPC compression. However, this has two problems: * Base backup caching will cache compressed base backups (making gRPC compression pointless). * Tonic does not support varying the compression level, and zstd default level is 10% slower than gzip fastest level. Touches https://github.com/neondatabase/neon/issues/11728. Touches https://github.com/neondatabase/cloud/issues/29353. ## Summary of changes This patch adds a gRPC parameter `BaseBackupRequest::compression` specifying the compression algorithm. It also moves compression into `send_basebackup_tarball` to reduce code duplication. A follow-up PR will integrate the base backup cache with gRPC.	2025-06-25 18:16:23 +00:00
Alex Chi Z.	6c77638ea1	feat(storcon): retrieve feature flag and pass to pageservers (#12324 ) ## Problem part of https://github.com/neondatabase/neon/issues/11813 ## Summary of changes It costs $$$ to directly retrieve the feature flags from the pageserver. Therefore, this patch adds new APIs to retrieve the spec from the storcon and updates it via pageserver. * Storcon retrieves the feature flag and send it to the pageservers. * If the feature flag gets updated outside of the normal refresh loop of the pageserver, pageserver won't fetch the flags on its own as long as the last updated time <= refresh_period. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-25 14:58:18 +00:00
Arpad Müller	1dc01c9bed	Support cancellations of timelines with hanging ondemand downloads (#12330 ) In `test_layer_download_cancelled_by_config_location`, we simulate hung downloads via the `before-downloading-layer-stream-pausable` failpoint. Then, we cancel a timeline via the `location_config` endpoint. With the new default as of https://github.com/neondatabase/neon/pull/11712, we would be creating the timeline on safekeepers regardless if there have been writes or not, and it turns out the test relied on the timeline not existing on safekeepers, due to a cancellation bug: * as established before, the test makes the read path hang * the timeline cancellation function first cancels the walreceiver, and only then cancels the timeline's token * `WalIngest::new` is requesting a checkpoint, which hits the read path * at cancellation time, we'd be hanging inside the read, not seeing the cancellation of the walreceiver * the test would time out due to the hang This is probably also reproducible in the wild when there is S3 unavailabilies or bottlenecks. So we thought that it's worthwhile to fix the hang issue. The approach chosen in the end involves the `tokio::select` macro. In PR 11712, we originally punted on the test due to the hang and opted it out from the new default, but now we can use the new default. Part of https://github.com/neondatabase/neon/issues/12299	2025-06-25 13:40:38 +00:00
Matthias van de Meent	6c6de6382a	Use enum-typed PG versions (#12317 ) This makes it possible for the compiler to validate that a match block matched all PostgreSQL versions we support. ## Problem We did not have a complete picture about which places we had to test against PG versions, and what format these versions were: The full PG version ID format (Major/minor/bugfix `MMmmbb`) as transfered in protocol messages, or only the Major release version (`MM`). This meant type confusion was rampant. With this change, it becomes easier to develop new version-dependent features, by making type and niche confusion impossible. ## Summary of changes Every use of `pg_version` is now typed as either `PgVersionId` (u32, valued in decimal `MMmmbb`) or PgMajorVersion (an enum, with a value for every major version we support, serialized and stored like a u32 with the value of that major version) --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-06-24 17:25:31 +00:00
Arpad Müller	552249607d	apply clippy fixes for 1.88.0 beta (#12331 ) The 1.88.0 stable release is near (this Thursday). We'd like to fix most warnings beforehand so that the compiler upgrade doesn't require approval from too many teams. This is therefore a preparation PR (like similar PRs before it). There is a lot of changes for this release, mostly because the `uninlined_format_args` lint has been added to the `style` lint group. One can read more about the lint [here](https://rust-lang.github.io/rust-clippy/master/#/uninlined_format_args). The PR is the result of `cargo +beta clippy --fix` and `cargo fmt`. One remaining warning is left for the proxy team. --------- Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2025-06-24 10:12:42 +00:00
Alex Chi Z.	85164422d0	feat(pageserver): support force overriding feature flags (#12233 ) ## Problem Part of #11813 ## Summary of changes Add a test API to make it easier to manipulate the feature flags within tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-23 17:31:53 +00:00
Alex Chi Z.	5e2c444525	fix(pageserver): reduce default feature flag refresh interval (#12246 ) ## Problem Part of #11813 ## Summary of changes The current interval is 30s and it costs a lot of $$$. This patch reduced it to 600s refresh interval (which means that it takes 10min for feature flags to propagate from UI to the pageserver). In the future we can let storcon retrieve the feature flags and push it to pageservers. We can consider creating a new release or we can postpone this to the week after the next week. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-06-23 13:51:21 +00:00
Vlad Lazar	0e490f3be7	pageserver: allow concurrent rw IO on in-mem layer (#12151 ) ## Problem Previously, we couldn't read from an in-memory layer while a batch was being written to it. Vice-versa, we couldn't write to it while there was an on-going read. ## Summary of Changes The goal of this change is to improve concurrency. Writes happened through a &mut self method so the enforcement was at the type system level. We attempt to improve by: 1. Adding interior mutability to EphemeralLayer. This involves wrapping the buffered writer in a read-write lock. 2. Minimise the time that the read lock is held for. Only hold the read lock while reading from the buffers (recently flushed or pending flush). If we need to read from the file, drop the lock and allow IO to be concurrent. The new benchmark variants with concurrent reads improve between 70 to 200 percent (against main). Benchmark results are in this [commit](`891f094ce6`). ## Future Changes We can push the interior mutability into the buffered writer. The mutable tail goes under a read lock, the flushed part goes into an ArcSwap and then we can read from anything that is flushed _without_ any locking.	2025-06-23 13:17:30 +00:00
Erik Grinaker	7e41ef1bec	pageserver: set gRPC basebackup chunk size to 256 KB (#12314 ) gRPC base backups send a stream of fixed-size 64KB chunks. pagebench basebackup with compression enabled shows this to reduce throughput: * 64 KB: 55 RPS * 128 KB: 69 RPS * 256 KB: 73 RPS * 1024 KB: 73 RPS This patch sets the base backup chunk size to 256 KB.	2025-06-23 12:41:11 +00:00
Erik Grinaker	47f7efee06	pageserver: require stripe size (#12257 ) ## Problem In #12217, we began passing the stripe size in reattach responses, and persisting it in the on-disk state. This is necessary to ensure the storage controller and Pageserver have a consistent view of the intended stripe size of unsharded tenants, which will be used for splits that do not specify a stripe size. However, for backwards compatibility, these stripe sizes were optional. ## Summary of changes Make the stripe sizes required for reattach responses and on-disk location configs. These will always be provided by the previous (current) release.	2025-06-21 15:01:29 +00:00
Vlad Lazar	6508f4e5c1	pageserver: revise gc layer map lock handling (#12290 ) ## Problem Timeline GC is very aggressive with regards to layer map locking. We've seen timelines with loads of layers in production that hold the write lock for the layer map for 30 minutes at a time. This blocks reads and the write path to some extent. ## Summary of changes Determining the set of layers to GC is done under the read lock. Applying the updates is done under the write lock. Previously, everything was done under write lock.	2025-06-20 11:57:30 +00:00
Erik Grinaker	15d079cd41	pagebench: improve `getpage-latest-lsn` gRPC support (#12293 ) This improves `pagebench getpage-latest-lsn` gRPC support by: * Using `page_api::Client`. * Removing `--protocol`, and using the `page-server-connstring` scheme instead. * Adding `--compression` to enable zstd compression.	2025-06-20 08:31:40 +00:00
Erik Grinaker	dc1625cd8e	pagebench: add `basebackup` gRPC support (#12250 ) ## Problem Pagebench does not support gRPC for `basebackup` benchmarks. Requires #12243. Touches #11728. ## Summary of changes Add gRPC support via gRPC connstrings, e.g. `pagebench basebackup --page-service-connstring grpc://localhost:51051`. Also change `--gzip-probability` to `--no-compression`, since this must be specified per-client for gRPC.	2025-06-19 15:40:57 +00:00
Heikki Linnakangas	1950ccfe33	Eliminate dependency from pageserver_api to postgres_ffi (#12273 ) Introduce a separate `postgres_ffi_types` crate which contains a few types and functions that were used in the API. `postgres_ffi_types` is a much small crate than `postgres_ffi`, and it doesn't depend on bindgen or the Postgres C headers. Move NeonWalRecord and Value types to wal_decoder crate. They are only used in the pageserver-safekeeper "ingest" API. The rest of the ingest API types are defined in wal_decoder, so move these there as well.	2025-06-19 10:31:27 +00:00
Erik Grinaker	6f4ffdb48b	pageserver: add gRPC compression (#12280 ) ## Problem The gRPC page service should support compression. Requires #12111. Touches #11728. Touches https://github.com/neondatabase/cloud/issues/25679. ## Summary of changes Add support for gzip and zstd compression in the server, and a client parameter to enable compression. This will need further benchmarking under realistic network conditions.	2025-06-19 09:54:34 +00:00
Vlad Lazar	3f676df3d5	pageserver: fix initial layer visibility calculation (#12206 ) ## Problem GC info is an input to updating layer visibility. Currently, gc info is updated on timeline activation and visibility is computed on tenant attach, so we ignore branch points and compute visibility by taking all layers into account. Side note: gc info is also updated when timelines are created and dropped. That doesn't help because we create the timelines in topological order from the root. Hence the root timeline goes first, without context of where the branch points are. The impact of this in prod is that shards need to rehydrate layers after live migration since the non-visible ones were excluded from the heatmap. ## Summary of Changes Move the visibility calculation into tenant attachment instead of activation.	2025-06-19 09:53:18 +00:00
Elizabeth Murray	830ef35ed3	Domain client for Pageserver GRPC. (#12111 ) Add domain client for new communicator GRPC types.	2025-06-18 15:51:49 +00:00
Heikki Linnakangas	5a045e7d52	Move pagestream_api to separate module (#12272 ) For general readability.	2025-06-18 12:03:14 +00:00
Erik Grinaker	04013929cb	pageserver: support full gRPC basebackups (#12269 ) ## Problem Full basebackups are used in tests, and may be useful for debugging as well, so we should support them in the gRPC API. Touches #11728. ## Summary of changes Add `GetBaseBackupRequest::full` to generate full base backups. The libpq implementation also allows specifying `prev_lsn` for full backups, i.e. the end LSN of the previous WAL record. This is omitted in the gRPC API, since it's not used by any tests, and presumably of limited value since it's autodetected. We can add it later if we find that we need it.	2025-06-18 06:48:39 +00:00
Erik Grinaker	a4c76740c0	pageserver: emit gRPC GetPage errors as responses (#12255 ) ## Problem When converting `proto::GetPageRequest` into `page_api::GetPageRequest` and validating the request, errors are returned as `tonic::Status`. This will tear down the GetPage stream, which is disruptive and unnecessary. ## Summary of changes Emit invalid request errors as `GetPageResponse` with an appropriate `status_code` instead. Also move the conversion from `tonic::Status` to `GetPageResponse` out into the stream handler.	2025-06-17 15:41:17 +00:00
Dmitrii Kovalkov	dee73f0cb4	pageserver: implement max_total_size_bytes limit for basebackup cache (#12230 ) ## Problem The cache was introduced as a hackathon project and the only supported limit was the number of entries. The basebackup entry size may vary. We need to have more control over disk space usage to ship it to production. - Part of https://github.com/neondatabase/cloud/issues/29353 ## Summary of changes - Store the size of entries in the cache and use it to limit `max_total_size_bytes` - Add the size of the cache in bytes to metrics.	2025-06-17 15:08:59 +00:00
Erik Grinaker	48052477b4	storcon: register Pageserver gRPC address (#12268 ) ## Problem Pageservers now expose a gRPC API on a separate address and port. This must be registered with the storage controller such that it can be plumbed through to the compute via cplane. Touches #11926. ## Summary of changes This patch registers the gRPC address and port with the storage controller: * Add gRPC address to `nodes` database table and `NodePersistence`, with a Diesel migration. * Add gRPC address in `NodeMetadata`, `NodeRegisterRequest`, `NodeDescribeResponse`, and `TenantLocateResponseShard`. * Add gRPC address flags to `storcon_cli node-register`. These changes are backwards-compatible, since all structs will ignore unknown fields during deserialization.	2025-06-17 13:27:10 +00:00
Erik Grinaker	d81353b2d1	pageserver: gRPC base backup fixes (#12243 ) ## Problem The gRPC base backup implementation has a few issues: chunks are not properly bounded, and it's not possible to omit the LSN. Touches #11728. ## Summary of changes * Properly bound chunks by using a limited writer. * Use an `Option<Lsn>` rather than a `ReadLsn` (the latter requires an LSN).	2025-06-17 12:37:43 +00:00
Trung Dinh	fc136eec8f	pagectl: add dump layer local (#12245 ) ## Problem In our environment, we don't always have access to the pagectl tool on the pageserver. We have to download the page files to local env to introspect them. Hence, it'll be useful to be able to parse the local files using `pagectl`. ## Summary of changes * Add `dump-layer-local` to `pagectl` that takes a local path as argument and returns the layer content: ``` cargo run -p pagectl layer dump-layer-local ~/Desktop/000000067F000040490002800000FFFFFFFF-030000000000000000000000000000000002__00003E7A53EDE611-00003E7AF27BFD19-v1-00000001 ``` * Bonus: Fix a bug in `pageserver/ctl/src/draw_timeline_dir.rs` in which we don't filter out temporary files.	2025-06-16 10:29:42 +00:00
Erik Grinaker	818e5130f1	page_api: add a few derives (#12253 ) ## Problem The `page_api` domain types are missing a few derives. ## Summary of changes Add `Clone`, `Copy`, and `Debug` derives for all types where appropriate.	2025-06-16 09:45:50 +00:00
Dmitrii Kovalkov	385324ee8a	pageserver: fix post-merge PR comments on basebackup cache (#12216 ) ## Problem This PR addresses all but the direct IO post-merge comments on basebackup cache implementation. - Follow up on https://github.com/neondatabase/neon/pull/11989#pullrequestreview-2867966119 - Part of https://github.com/neondatabase/cloud/issues/29353 ## Summary of changes - Clean up the tmp directory by recreating it. - Recreate the tmp directory on startup. - Add comments why it's safe to not fsync the inode after renaming.	2025-06-13 08:49:31 +00:00

1 2 3 4 5 ...

3022 Commits