rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Erik Grinaker	8840f3858c	pageserver: return 503 during tenant shutdown (#9635 ) ## Problem Tenant operations may return `409 Conflict` if the tenant is shutting down. This status code is not retried by the control plane, causing user-facing errors during pageserver restarts. Operations should instead return `503 Service Unavailable`, which may be retried for idempotent operations. ## Summary of changes Convert `GetActiveTenantError::WillNotBecomeActive(TenantState::Stopping)` to `ApiError::ShuttingDown` rather than `ApiError::Conflict`. This error is returned by `Tenant::wait_to_become_active` in most (all?) tenant/timeline-related HTTP routes.	2024-11-05 13:16:55 +01:00
Arpad Müller	ee68bbf6f5	Add tenant config option to allow timeline_offloading (#9598 ) Allow us to enable timeline offloading for single tenants without having to enable it for the entire pageserver. Part of #8088.	2024-11-04 21:01:18 +01:00
John Spray	4534f5cdc6	pageserver: make local timeline deletion infallible (#9594 ) ## Problem In https://github.com/neondatabase/neon/pull/9589, timeline offload code is modified to return an explicit error type rather than propagating anyhow::Error. One of the 'Other' cases there is I/O errors from local timeline deletion, which shouldn't need to exist, because our policy is not to try and continue running if the local disk gives us errors. ## Summary of changes - Make `delete_local_timeline_directory` and use `.fatal_err(` on I/O errors --------- Co-authored-by: Erik Grinaker <erik@neon.tech>	2024-11-04 09:11:52 +00:00
Konstantin Knizhnik	8ac523d2ee	Do not assign page LSN to new (uninitialized) page in ClearVisibilityMapFlags redo handler (#9287 ) ## Problem https://neondb.slack.com/archives/C04DGM6SMTM/p1727872045252899 See https://github.com/neondatabase/neon/issues/9240 ## Summary of changes Add `!page_is_new` check before assigning page lsn. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-11-01 20:31:29 +02:00
John Spray	552088ac16	pageserver: fix spurious error logs in timeline lifecycle (#9589 ) ## Problem The final part of https://github.com/neondatabase/neon/issues/9543 will be a chaos test that creates/deletes/archives/offloads timelines while restarting pageservers and migrating tenants. Developing that test showed up a few places where we log errors during normal shutdown. ## Summary of changes - UninitializedTimeline's drop should log at info severity: this is a normal code path when some part of timeline creation encounters a cancellation `?` path. - When offloading and finding a `RemoteTimelineClient` in a non-initialized state, this is not an error and should not be logged as such. - The `offload_timeline` function returned an anyhow error, so callers couldn't gracefully pick out cancellation errors from real errors: update this to have a structured error type and use it throughout.	2024-10-31 14:44:59 +00:00
Erik Grinaker	f9d8256d55	pageserver: don't return option from `DeletionQueue::new` (#9588 ) `DeletionQueue::new()` always returns deletion workers, so the returned `Option` is redundant.	2024-10-31 10:51:58 +00:00
Vlad Lazar	411c3aa0d6	pageserver: lift decoding and interpreting of wal into wal_decoder (#9524 ) ## Problem Decoding and ingestion are still coupled in `pageserver::WalIngest`. ## Summary of changes A new type is added to `wal_decoder::models`, InterpretedWalRecord. This type contains everything that the pageserver requires in order to ingest a WAL record. The highlights are the `metadata_record` which is an optional special record type to be handled and `blocks` which stores key, value pairs to be persisted to storage. This type is produced by `wal_decoder::models::InterpretedWalRecord::from_bytes` from a raw PG wal record. The rest of this commit separates decoding and interpretation of the PG WAL record from its application in `WalIngest::ingest_record`. Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-31 10:47:43 +00:00
Arpad Müller	65b69392ea	Disallow offloaded children during timeline deletion (#9582 ) If we delete a timeline that has childen, those children will have their data corrupted. Therefore, extend the already existing safety check to offloaded timelines as well. Part of #8088	2024-10-30 19:37:09 +01:00
Alex Chi Z.	8d70f88b37	refactor(pageserver): use JSON field encoding for consumption metrics cache (#9470 ) In https://github.com/neondatabase/neon/issues/9032, I would like to eventually add a `generation` field to the consumption metrics cache. The current encoding is not backward compatible and it is hard to add another field into the cache. Therefore, this patch refactors the format to store "field -> value", and it's easier to maintain backward/forward compatibility with this new format. ## Summary of changes * Add `NewRawMetric` as the new format. * Add upgrade path. When opening the disk cache, the codepath first inspects the `version` field, and decide how to decode. * Refactor metrics generation code and tests. * Add tests on upgrade / compatibility with the old format. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-30 18:13:11 +00:00
Arpad Müller	bcfe013094	Don't keep around the timeline's remote_client (#9583 ) Constructing a remote client is no big deal. Yes, it means an extra download from S3 but it's not that expensive. This simplifies code paths and scenarios to test. This unifies timelines that have been recently offloaded with timelines that have been offloaded in an earlier invocation of the process. Part of #8088	2024-10-30 18:44:29 +01:00
Arpad Müller	d0a02f3649	Disallow archived timelines to be detached or reparented (#9578 ) Disallow a request for timeline ancestor detach if either the to be detached timeline, or any of the to be reparented timelines are offloaded or archived. In theory we could support timelines that are archived but not offloaded, but archived timelines are at the risk of being offloaded, so we treat them like offloaded timelines. As for offloaded timelines, any code to "support" them would amount to unoffloading them, at which point we can just demand to have the timelines be unarchived. Part of #8088	2024-10-30 17:04:57 +01:00
John Spray	8e2e9f0fed	pageserver: generation-aware storage for TenantManifest (#9555 ) ## Problem When tenant manifest objects are written without a generation suffix, concurrently attached pageservers may stamp on each others writes of the manifest and cause undefined behavior. Closes: #9543 ## Summary of changes - Use download_generation_object helper when reading manifests, to search for the most recent generation - Use Tenant::generation as the generation suffix when writing manifests.	2024-10-29 23:24:04 +01:00
Alex Chi Z.	81f9aba005	fix(pagectl): layer parsing and image layer dump (#9571 ) This patch contains various improvements for the pagectl tool. ## Summary of changes * Rewrite layer name parsing: LayerName now supports all variants we use now. * Drop pagectl's own layer parsing function, use LayerName in the pageserver crate. * Support image layer dumping in the layer dump command using ImageLayer::dump, drop the original implementation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-29 15:16:23 -04:00
Alex Chi Z.	88ff8a7803	feat(pageserver): support partial gc-compaction for lowest retain lsn (#9134 ) part of https://github.com/neondatabase/neon/issues/8921, https://github.com/neondatabase/neon/issues/9114 ## Summary of changes We start the partial compaction implementation with the image layer partial generation. The partial compaction API now takes a key range. We will only generate images for that key range for now, and remove layers fully included in the key range after compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-29 18:25:32 +00:00
Konstantin Knizhnik	0c075fab3a	Add --replica parameter to basebackup (#9553 ) ## Problem See https://github.com/neondatabase/neon/pull/9458 This PR separates PS related changes in #9458 from compute_ctl changes to enforce that PS is deployed before compute. ## Summary of changes This PR adds handlings of `--replica` parameters of backebackup to page server. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-10-29 18:40:10 +02:00
John Spray	7a1331eee5	pageserver: make concurrent offloaded timeline operations safe wrt manifest uploads (#9557 ) ## Problem Uploads of the tenant manifest could race between different tasks, resulting in unexpected results in remote storage. Closes: https://github.com/neondatabase/neon/issues/9556 ## Summary of changes - Create a central function for uploads that takes a tokio::sync::Mutex - Store the latest upload in that Mutex, so that when there is lots of concurrency (e.g. archive 20 timelines at once) we can coalesce their manifest writes somewhat.	2024-10-29 13:54:48 +00:00
John Spray	4ef74215e1	pageserver: refactor generation-aware loading code into generic (#9545 ) ## Problem Indices used to be the only kind of object where we had to search across generations to find the most recent one. As of https://github.com/neondatabase/neon/issues/9543, manifests will need the same treatment. ## Summary of changes - Refactor download_index_part to a generic download_generation_object function, which will be usable for downloading manifest objects as well.	2024-10-29 13:00:03 +00:00
Arpad Müller	a73402e646	Offloaded timeline deletion (#9519 ) As pointed out in https://github.com/neondatabase/neon/pull/9489#discussion_r1814699683 , we currently didn't support deletion for offloaded timelines after the timeline has been loaded from the manifest instead of having been offloaded. This was because the upload queue hasn't been initialized yet. This PR thus initializes the timeline and shuts it down immediately. Part of #8088	2024-10-29 10:41:53 +00:00
Vlad Lazar	07b974480c	pageserver: move things around to prepare for decoding logic (#9504 ) ## Problem We wish to have high level WAL decoding logic in `wal_decoder::decoder` module. ## Summary of Changes For this we need the `Value` and `NeonWalRecord` types accessible there, so: 1. Move `Value` and `NeonWalRecord` to `pageserver::value` and `pageserver::record` respectively. 2. Get rid of `pageserver::repository` (follow up from (1)) 3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In theory they could live in `wal_decoder`, but it would create a circular dependency between `wal_decoder` and `postgres_ffi`. Long term it makes sense for those types to be PG version specific, so that will work out nicely. 4. Move higher level WAL record types (to be ingested by pageserver) into `wal_decoder::models` Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-29 10:00:34 +00:00
Arpad Müller	62f5d484d9	Assert the tenant to be active in `unoffload_timeline` (#9539 ) Currently, all callers of `unoffload_timeline` ensure that the tenant the unoffload operation is called on is active. We rely on it being active as we activate the timeline below and don't want to race with the activation code of the tenant (in the worst case, activating a timeline twice). Therefore, add this assertion. Part of #8088	2024-10-29 00:36:05 +00:00
Alex Chi Z.	57c21aff9f	refactor(pageserver): remove aux v1 configs (#9494 ) ## Problem Part of https://github.com/neondatabase/neon/issues/8623 ## Summary of changes Removed all aux-v1 config processing code. Note that we persisted it into the index part file, so we cannot really remove the field from index part. I also kept the config item within the tenant config, but we will not read it any more. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-28 19:51:14 +00:00
Arpad Müller	e7277885b3	Don't consider archived timelines for synthetic size calculation (#9497 ) Archived timelines should not count towards synthetic size. Closes #9384. Part of #8088.	2024-10-26 13:27:57 +00:00
Yuchen Liang	85b954f449	pageserver: add tokio-epoll-uring slots waiters queue depth metrics (#9482 ) In complement to https://github.com/neondatabase/tokio-epoll-uring/pull/56. ## Problem We want to make tokio-epoll-uring slots waiters queue depth observable via Prometheus. ## Summary of changes - Add `pageserver_tokio_epoll_uring_slots_submission_queue_depth` metrics as a `Histogram`. - Each thread-local tokio-epoll-uring system is given a `LocalHistogram` to observe the metrics. - Keep a list of `Arc<ThreadLocalMetrics>` used on-demand to flush data to the shared histogram. - Extend `Collector::collect` to report `pageserver_tokio_epoll_uring_slots_submission_queue_depth`. Signed-off-by: Yuchen Liang <yuchen@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-25 21:30:57 +01:00
Arpad Müller	76328ada05	Fix unoffload_timeline races with creation (#9525 ) This PR does two things: 1. Obtain a `TimelineCreateGuard` object in `unoffload_timeline`. This prevents two unoffload tasks from racing with each other. While they already obtain locks for `timelines` and `offloaded_timelines`, they aren't sufficient, as we have already constructed an entire timeline at that point. We shouldn't ever have two `Timeline` objects in the same process at the same time. 2. don't allow timeline creations for timelines that have been offloaded. Obviously they already exist, so we should not allow creation. the previous logic only looked at the timelines list. Part of #8088	2024-10-25 20:06:27 +00:00
John Spray	8297f7a181	pageserver: fix N^2 I/O when processing relation drops in transaction abort (#9507 ) ## Problem We have some known N^2 behaviors when it comes to large relation counts, due to the monolithic encoding and full rewrites of of RelDirectory each time a relation is added. Ordinarily our backpressure mechanisms give "slow but steady" performance when creating/dropping/truncating relations. However, in the case of a transaction abort, it is possible for a single WAL record to drop an unbounded number of relations. The results in an unavailable compute, as when it sends one of these records, it can stall the pageserver's ingest for many minutes, even though the compute only sent a small amount of WAL. Closes https://github.com/neondatabase/neon/issues/9505 ## Summary of changes - Rewrite relation-dropping code to do one read/modify/write cycle of RelDirectory, instead of doing it separately for each relation in a loop. - Add a test for the bug scenario encountered: `test_tx_abort_with_many_relations` The test has ~40s runtime on my workstation. About 1 second of that is the part where we wait for ingest to catch up after a rollback, the rest is the slowness of creating and truncating a large number of relations. --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-25 15:09:02 +01:00
Christian Schwarz	2090e928d1	refactor(timeline creation): idempotency checking (#9501 ) # Context In the PGDATA import code (https://github.com/neondatabase/neon/pull/9218) I add a third way to create timelines, namely, by importing from a copy of a vanilla PGDATA directory in object storage. For idempotency, I'm using the PGDATA object storage location specification, which is stored in the IndexPart for the entire lifespan of the timeline. When loading the timeline from remote storage, that value gets stored inside `struct Timeline` and timeline creation compares the creation argument with that value to determine idempotency of the request. # Changes This PR refactors the existing idempotency handling of Timeline bootstrap and branching such that we simply compare the `CreateTimelineIdempotency` struct, using the derive-generated `PartialEq` implementation. Also, by spelling idempotency out in the type names, I find it adds a lot of clarity. The pathway to idempotency via requester-provided idempotency key also becomes very straight-forward, if we ever want to do this in the future. # Refs * platform context: https://github.com/neondatabase/neon/pull/9218 * product context: https://github.com/neondatabase/cloud/issues/17507 * stacks on top of https://github.com/neondatabase/neon/pull/9366	2024-10-25 14:44:20 +01:00
Christian Schwarz	6f5c262684	pageserver: add testing API to scan layers for disposable keys (#9393 ) This PR adds a pageserver mgmt API to scan a layer file for disposable keys. It hooks it up to the sharding compaction test, demonstrating that we're not filtering out all disposable keys. This is extracted from PGDATA import (https://github.com/neondatabase/neon/pull/9218) where I do the filtering of layer files based on `is_key_disposable`.	2024-10-25 14:16:45 +02:00
Arpad Müller	4d9036bf1f	Support offloaded timelines during shard split (#9489 ) Before, we didn't copy over the `index-part.json` of offloaded timelines to the new shard's location, resulting in the new shard not knowing the timeline even exists. In #9444, we copy over the manifest, but we also need to do this for `index-part.json`. As the operations to do are mostly the same between offloaded and non-offloaded timelines, we can iterate over all of them in the same loop, after the introduction of a `TimelineOrOffloadedArcRef` type to generalize over the two cases. This is analogous to the deletion code added in #8907. The added test also ensures that the sharded archival config endpoint works, something that has not yet been ensured by tests. Part of #8088	2024-10-25 12:32:46 +02:00
Vlad Lazar	b3bedda6fd	pageserver/walingest: log on gappy rel extend (#9502 ) ## Problem https://github.com/neondatabase/neon/pull/9492 added a metric to track the total count of block gaps filled on rel extend. More context is needed to understand when this happens. The current theory is that it may only happen on pg 14 and pg 15 since they do not WAL log relation extends. ## Summary of Changes A rate limited log is added.	2024-10-25 11:15:53 +01:00
Christian Schwarz	b782b11b33	refactor(timeline creation): represent bootstrap vs branch using enum (#9366 ) # Problem Timeline creation can either be bootstrap or branch. The distinction is made based on whether the `ancestor_` fields are present or not. In the PGDATA import code (https://github.com/neondatabase/neon/pull/9218), I add a third variant to timeline creation. # Solution The above pushed me to refactor the code in Pageserver to distinguish the different creation requests through enum variants. There is no externally observable effect from this change. On the implementation level, a notable change is that the acquisition of the `TimelineCreationGuard` happens later than before. This is necessary so that we have everything in place to construct the `CreateTimelineIdempotency`. Notably, this moves the acquisition of the creation guard _after_ the acquisition of the `gc_cs` lock in the case of branching. This might appear as if we're at risk of holding `gc_cs` longer than before this PR, but, even before this PR, we were holding `gc_cs` until after the `wait_completion()` that makes the timeline creation durable in S3 returns. I don't see any deadlock risk with reversing the lock acquisition order. As a drive-by change, I found that the `create_timeline()` function in `neon_local` is unused, so I removed it. # Refs platform context: https://github.com/neondatabase/neon/pull/9218 * product context: https://github.com/neondatabase/cloud/issues/17507 * next PR stacked atop this one: https://github.com/neondatabase/neon/pull/9501	2024-10-25 10:04:27 +00:00
Vlad Lazar	5069123b6d	pageserver: refactor ingest inplace to decouple decoding and handling (#9472 ) ## Problem WAL ingest couples decoding of special records with their handling (updates to the storage engine mostly). This is a roadblock for our plan to move WAL filtering (and implicitly decoding) to safekeepers since they cannot do writes to the storage engine. ## Summary of changes This PR decouples the decoding of the special WAL records from their application. The changes are done in place and I've done my best to refrain from refactorings and attempted to preserve the original code as much as possible. Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-24 17:12:47 +01:00
Alex Chi Z.	fb0406e9d2	refactor(pageserver): refactor split writers using batch layer writer (#9493 ) part of https://github.com/neondatabase/neon/issues/9114, https://github.com/neondatabase/neon/issues/8836, https://github.com/neondatabase/neon/issues/8362 The split layer writer code can be used in a more general way: the caller puts unfinished writers into the batch layer writer and let batch layer writer to ensure the atomicity of the layer produces. ## Summary of changes * Add batch layer writer, which atomically finishes the layers. `BatchLayerWriter::finish` is simply a copy-paste from previous split layer writers. * Refactor split writers to use the batch layer writer. * The current split writer tests cover all code path of batch layer writer. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-24 10:49:54 -04:00
Christian Schwarz	6f34f97573	refactor(pageserver(load_remote_timeline)) remove dead code handling absence of IndexPart (#9408 ) The code is dead at runtime since we're nowadays always running with remote storage and treat it as the source of truth during attach. Clean it up as a preliminary to https://github.com/neondatabase/neon/pull/9218. Related: https://github.com/neondatabase/neon/pull/9366	2024-10-24 09:00:22 +01:00
Vlad Lazar	ac1205c14c	pageserver: add metric for number of zeroed pages on rel extend (#9492 ) ## Problem Filling the gap in with zeroes is annoying for sharded ingest. We are not sure it even happens in reality. ## Summary of Changes Add one global counter which tracks how many such gap blocks we filled on relation extends. We can add more metrics once we understand the scope.	2024-10-23 19:58:28 +01:00
Arpad Müller	3a3bd34a28	Rename IndexPart::{from_s3_bytes,to_s3_bytes} (#9481 ) We support multiple storage backends now, so remove the `_s3_` from the name. Analogous to the names adopted for tenant manifests added in #9444.	2024-10-23 00:34:24 +02:00
Alex Chi Z.	64949a37a9	fix(pageserver): make delta split layer writer finish atomic (#9048 ) similar to https://github.com/neondatabase/neon/pull/8841, we make the delta layer writer atomic when finishing the layers. ## Summary of changes * `put_value` not taking discard fn anymore * `finish` decides what layers to keep --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-22 22:06:21 +00:00
Arpad Müller	6f8fcdf9ea	Timeline offloading persistence (#9444 ) Persist timeline offloaded state to S3. Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful. This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched. This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start. We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active. Part of #9386, #8088	2024-10-22 20:52:30 +00:00
Arpad Müller	34b6bd416a	offloaded timeline list API (#9461 ) Add a way to list the offloaded timelines. Before, one had to look at logs to figure out if a timeline has been offloaded or not, or use the non-presence of a certain timeline in the list of normal timelines. Now, one can list them directly. Part of #8088	2024-10-21 16:33:05 +01:00
Yuchen Liang	49d5e56c08	pageserver: use direct IO for delta and image layer reads (#9326 ) Part of #8130 ## Problem Pageserver previously goes through the kernel page cache for all the IOs. The kernel page cache makes light-loaded pageserver have deceptive fast performance. Using direct IO would offer predictable latencies of our virtual file IO operations. In particular for reads, the data pages also have an extremely low temporal locality because the most frequently accessed pages are cached on the compute side. ## Summary of changes This PR enables pageserver to use direct IO for delta layer and image layer reads. We can ship them separately because these layers are write-once, read-many, so we will not be mixing buffered IO with direct IO. - implement `IoBufferMut`, an buffer type with aligned allocation (currently set to 512). - use `IoBufferMut` at all places we are doing reads on image + delta layers. - leverage Rust type system and use `IoBufAlignedMut` marker trait to guarantee that the input buffers for the IO operations are aligned. - page cache allocation is also made aligned. _* in-memory layer reads and the write path will be shipped separately._ ## Testing Integration test suite run with O_DIRECT enabled: https://github.com/neondatabase/neon/pull/9350 ## Performance We evaluated performance based on the `get-page-at-latest-lsn` benchmark. The results demonstrate a decrease in the number of IOps, no sigificant change in the latency mean, and an slight improvement on the p99.9 and p99.99 latencies. [Benchmark](https://www.notion.so/neondatabase/Benchmark-O_DIRECT-for-image-and-delta-layers-2024-10-01-112f189e00478092a195ea5a0137e706?pvs=4) ## Rollout We will add `virtual_file_io_mode=direct` region by region to enable direct IO on image + delta layers. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-21 11:01:25 -04:00
Alex Chi Z.	aca81f5fa4	fix(pageserver): make image split layer writer finish atomic (#8841 ) Part of https://github.com/neondatabase/neon/issues/8836 ## Summary of changes This pull request makes the image layer split writer atomic when finishing the layers. All the produced layers either finish at the same time, or discard at the same time. Note that this does not prevent atomicity when crash, but anyways, it will be cleaned up on pageserver restart. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-10-21 15:59:48 +01:00
Arpad Müller	71d09c78d4	Accept basebackup <tenant> <timeline> --gzip requests (#9456 ) In #9453, we want to remove the non-gzipped basebackup code in the computes, and always request gzipped basebackups. However, right now the pageserver's page service only accepts basebackup requests in the following formats: * `basebackup <tenant_id> <timeline_id>`, lsn is determined by the pageserver as the most recent one (`timeline.get_last_record_rlsn()`) * `basebackup <tenant_id> <timeline_id> <lsn>` * `basebackup <tenant_id> <timeline_id> <lsn> --gzip` We add a fourth case, `basebackup <tenant_id> <timeline_id> --gzip` to allow gzipping the request for the latest lsn as well.	2024-10-19 00:23:49 +02:00
Conrad Ludgate	b8304f90d6	2024 oct new clippy lints (#9448 ) Fixes new lints from `cargo +nightly clippy` (`clippy 0.1.83 (798fb83f 2024-10-16)`)	2024-10-18 10:27:50 +01:00
John Spray	24398bf060	pageserver: detect & warn on loading an old index which is probably the result of a bad generation (#9383 ) ## Problem The pageserver generally trusts the storage controller/control plane to give it valid generations. However, sometimes it should be obvious that a generation is bad, and for defense in depth we should detect that on the pageserver. This PR is part 1 of 2: 1. in this PR we detect and warn on such situations, but do not block starting up the tenant. Once we have confidence that the check is not firing unexpectedly in the field 2. part 2 of 2 will introduce a condition that refuses to start a tenant in this situtation, and a test for that (maybe, if we can figure out how to spoof an ancient mtime) Related: #6951 ## Summary of changes - When loading an index older than 2 weeks, log an INFO message noting that we will check for other indices - When loading an index older than 2 weeks _and_ a newer-generation index exists, log a warning.	2024-10-17 19:02:24 +01:00
Alex Chi Z.	63b3491c1b	refactor(pageserver): remove aux v1 code path (#9424 ) Part of the aux v1 retirement https://github.com/neondatabase/neon/issues/8623 ## Summary of changes Remove write/read path for aux v1, but keeping the config item and the index part field for now. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-17 17:22:44 +01:00
Erik Grinaker	4c9835f4a3	storage_controller: delete stale shards when deleting tenant (#9333 ) ## Problem Tenant deletion only removes the current shards from remote storage. Any stale parent shards (before splits) will be left behind. These shards are kept since child shards may reference data from the parent until new image layers are generated. ## Summary of changes * Document a special case for pageserver tenant deletion that deletes all shards in remote storage when given an unsharded tenant ID, as well as any unsharded tenant data. * Pass an unsharded tenant ID to delete all remote storage under the tenant ID prefix. * Split out `RemoteStorage::delete_prefix()` to delete a bucket prefix, with additional test coverage. * Add a `delimiter` argument to `asset_prefix_empty()` to support partial prefix matches (i.e. all shards starting with a given tenant ID).	2024-10-17 14:34:51 +00:00
Alex Chi Z.	f3a3eefd26	feat(pageserver): do space check before gc-compaction (#9250 ) part of https://github.com/neondatabase/neon/issues/9114 ## Summary of changes gc-compaction may take a lot of disk space, and if it does, the caller should do a partial gc-compaction. This patch adds space check for the compaction job. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-17 10:29:53 -04:00
Arpad Müller	35e7d91bc9	Add config variable for timeline offloading (#9421 ) Adds a configuration variable for timeline offloading support. The added pageserver-global config option controls whether the pageserver automatically offloads timelines during compaction. Therefore, already offloaded timelines are not affected by this, nor is the manual testing endpoint. This allows the rollout of timeline offloading to be driven by the storage team. Part of #8088	2024-10-17 12:07:58 +00:00
Arpad Müller	55b246085e	Activate timelines during unoffload (#9399 ) The current code has forgotten to activate timelines during unoffload, leading to inability to receive the basebackup, due to the timeline still being in loading state. ``` stderr: command failed: compute startup failed: failed to get basebackup@0/0 from pageserver postgresql://no_user@localhost:15014 Caused by: 0: db error: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading 1: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading ``` Therefore, also activate the timeline during unoffloading. Part of #8088	2024-10-16 16:47:17 +02:00
Arpad Müller	3140c14d60	Remove allow(clippy::unknown_lints) (#9416 ) the lint stabilized in 1.80.	2024-10-16 16:28:55 +02:00
John Spray	89a65a9e5a	pageserver: improve handling of archival_config calls during Timeline shutdown (#9415 ) ## Problem In test `test_timeline_offloading`, we see failures like: ``` PageserverApiException: queue is in state Stopped ``` Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/main/11356917668/index.html#testresult/ff0e348a78a974ee/retries ## Summary of changes - Amend code paths that handle errors from RemoteTimelineClient to check for cancellation and emit the Cancelled error variant in these cases (will give clients a 503 to retry) - Remove the implicit `#[from]` for the Other error case, to make it harder to add code that accidentally squashes errors into this (500-equivalent) error variant. This would be neater if we made RemoteTimelineClient return a structured error instead of anyhow::Error, but that's a bigger refactor. I'm not sure if the test really intends to hit this path, but the error handling fix makes sense either way.	2024-10-16 13:39:58 +01:00

1 2 3 4 5 ...

2518 Commits