rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-23 16:10:37 +00:00

Author	SHA1	Message	Date
Arpad Müller	920040e402	Update storage components to edition 2024 (#10919 ) Updates storage components to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. The PR has two commits: * the first commit updates storage crates to edition 2024 and appeases `cargo clippy` by changing code. i have accidentially ran the formatter on some files that had other edits. * the second commit performs a `cargo fmt` I would recommend a closer review of the first commit and a less close review of the second one (as it just runs `cargo fmt`). part of https://github.com/neondatabase/neon/issues/10918	2025-02-25 23:51:37 +00:00
Alex Chi Z.	5c76e2a983	fix(storage-scrubber): ignore errors if index_part is not consistent (#10304 ) ## Problem Consider the pageserver is doing the following sequence of operations: * upload X files * update index_part to add X and remove Y * delete Y files When storage scrubber obtains the initial timeline snapshot before "update index_part" (that is the old version that contains Y but not X), and then obtains the index_part file after it gets updated, it will report all Y files are missing. ## Summary of changes Do not report layer file missing if index_part listed and downloaded are not the same (i.e. different last_modified times) Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-07 23:24:17 +00:00
Arpad Müller	d1ab7471e2	Fix desc_str for Azure container (#10021 ) Small logs fix I've noticed while working on https://github.com/neondatabase/cloud/issues/19963 .	2024-12-05 20:51:57 +00:00
Arpad Müller	9b6af2bcad	Add the ability to configure GenericRemoteStorage for the scrubber (#9652 ) Earlier work (#7547) has made the scrubber internally generic, but one could only configure it to use S3 storage. This is the final piece to make (most of, snapshotting still requires S3) the scrubber be able to be configured via GenericRemoteStorage. I.e. you can now set an env var like: ``` REMOTE_STORAGE_CONFIG='remote_storage = { bucket_name = "neon-dev-safekeeper-us-east-2d", bucket_region = "us-east-2" } ``` and the scrubber will read it instead.	2024-11-18 21:01:48 +00:00
Erik Grinaker	37158d0424	pageserver: use conditional GET for secondary tenant heatmaps (#9236 ) ## Problem Secondary tenant heatmaps were always downloaded, even when they hadn't changed. This can be avoided by using a conditional GET request passing the `ETag` of the previous heatmap. ## Summary of changes The `ETag` was already plumbed down into the heatmap downloader, and just needed further plumbing into the remote storage backends. * Add a `DownloadOpts` struct and pass it to `RemoteStorage::download()`. * Add an optional `DownloadOpts::etag` field, which uses a conditional GET and returns `DownloadError::Unmodified` on match.	2024-10-04 12:29:48 +02:00
Alex Chi Z.	ecfa3d9de9	fix(storage-scrubber): wrong trial condition (#8905 ) ref https://github.com/neondatabase/neon/issues/8872 ## Summary of changes We saw stuck storage scrubber in staging caused by infinite retries. I believe here we should use `min` instead of `max` to avoid getting minutes or hours of retry backoff. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-03 21:39:56 +00:00
Alex Chi Z.	63a0d0d039	fix(storage-scrubber): make retry error into warnings (#8851 ) We get many HTTP connect timeout errors from scrubber logs, and it turned out that the scrubber is retrying, and this is not an actual error. In the future, we should revisit all places where we log errors in the storage scrubber, and only error when necessary (i.e., errors that might need manual fixing) Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-08-28 13:39:21 -04:00
Arpad Müller	4b26783c94	scrubber: remove _generic postfix and two unused functions (#8761 ) Removes the `_generic` postfix from the `GenericRemoteStorage` using APIs, as `remote_storage` is the "default" now, and add a `_s3` postfix to the remaining APIs using the S3 SDK (only in tenant snapshot). Also, remove two unused functions: `list_objects_with_retries` and `stream_tenants functions`. Part of https://github.com/neondatabase/neon/issues/7547	2024-08-19 23:58:47 +02:00
Arpad Müller	3b8ca477ab	Migrate physical GC and scan_metadata to remote_storage (#8673 ) Migrates most of the remaining parts of the scrubber to remote_storage: * `pageserver_physical_gc` * `scan_metadata` for pageservers (safekeepers were done in #8595) * `download()` in `tenant_snapshot`. The main `tenant_snapshot` is not migrated as it uses version history to be able to work in the face of ongoing changes. Part of #7547	2024-08-19 16:39:44 +02:00
John Spray	c53799044d	pageserver: refine how we delete timelines after shard split (#8436 ) ## Problem Previously, when we do a timeline deletion, shards will delete layers that belong to an ancestor. That is not a correctness issue, because when we delete a timeline, we're always deleting it from all shards, and destroying data for that timeline is clearly fine. However, there exists a race where one shard might start doing this deletion while another shard has not yet received the deletion request, and might try to access an ancestral layer. This creates ambiguity over the "all layers referenced by my index should always exist" invariant, which is important to detecting and reporting corruption. Now that we have a GC mode for clearing up ancestral layers, we can rely on that to clean up such layers, and avoid deleting them right away. This makes things easier to reason about: there are now no cases where a shard will delete a layer that belongs to a ShardIndex other than itself. ## Summary of changes - Modify behavior of RemoteTimelineClient::delete_all - Add `test_scrubber_physical_gc_timeline_deletion` to exercise this case - Tweak AWS SDK config in the scrubber to enable retries. Motivated by seeing the test for this feature encounter some transient "service error" S3 errors (which are probably nothing to do with the changes in this PR)	2024-08-02 08:00:46 +01:00
Arpad Müller	939d50a41c	storage_scrubber: migrate FindGarbage to remote_storage (#8548 ) Uses the newly added APIs from #8541 named `stream_tenants_generic` and `stream_objects_with_retries` and extends them with `list_objects_with_retries_generic` and `stream_tenant_timelines_generic` to migrate the `find-garbage` command of the scrubber to `GenericRemoteStorage`. Part of https://github.com/neondatabase/neon/issues/7547	2024-07-31 18:24:42 +00:00
Yuchen Liang	85bef9f05d	feat(scrubber): post `scan_metadata` results to storage controller (#8502 ) Part of #8128, followup to #8480. closes #8421. Enable scrubber to optionally post metadata scan health results to storage controller. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-30 16:07:34 +01:00
Arpad Müller	9fabdda2dc	scrubber: add remote_storage based listing APIs and use them in find-large-objects (#8541 ) Add two new functions `stream_objects_with_retries` and `stream_tenants_generic` and use them in the `find-large-objects` subcommand, migrating it to `remote_storage`. Also adds the `size` field to the `ListingObject` struct. Part of #7547	2024-07-30 09:00:37 +00:00
Arpad Müller	204bb8faa3	Start using remote_storage in S3 scrubber for PurgeGarbage (#7932 ) Starts using the `remote_storage` crate in the S3 scrubber for the `PurgeGarbage` subcommand. The `remote_storage` crate is generic over various backends and thus using it gives us the ability to run the scrubber against all supported backends. Start with the `PurgeGarbage` subcommand as it doesn't use `stream_tenants`. Part of #7547.	2024-07-22 14:49:30 +01:00
John Spray	44781518d0	storage scrubber: GC ancestor shard layers (#8196 ) ## Problem After a shard split, the pageserver leaves the ancestor shard's content in place. It may be referenced by child shards, but eventually child shards will de-reference most ancestor layers as they write their own data and do GC. We would like to eventually clean up those ancestor layers to reclaim space. ## Summary of changes - Extend the physical GC command with `--mode=full`, which includes cleaning up unreferenced ancestor shard layers - Add test `test_scrubber_physical_gc_ancestors` - Remove colored log output: in testing this is irritating ANSI code spam in logs, and in interactive use doesn't add much. - Refactor storage controller API client code out of storcon_client into a `storage_controller/client` crate - During physical GC of ancestors, call into the storage controller to check that the latest shards seen in S3 reflect the latest state of the tenant, and there is no shard split in progress.	2024-07-19 19:07:59 +03:00
Alex Chi Z	b1fe8259b4	fix(storage-scrubber): use default AWS authentication (#8299 ) part of https://github.com/neondatabase/cloud/issues/14024 close https://github.com/neondatabase/neon/issues/7665 Things running in k8s container use this authentication: https://docs.aws.amazon.com/sdkref/latest/guide/feature-container-credentials.html while we did not configure the client to use it. This pull request simply uses the default s3 client credential chain for storage scrubber. It might break compatibility with minio. ## Summary of changes * Use default AWS credential provider chain. * Improvements for s3 errors, we now have detailed errors and correct backtrace on last trial of the operation. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-09 18:41:37 +01:00
Alex Chi Z	73fa3c014b	chore(storage-scrubber): allow disable file logging (#8297 ) part of https://github.com/neondatabase/cloud/issues/14024, k8s does not always have a volume available for logging, and I'm running into weird permission errors... While I could spend time figuring out how to create temp directories for logging, I think it would be better to just disable file logging as k8s containers are ephemeral and we cannot retrieve anything on the fs after the container gets removed. ## Summary of changes `PAGESERVER_DISABLE_FILE_LOGGING=1` -> file logging disabled Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-09 17:11:37 +01:00
Arpad Müller	e579bc0819	Add find-large-objects subcommand to scrubber (#8257 ) Adds a find-large-objects subcommand to the scrubber to allow listing layer objects larger than a specific size. To be used like: ``` AWS_PROFILE=dev REGION=us-east-2 BUCKET=neon-dev-storage-us-east-2 cargo run -p storage_scrubber -- find-large-objects --min-size 250000000 --ignore-deltas ``` Part of #5431	2024-07-04 15:07:16 +00:00
Arpad Müller	27518676d7	Rename S3 scrubber to storage scrubber (#8013 ) The S3 scrubber contains "S3" in its name, but we want to make it generic in terms of which storage is used (#7547). Therefore, rename it to "storage scrubber", following the naming scheme of already existing components "storage broker" and "storage controller". Part of #7547	2024-06-11 22:45:22 +00:00

19 Commits