rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-19 14:10:37 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	d9a57aeed9	storcon: deny external node configuration if an operation is ongoing (#8727 ) Per #8674, disallow node configuration while drain/fill are ongoing. Implement it by adding a only-http wrapper `Service::external_node_configure` which checks for operation existing before configuring. Additionally: - allow cancelling drain/fill after a pageserver has restarted and transitioned to WarmingUp Fixes: #8674	2024-08-15 10:54:05 +01:00
Vlad Lazar	fef77b0cc9	safekeeper: consider partial uploads when pulling timeline (#8628 ) ## Problem The control file contains the id of the safekeeper that uploaded it. Previously, when sending a snapshot of the control file to another sk, it would eventually be gc-ed by the receiving sk. This is incorrect because the original sk might still need it later. ## Summary of Changes When sending a snapshot and the control file contains an uploaded segment: * Create a copy of the segment in s3 with the destination sk in the object name * Tweak the streamed control file to point to the object create in the previous step Note that the snapshot endpoint now has to know the id of the requestor, so the api has been extended to include the node if of the destination sk. Closes https://github.com/neondatabase/neon/issues/8542	2024-08-15 09:02:33 +01:00
Joonas Koivunen	485d76ac62	timeline_detach_ancestor: adjust error handling (#8528 ) With additional phases from #8430 the `detach_ancestor::Error` became untenable. Split it up into phases, and introduce laundering for remaining `anyhow::Error` to propagate them as most often `Error::ShuttingDown`. Additionally, complete FIXMEs. Cc: #6994	2024-08-14 10:16:18 +01:00
John Spray	4049d2b7e1	scrubber: fix spurious "Missed some shards" errors (#8661 ) ## Problem The storage scrubber was reporting warnings for lots of timelines like: ``` WARN Missed some shards at count ShardCount(0) tenant_id=25eb7a83d9a2f90ac0b765b6ca84cf4c ``` These were spurious: these tenants are fine. There was a bug in accumulating the ShardIndex for each tenant, whereby multiple timelines would lead us to add the same ShardIndex more than one. Closes: #8646 ## Summary of changes - Accumulate ShardIndex in a BTreeSet instead of a Vec - Extend the test to reproduce the issue	2024-08-14 09:29:06 +01:00
Konstantin Knizhnik	7a1736ddcf	Preserve HEAP_COMBOCID when restoring t_cid from WAL (#8503 ) ## Problem See https://github.com/neondatabase/neon/issues/8499 ## Summary of changes Save HEAP_COMBOCID flag in WAL and do not clear it in redo handlers. Related Postgres PRs: https://github.com/neondatabase/postgres/pull/457 https://github.com/neondatabase/postgres/pull/458 https://github.com/neondatabase/postgres/pull/459 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-14 08:13:20 +03:00
Tristan Partin	c624317b0e	Decode the database name in SQL/HTTP connections A url::Url does not hand you back a URL decoded value for path values, so we must decode them ourselves. Link: https://docs.rs/url/2.5.2/url/struct.Url.html#method.path Link: https://docs.rs/url/2.5.2/url/struct.Url.html#method.path_segments Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-08-13 16:32:58 -05:00
Joonas Koivunen	6d6e2c6a39	feat(detach_ancestor): better retries with persistent gc blocking (#8430 ) With the persistent gc blocking, we can now retry reparenting timelines which had failed for whatever reason on the previous attempt(s). Restructure the detach_ancestor into three phases: - prepare (insert persistent gc blocking, copy lsn prefix, layers) - detach and reparent - reparenting can fail, so we might need to retry this portion - complete (remove persistent gc blocking) Cc: #6994	2024-08-13 18:51:51 +01:00
Joonas Koivunen	ae6e27274c	refactor(test): unify how we clear shared buffers (#8634 ) so that we can easily plug in LFC clearing as well. Private discussion reference: <https://neondb.slack.com/archives/C033A2WE6BZ/p1722942856987979>	2024-08-13 20:14:42 +03:00
Joonas Koivunen	8f170c5105	fix: make compaction more sensitive to cancellation (#8706 ) A few of the benchmarks have started failing after #8655 where they are waiting for compactor task. Reads done by image layer creation should already be cancellation sensitive because vectored get does a check each time, but try sprinkling additional cancellation points to: - each partition - after each vectored read batch	2024-08-13 18:00:54 +01:00
Sasha Krassovsky	32aa1fc681	Add on-demand WAL download to slot funcs (#8705 ) ## Problem Currently we can have an issue where if someone does `pg_logical_slot_advance`, it could fail because it doesn't have the WAL locally. ## Summary of changes Adds on-demand WAL download and a test to these slot funcs. Before adding these, the test fails with ``` requested WAL segment pg_wal/000000010000000000000001 has already been removed ``` After the changes, the test passes Relies on: - https://github.com/neondatabase/postgres/pull/466 - https://github.com/neondatabase/postgres/pull/467 - https://github.com/neondatabase/postgres/pull/468	2024-08-12 20:54:42 -08:00
Joonas Koivunen	9dc9a9b2e9	test: do graceful shutdown by default (#8655 ) It should give us all possible allowed_errors more consistently. While getting the workflows to pass on https://github.com/neondatabase/neon/pull/8632 it was noticed that allowed_errors are rarely hit (1/4). This made me realize that we always do an immediate stop by default. Doing a graceful shutdown would had made the draining more apparent and likely we would not have needed the #8632 hotfix. Downside of doing this is that we will see more timeouts if tests are randomly leaving pause failpoints which fail the shutdown. The net outcome should however be positive, we could even detect too slow shutdowns caused by a bug or deadlock.	2024-08-12 15:37:15 +03:00
Arseny Sher	a4eea5025c	Fix logical apply worker reporting of flush_lsn wrt sync replication. It should take syncrep flush_lsn into account because WAL before it on endpoint restart is lost, which makes replication miss some data if slot had already been advanced too far. This commit adds test reproducing the issue and bumps vendor/postgres to commit with the actual fix.	2024-08-12 13:14:02 +03:00
Vlad Lazar	f5cef7bf7f	storcon: skip draining shard if it's secondary is lagging too much (#8644 ) ## Problem Migrations of tenant shards with cold secondaries are holding up drains in during production deployments. ## Summary of changes If a secondary locations is lagging by more than 256MiB (configurable, but that's the default), then skip cutting it over to the secondary as part of the node drain.	2024-08-09 15:45:07 +01:00
Alex Chi Z.	a155914c1c	fix(neon): disable create tablespace stmt (#8657 ) part of https://github.com/neondatabase/neon/issues/8653 Disable create tablespace stmt. It turns out it requires much less effort to do the regress test mode flag than patching the test cases, and given that we might need to support tablespaces in the future, I decided to add a new flag `regress_test_mode` to change the behavior of create tablespace. Tested manually that without setting regress_test_mode, create tablespace will be rejected. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-09 09:18:55 +01:00
Conrad Ludgate	7e08fbd1b9	Revert "proxy: update tokio-postgres to allow arbitrary config params (#8076 )" (#8654 ) This reverts #8076 - which was already reverted from the release branch since forever (it would have been a breaking change to release for all users who currently set TimeZone options). It's causing conflicts now so we should revert it here as well.	2024-08-09 09:09:29 +01:00
Joonas Koivunen	8561b2c628	fix: stop leaking BackgroundPurges (#8650 ) avoid "leaking" the completions of BackgroundPurges by: 1. switching it to TaskTracker for provided close+wait 2. stop using tokio::fs::remove_dir_all which will consume two units of memory instead of one blocking task Additionally, use more graceful shutdown in tests which do actually some background cleanup.	2024-08-08 12:02:53 +01:00
Konstantin Knizhnik	cbe8c77997	Use sycnhronous commit for logical replicaiton worker (#8645 ) ## Problem See https://neondb.slack.com/archives/C03QLRH7PPD/p1723038557449239?thread_ts=1722868375.476789&cid=C03QLRH7PPD Logical replication subscription by default use `synchronous_commit=off` which cause problems with safekeeper ## Summary of changes Set `synchronous_commit=on` for logical replication subscription in test_subscriber_restart.py ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-08-08 10:23:57 +03:00
Joonas Koivunen	05dd1ae9e0	fix: drain completed page_service connections (#8632 ) We've noticed increased memory usage with the latest release. Drain the joinset of `page_service` connection handlers to avoid leaking them until shutdown. An alternative would be to use a TaskTracker. TaskTracker was not discussed in original PR #8339 review, so not hot fixing it in here either.	2024-08-07 17:14:45 +00:00
Joonas Koivunen	a81fab4826	refactor(timeline_detach_ancestor): replace ordered reparented with a hashset (#8629 ) Earlier I was thinking we'd need a (ancestor_lsn, timeline_id) ordered list of reparented. Turns out we did not need it at all. Replace it with an unordered hashset. Additionally refactor the reparented direct children query out, it will later be used from more places. Split off from #8430. Cc: #6994	2024-08-07 18:19:00 +02:00
Yuchen Liang	ed5724d79d	scrubber: clean up `scan_metadata` before prod (#8565 ) Part of #8128. ## Problem Currently, scrubber `scan_metadata` command will return with an error code if the metadata on remote storage is corrupted with fatal errors. To safely deploy this command in a cronjob, we want to differentiate between failures while running scrubber command and the erroneous metadata. At the same time, we also want our regression tests to catch corrupted metadata using the scrubber command. ## Summary of changes - Return with error code only when the scrubber command fails - Uses explicit checks on errors and warnings to determine metadata health in regression tests. Resolve conflict with `tenant-snapshot` command (after shard split): [`test_scrubber_tenant_snapshot`](https://github.com/neondatabase/neon/blob/yuchen/scrubber-scan-cleanup-before-prod/test_runner/regress/test_storage_scrubber.py#L23) failed before applying `422a8443dd` - When taking a snapshot, the old `index_part.json` in the unsharded tenant directory is not kept. - The current `list_timeline_blobs` implementation consider no `index_part.json` as a parse error. - During the scan, we are only analyzing shards with highest shard count, so we will not get a parse error. but we do need to add the layers to tenant object listing, otherwise we will get index is referencing a layer that is not in remote storage error. - Action: Add s3_layers from `list_timeline_blobs` regardless of parsing error Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-06 18:55:42 +01:00
John Spray	3727c6fbbe	pageserver: use layer visibility when composing heatmap (#8616 ) ## Problem Sometimes, a layer is Covered by hasn't yet been evicted from local disk (e.g. shortly after image layer generation). It is not good use of resources to download these to a secondary location, as there's a good chance they will never be read. This follows the previous change that added layer visibility: - #8511 Part of epic: - https://github.com/neondatabase/neon/issues/8398 ## Summary of changes - When generating heatmaps, only include Visible layers - Update test_secondary_downloads to filter to visible layers when listing layers from an attached location	2024-08-06 17:15:40 +01:00
Joonas Koivunen	138f008bab	feat: persistent gc blocking (#8600 ) Currently, we do not have facilities to persistently block GC on a tenant for whatever reason. We could do a tenant configuration update, but that is risky for generation numbers and would also be transient. Introduce a `gc_block` facility in the tenant, which manages per timeline blocking reasons. Additionally, add HTTP endpoints for enabling/disabling manual gc blocking for a specific timeline. For debugging, individual tenant status now includes a similar string representation logged when GC is skipped. Cc: #6994	2024-08-06 10:09:56 +01:00
Joonas Koivunen	c32807ac19	fix: allow awaiting logical size for root timelines (#8604 ) Currently if `GET /v1/tenant/x/timeline/y?force-await-initial-logical-size=true` is requested for a root timeline created within the current pageserver session, the request handler panics hitting the debug assertion. These timelines will always have an accurate (at initdb import) calculated logical size. Fix is to never attempt prioritizing timeline size calculation if we already have an exact value. Split off from #8528.	2024-08-05 21:21:33 +01:00
John Spray	0a667bc8ef	tests: add test_historic_storage_formats (#8423 ) ## Problem Currently, our backward compatibility tests only look one release back. That means, for example, that when we switch on image layer compression by default, we'll test reading of uncompressed layers for one release, and then stop doing it. When we make an index_part.json format change, we'll test against the old format for a week, then stop (unless we write separate unit tests for each old format). The reality in the field is that data in old formats will continue to exist for weeks/months/years. When we make major format changes, we should retain examples of the old format data, and continuously verify that the latest code can still read them. This test uses contents from a new path in the public S3 bucket, `compatibility-data-snapshots/`. It is populated by hand. The first important artifact is one from before we switch on compression, so that we will keep testing reads of uncompressed data. We will generate more artifacts ahead of other key changes, like when we update remote storage format for archival timelines. Closes: https://github.com/neondatabase/cloud/issues/15576	2024-08-02 18:28:23 +01:00
Arpad Müller	8c828c586e	Wait for completion of the upload queue in flush_frozen_layer (#8550 ) Makes `flush_frozen_layer` add a barrier to the upload queue and makes it wait for that barrier to be reached until it lets the flushing be completed. This gives us backpressure and ensures that writes can't build up in an unbounded fashion. Fixes #7317	2024-08-02 13:07:12 +02:00
John Spray	c53799044d	pageserver: refine how we delete timelines after shard split (#8436 ) ## Problem Previously, when we do a timeline deletion, shards will delete layers that belong to an ancestor. That is not a correctness issue, because when we delete a timeline, we're always deleting it from all shards, and destroying data for that timeline is clearly fine. However, there exists a race where one shard might start doing this deletion while another shard has not yet received the deletion request, and might try to access an ancestral layer. This creates ambiguity over the "all layers referenced by my index should always exist" invariant, which is important to detecting and reporting corruption. Now that we have a GC mode for clearing up ancestral layers, we can rely on that to clean up such layers, and avoid deleting them right away. This makes things easier to reason about: there are now no cases where a shard will delete a layer that belongs to a ShardIndex other than itself. ## Summary of changes - Modify behavior of RemoteTimelineClient::delete_all - Add `test_scrubber_physical_gc_timeline_deletion` to exercise this case - Tweak AWS SDK config in the scrubber to enable retries. Motivated by seeing the test for this feature encounter some transient "service error" S3 errors (which are probably nothing to do with the changes in this PR)	2024-08-02 08:00:46 +01:00
Yuchen Liang	85bef9f05d	feat(scrubber): post `scan_metadata` results to storage controller (#8502 ) Part of #8128, followup to #8480. closes #8421. Enable scrubber to optionally post metadata scan health results to storage controller. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-30 16:07:34 +01:00
Yuchen Liang	e374d6778e	feat(storcon): store scrubber metadata scan result (#8480 ) Part of #8128, followed by #8502. ## Problem Currently we lack mechanism to alert unhealthy `scan_metadata` status if we start running this scrubber command as part of a cronjob. With the storage controller client introduced to storage scrubber in #8196, it is viable to set up alert by storing health status in the storage controller database. We intentionally do not store the full output to the database as the json blobs potentially makes the table really huge. Instead, only a health status and a timestamp recording the last time metadata health status is posted on a tenant shard. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-30 14:32:00 +01:00
Joonas Koivunen	bdfc9ca7e9	test: deflake test_duplicate_creation (#8536 ) By including comparison of `remote_consistent_lsn_visible` we risk flakyness coming from outside of timeline creation. Mask out the `remote_consistent_lsn_visible` for the comparison. Evidence: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8489/10142336315/index.html#suites/ffbb7f9930a77115316b58ff32b7c719/89ff0270bf58577a	2024-07-29 13:41:06 +01:00
Alexander Bayandin	6cad0455b0	CI(test_runner): Upload all test artifacts if preserve_database_files is enabled (#7990 ) ## Problem There's a `NeonEnvBuilder#preserve_database_files` parameter that allows you to keep database files for debugging purposes (by default, files get cleaned up), but there's no way to get these files from a CI run. This PR adds handling of `NeonEnvBuilder#preserve_database_files` and adds the compressed test output directory to Allure reports (for tests with this parameter enabled). Ref https://github.com/neondatabase/neon/issues/6967 ## Summary of changes - Compress and add the whole test output directory to Allure reports - Currently works only with `neon_env_builder` fixture - Remove `preserve_database_files = True` from sharding tests as unneeded --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-27 20:01:10 +01:00
Christian Schwarz	8154e88732	refactor(layer load API): all errors are permanent (#8527 ) I am not aware of a case of "transient" VirtualFile errors as mentioned in https://github.com/neondatabase/neon/pull/5880 Private DM with Joonas discussing this: https://neondb.slack.com/archives/D049K7HJ9JM/p1721836424615799	2024-07-26 15:48:44 +01:00
Vlad Lazar	7a796a9963	storcon: introduce step down primitive (#8512 ) ## Problem We are missing the step-down primitive required to implement rolling restarts of the storage controller. ## Summary of changes Add `/control/v1/step_down` endpoint which puts the storage controller into a state where it rejects all API requests apart from `/control/v1/step_down`, `/status` and `/metrics`. When receiving the request, storage controller cancels all pending reconciles and waits for them to exit gracefully. The response contains a snapshot of the in-memory observed state. Related: * https://github.com/neondatabase/cloud/issues/14701 * https://github.com/neondatabase/neon/issues/7797 * https://github.com/neondatabase/neon/pull/8310	2024-07-26 14:54:09 +01:00
Vlad Lazar	cdaa2816e7	pageserver: make vectored get the default read path for the pageserver (#8384 ) ## Problem Vectored get is already enabled in all prod regions without validation. The pageserver defaults are out of sync however. ## Summary of changes Update the pageserver defaults to match the prod config. Also means that when running tests locally, people don't have to use the env vars to get the prod config.	2024-07-26 14:19:52 +01:00
John Spray	65868258d2	tests: checkpoint instead of compact in test_sharding_split_compaction (#8473 ) ## Problem This test relies on writing image layers before the split. It can fail to do so durably if the image layers are written ahead of the remote consistent LSN, so we should have been doing a checkpoint rather than just a compaction	2024-07-26 11:03:44 +01:00
John Spray	775c0c8892	tests: adjust threshold in test_partial_evict_tenant (#8509 ) ## Problem This test was destabilized by https://github.com/neondatabase/neon/pull/8431. The threshold is arbitrary & failures are still quite close to it. At a high level the test is asserting "eviction was approximately fair to these tenants", which appears to still be the case when the abs diff between ratios is slightly higher at ~0.6-0.7. ## Summary of changes - Change threshold from 0.06 to 0.065. Based on the last ~10 failures that should be sufficient.	2024-07-25 15:00:42 +01:00
John Spray	24ea9f9f60	tests: always scrub on test exit when using S3Storage (#8437 ) ## Problem Currently, tests may have a scrub during teardown if they ask for it, but most tests don't request it. To detect "unknown unknowns", let's run it at the end of every test where possible. This is similar to asserting that there are no errors in the log at the end of tests. ## Summary of changes - Remove explicit `enable_scrub_on_exit` - Always scrub if remote storage is an S3Storage.	2024-07-25 14:19:38 +01:00
Vlad Lazar	9c5ad21341	storcon: make heartbeats restart aware (#8222 ) ## Problem Re-attach blocks the pageserver http server from starting up. Hence, it can't reply to heartbeats until that's done. This makes the storage controller mark the node off-line (not good). We worked around this by setting the interval after which nodes are marked offline to 5 minutes. This isn't a long term solution. ## Summary of changes * Introduce a new `NodeAvailability` state: `WarmingUp`. This state models the following time interval: * From receiving the re-attach request until the pageserver replies to the first heartbeat post re-attach * The heartbeat delta generator becomes aware of this state and uses a separate longer interval * Flag `max-warming-up-interval` now models the longer timeout and `max-offline-interval` the shorter one to match the names of the states Closes https://github.com/neondatabase/neon/issues/7552	2024-07-25 14:09:12 +01:00
John Spray	f5db655447	pageserver: simplify LayerAccessStats (#8431 ) ## Problem LayerAccessStats contains a lot of detail that we don't use: short histories of most recent accesses, specifics on what kind of task accessed a layer, etc. This is all stored inside a Mutex, which is locked every time something accesses a layer. ## Summary of changes - Store timestamps at a very low resolution (to the nearest second), sufficient for use on the timescales of eviction. - Pack access time and last residence change time into a single u64 - Use the high bits of the u64 for other flags, including the new layer visibility concept. - Simplify the external-facing model for access stats to just include what we now track. Note that the `HistoryBufferWithDropCounter` is removed here because it is no longer used. I do not dislike this type, we just happen not to use it for anything else at present. Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-24 08:17:28 +01:00
Konstantin Knizhnik	563d73d923	Use smgrexists() instead of access() to enforce uniqueness of generated relfilenumber (#7992 ) ## Problem Postgres is using `access()` function in `GetNewRelFileNumber` to check if assigned relfilenumber is not used for any other relation. This check will not work in Neon, because we do not have all files in local storage. ## Summary of changes Use smgrexists() instead which will check at page server if such relfilenode is used. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2024-07-23 18:41:55 +03:00
John Spray	80c8ceacbc	tests: make `test_scrubber_physical_gc_ancestors` more stable (#8453 ) ## Problem This test sometimes found that ancestors were getting cleaned up before it had done any compaction. Compaction was happening implicitly via Workload. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8298/10032173390/index.html#testresult/fb04786402f80822/retries ## Summary of changes - Set upload=False when writing data after shard split, to avoid doing a checkpoint - Add a checkpoint_period & explicit wait for uploads so that we ensure data lands in S3 without doing a checkpoint	2024-07-23 12:57:57 +01:00
Vlad Lazar	35854928d9	pageserver: use identity file as node id authority and remove init command and config-override flags (#7766 ) Ansible will soon write the node id to `identity.toml` in the work dir for new pageservers. On the pageserver side, we read the node id from the identity file if it is present and use that as the source of truth. If the identity file is missing, cannot be read, or does not deserialise, start-up is aborted. This PR also removes the `--init` mode and the `--config-override` flag from the `pageserver` binary. The neon_local is already not using these flags anymore. Ansible still uses them until the linked change is merged & deployed, so, this PR has to land simultaneously or after the Ansible change due to that. Related Ansible change: https://github.com/neondatabase/aws/pull/1322 Cplane change to remove config-override usages: https://github.com/neondatabase/cloud/pull/13417 Closes: https://github.com/neondatabase/neon/issues/7736 Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4 Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-23 11:41:12 +01:00
Konstantin Knizhnik	a868e342d4	Change default version of Neon extensio to 1.4	2024-07-22 17:58:07 +01:00
Yuchen Liang	595c450036	fix(scrubber): more robust metadata consistency checks (#8344 ) Part of #8128. ## Problem Scrubber uses `scan_metadata` command to flag metadata inconsistencies. To trust it at scale, we need to make sure the errors we emit is a reflection of real scenario. One check performed in the scrubber is to see whether layers listed in the latest `index_part.json` is present in object listing. Currently, the scrubber does not robustly handle the case where objects are uploaded/deleted during the scan. ## Summary of changes Condition for success: An object in the index is (1) in the object listing we acquire from S3 or (2) found in a HeadObject request (new object). - Add in the `HeadObject` requests for the layers missing from the object listing. - Keep the order of first getting the object listing and then downloading the layers. - Update check to only consider shards with highest shard count. - Skip analyzing a timeline if `deleted_at` tombstone is marked in `index_part.json`. - Add new test to see if scrubber actually detect the metadata inconsistency. _Misc_ - A timeline with no ancestor should always have some layers. - Removed experimental histograms _Caveat_ - Ancestor layer is not cleaned until #8308 is implemented. If ancestor layers reference non-existing layers in the index, the scrubber will emit false positives. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-07-22 14:53:33 +01:00
John Spray	8d948f2e07	tests: make test_change_pageserver more robust (#8442 ) ## Problem This test predates the storage controller. It stops pageservers and reconfigures computes, but that races with the storage controller's node failure detection, which can result in restarting nodes not getting the attachments they expect, and the test failing ## Summary of changes - Configure the storage controller to use a compute notify hook that does nothing, so that it cannot interfere with the test's configuration of computes. - Instead of using the attach hook, just notify the storage controller that nodes are offline, and reconcile tenants so that they will automatically be attached to the other node.	2024-07-22 14:17:02 +01:00
John Spray	98af1e365b	pageserver: remove absolute-order disk usage eviction (#8454 ) ## Problem Deployed pageserver configurations are all like this: ``` disk_usage_based_eviction: max_usage_pct: 85 min_avail_bytes: 0 period: "10s" eviction_order: type: "RelativeAccessed" args: highest_layer_count_loses_first: true ``` But we're maintaining this optional absolute order eviction, with test cases etc. ## Summary of changes - Remove absolute order eviction. Make the default eviction policy the same as how we really deploy pageservers.	2024-07-22 13:15:55 +01:00
John Spray	a4fa250c92	tests: longer timeouts in test_timeline_deletion_with_files_stuck_in_upload_queue (#8438 ) ## Problem This test had two locations with 2 second timeouts, which is rather low when we run on a highly contended test machine running lots of tests in parallel. It usually passes, but today I've seen both of these locations time out on separate PRs. Example failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8432/10007868041/index.html#suites/837740b64a53e769572c4ed7b7a7eeeb/6c6a092be083d27c ## Summary of changes - Change 2 second timeouts to 20 second timeouts	2024-07-19 19:30:28 +02:00
John Spray	44781518d0	storage scrubber: GC ancestor shard layers (#8196 ) ## Problem After a shard split, the pageserver leaves the ancestor shard's content in place. It may be referenced by child shards, but eventually child shards will de-reference most ancestor layers as they write their own data and do GC. We would like to eventually clean up those ancestor layers to reclaim space. ## Summary of changes - Extend the physical GC command with `--mode=full`, which includes cleaning up unreferenced ancestor shard layers - Add test `test_scrubber_physical_gc_ancestors` - Remove colored log output: in testing this is irritating ANSI code spam in logs, and in interactive use doesn't add much. - Refactor storage controller API client code out of storcon_client into a `storage_controller/client` crate - During physical GC of ancestors, call into the storage controller to check that the latest shards seen in S3 reflect the latest state of the tenant, and there is no shard split in progress.	2024-07-19 19:07:59 +03:00
Christian Schwarz	16071e57c6	pageserver: remove obsolete cached_metric_collection_interval (#8370 ) We're removing the usage of this long-meaningless config field in https://github.com/neondatabase/aws/pull/1599 Once that PR has been deployed to staging and prod, we can merge this PR.	2024-07-19 17:01:02 +01:00
Arpad Müller	c96e8012ce	Enable zstd in tests (#8368 ) Successor of #8288 , just enable zstd in tests. Also adds a test that creates easily compressable data. Part of #5431 --------- Co-authored-by: John Spray <john@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-07-18 19:09:57 +01:00
John Spray	9ded2556df	tests: increase test_pg_regress and test_isolation timeouts (#8418 ) ## Problem These tests time out ~1 in 50 runs when in debug mode. There is no indication of a real issue: they're just wrappers that have large numbers of individual tests contained within on pytest case. ## Summary of changes - Bump pg_regress timeout from 600 to 900s - Bump test_isolation timeout from 300s (default) to 600s In future it would be nice to break out these tests to run individual cases (or batches thereof) as separate tests, rather than this monolith.	2024-07-18 10:23:17 +01:00

1 2 3 4 5 ...

863 Commits