neon/regress at 472cc17b7aba4f78bc7a71a2c04d2e7cb8b696d8 - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 14:02:55 +00:00

Files

History

Dmitry Rodionov 472cc17b7a propagate lock guard to background deletion task (#4495 )

## Problem

1. During the rollout we got a panic: "timeline that we were deleting
was concurrently removed from 'timelines' map" that was caused by lock
guard not being propagated to the background part of the deletion.
Existing test didnt catch it because failpoint that was used for
verification was placed earlier prior to background task spawning.
2. When looking at surrounding code one more bug was detected. We
removed timeline from the map before deletion is finished, which breaks
client retry logic, because it will indicate 404 before actual deletion
is completed which can lead to client stopping its retry poll earlier.

## Summary of changes

1. Carry the lock guard over to background deletion. Ensure existing
test case fails without applied patch (second deletion becomes stuck
without it, which eventually leads to a test failure).
2. Move delete_all call earlier so timeline is removed from the map is
the last thing done during deletion.

Additionally I've added timeline_id to the `update_gc_info` span,
because `debug_assert_current_span_has_tenant_and_timeline_id` in
`download_remote_layer` was firing when `update_gc_info` lead to
on-demand downloads via `find_lsn_for_timestamp` (caught by @problame).
This is not directly related to the PR but fixes possible flakiness.

Another smaller set of changes involves deletion wrapper used in python
tests. Now there is a simpler wrapper that waits for deletions to
complete `timeline_delete_wait_completed`. Most of the
test_delete_timeline.py tests make negative tests, i.e., "does
ps_http.timeline_delete() fail in this and that scenario".
These can be left alone. Other places when we actually do the deletions,
we need to use the helper that polls for completion.

Discussion
https://neondb.slack.com/archives/C03F5SM1N02/p1686668007396639

resolves #4496

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>

2023-06-15 17:30:12 +03:00

..

test_ancestor_branch.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_attach_tenant_config.py

test: fix flaky warning on attach (#4415 )

2023-06-05 18:12:58 +03:00

test_auth.py

Make new tenant/timeline IDs mandatory in create APIs. (#4304 )

2023-05-26 16:19:36 +03:00

test_backpressure.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_basebackup_error.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_branch_and_gc.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_branch_behind.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_branching.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_broken_timeline.py

refactor TenantState transitions (#4321 )

2023-05-29 17:52:41 +03:00

test_build_info_metric.py

tests: use parse_metrics everywhere (#3737 )

2023-03-03 14:53:27 +01:00

test_clog_truncate.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_close_fds.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_compatibility.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_config.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_crafted_wal_end.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_createdropdb.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_createuser.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_ddl_forwarding.py

Fix flakiness of test_metric_collection (#4346 )

2023-05-26 00:05:11 +03:00

test_disk_usage_eviction.py

feat: three phased startup order (#4399 )

2023-06-07 14:29:23 +03:00

test_fsm_truncate.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_fullbackup.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_gc_aggressive.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_gc_cutoff.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_hot_standby.py

Un-xfail fixed tests on Postgres 15 (#4275 )

2023-05-18 22:38:33 +01:00

test_import.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_large_schema.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_layer_eviction.py

test_basic_eviction: avoid some sources of flakiness (#4504 )

2023-06-14 19:04:22 +02:00

test_layer_writers_fail.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_logging.py

add libmetric metric for each logged log message (#4055 )

2023-04-25 14:10:18 +02:00

test_lsn_mapping.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_metric_collection.py

Fix flakiness of test_metric_collection (#4346 )

2023-05-26 00:05:11 +03:00

test_multixact.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_neon_cli.py

tests: make neon_fixtures a bit thinner by splitting out some pageserver related helpers (#3977 )

2023-04-07 13:47:28 +03:00

test_neon_local_cli.py

Use compute_ctl to manage Postgres in tests. (#3886 )

2023-06-06 14:59:36 +01:00

test_next_xid.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_normal_work.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_old_request_lsn.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_ondemand_download.py

test: Less flaky gc (#4416 )

2023-06-05 15:43:52 +03:00

test_pageserver_api.py

test_runner: add --pg-version pytest argument (#4037 )

2023-05-05 02:57:47 +03:00

test_pageserver_catchup.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_pageserver_restart.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_pageserver_restarts_under_workload.py

test: Less flaky gc (#4416 )

2023-06-05 15:43:52 +03:00

test_parallel_copy.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_pg_regress.py

Un-xfail fixed tests on Postgres 15 (#4275 )

2023-05-18 22:38:33 +01:00

test_pitr_gc.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_proxy.py

Remove timeout on test_close_on_connections_exit

2023-06-13 06:26:03 +04:00

test_read_trace.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_read_validation.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_readonly_node.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_recovery.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_remote_storage.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_setup.py

Reorganize python tests.

2022-08-30 18:25:38 +03:00

test_sni_router.py

pg_sni_router: add session_id to more messages (#4403 )

2023-06-02 14:59:10 +03:00

test_subxacts.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_tenant_conf.py

test_runner: update dependencies (#4328 )

2023-05-24 12:47:01 +01:00

test_tenant_detach.py

map TenantState::Broken to TenantAttachmentStatus::Failed (#4371 )

2023-06-07 18:25:30 +03:00

test_tenant_relocation.py

upload new timeline index part json before 201 or on retry (#4204 )

2023-05-15 14:16:43 +03:00

test_tenant_size.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_tenant_tasks.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_tenants_with_remote_storage.py

Remove wait_for_sk_commit_lsn_to_reach_remote_storage.

2023-04-26 13:46:33 +04:00

test_tenants.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_threshold_based_eviction.py

eviction: regression test + distinguish layer write from map insert (#4005 )

2023-05-04 16:16:48 +02:00

test_timeline_delete.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_timeline_size.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_truncate.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_twophase.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_unlogged.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_vm_bits.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_wal_acceptor_async.py

Use compute_ctl to manage Postgres in tests. (#3886 )

2023-06-06 14:59:36 +01:00

test_wal_acceptor.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_wal_receiver.py

Use compute_ctl to manage Postgres in tests. (#3886 )

2023-06-06 14:59:36 +01:00

test_wal_restore.py

Run regressions tests on both Postgres 14 and 15 (#4192 )

2023-05-12 15:28:51 +01:00

test_walredo_not_left_behind_on_detach.py

Use compute_ctl to manage Postgres in tests. (#3886 )

2023-06-06 14:59:36 +01:00