neon/regress at 9b8df2634f3a41a0da641aa2ab1e9cab86d1f430 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-18 05:30:37 +00:00

Files

Christian Schwarz b467d8067b fix(test_ondemand_download_timetravel): occasionally fails with WAL timeout during layer creation (#6818 )

refs https://github.com/neondatabase/neon/issues/4112
amends https://github.com/neondatabase/neon/pull/6687

Since my last PR #6687 regarding this test, the type of flakiness that
has been observed has shifted to the beginning of the test, where we
create the layers:

```
timed out while waiting for remote_consistent_lsn to reach 0/411A5D8, was 0/411A5A0
```

[Example Allure
Report](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6789/7932503173/index.html#/testresult/ddb877cfa4062f7d)

Analysis
--------

I suspect there was the following race condition:
- endpoints push out some tiny piece of WAL during their
  endpoints.stop_all()
- that WAL reaches the SK (it's just one SK according to logs)
- the SKs send it into the walreceiver connection
- the SK gets shut down
- the checkpoint is taken, with last_record_lsn = 0/411A5A0
- the PS's walreceiver_connection_handler processes the WAL that was
  sent into the connection by the SKs; this advances
  last_record_lsn to 0/411A5D8
- we get current_lsn = 0/411A5D8
- nothing flushes a layer

Changes
-------

There's no testing / debug interface to shut down / server all
walreceiver connections.
So, this PR restarts pageserver to achieve it.

Also, it lifts the "wait for image layer uploads" further up, so that
after this first
restart, the pageserver really does _nothing_ by itself, and so, the
origianl physical size mismatch issue quoted in #6687 should be fixed.
(My initial suspicion hasn't changed that it was due to the tiny chunk
of endpoint.stop_all() WAL being ingested after the second PS restart.)

2024-02-20 14:09:15 +01:00

data/extension_test/5670669815

Feat/postgres 16 (#4761 )

2023-09-12 15:11:32 +02:00

test_ancestor_branch.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_attach_tenant_config.py

per-TenantShard read throttling (#6706 )

2024-02-16 21:26:59 +01:00

test_auth.py

libs: add 'generations_api' auth scope (#6783 )

2024-02-16 15:53:09 +00:00

test_backpressure.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_bad_connection.py

Add retry to fetching basebackup (#6537 )

2024-02-01 20:50:04 +00:00

test_basebackup_error.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_branch_and_gc.py

build: back to opt-level=0 in debug builds, for faster compile times (#5751 )

2023-11-20 15:41:37 +01:00

test_branch_behind.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_branching.py

pageserver: improved handling of concurrent timeline creations on the same ID (#6139 )

2023-12-15 08:51:23 +00:00

test_broken_timeline.py

control_plane: generalize attachment_service to handle sharding (#6251 )

2024-01-17 18:01:08 +00:00

test_build_info_metric.py

feat: add build_tag env support for set_build_info_metric (#5576 )

2023-10-27 10:47:11 +01:00

test_change_pageserver.py

pageserver: fixes + test updates for sharding (#6186 )

2023-12-20 12:26:20 +00:00

test_clog_truncate.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_close_fds.py

tests: enable multiple pageservers in neon_local and neon_fixture (#5231 )

2023-09-08 16:19:57 +01:00

test_compatibility.py

test_create_snapshot: do not try to copy pg_dynshmem dir (#6796 )

2024-02-18 12:16:07 +00:00

test_config.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_crafted_wal_end.py

test_runner: replace black with ruff format (#6268 )

2024-01-05 15:35:07 +00:00

test_createdropdb.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_createuser.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_ddl_forwarding.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_disk_usage_eviction.py

tests: test_secondary_mode_eviction: avoid use of mocked statvfs (#6698 )

2024-02-13 09:00:50 +02:00

test_download_extensions.py

Use test specific directory in test_remote_extensions (#5938 )

2023-11-27 18:57:58 +00:00

test_duplicate_layers.py

tests: update for tenant generations (#5449 )

2023-12-07 12:27:16 +00:00

test_fsm_truncate.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_fullbackup.py

tests: Remove unnecessary port config with VanillaPostgres class

2024-02-11 01:34:31 +02:00

test_gc_aggressive.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_hot_standby.py

Add large insertion and slow WAL sending to test_hot_standby.

2024-01-02 10:50:20 +04:00

test_import.py

tests: Remove obsolete allowlist entries

2024-02-11 01:34:31 +02:00

test_large_schema.py

tests: enable multiple pageservers in neon_local and neon_fixture (#5231 )

2023-09-08 16:19:57 +01:00

test_layer_bloating.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_layer_eviction.py

test_runner: replace black with ruff format (#6268 )

2024-01-05 15:35:07 +00:00

test_layer_writers_fail.py

Move tenant & timeline dir method to NeonPageserver and use them everywhere (#5262 )

2023-09-15 11:17:18 +01:00

test_layers_from_future.py

test_runner: test_issue_5878 log allow list (#6259 )

2024-01-03 14:22:17 +00:00

test_lfc_resize.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_local_file_cache.py

LFC fixes + statistics (#5727 )

2023-11-23 08:59:19 +02:00

test_logging.py

tests: support for running on single pg version, use in one place (#6525 )

2024-01-31 17:37:25 +02:00

test_logical_replication.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_lsn_mapping.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_migrations.py

Grant pg_monitor to neon_superuser (#6691 )

2024-02-09 20:22:53 +00:00

test_multixact.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_neon_cli.py

tests: update for tenant generations (#5449 )

2023-12-07 12:27:16 +00:00

test_neon_extension.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_neon_local_cli.py

fix(test suite): some tests leak child processes (#6497 )

2024-01-26 18:23:53 +00:00

test_neon_superuser.py

Grant pg_monitor to neon_superuser (#6691 )

2024-02-09 20:22:53 +00:00

test_next_xid.py

Fix calculation of maximal multixact in ingest_multixact_create_record (#6502 )

2024-01-29 07:39:16 +02:00

test_normal_work.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_old_request_lsn.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_ondemand_download.py

fix(test_ondemand_download_timetravel): occasionally fails with WAL timeout during layer creation (#6818 )

2024-02-20 14:09:15 +01:00

test_pageserver_api.py

test_runner: replace black with ruff format (#6268 )

2024-01-05 15:35:07 +00:00

test_pageserver_catchup.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_pageserver_generations.py

storage controller: background reconcile, graceful shutdown, better logging (#6709 )

2024-02-16 13:00:53 +00:00

test_pageserver_metric_collection.py

Use extend instead of groups of append calls in tests (#6109 )

2023-12-12 18:00:37 +01:00

test_pageserver_reconnect.py

Implement lockless update of pageserver_connstring GUC in shared memory (#6314 )

2024-01-23 07:55:05 +02:00

test_pageserver_restart.py

tests: add basic coverage for sharding (#6380 )

2024-01-26 14:40:47 +00:00

test_pageserver_restarts_under_workload.py

pageserver: improve the shutdown log error (#5792 )

2023-11-07 16:57:26 +00:00

test_pageserver_secondary.py

pageserver: remove heatmap file during tenant delete (#6806 )

2024-02-19 14:01:36 +00:00

test_parallel_copy.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_pg_regress.py

tests: add basic coverage for sharding (#6380 )

2024-01-26 14:40:47 +00:00

test_physical_replication.py

Track size of FSM fork while applying records at replica (#5901 )

2023-12-05 18:49:24 +02:00

test_pitr_gc.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_proxy_allowed_ips.py

IP allowlist on the proxy side (#5906 )

2023-11-30 13:14:33 +00:00

test_proxy_metric_collection.py

refactor(test_consumption_metrics): split for pageserver and proxy (#5324 )

2023-09-16 18:05:35 +03:00

test_proxy_rate_limiter.py

Proxy control plane rate limiter (#5785 )

2023-11-15 09:15:59 +00:00

test_proxy.py

http2 alpn (#6815 )

2024-02-20 10:44:46 +00:00

test_read_trace.py

tests: enable multiple pageservers in neon_local and neon_fixture (#5231 )

2023-09-08 16:19:57 +01:00

test_read_validation.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_readonly_node.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_recovery.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_remote_storage.py

tests: Remove obsolete allowlist entries

2024-02-11 01:34:31 +02:00

test_s3_restore.py

S3 restore test: Use a workaround to enable moto's self-copy support (#6594 )

2024-02-02 23:45:57 +01:00

test_setup.py

python: more linting (#4734 )

2023-07-18 12:56:40 +03:00

test_sharding_service.py

storage controller: debug observability endpoints and self-test (#6820 )

2024-02-19 20:29:23 +00:00

test_sharding.py

storage controller: debug observability endpoints and self-test (#6820 )

2024-02-19 20:29:23 +00:00

test_sni_router.py

tests: split neon_fixtures.py (#4871 )

2023-08-03 17:20:24 +03:00

test_subxacts.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_tenant_conf.py

tests: allow-lists for occasional failures (#6074 )

2023-12-08 17:32:16 +00:00

test_tenant_delete.py

Add test that runs the S3 scrubber (#6641 )

2024-02-12 19:15:21 +01:00

test_tenant_detach.py

metrics: remove broken tenants (#6586 )

2024-02-05 14:49:35 +02:00

test_tenant_relocation.py

tests: Remove obsolete allowlist entries

2024-02-11 01:34:31 +02:00

test_tenant_size.py

tests: use approximate equality in test_get_tenant_size_with_multiple_branches (#5411 )

2023-09-29 09:15:43 +01:00

test_tenant_tasks.py

propagate lock guard to background deletion task (#4495 )

2023-06-15 17:30:12 +03:00

test_tenants_with_remote_storage.py

tests: Remove obsolete allowlist entries

2024-02-11 01:34:31 +02:00

test_tenants.py

Add test for pageserver_directory_entries_count metric (#6767 )

2024-02-16 14:53:36 +00:00

test_threshold_based_eviction.py

Use extend instead of groups of append calls in tests (#6109 )

2023-12-12 18:00:37 +01:00

test_timeline_delete.py

test: shutdown endpoints before deletion (#6619 )

2024-02-09 09:01:07 +00:00

test_timeline_size.py

tests: Remove unnecessary port config with VanillaPostgres class

2024-02-11 01:34:31 +02:00

test_truncate.py

python: more linting (#4734 )

2023-07-18 12:56:40 +03:00

test_twophase.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_unlogged.py

Rename "Postgres nodes" in control_plane to endpoints.

2023-04-13 14:34:29 +03:00

test_vm_bits.py

tests: Remove "postgres is running on ... branch" messages

2024-02-11 01:34:31 +02:00

test_wal_acceptor_async.py

Make WAL segment init atomic.

2024-01-30 18:05:22 +04:00

test_wal_acceptor.py

tests: Remove obsolete allowlist entries

2024-02-11 01:34:31 +02:00

test_wal_receiver.py

Raise pageserver walreceiver timeouts.

2023-06-19 15:59:38 +04:00

test_wal_restore.py

Allow initdb preservation for broken tenants (#6790 )

2024-02-19 17:27:02 +01:00

test_walredo_not_left_behind_on_detach.py

Move tenant & timeline dir method to NeonPageserver and use them everywhere (#5262 )

2023-09-15 11:17:18 +01:00