rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 13:32:57 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	799db161d3	tests: support for running on single pg version, use in one place (#6525 ) Some tests which are unit test alike do not need to run on different pg versions. Logging test is one of them which I found for unrelated reasons. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-01-31 17:37:25 +02:00
Arpad Müller	47380be12d	Remove version param from get_lsn_by_timestamp (#6551 ) This removes the last remnants of the version param added by #5608 , concluding the transition plan laid out in https://github.com/neondatabase/cloud/pull/7553#discussion_r1370473911 . It follows PR https://github.com/neondatabase/cloud/pull/9202, which we now assume has been deployed to all environments. Full history: * https://github.com/neondatabase/neon/pull/5608 * https://github.com/neondatabase/cloud/pull/7553 * https://github.com/neondatabase/neon/pull/6178 * https://github.com/neondatabase/cloud/pull/9202	2024-01-31 15:30:19 +01:00
John Spray	4010adf653	control_plane/attachment_service: complete APIs (#6394 ) Depends on: https://github.com/neondatabase/neon/pull/6468 ## Problem The sharding service will be used as a "virtual pageserver" by the control plane -- so it needs the set of pageserver APIs that the control plane uses, and to present them under identical URLs, including prefix (/v1). ## Summary of changes - Add missing APIs: - Tenant deletion - Timeline deletion - Node list (used in test now, later in tools) - `/location_config` API (for migrating tenants into the sharding service) - Rework attachment service URLs: - `/v1` prefix is used for pageserver-compatible APIs - `/upcall/v1` prefix is used for APIs that are called by the pageserver (re-attach and validate) - `/debug/v1` prefix is used for endpoints that are for testing - `/control/v1` prefix is used for new sharding service APIs that do not mimic a pageserver API, such as registering and configuring nodes. - Add test_sharding_service. The sharding service already had some collateral coverage from its use in general tests, but this is the first dedicated testing for it.	2024-01-31 12:23:06 +00:00
Sasha Krassovsky	e8c9a51273	Allow creating subscriptions as neon_superuser (#6484 ) ## Problem We currently can't create subscriptions in PG14 and PG15 because only superusers can, and PG16 requires adding roles to pg_create_subscription. ## Summary of changes I added changes to PG14 and PG15 that allow neon_superuser to bypass the superuser requirement. For PG16, I didn't do that but added a migration that adds neon_superuser to pg_create_subscription. Also added a test to make sure it works.	2024-01-30 22:32:33 -08:00
Arseny Sher	bc684e9d3b	Make WAL segment init atomic. Since fdatasync is used for flushing WAL, changing file size is unsafe. Make segment creation atomic by using tmp file + rename to avoid using partially initialized segments. fixes https://github.com/neondatabase/neon/issues/6402	2024-01-30 18:05:22 +04:00
Arthur Petukhovsky	2ff1a5cecd	Patch safekeeper control file on HTTP request (#6455 ) Closes #6397	2024-01-29 18:20:57 +00:00
Konstantin Knizhnik	c1148dc9ac	Fix calculation of maximal multixact in ingest_multixact_create_record (#6502 ) ## Problem See https://neondb.slack.com/archives/C06F5UJH601/p1706373716661439 ## Summary of changes Use None instead of 0 as initial accumulator value for calculating maximal multixact XID. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-29 07:39:16 +02:00
Christian Schwarz	3a36a0a227	fix(test suite): some tests leak child processes (#6497 )	2024-01-26 18:23:53 +00:00
John Spray	58f6cb649e	control_plane: database persistence for attachment_service (#6468 ) ## Problem Spun off from https://github.com/neondatabase/neon/pull/6394 -- this PR is just the persistence parts and the changes that enable it to work nicely ## Summary of changes - Revert #6444 and #6450 - In neon_local, start a vanilla postgres instance for the attachment service to use. - Adopt `diesel` crate for database access in attachment service. This uses raw SQL migrations as the source of truth for the schema, so it's a soft dependency: we can switch libraries pretty easily. - Rewrite persistence.rs to use postgres (via diesel) instead of JSON. - Preserve JSON read+write at startup and shutdown: this enables using the JSON format in compatibility tests, so that we don't have to commit to our DB schema yet. - In neon_local, run database creation + migrations before starting attachment service - Run the initial reconciliation in Service::spawn in the background, so that the pageserver + attachment service don't get stuck waiting for each other to start, when restarting both together in a test.	2024-01-26 17:20:44 +00:00
John Spray	55b7cde665	tests: add basic coverage for sharding (#6380 ) ## Problem The support for sharding in the pageserver was written before https://github.com/neondatabase/neon/pull/6205 landed, so when it landed we couldn't directly test sharding. ## Summary of changes - Add `test_sharding_smoke` which tests the basics of creating a sharding tenant, creating a timeline within it, checking that data within it is distributed. - Add modes to pg_regress tests for running with 4 shards as well as with 1.	2024-01-26 14:40:47 +00:00
Christian Schwarz	918b03b3b0	integrate tokio-epoll-uring as alternative VirtualFile IO engine (#5824 )	2024-01-26 09:25:07 +01:00
Christian Schwarz	fd4cce9417	test_pageserver_max_throughput_getpage_at_latest_lsn: remove n_tenants=100 combination (#6477 ) Need to fix the neon_local timeouts first (https://github.com/neondatabase/neon/issues/6473) and also not run them on every merge, but only nightly: https://github.com/neondatabase/neon/issues/6476	2024-01-25 18:17:53 +00:00
Joonas Koivunen	463b6a26b5	test: show relative order eviction with "fast growing tenant" (#6377 ) Refactor out test_disk_usage_eviction tenant creation and add a custom case with 4 tenants, 3 made with pgbench scale=1 and 1 made with pgbench scale=4. Because the tenants are created in order of scales [1, 1, 1, 4] this is simple enough to demonstrate the problem with using absolute access times, because on a disk usage based eviction run we will disproportionally target the first scale=1 tenant(s), and the later larger tenant does not lose anything. This test is not enough to show the difference between `relative_equal` and `relative_spare` (the fudge factor); much larger scale will be needed for "the large tenant", but that will make debug mode tests slower. Cc: #5304	2024-01-25 15:38:28 +02:00
John Spray	c9b1657e4c	pageserver: fixes for creation operations overlapping with shutdown/startup (#6436 ) ## Problem For #6423, creating a reproducer turned out to be very easy, as an extension to test_ondemand_activation. However, before I had diagnosed the issue, I was starting with a more brute force approach of running creation API calls in the background while restarting a pageserver, and that shows up a bunch of other interesting issues. In this PR: - Add the reproducer for #6423 by extending `test_ondemand_activation` (confirmed that this test fails if I revert the fix from https://github.com/neondatabase/neon/pull/6430) - In timeline creation, return 503 responses when we get an error and the tenant's cancellation token is set: this covers the cases where we get an anyhow::Error from something during timeline creation as a result of shutdown. - While waiting for tenants to become active during creation, don't .map_err() the result to a 500: instead let the `From` impl map the result to something appropriate (this includes mapping shutdown to 503) - During tenant creation, we were calling `Tenant::load_local` because no Preload object is provided. This is usually harmless because the tenant dir is empty, but if there are some half-created timelines in there, bad things can happen. Propagate the SpawnMode into Tenant::attach, so that it can properly skip _any_ attempt to load timelines if creating. - When we call upsert_location, there's a SpawnMode that tells us whether to load from remote storage or not. But if the operation is a retry and we already have the tenant, it is not correct to skip loading from remote storage: there might be a timeline there. This isn't strictly a correctness issue as long as the caller behaves correctly (does not assume that any timelines are persistent until the creation is acked), but it's a more defensive position. - If we shut down while the task in Tenant::attach is running, it can end up spawning rogue tasks. Fix this by holding a GateGuard through here, and in upsert_location shutting down a tenant after calling tenant_spawn if we can't insert it into tenants_map. This fixes the expected behavior that after shutdown_all_tenants returns, no tenant tasks are running. - Add `test_create_churn_during_restart`, which runs tenant & timeline creations across pageserver restarts. - Update a couple of tests that covered cancellation, to reflect the cleaner errors we now return.	2024-01-25 12:35:52 +00:00
Arpad Müller	d820aa1d08	Disable initdb cancellation (#6451 ) ## Problem The initdb cancellation added in #5921 is not sufficient to reliably abort the entire initdb process. Initdb also spawns children. The tests added by #6310 (#6385) and #6436 now do initdb cancellations on a more regular basis. In #6385, I attempted to issue `killpg` (after giving it a new process group ID) to kill not just the initdb but all its spawned subprocesses, but this didn't work. Initdb doesn't take that long in the end either, so we just wait until it concludes. ## Summary of changes * revert initdb cancellation support added in #5921 * still return `Err(Cancelled)` upon cancellation, but this is just to not have to remove the cancellation infrastructure * fixes to the `test_tenant_delete_races_timeline_creation` test to make it reliably pass Fixes #6385	2024-01-24 13:06:05 +01:00
Christian Schwarz	996abc9563	pagebench-based GetPage@LSN performance test (#6214 )	2024-01-24 12:51:53 +01:00
Sasha Krassovsky	4f51824820	Fix creating publications for all tables	2024-01-23 22:41:00 -08:00
Arpad Müller	faf275d4a2	Remove initdb on timeline delete (#6387 ) This PR: * makes `initdb.tar.zst` be deleted by default on timeline deletion (#6226), mirroring the safekeeper: https://github.com/neondatabase/neon/pull/6381 * adds a new `preserve_initdb_archive` endpoint for a timeline, to be used during the disaster recovery process, see reasoning [here](https://github.com/neondatabase/neon/issues/6226#issuecomment-1894574778) * makes the creation code look for `initdb-preserved.tar.zst` in addition to `initdb.tar.zst`. * makes the tests use the new endpoint fixes #6226	2024-01-23 18:22:59 +00:00
Konstantin Knizhnik	00d9bf5b61	Implement lockless update of pageserver_connstring GUC in shared memory (#6314 ) ## Problem There is "neon.pageserver_connstring" GUC with PGC_SIGHUP option, allowing to change it using pg_reload_conf(). It is used by control plane to update pageserver connection string if page server is crashed, relocated or new shards are added. It is copied to shared memory because config can not be loaded during query execution and we need to reestablish connection to page server. ## Summary of changes Copying connection string to shared memory is done by postmaster. And other backends should check update counter to determine of connection URL is changed and connection needs to be reestablished. We can not use standard Postgres LW-locks, because postmaster has proc entry and so can not wait on this primitive. This is why lockless access algorithm is implemented using two atomic counters to enforce consistent reading of connection string value from shared memory. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-23 07:55:05 +02:00
Sasha Krassovsky	71f495c7f7	Gate it behind feature flags	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	0a7e050144	Fix test one last time	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	55bfa91bd7	Fix test again again	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	d90b2b99df	Fix test again	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	27587e155d	Fix test	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	b2e7249979	Sleep	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	3c3b53f8ad	Update test	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	394ef013d0	Push the migrations test	2024-01-22 14:53:29 -08:00
Sasha Krassovsky	3f90b2d337	Fix test_ddl_forwarding	2024-01-22 14:53:29 -08:00
Heikki Linnakangas	e4898a6e60	Don't pass InvalidTransactionId to update_next_xid. (#6410 ) update_next_xid() doesn't have any special treatment for the invalid or other special XIDs, so it will treat InvalidTransactionId (0) as a regular XID. If old nextXid is smaller than 2^31, 0 will look like a very old XID, and nothing happens. But if nextXid is greater than 2^31 0 will look like a very new XID, and update_next_xid() will incorrectly bump up nextXID.	2024-01-20 18:04:16 +02:00
Arseny Sher	88df057531	Delete WAL segments from s3 when timeline is deleted. In the most straightforward way; safekeeper performs it in DELETE endpoint implementation, with no coordination between sks. delete_force endpoint in the code is renamed to delete as there is only one way to delete.	2024-01-19 20:11:24 +04:00
Alexander Bayandin	c65ac37a6d	zenbenchmark: attach perf results to allure report (#6395 ) ## Problem For PRs with `run-benchmarks` label, we don't upload results to the db, making it harder to debug such tests. The only way to see some numbers is by examining GitHub Action output which is really inconvenient. This PR adds zenbenchmark metrics to Allure reports. ## Summary of changes - Create a json file with zenbenchmark results and attach it to allure report	2024-01-18 20:59:43 +00:00
Joonas Koivunen	a584e300d1	test: figure out the relative eviction order assertions (#6375 ) I just failed to see this earlier on #6136. layer counts are used as an abstraction, and each of the two tenants lose proportionally about the same amount of layers. sadly there is no difference in between `relative_spare` and `relative_equal` as both of these end up evicting the exact same amount of layers, but I'll try to add later another test for those. Cc: #5304	2024-01-18 12:39:45 +02:00
John Spray	b6ec11ad78	control_plane: generalize attachment_service to handle sharding (#6251 ) ## Problem To test sharding, we need something to control it. We could write python code for doing this from the test runner, but this wouldn't be usable with neon_local run directly, and when we want to write tests with large number of shards/tenants, Rust is a better fit efficiently handling all the required state. This service enables automated tests to easily get a system with sharding/HA without the test itself having to set this all up by hand: existing tests can be run against sharded tenants just by setting a shard count when creating the tenant. ## Summary of changes Attachment service was previously a map of TenantId->TenantState, where the principal state stored for each tenant was the generation and the last attached pageserver. This enabled it to serve the re-attach and validate requests that the pageserver requires. In this PR, the scope of the service is extended substantially to do overall management of tenants in the pageserver, including tenant/timeline creation, live migration, evacuation of offline pageservers etc. This is done using synchronous code to make declarative changes to the tenant's intended state (`TenantState.policy` and `TenantState.intent`), which are then translated into calls into the pageserver by the `Reconciler`. Top level summary of modules within `control_plane/attachment_service/src`: - `tenant_state`: structure that represents one tenant shard. - `service`: implements the main high level such as tenant/timeline creation, marking a node offline, etc. - `scheduler`: for operations that need to pick a pageserver for a tenant, construct a scheduler and call into it. - `compute_hook`: receive notifications when a tenant shard is attached somewhere new. Once we have locations for all the shards in a tenant, emit an update to postgres configuration via the neon_local `LocalEnv`. - `http`: HTTP stubs. These mostly map to methods on `Service`, but are separated for readability and so that it'll be easier to adapt if/when we switch to another RPC layer. - `node`: structure that describes a pageserver node. The most important attribute of a node is its availability: marking a node offline causes tenant shards to reschedule away from it. This PR is a precursor to implementing the full sharding service for prod (#6342). What's the difference between this and a production-ready controller for pageservers? - JSON file persistence to be replaced with a database - Limited observability. - No concurrency limits. Marking a pageserver offline will try and migrate every tenant to a new pageserver concurrently, even if there are thousands. - Very simple scheduler that only knows to pick the pageserver with fewest tenants, and place secondary locations on a different pageserver than attached locations: it does not try to place shards for the same tenant on different pageservers. This matters little in tests, because picking the least-used pageserver usually results in round-robin placement. - Scheduler state is rebuilt exhaustively for each operation that requires a scheduler. - Relies on neon_local mechanisms for updating postgres: in production this would be something that flows through the real control plane. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-01-17 18:01:08 +00:00
John Spray	bf4e708646	pageserver: eviction for secondary mode tenants (#6225 ) Follows #6123 Closes: https://github.com/neondatabase/neon/issues/5342 The approach here is to avoid using `Layer` from secondary tenants, and instead make the eviction types (e.g. `EvictionCandidate`) have a variant that carries a Layer for attached tenants, and a different variant for secondary tenants. Other changes: - EvictionCandidate no longer carries a `Timeline`: this was only used for providing a witness reference to remote timeline client. - The types for returning eviction candidates are all in disk_usage_eviction_task.rs now, whereas some of them were in timeline.rs before. - The EvictionCandidate type replaces LocalLayerInfoForDiskUsageEviction type, which was basically the same thing.	2024-01-16 10:29:26 +00:00
Anna Khanova	3f2187eb92	Proxy relax sni check (#6323 ) ## Problem Using the same domain name () for serverless driver can help with connection caching. https://github.com/neondatabase/neon/issues/6290 ## Summary of changes Relax SNI check.	2024-01-16 08:42:13 +00:00
Arpad Müller	60ced06586	Fix timeline creation and tenant deletion race (#6310 ) Fixes the race condition between timeline creation and tenant deletion outlined in #6255. Related: #5914, which is a similar race condition about the uninit marker file. Fixes #6255	2024-01-13 09:15:58 +01:00
Christian Schwarz	42613d4c30	refactor(NeonEnv): shutdown of child processes (#6327 ) Also shuts down `Broker`, which, before this PR, we did start in `start()` but relied on the fixture to stop. Do it a bit earlier so that, after `NeonEnv.stop()` returns, there are no child processes using `repo_dir`. Also, drive-by-fixes inverted logic around `ps_assert_metric_no_errors`, missed during https://github.com/neondatabase/neon/pull/6295 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-01-12 10:23:21 +01:00
Christian Schwarz	087526b81b	neon_local init: add `--force` mode that allows an empty dir (#6328 ) Need this in https://github.com/neondatabase/neon/pull/6214 refs https://github.com/neondatabase/neon/issues/5771	2024-01-11 18:11:44 +00:00
Vlad Lazar	da7a7c867e	pageserver: do not bump priority of background task for timeline status requests (#6301 ) ## Problem Previously, `GET /v1/tenant/:tenant_id/timeline` and `GET /v1/tenant/:tenant_id/timeline/:timeline_id` would bump the priority of the background task which computes the initial logical size by cancelling the wait on the synchronisation semaphore. However, the request would still return an approximate logical size. It's undesirable to force background work for a status request. ## Summary of changes This PR updates the priority used by the timeline status request such that they don't do priority boosting by default anymore. An optional query parameter, `force-await-initial-logical-size`, is added for both mentioned endpoints. When set to true, it will skip the concurrency limiting semaphore and wait for the background task to complete before returning the exact logical size. In order to exercise this behaviour in a test I had to add an extra failpoint. If you think it's too intrusive, it can be removed. Also fixeda small bug where the cancellation of a download is reported as an opaque download failure upstream. This caused `test_location_conf_churn` to fail at teardown due to a WARN log line. Closes https://github.com/neondatabase/neon/issues/6168	2024-01-11 15:55:32 +00:00
John Spray	4b9b4c2c36	pageserver: cleanup redundant create/attach code, fix detach while attaching (#6277 ) ## Problem The code for tenant create and tenant attach was just a special case of what upsert_location does. ## Summary of changes - Use `upsert_location` for create and attach APIs - Clean up error handling in upsert_location so that it can generate appropriate HTTP response codes - Update tests that asserted the old non-idempotent behavior of attach - Rework the `test_ignore_while_attaching` test, and fix tenant shutdown during activation, which this test was supposed to cover, but it was actually just waiting for activation to complete.	2024-01-09 10:37:54 +00:00
Christian Schwarz	90e0219b29	python tests: support overlayfs for NeonEnvBuilder.from_repo_dir (#6295 ) Part of #5771 Extracted from https://github.com/neondatabase/neon/pull/6214 This PR makes the test suite sensitive to the new env var `NEON_ENV_BUILDER_FROM_REPO_DIR_USE_OVERLAYFS`. If it is set, `NeonEnvBuilder.from_repo_dir` uses overlayfs to duplicate the the snapshot repo dir contents. Since mounting requires root privileges, we use sudo to perform the mounts. That, and macOS support, is also why copytree remains the default. If we ever run on a filesystem with copy reflink support, we should consider that as an alternative. This PR can be tried on a Linux machine on the `test_backward_compatiblity` test, which uses `from_repo_dir`.	2024-01-09 10:15:46 +00:00
John Spray	b3a681d121	s3_scrubber: updates for sharding (#6281 ) This is a lightweight change to keep the scrubber providing sensible output when using sharding. - The timeline count was wrong when using sharding - When checking for tenant existence, we didn't re-use results between different shards in the same tenant Closes: https://github.com/neondatabase/neon/issues/5929	2024-01-08 09:19:10 +00:00
Alexander Bayandin	7de829e475	test_runner: replace black with ruff format (#6268 ) ## Problem `black` is slow sometimes, we can replace it with `ruff format` (a new feature in 0.1.2 [0]), which produces pretty similar to black style [1]. On my local machine (MacBook M1 Pro 16GB): ``` # `black` on main $ hyperfine "BLACK_CACHE_DIR=/dev/null poetry run black ." Benchmark 1: BLACK_CACHE_DIR=/dev/null poetry run black . Time (mean ± σ): 3.131 s ± 0.090 s [User: 5.194 s, System: 0.859 s] Range (min … max): 3.047 s … 3.354 s 10 runs ``` ``` # `ruff format` on the current PR $ hyperfine "RUFF_NO_CACHE=true poetry run ruff format" Benchmark 1: RUFF_NO_CACHE=true poetry run ruff format Time (mean ± σ): 300.7 ms ± 50.2 ms [User: 259.5 ms, System: 76.1 ms] Range (min … max): 267.5 ms … 420.2 ms 10 runs ``` ## Summary of changes - Replace `black` with `ruff format` everywhere - [0] https://docs.astral.sh/ruff/formatter/ - [1] https://docs.astral.sh/ruff/formatter/#black-compatibility	2024-01-05 15:35:07 +00:00
John Spray	3c560d27a8	pageserver: implement secondary-mode downloads (#6123 ) Follows on from #6050 , in which we upload heatmaps. Secondary locations will now poll those heatmaps and download layers mentioned in the heatmap. TODO: - [X] ~Unify/reconcile stats for behind-schedule execution with warn_when_period_overrun (https://github.com/neondatabase/neon/pull/6050#discussion_r1426560695)~ - [x] Give downloads their own concurrency config independent of uploads Deferred optimizations: - https://github.com/neondatabase/neon/issues/6199 - https://github.com/neondatabase/neon/issues/6200 Eviction will be the next PR: - #5342	2024-01-05 12:29:20 +00:00
Arthur Petukhovsky	f3b5db1443	Add API for safekeeper timeline copy (#6091 ) Implement API for cloning a single timeline inside a safekeeper. Also add API for calculating a sha256 hash of WAL, which is used in tests. `/copy` API works by copying objects inside S3 for all but the last segments, and the last segments are copied on-disk. A special temporary directory is created for a timeline, because copy can take a lot of time, especially for large timelines. After all files segments have been prepared, this directory is mounted to the main tree and timeline is loaded to memory. Some caveats: - large timelines can take a lot of time to copy, because we need to copy many S3 segments - caller should wait for HTTP call to finish indefinetely and don't close the HTTP connection, because it will stop the process, which is not continued in the background - `until_lsn` must be a valid LSN, otherwise bad things can happen - API will return 200 if specified `timeline_id` already exists, even if it's not a copy - each safekeeper will try to copy S3 segments, so it's better to not call this API in-parallel on different safekeepers	2024-01-04 17:40:38 +00:00
John Spray	edc962f1d7	test_runner: test_issue_5878 log allow list (#6259 ) ## Problem https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6254/7388706419/index.html#suites/5a4b8734277a9878cb429b80c314f470/e54c4f6f6ed22672 ## Summary of changes Permit the log message: because the test helper's detach function increments the generation number, a detach/attach cycle can cause the error if the test runner node is slow enough for the opportunistic deletion queue flush on detach not to complete by the time we call attach.	2024-01-03 14:22:17 +00:00
Arseny Sher	65b4e6e7d6	Remove empty safekeeper init since truncateLsn. It has caveats such as creating half empty segment which can't be offloaded. Instead we'll pursue approach of pull_timeline, seeding new state from some peer.	2024-01-03 18:20:19 +04:00
John Spray	673a865055	tests: tolerate 304 when evicting layers (#6261 ) In tests that evict layers, explicit eviction can race with automatic eviction of the same layer and result in a 304	2024-01-03 11:50:58 +00:00
Arseny Sher	aaaa39d9f5	Add large insertion and slow WAL sending to test_hot_standby. To exercise MAX_SEND_SIZE sending from safekeeper; we've had a bug with WAL records torn across several XLogData messages. Add failpoint to safekeeper to slow down sending. Also check for corrupted WAL complains in standby log. Make the test a bit simpler in passing, e.g. we don't need explicit commits as autocommit is enabled by default. https://neondb.slack.com/archives/C05L7D1JAUS/p1703774799114719 https://github.com/neondatabase/cloud/issues/9057	2024-01-02 10:50:20 +04:00
Arseny Sher	e79a19339c	Add failpoint support to safekeeper. Just a copy paste from pageserver.	2024-01-02 10:50:20 +04:00

1 2 3 4 5 ...

1017 Commits