rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-28 18:40:38 +00:00

Author	SHA1	Message	Date
John Spray	ce1673a8c4	tests: improve stability of tests using `wait_for_upload_queue_empty` (#6856 ) ## Problem PR #6834 introduced an assertion that the sets of metric labels on finished operations should equal those on started operations, which is not true if no operations have finished yet for a particular set of labels. ## Summary of changes - Instead of asserting out, wait and re-check in the case that finished metrics don't match started	2024-02-21 16:00:17 +00:00
Arpad Müller	4de2f0f3e0	Implement a sharded time travel recovery endpoint (#6821 ) The sharding service didn't have support for S3 disaster recovery. This PR adds a new endpoint to the attachment service, which is slightly different from the endpoint on the pageserver, in that it takes the shard count history of the tenant as json parameters: we need to do time travel recovery for both the shard count at the target time and the shard count at the current moment in time, as well as the past shard counts that either still reference. Fixes #6604, part of https://github.com/neondatabase/cloud/issues/8233 --------- Co-authored-by: John Spray <john@neon.tech>	2024-02-21 16:35:37 +01:00
Christian Schwarz	e49602ecf5	feat(metrics): per-timeline metric for on-demand downloads, remove calls_started histogram (#6834 ) refs #6737 # Problem Before this PR, on-demand downloads weren't measured per tenant_id. This makes root-cause analysis of latency spikes harder, requiring us to resort to log scraping for ``` {neon_service="pageserver"} \|= `downloading on-demand` \|= `$tenant_id` ``` which can be expensive when zooming out in Grafana. Context: https://neondb.slack.com/archives/C033RQ5SPDH/p1707809037868189 # Solution / Changes - Remove the calls_started histogram - I did the dilegence, there are only 2 dashboards using this histogram, and in fact only one uses it as a histogram, the other just as a a counter. - [Link 1](`8115b54d9f/neonprod/dashboards/hkXNF7oVz/dashboard-Z31XmM24k.yaml (L1454)`): `Pageserver Thrashing` dashboard, linked from playbook, will fix. - [Link 2](`8115b54d9f/neonprod/dashboards/CEllzAO4z/dashboard-sJqfNFL4k.yaml (L599)`): one of my personal dashboards, unused for a long time, already broken in other ways, no need to fix. - replace `pageserver_remote_timeline_client_calls_unfinished` gauge with a counter pair - Required `Clone`-able `IntCounterPair`, made the necessary changes in the `libs/metrics` crate - fix tests to deal with the fallout A subsequent PR will remove a timeline-scoped metric to compensate. Note that we don't need additional global counters for the per-timeline counters affected by this PR; we can use the `remote_storage` histogram for those, which, conveniently, also include the secondary-mode downloads, which aren't covered by the remote timeline client metrics (should they?).	2024-02-20 17:52:23 +01:00
John Spray	4f7704af24	storage controller: fix spurious reconciles after pageserver restarts (#6814 ) ## Problem When investigating test failures (https://github.com/neondatabase/neon/issues/6813) I noticed we were doing a bunch of Reconciler runs right after splitting a tenant. It's because the splitting test does a pageserver restart, and there was a bug in /re-attach handling, where we would update the generation correctly in the database and intent state, but not observed state, thereby triggering a reconciliation on the next call to maybe_reconcile. This didn't break anything profound (underlying rules about generations were respected), but caused the storage controller to do an un-needed extra round of bumping the generation and reconciling. ## Summary of changes - Start adding metrics to the storage controller - Assert on the number of reconciles done in test_sharding_split_smoke - Fix /re-attach to update `observed` such that we don't spuriously re-reconcile tenants.	2024-02-19 17:44:20 +00:00
Arpad Müller	e0c12faabd	Allow initdb preservation for broken tenants (#6790 ) Often times the tenants we want to (WAL) DR are the ones which the pageserver marks as broken. Therefore, we should allow initdb preservation also for broken tenants. Fixes #6781.	2024-02-19 17:27:02 +01:00
Vlad Lazar	587cb705b8	pageserver: roll open layer in timeline writer (#6661 ) ## Problem One WAL record can actually produce an arbitrary amount of key value pairs. This is problematic since it might cause our frozen layers to bloat past the max allowed size of S3 single shot uploads. [#6639](https://github.com/neondatabase/neon/pull/6639) introduced a "should roll" check after every batch of `ingest_batch_size` (100 WAL records by default). This helps, but the original problem still exists. ## Summary of changes This patch moves the responsibility of rolling the currently open layer to the `TimelineWriter`. Previously, this was done ad-hoc via calls to `check_checkpoint_distance`. The advantages of this approach are: * ability to split one batch over multiple open layers * less layer map locking * remove ad-hoc check_checkpoint_distance calls More specifically, we track the current size of the open layer in the writer. On each `put` check whether the current layer should be closed and a new one opened. Keeping track of the currently open layer results in less contention on the layer map lock. It only needs to be acquired on the first write and on writes that require a roll afterwards. Rolling the open layer can be triggered by: 1. The distance from the last LSN we rolled at. This bounds the amount of WAL that the safekeepers need to store. 2. The size of the currently open layer. 3. The time since the last roll. It helps safekeepers to regard pageserver as caught up and suspend activity. Closes #6624	2024-02-19 12:34:27 +00:00
Arpad Müller	f7516df6c1	Pass timestamp as a datetime (#6656 ) This saves some repetition. I did this in #6533 for `tenant_time_travel_remote_storage` already.	2024-02-07 12:56:53 +01:00
Arpad Müller	0ac2606c8a	S3 restore test: Use a workaround to enable moto's self-copy support (#6594 ) While working on https://github.com/getmoto/moto/pull/7303 I discovered that if you enable bucket encryption, moto allows self-copies. So we can un-ignore the test. I tried it out locally, it works great. Followup of #6533, part of https://github.com/neondatabase/cloud/issues/8233	2024-02-02 23:45:57 +01:00
Joonas Koivunen	caf868e274	test: assert we eventually free space (#6536 ) in `test_statvfs_pressure_{usage,min_avail_bytes}` we now race against initial logical size calculation on-demand downloading the layers. first wait out the initial logical sizes, then change the final asserts to be "eventual", which is not great but it is faster than failing and retrying. this issue seems to happen only in debug mode tests. Fixes: #6510	2024-02-02 19:46:47 +02:00
Arpad Müller	48b05b7c50	Add a time_travel_remote_storage http endpoint (#6533 ) Adds an endpoint to the pageserver to S3-recover an entire tenant to a specific given timestamp. Required input parameters: * `travel_to`: the target timestamp to recover the S3 state to * `done_if_after`: a timestamp that marks the beginning of the recovery process. retries of the query should keep this value constant. it must be after `travel_to`, and also after any changes we want to revert, and must represent a point in time before the endpoint is being called, all of these time points in terms of the time source used by S3. these criteria need to hold even in the face of clock differences, so I recommend waiting a specific amount of time, then taking `done_if_after`, then waiting some amount of time again, and only then issuing the request. Also important to note: the timestamps in S3 work at second accuracy, so one needs to add generous waits before and after for the process to work smoothly (at least 2-3 seconds). We ignore the added test for the mocked S3 for now due to a limitation in moto: https://github.com/getmoto/moto/issues/7300 . Part of https://github.com/neondatabase/cloud/issues/8233	2024-02-02 14:52:12 +01:00
Arpad Müller	47380be12d	Remove version param from get_lsn_by_timestamp (#6551 ) This removes the last remnants of the version param added by #5608 , concluding the transition plan laid out in https://github.com/neondatabase/cloud/pull/7553#discussion_r1370473911 . It follows PR https://github.com/neondatabase/cloud/pull/9202, which we now assume has been deployed to all environments. Full history: * https://github.com/neondatabase/neon/pull/5608 * https://github.com/neondatabase/cloud/pull/7553 * https://github.com/neondatabase/neon/pull/6178 * https://github.com/neondatabase/cloud/pull/9202	2024-01-31 15:30:19 +01:00
John Spray	c9b1657e4c	pageserver: fixes for creation operations overlapping with shutdown/startup (#6436 ) ## Problem For #6423, creating a reproducer turned out to be very easy, as an extension to test_ondemand_activation. However, before I had diagnosed the issue, I was starting with a more brute force approach of running creation API calls in the background while restarting a pageserver, and that shows up a bunch of other interesting issues. In this PR: - Add the reproducer for #6423 by extending `test_ondemand_activation` (confirmed that this test fails if I revert the fix from https://github.com/neondatabase/neon/pull/6430) - In timeline creation, return 503 responses when we get an error and the tenant's cancellation token is set: this covers the cases where we get an anyhow::Error from something during timeline creation as a result of shutdown. - While waiting for tenants to become active during creation, don't .map_err() the result to a 500: instead let the `From` impl map the result to something appropriate (this includes mapping shutdown to 503) - During tenant creation, we were calling `Tenant::load_local` because no Preload object is provided. This is usually harmless because the tenant dir is empty, but if there are some half-created timelines in there, bad things can happen. Propagate the SpawnMode into Tenant::attach, so that it can properly skip _any_ attempt to load timelines if creating. - When we call upsert_location, there's a SpawnMode that tells us whether to load from remote storage or not. But if the operation is a retry and we already have the tenant, it is not correct to skip loading from remote storage: there might be a timeline there. This isn't strictly a correctness issue as long as the caller behaves correctly (does not assume that any timelines are persistent until the creation is acked), but it's a more defensive position. - If we shut down while the task in Tenant::attach is running, it can end up spawning rogue tasks. Fix this by holding a GateGuard through here, and in upsert_location shutting down a tenant after calling tenant_spawn if we can't insert it into tenants_map. This fixes the expected behavior that after shutdown_all_tenants returns, no tenant tasks are running. - Add `test_create_churn_during_restart`, which runs tenant & timeline creations across pageserver restarts. - Update a couple of tests that covered cancellation, to reflect the cleaner errors we now return.	2024-01-25 12:35:52 +00:00
Arpad Müller	d820aa1d08	Disable initdb cancellation (#6451 ) ## Problem The initdb cancellation added in #5921 is not sufficient to reliably abort the entire initdb process. Initdb also spawns children. The tests added by #6310 (#6385) and #6436 now do initdb cancellations on a more regular basis. In #6385, I attempted to issue `killpg` (after giving it a new process group ID) to kill not just the initdb but all its spawned subprocesses, but this didn't work. Initdb doesn't take that long in the end either, so we just wait until it concludes. ## Summary of changes * revert initdb cancellation support added in #5921 * still return `Err(Cancelled)` upon cancellation, but this is just to not have to remove the cancellation infrastructure * fixes to the `test_tenant_delete_races_timeline_creation` test to make it reliably pass Fixes #6385	2024-01-24 13:06:05 +01:00
Christian Schwarz	996abc9563	pagebench-based GetPage@LSN performance test (#6214 )	2024-01-24 12:51:53 +01:00
Arpad Müller	faf275d4a2	Remove initdb on timeline delete (#6387 ) This PR: * makes `initdb.tar.zst` be deleted by default on timeline deletion (#6226), mirroring the safekeeper: https://github.com/neondatabase/neon/pull/6381 * adds a new `preserve_initdb_archive` endpoint for a timeline, to be used during the disaster recovery process, see reasoning [here](https://github.com/neondatabase/neon/issues/6226#issuecomment-1894574778) * makes the creation code look for `initdb-preserved.tar.zst` in addition to `initdb.tar.zst`. * makes the tests use the new endpoint fixes #6226	2024-01-23 18:22:59 +00:00
Arseny Sher	88df057531	Delete WAL segments from s3 when timeline is deleted. In the most straightforward way; safekeeper performs it in DELETE endpoint implementation, with no coordination between sks. delete_force endpoint in the code is renamed to delete as there is only one way to delete.	2024-01-19 20:11:24 +04:00
John Spray	b6ec11ad78	control_plane: generalize attachment_service to handle sharding (#6251 ) ## Problem To test sharding, we need something to control it. We could write python code for doing this from the test runner, but this wouldn't be usable with neon_local run directly, and when we want to write tests with large number of shards/tenants, Rust is a better fit efficiently handling all the required state. This service enables automated tests to easily get a system with sharding/HA without the test itself having to set this all up by hand: existing tests can be run against sharded tenants just by setting a shard count when creating the tenant. ## Summary of changes Attachment service was previously a map of TenantId->TenantState, where the principal state stored for each tenant was the generation and the last attached pageserver. This enabled it to serve the re-attach and validate requests that the pageserver requires. In this PR, the scope of the service is extended substantially to do overall management of tenants in the pageserver, including tenant/timeline creation, live migration, evacuation of offline pageservers etc. This is done using synchronous code to make declarative changes to the tenant's intended state (`TenantState.policy` and `TenantState.intent`), which are then translated into calls into the pageserver by the `Reconciler`. Top level summary of modules within `control_plane/attachment_service/src`: - `tenant_state`: structure that represents one tenant shard. - `service`: implements the main high level such as tenant/timeline creation, marking a node offline, etc. - `scheduler`: for operations that need to pick a pageserver for a tenant, construct a scheduler and call into it. - `compute_hook`: receive notifications when a tenant shard is attached somewhere new. Once we have locations for all the shards in a tenant, emit an update to postgres configuration via the neon_local `LocalEnv`. - `http`: HTTP stubs. These mostly map to methods on `Service`, but are separated for readability and so that it'll be easier to adapt if/when we switch to another RPC layer. - `node`: structure that describes a pageserver node. The most important attribute of a node is its availability: marking a node offline causes tenant shards to reschedule away from it. This PR is a precursor to implementing the full sharding service for prod (#6342). What's the difference between this and a production-ready controller for pageservers? - JSON file persistence to be replaced with a database - Limited observability. - No concurrency limits. Marking a pageserver offline will try and migrate every tenant to a new pageserver concurrently, even if there are thousands. - Very simple scheduler that only knows to pick the pageserver with fewest tenants, and place secondary locations on a different pageserver than attached locations: it does not try to place shards for the same tenant on different pageservers. This matters little in tests, because picking the least-used pageserver usually results in round-robin placement. - Scheduler state is rebuilt exhaustively for each operation that requires a scheduler. - Relies on neon_local mechanisms for updating postgres: in production this would be something that flows through the real control plane. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-01-17 18:01:08 +00:00
Vlad Lazar	da7a7c867e	pageserver: do not bump priority of background task for timeline status requests (#6301 ) ## Problem Previously, `GET /v1/tenant/:tenant_id/timeline` and `GET /v1/tenant/:tenant_id/timeline/:timeline_id` would bump the priority of the background task which computes the initial logical size by cancelling the wait on the synchronisation semaphore. However, the request would still return an approximate logical size. It's undesirable to force background work for a status request. ## Summary of changes This PR updates the priority used by the timeline status request such that they don't do priority boosting by default anymore. An optional query parameter, `force-await-initial-logical-size`, is added for both mentioned endpoints. When set to true, it will skip the concurrency limiting semaphore and wait for the background task to complete before returning the exact logical size. In order to exercise this behaviour in a test I had to add an extra failpoint. If you think it's too intrusive, it can be removed. Also fixeda small bug where the cancellation of a download is reported as an opaque download failure upstream. This caused `test_location_conf_churn` to fail at teardown due to a WARN log line. Closes https://github.com/neondatabase/neon/issues/6168	2024-01-11 15:55:32 +00:00
Alexander Bayandin	7de829e475	test_runner: replace black with ruff format (#6268 ) ## Problem `black` is slow sometimes, we can replace it with `ruff format` (a new feature in 0.1.2 [0]), which produces pretty similar to black style [1]. On my local machine (MacBook M1 Pro 16GB): ``` # `black` on main $ hyperfine "BLACK_CACHE_DIR=/dev/null poetry run black ." Benchmark 1: BLACK_CACHE_DIR=/dev/null poetry run black . Time (mean ± σ): 3.131 s ± 0.090 s [User: 5.194 s, System: 0.859 s] Range (min … max): 3.047 s … 3.354 s 10 runs ``` ``` # `ruff format` on the current PR $ hyperfine "RUFF_NO_CACHE=true poetry run ruff format" Benchmark 1: RUFF_NO_CACHE=true poetry run ruff format Time (mean ± σ): 300.7 ms ± 50.2 ms [User: 259.5 ms, System: 76.1 ms] Range (min … max): 267.5 ms … 420.2 ms 10 runs ``` ## Summary of changes - Replace `black` with `ruff format` everywhere - [0] https://docs.astral.sh/ruff/formatter/ - [1] https://docs.astral.sh/ruff/formatter/#black-compatibility	2024-01-05 15:35:07 +00:00
John Spray	3c560d27a8	pageserver: implement secondary-mode downloads (#6123 ) Follows on from #6050 , in which we upload heatmaps. Secondary locations will now poll those heatmaps and download layers mentioned in the heatmap. TODO: - [X] ~Unify/reconcile stats for behind-schedule execution with warn_when_period_overrun (https://github.com/neondatabase/neon/pull/6050#discussion_r1426560695)~ - [x] Give downloads their own concurrency config independent of uploads Deferred optimizations: - https://github.com/neondatabase/neon/issues/6199 - https://github.com/neondatabase/neon/issues/6200 Eviction will be the next PR: - #5342	2024-01-05 12:29:20 +00:00
John Spray	673a865055	tests: tolerate 304 when evicting layers (#6261 ) In tests that evict layers, explicit eviction can race with automatic eviction of the same layer and result in a 304	2024-01-03 11:50:58 +00:00
Arpad Müller	a89d6dc76e	Always send a json response for timeline_get_lsn_by_timestamp (#6178 ) As part of the transition laid out in [this](https://github.com/neondatabase/cloud/pull/7553#discussion_r1370473911) comment, don't read the `version` query parameter in `timeline_get_lsn_by_timestamp`, but always return the structured json response. Follow-up of https://github.com/neondatabase/neon/pull/5608	2023-12-19 11:29:16 +01:00
John Spray	56f7d55ba7	pageserver: basic cancel/timeout for remote storage operations (#6097 ) ## Problem Various places in remote storage were not subject to a timeout (thereby stuck TCP connections could hold things up), and did not respect a cancellation token (so things like timeline deletion or tenant detach would have to wait arbitrarily long). ## Summary of changes - Add download_cancellable and upload_cancellable helpers, and use them in all the places we wait for remote storage operations (with the exception of initdb downloads, where it would not have been safe). - Add a cancellation token arg to `download_retry`. - Use cancellation token args in various places that were missing one per #5066 Closes: #5066 Why is this only "basic" handling? - Doesn't express difference between shutdown and errors in return types, to avoid refactoring all the places that use an anyhow::Error (these should all eventually return a more structured error type) - Implements timeouts on top of remote storage, rather than within it: this means that operations hitting their timeout will lose their semaphore permit and thereby go to the back of the queue for their retry. - Doing a nicer job is tracked in https://github.com/neondatabase/neon/issues/6096	2023-12-15 17:43:02 +00:00
John Spray	c4e0ef507f	pageserver: heatmap uploads (#6050 ) Dependency (commits inline): https://github.com/neondatabase/neon/pull/5842 ## Problem Secondary mode tenants need a manifest of what to download. Ultimately this will be some kind of heat-scored set of layers, but as a robust first step we will simply use the set of resident layers: secondary tenant locations will aim to match the on-disk content of the attached location. ## Summary of changes - Add heatmap types representing the remote structure - Add hooks to Tenant/Timeline for generating these heatmaps - Create a new `HeatmapUploader` type that is external to `Tenant`, and responsible for walking the list of attached tenants and scheduling heatmap uploads. Notes to reviewers: - Putting the logic for uploads (and later, secondary mode downloads) outside of `Tenant` is an opinionated choice, motivated by: - Enable future smarter scheduling of operations, e.g. uploading the stalest tenant first, rather than having all tenants compete for a fair semaphore on a first-come-first-served basis. Similarly for downloads, we may wish to schedule the tenants with the hottest un-downloaded layers first. - Enable accessing upload-related state without synchronization (it belongs to HeatmapUploader, rather than being some Mutex<>'d part of Tenant) - Avoid further expanding the scope of Tenant/Timeline types, which are already among the largest in the codebase - You might reasonably wonder how much of the uploader code could be a generic job manager thing. Probably some of it: but let's defer pulling that out until we have at least two users (perhaps secondary downloads will be the second one) to highlight which bits are really generic. Compromises: - Later, instead of using digests of heatmaps to decide whether anything changed, I would prefer to avoid walking the layers in tenants that don't have changes: tracking that will be a bit invasive, as it needs input from both remote_timeline_client and Layer.	2023-12-14 13:09:24 +00:00
John Spray	6a922b1a75	tests: start adding tests for secondary mode, live migration (#5842 ) These tests have been loitering on a branch of mine for a while: they already provide value even without all the secondary mode bits landed yet, and the Workload helper is handy for other tests too. - `Workload` is a re-usable test workload that replaces some of the arbitrary "write a few rows" SQL that I've found my self repeating, and adds a systematic way to append data and check that reads properly reflect the changes. This append+validate stuff is important when doing migrations, as we want to detect situations where we might be reading from a pageserver that has not properly seen latest changes. - test_multi_attach is a validation of how the pageserver handles attaching the same tenant to multiple pageservers, from a safety point of view. This is intentionally separate from the larger testing of migration, to provide an isolated environment for multi-attachment. - test_location_conf_churn is a pseudo-random walk through the various states that TenantSlot can be put into, with validation that attached tenants remain externally readable when they should, and as a side effect validating that the compute endpoint's online configuration changes work as expected. - test_live_migration is the reference implementation of how to drive a pair of pageservers through a zero-downtime migration of a tenant. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-12-11 16:55:43 +00:00
John Spray	e89e41f8ba	tests: update for tenant generations (#5449 ) ## Problem Some existing tests are written in a way that's incompatible with tenant generations. ## Summary of changes Update all the tests that need updating: this is things like calling through the NeonPageserver.tenant_attach helper to get a generation number, instead of calling directly into the pageserver API. There are various more subtle cases.	2023-12-07 12:27:16 +00:00
John Spray	da5e03b0d8	pageserver: add a /reset API for tenants (#6014 ) ## Problem Traditionally we would detach/attach directly with curl if we wanted to "reboot" a single tenant. That's kind of inconvenient these days, because one needs to know a generation number to issue an attach request. Closes: https://github.com/neondatabase/neon/issues/6011 ## Summary of changes - Introduce a new `/reset` API, which remembers the LocationConf from the current attachment so that callers do not have to work out the correct configuration/generation to use. - As an additional support tool, allow an optional `drop_cache` query parameter, for situations where we are concerned that some on-disk state might be bad and want to clear that as well as the in-memory state. One might wonder why I didn't call this "reattach" -- it's because there's already a PS->CP API of that name and it could get confusing.	2023-12-05 15:38:27 +00:00
Arpad Müller	b71b8ecfc2	Add existing_initdb_timeline_id param to timeline creation (#5912 ) This PR adds an `existing_initdb_timeline_id` option to timeline creation APIs, taking an optional timeline ID. Follow-up of #5390. If the `existing_initdb_timeline_id` option is specified via the HTTP API, the pageserver downloads the existing initdb archive from the given timeline ID and extracts it, instead of running initdb itself. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-30 22:32:04 +01:00
John Spray	c48cc020bd	pageserver: fix race between deletion completion and incoming requests (#5941 ) ## Problem This is a narrow race that can leave a stuck Stopping tenant behind, while emitting a log error "Missing InProgress marker during tenant upsert, this is a bug" - Deletion request 1 puts tenant into Stopping state, and fires off background part of DeleteTenantFlow - Deletion request 2 acquires a SlotGuard for the same tenant ID, leaves a TenantSlot::InProgress in place while it checks if the tenant's state is accept able. - DeleteTenantFlow finishes, calls TenantsMap::remove, which removes the InProgress marker. - Deletion request 2 calls SlotGuard::revert, which upserts the old value (the Tenant in Stopping state), and emits the telltale log message. Closes: #5936 ## Summary of changes - Add a regression test which uses pausable failpoints to reproduce this scenario. - TenantsMap::remove is only called by DeleteTenantFlow. Its behavior is tweaked to express the different possible states, especially `InProgress` which carriers a barrier. - In DeleteTenantFlow, if we see such a barrier result from remove(), wait for the barrier and then try removing again. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-29 09:32:26 +00:00
Joonas Koivunen	105edc265c	fix: remove layer_removal_cs (#5108 ) Quest: https://github.com/neondatabase/neon/issues/4745. Follow-up to #4938. - add in locks for compaction and gc, so we don't have multiple executions at the same time in tests - remove layer_removal_cs - remove waiting for uploads in eviction/gc/compaction - #4938 will keep the file resident until upload completes Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-28 19:15:21 +02:00
Joonas Koivunen	831fad46d5	tests: fix allowed_error for compaction detecting a shutdown (#5919 ) This has been causing flaky tests, [example evidence]. Follow-up to #5883 where I forgot to fix this. [example evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5917/6981540065/index.html#suites/9d2450a537238135fd4007859e09aca7/6fd3556a879fa3d1	2023-11-24 16:14:32 +00:00
Joonas Koivunen	044375732a	test: support validating allowed_errors against a logfile (#5905 ) this will make it easier to test if an added allowed_error does in fact match for example against a log file from an allure report. ``` $ python3 test_runner/fixtures/pageserver/allowed_errors.py --help usage: allowed_errors.py [-h] [-i INPUT] check input against pageserver global allowed_errors optional arguments: -h, --help show this help message and exit -i INPUT, --input INPUT Pageserver logs file. Reads from stdin if no file is provided. ``` Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-11-24 12:43:25 +00:00
Christian Schwarz	a0e61145c8	fix: cleanup of layers from the future can race with their re-creation (#5890 ) fixes https://github.com/neondatabase/neon/issues/5878 obsoletes https://github.com/neondatabase/neon/issues/5879 Before this PR, it could happen that `load_layer_map` schedules removal of the future image layer. Then a later compaction run could re-create the same image layer, scheduling a PUT. Due to lack of an upload queue barrier, the PUT and DELETE could be re-ordered. The result was IndexPart referencing a non-existent object. ## Summary of changes * Add support to `pagectl` / Python tests to decode `IndexPart` * Rust * new `pagectl` Subcommand * `IndexPart::{from,to}_s3_bytes()` methods to internalize knowledge about encoding of `IndexPart` * Python * new `NeonCli` subclass * Add regression test * Rust * Ability to force repartitioning; required to ensure image layer creation at last_record_lsn * Python * The regression test. * Fix the issue * Insert an `UploadOp::Barrier` after scheduling the deletions.	2023-11-23 13:33:41 +00:00
Arpad Müller	e310533ed3	Support JWT key reload in pageserver (#5594 ) ## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 15:43:29 +01:00
John Spray	e5c81fef86	tests: minor improvements (#5674 ) Minor changes from while I have been working on HA tests: - Manual pytest executions came with some warnings from `log.warn()` usage - When something fails in a generations-enabled test, it it useful to have a log from the attachment service of what attached when, and with which generation. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-31 11:44:35 +00:00
Arpad Müller	a673e4e7a9	Optionally return json from get_lsn_by_timestamp (#5608 ) This does two things: first a minor refactor to not use HTTP/1.x style header names and also to not panic if some certain requests had no "Accept" header. As a second thing, it addresses the third bullet point from #3689: > Change `get_lsn_by_timestamp` API method to return LSN even if we only found commit before the specified timestamp. This is done by adding a version parameter to the `get_lsn_by_timestamp` API call and making its behaviour depend on the version number. Part of #3414 (but doesn't address it in its entirety). --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-25 18:46:34 +02:00
Arpad Müller	f842b22b90	Add endpoint for querying time info for lsn (#5497 ) ## Problem See #5468. ## Summary of changes Add a new `get_timestamp_of_lsn` endpoint, returning the timestamp associated with the given lsn. Fixes #5468. --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-19 04:50:49 +02:00
Joonas Koivunen	4772cd6c93	fix: deny branching, starting compute from not yet uploaded timelines (#5484 ) Part of #5172. First commits show that we used to allow starting up a compute or creating a branch off a not yet uploaded timeline. This PR moves activation of a timeline to happen after initial layer file(s) (if any) and `index_part.json` have been uploaded. Simply moving activation to be after downloads have finished works because we now spawn a task per http request handler. Current behaviour of uploading on the timelines on next startup is kept, to be removed later as part of #5172. Adds: - `NeonCli.map_branch` and corresponding `neon_local` implementation: allow creating computes for timelines managed via pageserver http client/api - possibly duplicate tests (I did not want to search for, will cleanup in a follow-up if these duplicated) Changes: - make `wait_until_tenant_state` return immediatedly on `Broken` and not wait more	2023-10-09 17:03:38 +03:00
John Spray	ace0c775fc	pageserver: prefer 503 to 500 for transient unavailability (#5439 ) ## Problem The 500 status code should only be used for bugs or unrecoverable failures: situations we did not expect. Currently, the pageserver is misusing this response code for some situations that are totally normal, like requests targeting tenants that are in the process of activating. The 503 response is a convenient catch-all for "I can't right now, but I will be able to". ## Summary of changes - Change some transient availability error conditions to return 503 instead of 500 - Update the HTTP client configuration in integration tests to retry on 503 After these changes, things like creating a tenant and then trying to create a timeline within it will no longer require carefully checking its status first, or retrying on 500s. Instead, a client which is properly configured to retry on 503 can quietly handle such situations.	2023-10-03 17:00:55 +01:00
John Spray	ba92668e37	pageserver: deletion queue & generation validation for deletions (#5207 ) ## Problem Pageservers must not delete objects or advertise updates to remote_consistent_lsn without checking that they hold the latest generation for the tenant in question (see [the RFC]( https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md)) In this PR: - A new "deletion queue" subsystem is introduced, through which deletions flow - `RemoteTimelineClient` is modified to send deletions through the deletion queue: - For GC & compaction, deletions flow through the full generation verifying process - For timeline deletions, deletions take a fast path that bypasses generation verification - The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced with a mechanism that maintains a "projected" lsn (equivalent to the previous property), and a "visible" LSN (which is the one that we may share with safekeepers). - Until `control_plane_api` is set, all deletions skip generation validation - Tests are introduced for the new functionality in `test_pageserver_generations.py` Once this lands, if a pageserver is configured with the `control_plane_api` configuration added in https://github.com/neondatabase/neon/pull/5163, it becomes safe to attach a tenant to multiple pageservers concurrently. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-26 16:11:55 +01:00
Joonas Koivunen	55371af711	test: workaround known bad mock_s3 ListObjectsV2 response (#5330 ) this should allow test test_delete_tenant_exercise_crash_safety_failpoints with debug-pg16-Check.RETRY_WITH_RESTART-mock_s3-tenant-delete-before-remove-timelines-dir-True to pass more reliably.	2023-09-18 09:24:53 +02:00
bojanserafimov	c0ed362790	Measure pageserver wal recovery time and fix flush() method (#5240 )	2023-09-11 09:46:06 -04:00
Joonas Koivunen	ff87fc569d	test: Remote storage refactorings (#5243 ) Remote storage cleanup split from #5198: - pageserver, extensions, and safekeepers now have their separate remote storage - RemoteStorageKind has the configuration code - S3Storage has the cleanup code - with MOCK_S3, pageserver, extensions, safekeepers use different buckets - with LOCAL_FS, `repo_dir / "local_fs_remote_storage" / $user` is used as path, where $user is `pageserver`, `safekeeper` - no more `NeonEnvBuilder.enable_xxx_remote_storage` but one `enable_{pageserver,extensions,safekeeper}_remote_storage` Should not have any real changes. These will allow us to default to `LOCAL_FS` for pageserver on the next PR, remove `RemoteStorageKind.NOOP`, work towards #5172. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-09-08 13:54:23 +03:00
John Spray	61d661a6c3	pageserver: generation number fetch on startup and use in /attach (#5163 ) ## Problem - #5050 Closes: https://github.com/neondatabase/neon/issues/5136 ## Summary of changes - A new configuration property `control_plane_api` controls other functionality in this PR: if it is unset (default) then everything still works as it does today. - If `control_plane_api` is set, then on startup we call out to control plane `/re-attach` endpoint to discover our attachments and their generations. If an attachment is missing from the response we implicitly detach the tenant. - Calls to pageserver `/attach` API may include a `generation` parameter. If `control_plane_api` is set, then this parameter is mandatory. - RemoteTimelineClient's loading of index_part.json is generation-aware, and will try to load the index_part with the most recent generation <= its own generation. - The `neon_local` testing environment now includes a new binary `attachment_service` which implements the endpoints that the pageserver requires to operate. This is on by default if running `cargo neon` by hand. In `test_runner/` tests, it is off by default: existing tests continue to run with in the legacy generation-less mode. Caveats: - The re-attachment during startup assumes that we are only re-attaching tenants that have previously been attached, and not totally new tenants -- this relies on the control plane's attachment logic to keep retrying so that we should eventually see the attach API call. That's important because the `/re-attach` API doesn't tell us which timelines we should attach -- we still use local disk state for that. Ref: https://github.com/neondatabase/neon/issues/5173 - Testing: generations are only enabled for one integration test right now (test_pageserver_restart), as a smoke test that all the machinery basically works. Writing fuller tests that stress tenant migration will come later, and involve extending our test fixtures to deal with multiple pageservers. - I'm not in love with "attachment_service" as a name for the neon_local component, but it's not very important because we can easily rename these test bits whenever we want. - Limited observability when in re-attach on startup: when I add generation validation for deletions in a later PR, I want to wrap up the control plane API calls in some small client class that will expose metrics for things like errors calling the control plane API, which will act as a strong red signal that something is not right. Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-06 14:44:48 +01:00
Joonas Koivunen	d1fcdf75b3	test: enhanced logging for curious mock_s3 (#5134 ) Possible flakyness with mock_s3. Add logging in hopes this will happen again. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-08-29 14:48:50 +03:00
Dmitry Rodionov	9140a950f4	Resume tenant deletion on attach (#5039 ) I'm still a bit nervous about attach -> crash case. But it should work. (unlike case with timeline). Ideally would be cool to cover this with test. This continues tradition of adding bool flags for Tenant::set_stopping. Probably lifecycle project will help with fixing it.	2023-08-20 12:28:50 +03:00
Dmitry Rodionov	64fc7eafcd	Increase timeout once again. (#5021 ) When failpoint is early in deletion process it takes longer to complete after failpoint is removed. Example was: https://neon-github-public-dev.s3.amazonaws.com/reports/main/5889544346/index.html#suites/3556ed71f2d69272a7014df6dcb02317/49826c68ce8492b1	2023-08-17 15:37:28 +03:00
Dmitry Rodionov	d8b0a298b7	Do not attach deleted tenants (#5008 ) Rather temporary solution before proper: https://github.com/neondatabase/neon/issues/5006 It requires more plumbing so lets not attach deleted tenants first and then implement resume. Additionally fix `assert_prefix_empty`. It had a buggy prefix calculation, and since we always asserted for absence of stuff it worked. Here I started to assert for presence of stuff too and it failed. Added more "presence" asserts to other places to be confident that it works. Resolves [#5016](https://github.com/neondatabase/neon/issues/5016)	2023-08-17 13:46:49 +03:00
John Spray	5c836ee5b4	tests: extend timeout in timeline deletion test (#4992 ) ## Problem This was set to 5 seconds, which was very close to how long a compaction took on my workstation, and when deletion is blocked on compaction the test would fail. We will fix this to make compactions drop out on deletion, but for the moment let's stabilize the test. ## Summary of changes Change timeout on timeline deletion in `test_timeline_deletion_with_files_stuck_in_upload_queue` from 5 seconds to 30 seconds.	2023-08-15 20:14:03 +03:00
Dmitry Rodionov	52c2c69351	fsync directory before mark file removal (#4986 ) ## Problem Deletions can be possibly reordered. Use fsync to avoid the case when mark file doesnt exist but other tenant/timeline files do. See added comments. resolves #4987	2023-08-15 19:24:23 +03:00

1 2

71 Commits