rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-08 06:30:37 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	ffdf7df2ea	Update pageserver/src/tenant/timeline.rs Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	74ab232afb	Update pageserver/src/tenant/timeline.rs Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	2cf02b381c	Update pageserver/src/keyspace.rs Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	7d4ebf8485	Update pageserver/src/keyspace.rs Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	9b9b125d13	Make clippy happy	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	43187715d6	Add KeySpaceRandomAccum	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	2af45505b8	test_runner/performance/test_gc_feedback.py	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	8b05a87f75	Add test that no redundant image are generatd if them are wanted by GC	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	3bc4a7c1e2	Add test that no redundant image are generatd if them are wanted by GC	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	3d0a51567f	Fix KeySpace.add_range	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	7e6dbc32d1	Update pageserver/src/tenant/timeline.rs Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	af75d59b4c	Update pageserver/src/tenant/timeline.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	f838a11514	Update pageserver/src/keyspace.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	f0fe03ea80	Make clippy happy	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	be22be7b24	Make clippy happy	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	451479305e	Use KeySpace for passing infirmation about wanted image layers from GC to copaction task	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	5e690307fb	Avoid redundant generation of wanted image layers if such layer already exists beyond GC cutoff horizon	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	3e6288d7d8	Update pageserver/src/tenant/timeline.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	2d015a1464	Update pageserver/src/tenant/timeline.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	fcb9bac847	Revert changes in key space partitioning	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	7a3d6531b8	Revert "fix KeySpace initialization in bench_layer_map.rs" This reverts commit 63b1fcb813ca5f40a2b1328d4cb6e21646fba69f.	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	3275305a30	Revert "Split keyspace in partitions without holes" This reverts commit 02c0e9082f804ccf201fe1cf07eb167b697ea9a3.	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	0deca452bf	Add comments	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	e8066631a6	Update pageserver/src/tenant/timeline.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	e069c409ef	Update pageserver/src/tenant/timeline.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	7f81d57d52	Update pageserver/src/tenant/timeline.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	787c4a8bbb	Update pageserver/src/tenant/timeline.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	6ec9922184	Make clippy happy	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	9b418a71ac	fix KeySpace initialization in bench_layer_map.rs	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	1bb8ca0806	Split keyspace in partitions without holes	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	a1c8e74fb9	Add test for GC of stairs layers	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	f9999c84d9	Rebase with main	2023-05-16 21:18:20 +03:00
Konstantin Knizhnik	c01c31d045	Add comment exlaining wanted_image_layers	2023-05-16 21:18:19 +03:00
Konstantin Knizhnik	4da24ba34f	Pass set of wanted image layers from GC to compaction	2023-05-16 21:18:19 +03:00
Joonas Koivunen	4a76f2b8d6	upload new timeline index part json before 201 or on retry (#4204 ) Await for upload to complete before returning 201 Created on `branch_timeline` or when `bootstrap_timeline` happens. Should either of those waits fail, then on the retried request await for uploads again. This should work as expected assuming control-plane does not start to use timeline creation as a wait_for_upload mechanism. Fixes #3865, started from https://github.com/neondatabase/neon/pull/3857/files#r1144468177 Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-15 14:16:43 +03:00
Shany Pozin	9cd6f2ceeb	Remove duplicated logic in creating TenantConfOpt (#4230 ) ## Describe your changes Remove duplicated logic in creating TenantConfOpt in both TryFrom of TenantConfigRequest and TenantCreateRequest	2023-05-15 10:08:44 +03:00
Heikki Linnakangas	2855c73990	Fix race condition after attaching tenant with branches. (#4170 ) After tenant attach, there is a window where the child timeline is loaded and accepts GetPage requests, but its parent is not. If a GetPage request needs to traverse to the parent, it needs to wait for the parent timeline to become active, or it might miss some records on the parent timeline. It's also possible that the parent timeline is active, but it hasn't yet received all the WAL up to the branch point from the safekeeper. This happens if a pageserver crashes soon after creating a timeline, so that the WAL leading to the branch point has not yet been uploaded to remote storage. After restart, the WAL will be re-streamed and ingested from the safekeeper, but that takes a while. Because of that, it's not enough to check that the parent timeline is active, we also need to wait for the WAL to arrive on the parent timeline, just like at the beginning of GetPage handling. We probably should change the behavior at create_timeline so that a timeline can only be created after all the WAL up to the branch point has been uploaded to remote storage, but that's not currently the case and out of scope for this PR (see github issue #4218). @NanoBjorn encountered this while working on tenant migration. After migrating a tenant with a parent and child branch, connecting to the child branch failed with an error like: ``` FATAL: "base/16385" is not a valid data directory DETAIL: File "base/16385/PG_VERSION" is missing. ``` This commit adds two tests that reproduce the bug, with slightly different symptoms.	2023-05-13 10:44:11 +03:00
Christian Schwarz	edcf4d61a4	distinguish imitated from real size::gather_input calls in metrics (#4224 ) Before this PR, the gather_inputs() calls made to imitate synthetic size calculation accesses were accounted towards the real logical size calculation metric. This PR forces all callers to declare the cause for making logical size calculations, making the decision which cause counts towards which metric explicit. This is follow-up to ``` commit `1d266a6365` Author: Christian Schwarz <christian@neon.tech> Date: Thu May 11 16:09:29 2023 +0200 logical size calculation metrics: differentiate regular vs imitated (#4197) ``` After merging this patch, I hope to be able to explain why we have ca 30x more "logical size" ops in prod than "imitate logical size" for any given observation interval. refs https://github.com/neondatabase/neon/issues/4154	2023-05-12 17:57:33 +00:00
Christian Schwarz	a2a9c598be	add counter metric that increases whenever a background loop overruns its period (#4223 ) We already have the warn!() log line for this condition. This PR adds a corresponding metric on which we can have a dedicated alert. Cheaper and more reliable than alerting on the logs, because, we run into log rate limits from time to time these days. refs https://github.com/neondatabase/neon/issues/4222	2023-05-12 19:00:06 +03:00
Christian Schwarz	5869234290	logical size calculation: spawn with in_current_span (#4196 ) While investigating https://github.com/neondatabase/neon/issues/4154 I found that the `Calculating logical size for timeline` tracing events created from within the logical size computation code are not always attributable to the background task that caused it. My goal is to be able to distinguish in the logs whether a `Calculating logical size for timeline` was logged as part of a real synthetic size calculation VS an imitation by the eviction task. I want this distinction so I can prove my assumption that the disk IO peaks which we see every 24h on prod are due to eviction's imitate synthetic size calculations. The alternative here, which I would have preferred, but is more work: link RequestContext's into a child->parent list and dump this list when we log `Calculating logical size for timeline`. I would have preferred that over what we have in this PR because, technically, the ondemand logical size computation can outlive the caller that spawned it. This is against the idea of correctly nested spans. I guess in OpenTelemetry land, the correct modelling would be a link between the caller's span and the task_mgr task's span. Anyways, I think the case where we hang up on the spawned ondemand logical size calculation is quite rare. So, I'm willing to tolerate incorrectly nested spans for these edge-cases. refs https://github.com/neondatabase/neon/issues/4154	2023-05-12 15:36:30 +02:00
Christian Schwarz	845e296562	eviction: add global histogram for iteration durations (#4212 ) I would like to know whether and by how much the eviction iterations spike in the $period-sized window that happens every $threshold , when all the timelines do the imitate accesses. refs https://github.com/neondatabase/neon/issues/4154	2023-05-11 18:02:19 +03:00
Christian Schwarz	1d266a6365	logical size calculation metrics: differentiate regular vs imitated (#4197 ) I want this distinction so I can prove my assumption that the disk IO peaks which we see every 24h on prod are due to eviction's imitate synthetic size calculations. refs https://github.com/neondatabase/neon/issues/4154	2023-05-11 17:09:29 +03:00
Christian Schwarz	80522a1b9d	replace has_in_progress_downloads with new attachment_status field (#4168 ) Control Plane currently [^1] polls for `has_in_progress_downloads == false` after /attach to determine that an attach operation succeeded. As pointed out in the OpenAPI spec as of neon#4151, polling for `has_in_progress_downloads` is incorrect. This patch changes the situation by - removing `has_in_progress_downloads` - adding a new field `attachment_status.` - changing instructions for `/attach` to poll for `attachment_status == attached`. This makes the instructions in `/attach` actionable for Control Plane. NB that we don't expose the TenantState in the OpenAPI docs, even though we expose it in the endpoint. That is with good reason because we don't want to commit to a fixed set of tenant states forever. Hence, the separate `attachment_status` field that exposes the bare minimum required to make /attach + subsequent polling 100% safe wrt split brain. It would have been nice to report failures explicitly, but the problem is that we lose that state when we restart. So, we return `attached` upon attach failure. The tenant is Broken in that case, causing Control Plane's subsequent health check will fail. Control Plane can roll back the relocation operation then. NB: the reliance on the subsequent health check is no change to what we had before this patch! NB: we can always add additional TenantAttachmentStatus'es in the future to communicate failure. This PR also moves the attach-marker file's creation to the API handler's synchronous part. That was done to avoid the need to distinguish * `Attaching but marker not yet written => AttachmentStatus::Maybe` from * `Attaching, marker written, but attach failed for other reason => AttachmentStatus::Attached` Coincidentally, this also adds more transactionality to the /attach API because we only return 202 once we've written the marker file. But, in the end, it doesn't affect how the control plane interacts with us or how it needs to do retries. So, we don't mention any of this in the API docs. [^1]: The one-click tenant relocation PR cloud#4740, currently WIP, is the first real user.	2023-05-11 16:53:46 +03:00
Joonas Koivunen	ecced13d90	try: higher page_service timeouts to isolate an issue (#4206 ) See #4205.	2023-05-11 16:14:42 +03:00
Dmitry Rodionov	eb3a8be933	keep track of timeline deletion status in IndexPart to prevent timeline resurrection (#3919 ) Before this patch, the following sequence would lead to the resurrection of a deleted timeline: - create timeline - wait for its index part to reach s3 - delete timeline - wait an arbitrary amount of time, including 0 seconds - detach tenant - attach tenant - the timeline is there and Active again This happens because we only kept track of the deletion in the tenant dir (by deleting the timeline dir) but not in S3. The solution is to turn the deleted timeline's IndexPart into a tombstone. The deletion status of the timeline is expressed in the `deleted_at: Option<NativeDateTime>` field of IndexPart. It's `None` while the timeline is alive and `Some(deletion time stamp)` if it is deleted. We change the timeline deletion handler to upload this tombstoned IndexPart. The handler does not return success if the upload fails. Coincidentally, this fixes the long-stanging TODO about the `std::fs::remove_dir_all` being not atomic. It need not be atomic anymore because we set the `deleted_at=Some()` before starting the `remove_dir_all`. The tombstone is in the IndexPart only, not in the `metadata`. So, we only have the tombstone and the `remove_dir_all` benefits mentioned above if remote storage is configured. This was a conscious trade-off because there's no good format evolution story for the current metadata file format. The introduction of this additional step into `delete_timeline` was painful because delete_timeline needs to be 1. cancel-safe 2. idempotent 3. safe to call concurrently These are mostly self-inflicted limitations that can be avoided by using request-coalescing. PR https://github.com/neondatabase/neon/pull/4159 will do that. fixes https://github.com/neondatabase/neon/issues/3560 refs https://github.com/neondatabase/neon/issues/3889 (part of tenant relocation) Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-05-10 10:27:12 +02:00
Christian Schwarz	3ec52088dd	eviction_task: tracing::instrument the imitate-access calls (#4180 ) Currently, if we unexpectly download from the eviction task, the log lines look like what we have in https://github.com/neondatabase/neon/issues/4154 ``` 2023-05-04T14:42:57.586772Z WARN eviction_task{tenant_id=$TENANT timeline_id=$TIMELINE}:eviction_iteration{policy_kind="LayerAccessThreshold"}: unexpectedly on-demand downloading remote layer remote $TIMELINE/000000067F000032AC0000400C00FFFFFFFF-000000067F000032AC000040140000000008__0000000001696070-0000000003DC76E9 for task kind Eviction ``` We know these are caused by the imitate accesses. But we don't know which one (my bet is on update_gc_info). I didn't want to pollute the other tasks' logs with the additional spans, so, using `.instrument()` when we call non-eviction-task code. refs https://github.com/neondatabase/neon/issues/4154	2023-05-09 18:16:22 +02:00
Christian Schwarz	411c71b486	document current tenant attach API semantics (#4151 ) We currently return 202 as soon as the tenant is allocated in memory before we've written out the marker file. So, the /attach API currently does not have a transactional character. For example, it can happen that we respond with a 202 and then crash before writing out the marker file. In such a case, it is important that the client 1. observes the lost attach (by polling tenant status and observing 404) 2. and consequently retries the attach. It has to do it in this loop until it observes the tenant as "Active" in the tenant status. If the client doesn't follow this protocol and instead goes to another pageserver to attach the tenant, we risk a split-brain situation where both the first and second pageserver write to the tenant's S3 state. The improved description highlights the consequences of this behavior for clients that use the /attach endpoint. The tenant relocation that is currently being implemented in cloud#4740 implements retries of Attach and it does poll afterwards, but, it polls `has_in_progress_downloads`. That is incorrect, as described in the patch body. The motivation for this write-up is that, in a future PR, we'll extend the /attach endpoint with an option to provide the tenant config. If we decide to leave the non-transactional behavior of /attach unmodified, we will be able to avoid persisting the tenant config. Conversely, if we decide that the /attach API should become transactional, we'll need to persist the tenant config in the attach-marker-file before acknowledging receipt of the /attach operation. refs https://github.com/neondatabase/cloud/pull/4740 refs https://github.com/neondatabase/neon/issues/2238 refs https://github.com/neondatabase/neon/issues/1555	2023-05-05 19:32:41 +03:00
Christian Schwarz	88f39c11d4	refactor: the code that builds TenantConfOpt from mgmt API requests (#4152 ) - extract code that builds TenantConfOpt from requests into a From<> impl - move map_err(ApiError::BadRequest) into callers	2023-05-04 18:10:40 +03:00
Christian Schwarz	7dd9553bbb	eviction: regression test + distinguish layer write from map insert (#4005 ) This patch adds a regression test for the threshold-based layer eviction. The test asserts the basic invariant that, if left alone, the residence statuses will stabilize, with some layers resident and some layers evicted. Thereby, we cover both the aspect of last-access-time-threshold-based eviction, and the "imitate access" hacks that we put in recently. The aggressive `period` and `threshold` values revealed a subtle bug which is also fixed in this patch. The symptom was that, without the Rust changes of this patch, there would be occasional test failures due to `WARN... unexpectedly downloading` log messages. These log messages were caused by the "imitate access" calls of the eviction task. But, the whole point of the "imitate access" hack was to prevent eviction of the layers that we access there. After some digging, I found the root cause, which is the following race condition: 1. Compact: Write out an L1 layer from several L0 layers. This records residence event `LayerCreate` with the current timestamp. 2. Eviction: imitate access logical size calculation. This accesses the L0 layers because the L1 layer is not yet in the layer map. 3. Compact: Grab layer map lock, add the new L1 to layer map and remove the L0s, release layer map lock. 4. Eviction: observes the new L1 layer whose only activity timestamp is the `LayerCreate` event. The L1 layer had no chance of being accessed until after (3). So, if enough time passes between (1) and (3), then (4) will observe a layer with `now-last_activity > threshold` and evict it The fix is to require the first `record_residence_event` to happen while we already hold the layer map lock. The API requires a ref to a `BatchedUpdates` as a witness that we are inside a layer map lock. That is not fool-proof, e.g., new call sites for `insert_historic` could just completely forget to record the residence event. It would be nice to prevent this at the type level. In the meantime, we have a rate-limited log messages to warn us, if such an implementation error sneaks in in the future. fixes https://github.com/neondatabase/neon/issues/3593 fixes https://github.com/neondatabase/neon/issues/3942 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-04 16:16:48 +02:00
Christian Schwarz	f9839a0dd9	import_basebackup_from_tar: don't load local layers twice (#4111 ) PR #4104 removed these bits as part of a revert of a larger change. follow-up to https://github.com/neondatabase/neon/pull/4104#discussion_r1180444952 --- Let's not merge this before the release.	2023-05-04 09:23:49 +02:00

1 2 3 4 5 ...

1326 Commits