make TenantState::{Loading,Attaching,Activating} owned by spawn_load / spawn_attach

See the Mermaid diagram in the doc comment for the now-possible state transitions.

The two core insights / changes are:
- spawn_load and spawn_attach own the tenant state until they're done
- once load()/attach() calls are done
    - if they failed, transition them to Broken directly (we know
      that there's no background activity because we didn't call activate yet)
    - if they succeed, call activate. We can make it infallible. How? Later.

- set_broken() and set_stopping() are changed to wait for spawn_load() /
  spawn_attach() to finish. This sounds scary because it might hinder
  detach or shutdown, but actually, concurrent attach+detach, or
  attach+shutdown, or load+shutdown, or attach+shutdown were just racy.
  With this change, they're not anymore.
  We can add a CancellationToken stored in Tenant for load/attach and cancel
  it from set_stopping() or set_broken() if necessary in the future.

So, why can activate() be infallible now: because we declare that
spawn_load and spawn_attach own the tenant state until they're done.
And we enforce that ownership using the wait_for at the start of
set_stopping and set_broken.
This commit is contained in:
Christian Schwarz
2023-05-25 14:07:00 +02:00
parent f18d9f555b
commit de780d2e0f
7 changed files with 130 additions and 114 deletions

View File

@@ -18,7 +18,29 @@ use crate::reltag::RelTag;
use anyhow::bail;
use bytes::{BufMut, Bytes, BytesMut};
/// A state of a tenant in pageserver's memory.
/// The state of a tenant in this pageserver.
///
/// ```mermaid
/// stateDiagram-v2
///
/// [*] --> Loading: spawn_load()
/// [*] --> Attaching: spawn_attach()
///
/// Loading --> Activating: activate()
/// Attaching --> Activating: activate()
/// Activating --> Active: infallible
///
/// Loading --> Broken: load() failure
/// Attaching --> Broken: attach() failure
///
/// Active --> Stopping: set_stopping(), part of shutdown & detach
/// Stopping --> Broken: late error in remove_tenant_from_memory
///
/// Broken --> [*]: ignore / detach / shutdown
/// Stopping --> [*]: remove_from_memory complete
///
/// Active --> Broken: cfg(testing)-only tenant break point
/// ```
#[derive(
Clone,
PartialEq,
@@ -35,11 +57,11 @@ use bytes::{BufMut, Bytes, BytesMut};
pub enum TenantState {
/// This tenant is being loaded from local disk
Loading,
/// This tenant is being downloaded from cloud storage.
/// This tenant is being attached to the pageserver.
Attaching,
/// The tenant is transitioning from Loading/Attaching to Active.
Activating,
/// Tenant is fully operational
/// The tenant has finished activating and is open for business.
Active,
/// A tenant is recognized by pageserver, but it is being detached or the
/// system is being shut down.