From 788b5362a1373a54555d6a7ce45c4b66f50eae1f Mon Sep 17 00:00:00 2001 From: Yingwen Date: Thu, 2 Feb 2023 11:28:56 +0800 Subject: [PATCH] docs: Add procedure framework RFC (#836) * docs: Add procedure framework RFC * docs: Add dump, rollback and locking to procedure framework * docs: Change ProcedureBuilder to ProcedureLoader * docs: Add sub-procedures section * docs: Add a link to explain idempotent * docs: Add link to the tracking issue * docs: Fix ProcedureLoader type alias * docs: Update procedure API * docs: Address CR comments * docs: Update path and make the docs more clear --- docs/rfcs/2023-01-03-procedure-framework.md | 153 ++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 docs/rfcs/2023-01-03-procedure-framework.md diff --git a/docs/rfcs/2023-01-03-procedure-framework.md b/docs/rfcs/2023-01-03-procedure-framework.md new file mode 100644 index 0000000000..a428c6ab31 --- /dev/null +++ b/docs/rfcs/2023-01-03-procedure-framework.md @@ -0,0 +1,153 @@ +--- +Feature Name: "procedure-framework" +Tracking Issue: https://github.com/GreptimeTeam/greptimedb/issues/286 +Date: 2023-01-03 +Author: "Yingwen " +--- + +Procedure Framework +---------------------- + +# Summary +A framework for executing operations in a fault-tolerant manner. + +# Motivation +Some operations in GreptimeDB require multiple steps to implement. For example, creating a table needs: +1. Check whether the table exists +2. Create the table in the table engine + 1. Create a region for the table in the storage engine + 2. Persist the metadata of the table to the table manifest +3. Add the table to the catalog manager + +If the node dies or restarts in the middle of creating a table, it could leave the system in an inconsistent state. The procedure framework, inspired by [Apache HBase's ProcedureV2 framework](https://github.com/apache/hbase/blob/bfc9fc9605de638785435e404430a9408b99a8d0/src/main/asciidoc/_chapters/pv2.adoc) and [Apache Accumulo’s FATE framework](https://accumulo.apache.org/docs/2.x/administration/fate), aims to provide a unified way to implement multi-step operations that is tolerant to failure. + +# Details +## Overview +The procedure framework consists of the following primary components: +- A `Procedure` represents an operation or a set of operations to be performed step-by-step +- `ProcedureManager`, the runtime to run `Procedures`. It executes the submitted procedures, stores procedures' states to the `ProcedureStore` and restores procedures from `ProcedureStore` while the database restarts. +- `ProcedureStore` is a storage layer for persisting the procedure state + + +## Procedures +The `ProcedureManager` keeps calling `Procedure::execute()` until the Procedure is done, so the operation of the Procedure should be [idempotent](https://developer.mozilla.org/en-US/docs/Glossary/Idempotent): it needs to be able to undo or replay a partial execution of itself. + +```rust +trait Procedure { + fn execute(&mut self, ctx: &Context) -> Result; + + fn dump(&self) -> Result; + + fn rollback(&self) -> Result<()>; + + // other methods... +} +``` + +The `Status` is an enum that has the following variants: +```rust +enum Status { + Executing { + persist: bool, + }, + Suspended { + subprocedures: Vec, + persist: bool, + }, + Done, +} +``` + +A call to `execute()` can result in the following possibilities: +- `Ok(Status::Done)`: we are done +- `Ok(Status::Executing { .. })`: there are remaining steps to do +- `Ok(Status::Suspend { sub_procedure, .. })`: execution is suspended and can be resumed later after the sub-procedure is done. +- `Err(e)`: error occurs during execution and the procedure is unable to proceed anymore. + +Users need to assign a unique `ProcedureId` to the procedure and the procedure can get this id via the `Context`. The `ProcedureId` is typically a UUID. + +```rust +struct Context { + id: ProcedureId, + // other fields ... +} +``` + +The `ProcedureManager` calls `Procedure::dump()` to serialize the internal state of the procedure and writes to the `ProcedureStore`. The `Status` has a field `persist` to tell the `ProcedureManager` whether it needs persistence. + +## Sub-procedures +A procedure may need to create some sub-procedures to process its subtasks. For example, creating a distributed table with multiple regions (partitions) needs to set up the regions in each node, thus the parent procedure should instantiate a sub-procedure for each region. The `ProcedureManager` makes sure that the parent procedure does not proceed till all sub-procedures are successfully finished. + +The procedure can submit sub-procedures to the `ProcedureManager` by returning `Status::Suspended`. It needs to assign a procedure id to each procedure manually so it can track the status of the sub-procedures. +```rust +struct ProcedureWithId { + id: ProcedureId, + procedure: BoxedProcedure, +} +``` + +## ProcedureStore +We might need to provide two different ProcedureStore implementations: +- In standalone mode, it stores data on the local disk. +- In distributed mode, it stores data on the meta server or the object store service. + +These implementations should share the same storage structure. They store each procedure's state in a unique path based on the procedure id: + +``` +Sample paths: + +/procedures/{PROCEDURE_ID}/000001.step +/procedures/{PROCEDURE_ID}/000002.step +/procedures/{PROCEDURE_ID}/000003.commit +``` + +`ProcedureStore` behaves like a WAL. Before performing each step, the `ProcedureManager` can write the procedure's current state to the ProcedureStore, which stores the state in the `.step` file. The `000001` in the path is a monotonic increasing sequence of the step. After the procedure is done, the `ProcedureManager` puts a `.commit` file to indicate the procedure is finished (committed). + +The `ProcedureManager` can remove the procedure's files once the procedure is done, but it needs to leave the `.commit` as the last file to remove in case of failure during removal. + +## ProcedureManager +`ProcedureManager` executes procedures submitted to it. + +```rust +trait ProcedureManager { + fn register_loader(&self, name: &str, loader: BoxedProcedureLoader) -> Result<()>; + + async fn submit(&self, procedure: ProcedureWithId) -> Result<()>; +} +``` + +It supports the following operations: +- Register a `ProcedureLoader` by the type name of the `Procedure`. +- Submit a `Procedure` to the manager and execute it. + +When `ProcedureManager` starts, it loads procedures from the `ProcedureStore` and restores the procedures by the `ProcedureLoader`. The manager stores the type name from `Procedure::type_name()` with the data from `Procedure::dump()` in the `.step` file and uses the type name to find a `ProcedureLoader` to recover the procedure from its data. + +```rust +type BoxedProcedureLoader = Box Result + Send>; +``` + +## Rollback +The rollback step is supposed to clean up the resources created during the execute() step. When a procedure has failed, the `ProcedureManager` puts a `rollback` file and calls the `Procedure::rollback()` method. + + +```text +/procedures/{PROCEDURE_ID}/000001.step +/procedures/{PROCEDURE_ID}/000002.rollback +``` + +Rollback is complicated to implement so some procedures might not support rollback or only provide a best-efforts approach. + +## Locking +The `ProcedureManager` can provide a locking mechanism that gives a procedure read/write access to a database object such as a table so other procedures are unable to modify the same table while the current one is executing. + +Sub-procedures always inherit their parents' locks. The `ProcedureManager` only acquires locks for a procedure if its parent doesn't hold the lock. + +# Drawbacks +The `Procedure` framework introduces additional complexity and overhead to our database. +- To execute a `Procedure`, we need to write to the `ProcedureStore` multiple times, which may slow down the server +- We need to rewrite the logic of creating/dropping/altering a table using the procedure framework + +# Alternatives +Another approach is to tolerate failure during execution and allow users to retry the operation until it succeeds. But we still need to: +- Make each step idempotent +- Record the status in some place to check whether we are done