From 2f338daf171b11f624c244b0685b6d87f81cec90 Mon Sep 17 00:00:00 2001 From: Alex Chi Z Date: Fri, 25 Apr 2025 15:35:45 -0400 Subject: [PATCH] rfc: new encryption Signed-off-by: Alex Chi Z --- docs/rfcs/2025-04-14-storage-keys.md | 117 ++++++++++++--------------- 1 file changed, 53 insertions(+), 64 deletions(-) diff --git a/docs/rfcs/2025-04-14-storage-keys.md b/docs/rfcs/2025-04-14-storage-keys.md index c893403496..594a15acf6 100644 --- a/docs/rfcs/2025-04-14-storage-keys.md +++ b/docs/rfcs/2025-04-14-storage-keys.md @@ -9,7 +9,6 @@ to provide at least tenant granularity, but will use timeline granularity when i so. Out of scope: -- We describe lifecycle of keys here but not the encryption of user data with these keys. - We describe an abstract KMS interface, but not particular platform implementations (such as how to authenticate with KMS). @@ -19,11 +18,11 @@ _wrapped/unwrapped_: a wrapped encryption key is a key encrypted by another key. encrypting a timeline's pageserver data might be wrapped by some "root" key for the tenant's user account, stored in a KMS system. _key hierarchy_: the relationships between keys which wrap each other. For example, a layer file key might -be wrapped by a pageserver timeline key, which is wrapped by a tenant's root key. +be wrapped by a pageserver tenant key, which is wrapped by a tenant's root key. ## Design Choices -Storage: S3 will be the store of record for wrapped keys +Storage: S3 will be the store of record for wrapped keys. Separate keys: Safekeeper and Pageserver will use independent keys. @@ -34,6 +33,9 @@ Per-object keys: rather than encrypting data objects (layer files and segment fi the tenant keys directly, they will be encrypted with separate keys. This avoids cryptographic safety issues from re-using the same key for large quantities of potentially repetitive plaintext. +S3 objects are self-contained: each encrypted file will have a metadata block in the file itself +storing the KMS-wrapped key to decrypt itself. + Key storage is optional at a per-tenant granularity: eventually this would be on by default, but: - initially only some environments will have a KMS set up. - Encryption has some overhead and it may be that some tenants don't want or need it. @@ -42,11 +44,10 @@ Key storage is optional at a per-tenant granularity: eventually this would be on ### Summary of format changes +- Pageserver layer files and safekeeper segment objects are split into blocks and each + block is encrypted by the layer key. - Pageserver layer files and safekeeper segment objects get new metadata fields to - store wrapped key and version of the wrapping key -- Pageserver timeline index gets a new `keys` field to store wrapped timeline keys -- Safekeeper gets a new per-timeline manifest object in S3 to store wrapped timeline keys -- Pageserver timeline index gets per-layer metadata for wrapped key and wrapping version + store wrapped layer key and the KMS-wrapped timeline key. ### Summary of API changes @@ -68,80 +69,68 @@ The KMS deals with abstract "account IDs", which are not equal to tenant IDs and 1:1 with tenants. The account ID will be provided as part of tenant configuration, along with a field to identify an encryption mode. -### Pageserver key storage -The wrapped pageserver timeline key will be stored in the timeline index object. Because of -key rotation, multiple keys will be stored in an array, with each key having a counter version. +### Pageserver Layer File Format + +Encryption blocks are the minimum of unit of read. To read the part of the data within the encryption block +we must decrypt the whole block. All encryption blocks share the same layer key within the layer (is this safe?). + +Image layers: each image is one encryption block. + +Delta layers: for the first stage of the project, each delta is encrypted separately; in the future, we can batch +several small deltas into a single encryption block. + +Indicies: each B+ tree node is an encryption block. + +Layer format: ``` -"keys": [ - { - # The key version: a new key with the next version is generated when rekeying - "version": 1, - # The wrapped key: this is unwrapped by a KMS API call when the key is to be used - "wrapped": "", - # The time the key was generated: this may be used to implement rekeying/key rotation - # policies. - "ctime": "", - }, - ... -] +| Data Block | Data Block | Data Block | ... | Index Block | Index Block | Index Block | Metadata | +Data block = encrypt(data, layer_key) +Index block = encrypt(index, layer_key); index points a key to a offset of the data block inside the layer file. +Metadata = wrap(layer_key, timeline_key), wrap_kms(tenant_key), and other metadata we want to store in the future ``` -Wrapped pageserver layer file keys will be stored in the `index_part` file, as part -of the layer metadata. +Note that we generate a random layer_key for each of the layer. We store the layer key wrapped by the current +tenant key (described in later sections) and the KMS-wrapped tenant key in the layer. + +If data compression is enabled, the data is compressed first before being encrypted (is this safe?) + +This file format is used across both object storage and local storage. We do not decrypt when downloading +the layer file to the disk. Decryption is done when reading the layer. + +### Safekeeper Segment Format + +TBD + +### Pageserver Timeline Index + +We will add a `created_at` for each of the layer file so that during re-keying (described in later sections) +we can determine which layer files to rewrite. We also record the offset of the metadata block so that it is +possible to obtain more information about the layer file without downloading the full layer file (i.e., the +exact timeline key being used to encrypt the layer file). ``` # LayerFileMetadata { - "key": { - "version": - } - + "created_at": "