From 1ad48b2eaf643ace40ed523c00044622ba4e7f59 Mon Sep 17 00:00:00 2001 From: John Spray Date: Mon, 14 Apr 2025 11:57:17 +0100 Subject: [PATCH] docs/rfcs: add storage encryption key RFC --- docs/rfcs/2025-04-14-storage-keys.md | 244 +++++++++++++++++++++++++++ 1 file changed, 244 insertions(+) create mode 100644 docs/rfcs/2025-04-14-storage-keys.md diff --git a/docs/rfcs/2025-04-14-storage-keys.md b/docs/rfcs/2025-04-14-storage-keys.md new file mode 100644 index 0000000000..32494e03e3 --- /dev/null +++ b/docs/rfcs/2025-04-14-storage-keys.md @@ -0,0 +1,244 @@ + +# Storage Encryption Key Management + +## Summary + +As a precursor to adding new encryption capabilities to Neon's storage services, this RFC proposes +mechanisms for creating and storing fine-grained encryption keys for user data in Neon. We aim +to provide at least tenant granularity, but will use timeline granularity when it is simpler to do +so. + +Out of scope: +- We describe lifecycle of keys here but not the encryption of user data with these keys. +- We describe an abstract KMS interface, but not particular platform implementations (such as how + to authenticate with KMS). + +## Terminology + +_wrapped/unwrapped_: a wrapped encryption key is a key encrypted by another key. For example, the key for +encrypting a timeline's pageserver data might be wrapped by some "root" key for the tenant's user account, stored in a KMS system. + +_key hierarchy_: the relationships between keys which wrap each other. For example, a layer file key might +be wrapped by a pageserver timeline key, which is wrapped by a tenant's root key. + +## Design Choices + +Storage: S3 will be the store of record for wrapped keys + +Separate keys: Safekeeper and Pageserver will use independent keys. + +AES256: rather than building a generic system for keys, we will assume that all the keys +we manage are AES256 keys -- this is the de-facto standard for enterprise data storage. + +Per-object keys: rather than encrypting data objects (layer files and segment files) with +the tenant keys directly, they will be encrypted with separate keys. This avoids cryptographic +safety issues from re-using the same key for large quantities of potentially repetitive plaintext. + +Key storage is optional at a per-tenant granularity: eventually this would be on by default, but: +- initially only some environments will have a KMS set up. +- Encryption has some overhead and it may be that some tenants don't want or need it. + +## Design + +### Summary of format changes + +- Pageserver layer files and safekeeper segment objects get new metadata fields to + store wrapped key and version of the wrapping key +- Pageserver timeline index gets a new `keys` field to store timeline keys +- Safekeeper gets a new per-timeline manifest object in S3 to store timeline keys +- Pageserver timeline index gets per-layer metadata for wrapped key and wrapping version + +### Summary of API changes + +- Pageserver TenantConf API gets a new field for account ID +- Pageserver TenantConf API gets a new field for encryption mode +- Safekeeper timeline creation API gets a new field for account ID +- Controller, pageserver & safekeeper get a new timeline-scoped `rotate_key` API + +### KMS interface + +Neon will interoperate with different KMS APIs on different platforms. We will implement a generic interface, +similar to how `remote_storage` wraps different object storage APIs: +- `generate(accountId, keyType, alias) -> ` +- `unwrap(accountId, ciphertext key) -> plaintext key` + +Hereafter, when we talk about generating or unwrapping a key, this means a call into the KMS API. + +The KMS deals with abstract "account IDs", which are not equal to tenant IDs and may not be +1:1 with tenants. The account ID will be provided as part of tenant configuration, along +with a field to identify an encryption mode. + +### Pageserver key storage + +The wrapped pageserver timeline key will be stored in the timeline index object. Because of +key rotation, multiple keys will be stored in an array, with each key having a counter version. + +``` +"keys": [ + { + # The key version: a new key with the next version is generated when rekeying + "version": 1, + # The wrapped key: this is unwrapped by a KMS API call when the key is to be used + "wrapped": "", + # The time the key was generated: this may be used to implement rekeying/key rotation + # policies. + "ctime": "", + }, + ... +] +``` + +Wrapped pageserver layer file keys will be stored in the `index_part` file, as part +of the layer metadata. + +``` +# LayerFileMetadata +{ + "key": { + "version": + } + +} +``` + +To enable re-key procedure to drop deleted versions with old keys, and to avoid mistakes in index_part leading to irretreivable data loss, wrapped keys & version will also be stored +in the object store metadata of uploaded objects. + +### Safekeeper key storage + +All safekeeper storage is per-timeline. The only concept of a tenant in the safekeeper +is as a namespace for timelines. + +As the safekeeper doesn't currently have a flexible metadata object in remote storage, +we will add one. This will initially contain: +- A configuration object that contains the accountId +- An array of keys idential to those used in the pageserver's index. + +Because multiple safekeeper processes share the same remote storage path, we must be +sure to handle write races safely. To avoid giving safekeepers a pageserver-like generation +concept (not to be confused with safekeeper's configuration generation), we may use +the conditional write primitive that is available on S3 and ABS, to implement a safe +read-then-write for operations such as key rotation, such that a given key version is +only ever implemented once. + +### Key rotation + +The process of key rotation is: +1. Load the version of the existing key +2. Generate a new key +3. Store the new key with the previous version incremented by 1 +4. **Only once durably stored** use the new key for subsequent generation of object keys + +This is the same for safekeepers and pageservers. + +A storage controller API will be exposed for re-keying. + +For the pageserver, it is very important that re-key +operations respect generation safety rules, the same as timeline CRUD operations: i.e. +the operation is only durable if the generation of the tenant location updated is still +the latest generation when the operation is complete. + +For the safekeeper, it is very important that ... **TODO** rules on racing key updates +with configuration changes? + +### Re-keying + +While re-keying and key-rotation are sometimes used synonymously, we distinguish them: +- Key rotation is generating a new key to use for new data +- Re-keying is rewriting existing data so that old keys are no longer used at all + +Re-keying is a bulk data operation, and not fully defined in this RFC: it can be defined +quite simply as "For object in objects, if object key version is < the rekeying horizon, +then do a read/write cycle on the object using latest key". This is a simple but potentially very +expensive operation, so we discuss efficiency here. + +#### Pageserver re-key + +For pageservers, occasional rekeying may be implemented efficiently if one tolerates using +the last few keys and doesn't insist on the latest, because pageservers periodically rewrite +their data for GC-compaction anyway. Thereby an API call to re-key any data with an overly old +key would often be a no-op because all data was rewritten recently anyway. + +When object versioning is enabled in storage, re-keying is not fully accomplished by just +re-writing live data: old versions would still contain user data encrypted with older keys. To +fully re-key, an extra step is needed to purge old objects. Ideally, we should only purge +old objects which were encrypted using old keys. To this end, it would be useful to store +the encryption key version as metadata on objects, so that a scrub of deleted object versions +can efficiently select those objects that should be purged during re-key. + +Checks on object versions should not only be on deleted objects: because pageserver can emit +"orphan" objects not referenced in the index under some circumstances, re-key must also +check non-deleted objects. + +To summarize, the pageserver re-key operation is: +- Iterate over index of layer files, select those with too-old key and rewrite them +- Iterate over all versions in object storage, select those with a too-old key version + in their metadata and purge them (with a safety check that these are not referenced + by the latest index). + +It would be wise to combine the re-key procedure with an exhaustive read of a timeline's data, +to ensure that when testing & rolling this feature out we are not rendering anything unreadable +due to bugs in implementation. Since we are deleting old versions in object storage, our +time travel recovery tool will not be any help if we get something wrong in this process. + +#### Safekeeper re-key + +Re-keying a safekeeper timeline requires an exhaustive walk of segment objects, read +metadata on each one and decide whether it requires rewrite. + +Safekeeper currently keeps historic objects forever, so re-keying this data will get +more expensive as time goes on. This would be a good time to add cleanup of old safekeeper +segments, but doing so is beyond the scope of this RFC. + +### Enabling encryption for existing tenants + +To enable encryption for an existing tenant, we may simply call key-rotation API (to generate a key), +and then re-key API (to rewrite existing data using this key). + +## Observability + +- To enable some external service to implement re-keying, we should publish metrics per-timeline + on the age of their latest encryption key. +- Calls to KMS should be tracked with typical request rate/result/latency histograms to enable + detection of a slow KMS server and/or errors. + +## Alternatives considered + +### Use same tenant key for safekeeper and pageserver + +We could halve the number of keys in circulation by having the safekeeper and pageserver +share a key rather than working independently. + +However, this would be substantially more complex to implement, as safekeepers and pageservers +currently share no storage, so some new communication path would be needed. There is minimal +upside in sharing a key. + +### No KMS dependency + +We could choose to do all key management ourselves. However, the industry standard approach +to enabling users of cloud SaaS software to self-manage keys is to use the KMS as the intermediary +between our system and the user's control of their key. Although this RFC does not propose user-managed keys, we should design with this in mind. + +### Do all key generation/wrapping in KMS service + +We could avoid generating and wrapping/unwrapping object keys in our storage +services by delegating all responsibility for key operations to the KMS. However, +KMS services have limited throughput and in some cases may charge per operation, so +it is useful to avoid doing KMS operations per-object, and restrict them to per-timeline +frequency. + +### Per-tenant instead of per-timeline pageserver keys + +For tenants with many timelines, we may reduce load on KMS service by +using per-tenant instead of per-timeline keys, so that we may do operations +such as creating a timeline without needing to do a KMS unwrap operation. + +However, per-timeline key management is much simpler to implement on the safekeeper, +which currently has no concept of a tenant (other than as a namespace for timelines). +It is also slightly simpler to implement on the pageserver, as it avoids implementing +a tenant-scoped creation operation to initialize keys (instead, we may initialize keys +during timeline creation). + +As a side benefit, per-timeline key management also enables implementing secure deletion in future +at a per-timeline granularity. +