Files
neon/docs/rfcs
Heikki Linnakangas 07342f7519 Major storage format rewrite.
This is a backwards-incompatible change. The new pageserver cannot
read repositories created with an old pageserver binary, or vice
versa.

Simplify Repository to a value-store
------------------------------------

Move the responsibility of tracking relation metadata, like which
relations exist and what are their sizes, from Repository to a new
module, pgdatadir_mapping.rs. The interface to Repository is now a
simple key-value PUT/GET operations.

It's still not any old key-value store though. A Repository is still
responsible from handling branching, and every GET operation comes
with an LSN.

Mapping from Postgres data directory to keys/values
---------------------------------------------------

All the data is now stored in the key-value store. The
'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects
like relation pages and SLRUs, to key-value pairs.

The key to the Repository key-value store is a Key struct, which
consists of a few integer fields. It's wide enough to store a full
RelFileNode, fork and block number, and to distinguish those from
metadata keys.

'pgdatadir_mapping.rs' is also responsible for maintaining a
"partitioning" of the keyspace. Partitioning means splitting the
keyspace so that each partition holds a roughly equal number of keys.
The partitioning is used when new image layer files are created, so
that each image layer file is roughly the same size.

The partitioning is also responsible for reclaiming space used by
deleted keys. The Repository implementation doesn't have any explicit
support for deleting keys. Instead, the deleted keys are simply
omitted from the partitioning, and when a new image layer is created,
the omitted keys are not copied over to the new image layer. We might
want to implement tombstone keys in the future, to reclaim space
faster, but this will work for now.

Changes to low-level layer file code
------------------------------------

The concept of a "segment" is gone. Each layer file can now store an
arbitrary range of Keys.

Checkpointing, compaction
-------------------------

The background tasks are somewhat different now. Whenever
checkpoint_distance is reached, the WAL receiver thread "freezes" the
current in-memory layer, and creates a new one. This is a quick
operation and doesn't perform any I/O yet. It then launches a
background "layer flushing thread" to write the frozen layer to disk,
as a new L0 delta layer. This mechanism takes care of durability. It
replaces the checkpointing thread.

Compaction is a new background operation that takes a bunch of L0
delta layers, and reshuffles the data in them. It runs in a separate
compaction thread.

Deployment
----------

This also contains changes to the ansible scripts that enable having
multiple different pageservers running at the same time in the staging
environment. We will use that to keep an old version of the pageserver
running, for clusters created with the old version, at the same time
with a new pageserver with the new binary.

Author: Heikki Linnakangas
Author: Konstantin Knizhnik <knizhnik@zenith.tech>
Author: Andrey Taranik <andrey@zenith.tech>
Reviewed-by: Matthias Van De Meent <matthias@zenith.tech>
Reviewed-by: Bojan Serafimov <bojan@zenith.tech>
Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech>
Reviewed-by: Anton Shyrabokau <antons@zenith.tech>
Reviewed-by: Dhammika Pathirana <dham@zenith.tech>
Reviewed-by: Kirill Bulatov <kirill@zenith.tech>
Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech>
Reviewed-by: Alexey Kondratov <alexey@zenith.tech>
2022-03-28 05:41:15 -05:00
..

This directory contains Request for Comments documents, or RFCs, for features or concepts that have been proposed. Alternative names: technical design doc, ERD, one-pager

To make a new proposal, create a new text file in this directory and open a Pull Request with it. That gives others a chance and a forum to comment and discuss the design.

When a feature is implemented and the code changes are committed, also include the corresponding RFC in this directory.

Some of the RFCs in this directory have been implemented in some form or another, while others are on the roadmap, while still others are just obsolete and forgotten about. So read them with a grain of salt, but hopefully even the ones that don't reflect reality give useful context information.

What

We use Tech Design RFCs to summarize what we are planning to implement in our system. These RFCs should be created for large or not obvious technical tasks, e.g. changes of the architecture or bigger tasks that could take over a week, changes that touch multiple components or their interaction. RFCs should fit into a couple of pages, but could be longer on occasion.

Why

Were using RFCs to enable early review and collaboration, reduce uncertainties, risk and save time during the implementation phase that follows the Tech Design RFC.

Tech Design RFCs also aim to avoid bus factor and are an additional measure to keep more peers up to date & familiar with our design and architecture.

This is a crucial part for ensuring collaboration across timezones and setting up for success a distributed team that works on complex topics.

Prior art

How

RFC lifecycle:

  • Should be submitted in a pull request with and full RFC text in a commited markdown file and copy of the Summary and Motivation sections also included in the PR body.
  • RFC should be published for review before most of the actual code is written. This isnt a strict rule, dont hesitate to experiment and build a POC in parallel with writing an RFC.
  • Add labels to the PR in the same manner as you do Issues. Example TBD
  • Request the review from your peers. Reviewing the RFCs from your peers is a priority, same as reviewing the actual code.
  • The Tech Design RFC should evolve based on the feedback received and further during the development phase if problems are discovered with the taken approach
  • RFCs stop evolving once the consensus is found or the proposal is implemented and merged.
  • RFCs are not intended as a documentation thats kept up to date after the implementation is finished. Do not update the Tech Design RFC when merged functionality evolves later on. In such situation a new RFC may be appropriate.

RFC template

Note, a lot of the sections are marked as if relevant. They are included into the template as a reminder and to help inspiration.

# Name
Created on ..
Implemented on ..

## Summary

## Motivation

## Non Goals (if relevant)

## Impacted components (e.g. pageserver, safekeeper, console, etc)

## Proposed implementation

### Reliability, failure modes and corner cases (if relevant)

### Interaction/Sequence diagram (if relevant)

### Scalability (if relevant)

### Security implications (if relevant)

### Unresolved questions (if relevant)

## Alternative implementation (if relevant)

## Pros/cons of proposed approaches (if relevant)

## Definition of Done (if relevant)