This is a backwards-incompatible change. The new pageserver cannot read repositories created with an old pageserver binary, or vice versa. Simplify Repository to a value-store ------------------------------------ Move the responsibility of tracking relation metadata, like which relations exist and what are their sizes, from Repository to a new module, pgdatadir_mapping.rs. The interface to Repository is now a simple key-value PUT/GET operations. It's still not any old key-value store though. A Repository is still responsible from handling branching, and every GET operation comes with an LSN. Mapping from Postgres data directory to keys/values --------------------------------------------------- All the data is now stored in the key-value store. The 'pgdatadir_mapping.rs' module handles mapping from PostgreSQL objects like relation pages and SLRUs, to key-value pairs. The key to the Repository key-value store is a Key struct, which consists of a few integer fields. It's wide enough to store a full RelFileNode, fork and block number, and to distinguish those from metadata keys. 'pgdatadir_mapping.rs' is also responsible for maintaining a "partitioning" of the keyspace. Partitioning means splitting the keyspace so that each partition holds a roughly equal number of keys. The partitioning is used when new image layer files are created, so that each image layer file is roughly the same size. The partitioning is also responsible for reclaiming space used by deleted keys. The Repository implementation doesn't have any explicit support for deleting keys. Instead, the deleted keys are simply omitted from the partitioning, and when a new image layer is created, the omitted keys are not copied over to the new image layer. We might want to implement tombstone keys in the future, to reclaim space faster, but this will work for now. Changes to low-level layer file code ------------------------------------ The concept of a "segment" is gone. Each layer file can now store an arbitrary range of Keys. Checkpointing, compaction ------------------------- The background tasks are somewhat different now. Whenever checkpoint_distance is reached, the WAL receiver thread "freezes" the current in-memory layer, and creates a new one. This is a quick operation and doesn't perform any I/O yet. It then launches a background "layer flushing thread" to write the frozen layer to disk, as a new L0 delta layer. This mechanism takes care of durability. It replaces the checkpointing thread. Compaction is a new background operation that takes a bunch of L0 delta layers, and reshuffles the data in them. It runs in a separate compaction thread. Deployment ---------- This also contains changes to the ansible scripts that enable having multiple different pageservers running at the same time in the staging environment. We will use that to keep an old version of the pageserver running, for clusters created with the old version, at the same time with a new pageserver with the new binary. Author: Heikki Linnakangas Author: Konstantin Knizhnik <knizhnik@zenith.tech> Author: Andrey Taranik <andrey@zenith.tech> Reviewed-by: Matthias Van De Meent <matthias@zenith.tech> Reviewed-by: Bojan Serafimov <bojan@zenith.tech> Reviewed-by: Konstantin Knizhnik <knizhnik@zenith.tech> Reviewed-by: Anton Shyrabokau <antons@zenith.tech> Reviewed-by: Dhammika Pathirana <dham@zenith.tech> Reviewed-by: Kirill Bulatov <kirill@zenith.tech> Reviewed-by: Anastasia Lubennikova <anastasia@zenith.tech> Reviewed-by: Alexey Kondratov <alexey@zenith.tech>
This directory contains Request for Comments documents, or RFCs, for features or concepts that have been proposed. Alternative names: technical design doc, ERD, one-pager
To make a new proposal, create a new text file in this directory and open a Pull Request with it. That gives others a chance and a forum to comment and discuss the design.
When a feature is implemented and the code changes are committed, also include the corresponding RFC in this directory.
Some of the RFCs in this directory have been implemented in some form or another, while others are on the roadmap, while still others are just obsolete and forgotten about. So read them with a grain of salt, but hopefully even the ones that don't reflect reality give useful context information.
What
We use Tech Design RFC’s to summarize what we are planning to implement in our system. These RFCs should be created for large or not obvious technical tasks, e.g. changes of the architecture or bigger tasks that could take over a week, changes that touch multiple components or their interaction. RFCs should fit into a couple of pages, but could be longer on occasion.
Why
We’re using RFCs to enable early review and collaboration, reduce uncertainties, risk and save time during the implementation phase that follows the Tech Design RFC.
Tech Design RFCs also aim to avoid bus factor and are an additional measure to keep more peers up to date & familiar with our design and architecture.
This is a crucial part for ensuring collaboration across timezones and setting up for success a distributed team that works on complex topics.
Prior art
- Rust: https://github.com/rust-lang/rfcs/blob/master/0000-template.md
- React.js: https://github.com/reactjs/rfcs/blob/main/0000-template.md
- Google fuchsia: https://fuchsia.dev/fuchsia-src/contribute/governance/rfcs/TEMPLATE
- Apache: https://cwiki.apache.org/confluence/display/GEODE/RFC+Template / https://cwiki.apache.org/confluence/display/GEODE/Lightweight+RFC+Process
How
RFC lifecycle:
- Should be submitted in a pull request with and full RFC text in a commited markdown file and copy of the Summary and Motivation sections also included in the PR body.
- RFC should be published for review before most of the actual code is written. This isn’t a strict rule, don’t hesitate to experiment and build a POC in parallel with writing an RFC.
- Add labels to the PR in the same manner as you do Issues. Example TBD
- Request the review from your peers. Reviewing the RFCs from your peers is a priority, same as reviewing the actual code.
- The Tech Design RFC should evolve based on the feedback received and further during the development phase if problems are discovered with the taken approach
- RFCs stop evolving once the consensus is found or the proposal is implemented and merged.
- RFCs are not intended as a documentation that’s kept up to date after the implementation is finished. Do not update the Tech Design RFC when merged functionality evolves later on. In such situation a new RFC may be appropriate.
RFC template
Note, a lot of the sections are marked as ‘if relevant’. They are included into the template as a reminder and to help inspiration.
# Name
Created on ..
Implemented on ..
## Summary
## Motivation
## Non Goals (if relevant)
## Impacted components (e.g. pageserver, safekeeper, console, etc)
## Proposed implementation
### Reliability, failure modes and corner cases (if relevant)
### Interaction/Sequence diagram (if relevant)
### Scalability (if relevant)
### Security implications (if relevant)
### Unresolved questions (if relevant)
## Alternative implementation (if relevant)
## Pros/cons of proposed approaches (if relevant)
## Definition of Done (if relevant)