## Page server architecture The Page Server is responsible for all operations on a number of "chunks" of relation data. A chunk corresponds to a PostgreSQL relation segment (i.e. one max. 1 GB file in the data directory), but it holds all the different versions of every page in the segment that are still needed by the system. Currently we do not specifically organize data in chunks. All page images and corresponding WAL records are stored as entries in a key-value storage, where StorageKey is a zenith_timeline_id + BufferTag + LSN. The Page Server has a few different duties: - Respond to GetPage@LSN requests from the Compute Nodes - Receive WAL from WAL safekeeper - Replay WAL that's applicable to the chunks that the Page Server maintains - Backup to S3 The Page Server consists of multiple threads that operate on a shared cache of page versions: | WAL V +--------------+ | | | WAL receiver | | | +--------------+ +----+ +---------+ .......... | | | | . . | | GetPage@LSN | | . backup . -------> | S3 | -------------> | Page | page cache . . | | | Service | .......... | | page | | +----+ <------------- | | +---------+ ................................... . . . Garbage Collection / Compaction . ................................... Legend: +--+ | | A thread or multi-threaded service +--+ .... . . Component that we will need, but doesn't exist at the moment. A TODO. .... ---> Data flow <--- Page Service ------------ The Page Service listens for GetPage@LSN requests from the Compute Nodes, and responds with pages from the page cache. WAL Receiver ------------ The WAL receiver connects to the external WAL safekeeping service (or directly to the primary) using PostgreSQL physical streaming replication, and continuously receives WAL. It decodes the WAL records, and stores them to the page cache. Page Cache ---------- The Page Cache is a switchboard to access different Repositories. #### Repository Repository corresponds to one .zenith directory. Repository is needed to manage Timelines. #### Timeline Timeline is a page cache workhorse that accepts page changes and serves get_page_at_lsn() and get_rel_size() requests. Note: this has nothing to do with PostgreSQL WAL timeline. #### Branch We can create branch at certain LSN. Each Branch lives in a corresponding timeline and has an ancestor. To get full snapshot of data at certain moment we need to traverse timeline and its ancestors. #### ObjectRepository ObjectRepository implements Repository and has associated ObjectStore and WAL redo service. #### ObjectStore ObjectStore is an interface for key-value store for page images and wal records. Currently it has one implementation - RocksDB. #### WAL redo service WAL redo service - service that runs PostgreSQL in a special wal_redo mode to apply given WAL records over an old page image and return new page image. TODO: Garbage Collection / Compaction ------------------------------------- Periodically, the Garbage Collection / Compaction thread runs and applies pending WAL records, and removes old page versions that are no longer needed. TODO: Backup service -------------------- The backup service is responsible for periodically pushing the chunks to S3. TODO: How/when do restore from S3? Whenever we get a GetPage@LSN request for a chunk we don't currently have? Or when an external Control Plane tells us? TODO: Sharding -------------------- We should be able to run multiple Page Servers that handle sharded data.