Page Server =========== How to test ----------- 1. Compile and install Postgres from this repository (there are modifications, so vanilla Postgres won't do) ./configure --prefix=/home/heikki/zenith-install 2. Compile the page server cd pageserver cargo build 3. Create another "dummy" cluster that will be used by the page server when it applies the WAL records. (shouldn't really need this, getting rid of it is a TODO): /home/heikki/zenith-install/bin/initdb -D /data/zenith-dummy 4. Initialize and start a new postgres cluster /home/heikki/zenith-install/bin/initdb -D /data/zenith-test-db --username=postgres /home/heikki/zenith-install/bin/postgres -D /data/zenith-test-db 5. In another terminal, start the page server. PGDATA=/data/zenith-dummy PATH=/home/heikki/zenith-install/bin:$PATH ./target/debug/pageserver It should connect to the postgres instance using streaming replication, and print something like this: $ PGDATA=/data/zenith-dummy PATH=/home/heikki/zenith-install/bin:$PATH ./target/debug/pageserver Starting WAL receiver connecting... Starting page server on 127.0.0.1:5430 connected! page cache is empty 6. You can now open another terminal and issue DDL commands. Generated WAL records will be streamed to the page servers, and attached to blocks that they apply to in its page cache $ psql postgres -U postgres psql (14devel) Type "help" for help. postgres=# create table mydata (i int4); CREATE TABLE postgres=# insert into mydata select g from generate_series(1,100) g; INSERT 0 100 postgres=# 7. The GetPage@LSN interface to the compute nodes isn't working yet, but to simulate that, the page server generates a test GetPage@LSN call every 5 seconds on a random block that's in the page cache. In a few seconds, you should see output from that: testing GetPage@LSN for block 0 WAL record at LSN 23584576 initializes the page 2021-03-19 11:03:13.791 EET [11439] LOG: applied WAL record at 0/167DF40 2021-03-19 11:03:13.791 EET [11439] LOG: applied WAL record at 0/167DF80 2021-03-19 11:03:13.791 EET [11439] LOG: applied WAL record at 0/167DFC0 2021-03-19 11:03:13.791 EET [11439] LOG: applied WAL record at 0/167E018 2021-03-19 11:03:13.791 EET [11439] LOG: applied WAL record at 0/167E058 2021-03-19 11:03:13.791 EET [11439] LOG: applied WAL record at 0/167E098 2021-03-19 11:03:13.791 EET [11439] LOG: applied WAL record at 0/167E0D8 2021-03-19 11:03:13.792 EET [11439] LOG: applied WAL record at 0/167E118 2021-03-19 11:03:13.792 EET [11439] LOG: applied WAL record at 0/167E158 2021-03-19 11:03:13.792 EET [11439] LOG: applied WAL record at 0/167E198 applied 10 WAL records to produce page image at LSN 18446744073709547246 Architecture ============ The Page Server is responsible for all operations on a number of "chunks" of relation data. A chunk corresponds to a PostgreSQL relation segment (i.e. one max. 1 GB file in the data directory), but it holds all the different versions of every page in the segment that are still needed by the system. Determining which chunk each Page Server holds is handled elsewhere. (TODO: currently, there is only one Page Server which holds all chunks) The Page Server has a few different duties: - Respond to GetPage@LSN requests from the Compute Nodes - Receive WAL from WAL safekeeper - Replay WAL that's applicable to the chunks that the Page Server maintains - Backup to S3 The Page Server consists of multiple threads that operate on a shared cache of page versions: | WAL V +--------------+ | | | WAL receiver | | | +--------------+ +----+ +---------+ .......... | | | | . . | | GetPage@LSN | | . backup . -------> | S3 | -------------> | Page | page cache . . | | | Service | .......... | | page | | +----+ <------------- | | +---------+ ................................... . . . Garbage Collection / Compaction . ................................... Legend: +--+ | | A thread or multi-threaded service +--+ .... . . Component that we will need, but doesn't exist at the moment. A TODO. .... ---> Data flow <--- Page Service ------------ The Page Service listens for GetPage@LSN requests from the Compute Nodes, and responds with pages from the page cache. WAL Receiver ------------ The WAL receiver connects to the external WAL safekeeping service (or directly to the primary) using PostgreSQL physical streaming replication, and continuously receives WAL. It decodes the WAL records, and stores them to the page cache. Page Cache ---------- The Page Cache is a data structure, to hold all the different page versions. It is accessed by all the other threads, to perform their duties. Currently, the page cache is implemented fully in-memory. TODO: Store it on disk. Define a file format. TODO: Garbage Collection / Compaction ------------------------------------- Periodically, the Garbage Collection / Compaction thread runs and applies pending WAL records, and removes old page versions that are no longer needed. TODO: Backup service -------------------- The backup service is responsible for periodically pushing the chunks to S3. TODO: How/when do restore from S3? Whenever we get a GetPage@LSN request for a chunk we don't currently have? Or when an external Control Plane tells us?