Files
neon/pageserver
Heikki Linnakangas b484b896b6 Refactor the functionality page_cache.rs.
This moves things around:

- The PageCache is split into two structs: Repository and Timeline. A
  Repository holds multiple Timelines. In order to get a page version,
  you must first get a reference to the Repository, then the Timeline
  in the repository, and finally call the get_page_at_lsn() function
  on the Timeline object. This sounds complicated, but because each
  connection from a compute node, and each WAL receiver, only deals
  with one timeline at a time, the callers can get the reference to
  the Timeline object once and hold onto it. The Timeline corresponds
  most closely to the old PageCache object.

- Repository and Timeline are now abstract traits, so that we can
  support multiple implementations. I don't actually expect us to have
  multiple implementations for long. We have the RocksDB
  implementation now, but as soon as we have a different
  implementation that's usable, I expect that we will retire the
  RocksDB implementation. But I think this abstraction works as good
  documentation in any case: it's now easier to see what the interface
  for storing and loading pages from the repository is, by looking at
  the Repository/Timeline traits. They abstract traits are in
  repository.rs, and the RocksDB implementation of them is in
  repository/rocksdb.rs.

- page_cache.rs is now a "switchboard" to get a handle to the
  repository. Currently, the page server can only handle one
  repository at a time, so there isn't much there, but in the future
  we might do multi-tenancy there.
2021-05-05 10:37:36 +03:00
..
2021-05-04 12:02:00 -07:00

Page Server
===========


How to test
-----------


1. Compile and install Postgres from this repository (there are
   modifications, so vanilla Postgres won't do)

    ./configure --prefix=/home/heikki/zenith-install

2. Compile the page server

    cd pageserver
    cargo build

3. Create another "dummy" cluster that will be used by the page server when it applies
   the WAL records. (shouldn't really need this, getting rid of it is a TODO):

    /home/heikki/zenith-install/bin/initdb -D /data/zenith-dummy


4. Initialize and start a new postgres cluster

    /home/heikki/zenith-install/bin/initdb -D /data/zenith-test-db --username=postgres
    /home/heikki/zenith-install/bin/postgres -D /data/zenith-test-db

5. In another terminal, start the page server.

    PGDATA=/data/zenith-dummy PATH=/home/heikki/zenith-install/bin:$PATH ./target/debug/pageserver

   It should connect to the postgres instance using streaming replication, and print something
   like this:

    $ PGDATA=/data/zenith-dummy PATH=/home/heikki/zenith-install/bin:$PATH ./target/debug/pageserver
    Starting WAL receiver
    connecting...
    Starting page server on 127.0.0.1:5430
    connected!
    page cache is empty

6. You can now open another terminal and issue DDL commands. Generated WAL records will
   be streamed to the page servers, and attached to blocks that they apply to in its
   page cache

    $ psql postgres -U postgres
    psql (14devel)
    Type "help" for help.
    
    postgres=# create table mydata (i int4);
    CREATE TABLE
    postgres=# insert into mydata select g from generate_series(1,100) g;
    INSERT 0 100
    postgres=# 

7. The GetPage@LSN interface to the compute nodes isn't working yet, but to simulate
   that, the page server generates a test GetPage@LSN call every 5 seconds on a random
   block that's in the page cache. In a few seconds, you should see output from that:

    testing GetPage@LSN for block 0
    WAL record at LSN 23584576 initializes the page
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167DF40
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167DF80
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167DFC0
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E018
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E058
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E098
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E0D8
    2021-03-19 11:03:13.792 EET [11439] LOG:  applied WAL record at 0/167E118
    2021-03-19 11:03:13.792 EET [11439] LOG:  applied WAL record at 0/167E158
    2021-03-19 11:03:13.792 EET [11439] LOG:  applied WAL record at 0/167E198
    applied 10 WAL records to produce page image at LSN 18446744073709547246



Architecture
============

The Page Server is responsible for all operations on a number of
"chunks" of relation data. A chunk corresponds to a PostgreSQL
relation segment (i.e. one max. 1 GB file in the data directory), but
it holds all the different versions of every page in the segment that
are still needed by the system.

Determining which chunk each Page Server holds is handled elsewhere. (TODO:
currently, there is only one Page Server which holds all chunks)

The Page Server has a few different duties:

- Respond to GetPage@LSN requests from the Compute Nodes
- Receive WAL from WAL safekeeper
- Replay WAL that's applicable to the chunks that the Page Server maintains
- Backup to S3


The Page Server consists of multiple threads that operate on a shared
cache of page versions:


                                           | WAL
                                           V
                                   +--------------+
                                   |              |
                                   | WAL receiver |
                                   |              |
                                   +--------------+
                                                                                 +----+
                  +---------+                              ..........            |    |
                  |         |                              .        .            |    |
 GetPage@LSN      |         |                              . backup .  ------->  | S3 |
------------->    |  Page   |         page cache           .        .            |    |
                  | Service |                              ..........            |    |
   page           |         |                                                    +----+
<-------------    |         |
                  +---------+

                             ...................................
                             .                                 .
                             . Garbage Collection / Compaction .
                             ...................................

Legend:

+--+
|  |   A thread or multi-threaded service
+--+

....
.  .   Component that we will need, but doesn't exist at the moment. A TODO.
....

--->   Data flow
<---


Page Service
------------

The Page Service listens for GetPage@LSN requests from the Compute Nodes,
and responds with pages from the page cache.


WAL Receiver
------------

The WAL receiver connects to the external WAL safekeeping service (or
directly to the primary) using PostgreSQL physical streaming
replication, and continuously receives WAL. It decodes the WAL records,
and stores them to the page cache.


Page Cache
----------

The Page Cache is a data structure, to hold all the different page versions.
It is accessed by all the other threads, to perform their duties.

Currently, the page cache is implemented fully in-memory. TODO: Store it
on disk. Define a file format.


TODO: Garbage Collection / Compaction
-------------------------------------

Periodically, the Garbage Collection / Compaction thread runs
and applies pending WAL records, and removes old page versions that
are no longer needed.


TODO: Backup service
--------------------

The backup service is responsible for periodically pushing the chunks to S3.

TODO: How/when do restore from S3? Whenever we get a GetPage@LSN request for
a chunk we don't currently have? Or when an external Control Plane tells us?