neon/pageserver at 3600b33f1cbe4114a020bb17389c814d4b3b7c24 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-05 20:42:54 +00:00

Files

Heikki Linnakangas 3600b33f1c Implement "timelines" in page server

This replaces the page server's "datadir" concept. The Page Server now
always works with a "Zenith Repository". When you initialize a new
repository with "zenith init", it runs initdb and loads an initial
basebackup of the freshly-created cluster into the repository, on "main"
branch. Repository can hold multiple "timelines", which can be given
human-friendly names, making them "branches". One page server simultaneously
serves all timelines stored in the repository, and you can have multiple
Postgres compute nodes connected to the page server, as long they all
operate on a different timeline.

There is a new command "zenith branch", which can be used to fork off
new branches from existing branches.

The repository uses the directory layout desribed as Repository format
v1 in https://github.com/zenithdb/rfcs/pull/5. It it *highly* inefficient:
- we never create new snapshots. So in practice, it's really just a base
  backup of the initial empty cluster, and everything else is reconstructed
  by redoing all WAL

- when you create a new timeline, the base snapshot and *all* WAL is copied
  from the new timeline to the new one. There is no smarts about
  referencing the old snapshots/wal from the ancestor timeline.

To support all this, this commit includes a bunch of other changes:

- Implement "basebackup" funtionality in page server. When you initialize
  a new compute node with "zenith pg create", it connects to the page
  server, and requests a base backup of the Postgres data directory on
  that timeline. (the base backup excludes user tables, so it's not
  as bad as it sounds).

- Have page server's WAL receiver write the WAL into timeline dir. This
  allows running a Page Server and Compute Nodes without a WAL safekeeper,
  until we get around to integrate that properly into the system. (Even
  after we integrate WAL safekeeper, this is perhaps how this will operate
  when you want to run the system on your laptop.)

- restore_datadir.rs was renamed to restore_local_repo.rs, and heavily
  modified to use the new format. It now also restores all WAL.

- Page server no longer scans and restores everything into memory at startup.
  Instead, when the first request is made for a timeline, the timeline is
  slurped into memory at that point.

- The responsibility for telling page server to "callmemaybe" was moved
  into Postgres libpqpagestore code. Also, WAL producer connstring cannot
  be specified in the pageserver's command line anymore.

- Having multiple "system identifiers" in the same page server is no
  longer supported. I repurposed much of that code to support multiple
  timelines, instead.

- Implemented very basic, incomplete, support for PostgreSQL's Extended
  Query Protocol in page_service.rs. Turns out that rust-postgres'
  copy_out() function always uses the extended query protocol to send
  out the command, and I'm using that to stream the base backup from the
  page server.

TODO: I haven't fixed the WAL safekeeper for this scheme, so all the
integration tests involving safekeepers are failing. My plan is to modify
the safekeeper to know about Zenith timelines, too, and modify it to work
with the same Zenith repository format. It only needs to care about the
'.zenith/timelines/<timeline>/wal' directories.

2021-04-20 19:11:27 +03:00

src

Implement "timelines" in page server

2021-04-20 19:11:27 +03:00

build.rs

Separate walkeeper and pageserver sources into different directories.

2021-04-06 13:15:26 +03:00

Cargo.lock

Separate walkeeper and pageserver sources into different directories.

2021-04-06 13:15:26 +03:00

Cargo.toml

Implement "timelines" in page server

2021-04-20 19:11:27 +03:00

launch.sh

Move launch.sh into 'pageserver' directory.

2021-04-06 14:05:43 +03:00

README

Separate walkeeper and pageserver sources into different directories.

2021-04-06 13:15:26 +03:00

README

Page Server
===========


How to test
-----------


1. Compile and install Postgres from this repository (there are
   modifications, so vanilla Postgres won't do)

    ./configure --prefix=/home/heikki/zenith-install

2. Compile the page server

    cd pageserver
    cargo build

3. Create another "dummy" cluster that will be used by the page server when it applies
   the WAL records. (shouldn't really need this, getting rid of it is a TODO):

    /home/heikki/zenith-install/bin/initdb -D /data/zenith-dummy


4. Initialize and start a new postgres cluster

    /home/heikki/zenith-install/bin/initdb -D /data/zenith-test-db --username=postgres
    /home/heikki/zenith-install/bin/postgres -D /data/zenith-test-db

5. In another terminal, start the page server.

    PGDATA=/data/zenith-dummy PATH=/home/heikki/zenith-install/bin:$PATH ./target/debug/pageserver

   It should connect to the postgres instance using streaming replication, and print something
   like this:

    $ PGDATA=/data/zenith-dummy PATH=/home/heikki/zenith-install/bin:$PATH ./target/debug/pageserver
    Starting WAL receiver
    connecting...
    Starting page server on 127.0.0.1:5430
    connected!
    page cache is empty

6. You can now open another terminal and issue DDL commands. Generated WAL records will
   be streamed to the page servers, and attached to blocks that they apply to in its
   page cache

    $ psql postgres -U postgres
    psql (14devel)
    Type "help" for help.
    
    postgres=# create table mydata (i int4);
    CREATE TABLE
    postgres=# insert into mydata select g from generate_series(1,100) g;
    INSERT 0 100
    postgres=# 

7. The GetPage@LSN interface to the compute nodes isn't working yet, but to simulate
   that, the page server generates a test GetPage@LSN call every 5 seconds on a random
   block that's in the page cache. In a few seconds, you should see output from that:

    testing GetPage@LSN for block 0
    WAL record at LSN 23584576 initializes the page
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167DF40
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167DF80
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167DFC0
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E018
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E058
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E098
    2021-03-19 11:03:13.791 EET [11439] LOG:  applied WAL record at 0/167E0D8
    2021-03-19 11:03:13.792 EET [11439] LOG:  applied WAL record at 0/167E118
    2021-03-19 11:03:13.792 EET [11439] LOG:  applied WAL record at 0/167E158
    2021-03-19 11:03:13.792 EET [11439] LOG:  applied WAL record at 0/167E198
    applied 10 WAL records to produce page image at LSN 18446744073709547246



Architecture
============

The Page Server is responsible for all operations on a number of
"chunks" of relation data. A chunk corresponds to a PostgreSQL
relation segment (i.e. one max. 1 GB file in the data directory), but
it holds all the different versions of every page in the segment that
are still needed by the system.

Determining which chunk each Page Server holds is handled elsewhere. (TODO:
currently, there is only one Page Server which holds all chunks)

The Page Server has a few different duties:

- Respond to GetPage@LSN requests from the Compute Nodes
- Receive WAL from WAL safekeeper
- Replay WAL that's applicable to the chunks that the Page Server maintains
- Backup to S3


The Page Server consists of multiple threads that operate on a shared
cache of page versions:


                                           | WAL
                                           V
                                   +--------------+
                                   |              |
                                   | WAL receiver |
                                   |              |
                                   +--------------+
                                                                                 +----+
                  +---------+                              ..........            |    |
                  |         |                              .        .            |    |
 GetPage@LSN      |         |                              . backup .  ------->  | S3 |
------------->    |  Page   |         page cache           .        .            |    |
                  | Service |                              ..........            |    |
   page           |         |                                                    +----+
<-------------    |         |
                  +---------+

                             ...................................
                             .                                 .
                             . Garbage Collection / Compaction .
                             ...................................

Legend:

+--+
|  |   A thread or multi-threaded service
+--+

....
.  .   Component that we will need, but doesn't exist at the moment. A TODO.
....

--->   Data flow
<---


Page Service
------------

The Page Service listens for GetPage@LSN requests from the Compute Nodes,
and responds with pages from the page cache.


WAL Receiver
------------

The WAL receiver connects to the external WAL safekeeping service (or
directly to the primary) using PostgreSQL physical streaming
replication, and continuously receives WAL. It decodes the WAL records,
and stores them to the page cache.


Page Cache
----------

The Page Cache is a data structure, to hold all the different page versions.
It is accessed by all the other threads, to perform their duties.

Currently, the page cache is implemented fully in-memory. TODO: Store it
on disk. Define a file format.


TODO: Garbage Collection / Compaction
-------------------------------------

Periodically, the Garbage Collection / Compaction thread runs
and applies pending WAL records, and removes old page versions that
are no longer needed.


TODO: Backup service
--------------------

The backup service is responsible for periodically pushing the chunks to S3.

TODO: How/when do restore from S3? Whenever we get a GetPage@LSN request for
a chunk we don't currently have? Or when an external Control Plane tells us?