Files
neon/pageserver
Dmitry Rodionov ce5333656f Introduce authentication v0.1.
Current state with authentication.
Page server validates JWT token passed as a password during connection
phase and later when performing an action such as create branch tenant
parameter of an operation is validated to match one submitted in token.
To allow access from console there is dedicated scope: PageServerApi,
this scope allows access to all tenants. See code for access validation in:
PageServerHandler::check_permission.

Because we are in progress of refactoring of communication layer
involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper
now doesn’t check token passed from compute, and uses “hardcoded” token
passed via environment variable to communicate with pageserver.

Compute postgres now takes token from environment variable and passes it
as a password field in pageserver connection. It is not passed through
settings because then user will be able to retrieve it using pg_settings
or SHOW ..

I’ve added basic test in test_auth.py. Probably after we add
authentication to remaining network paths we should enable it by default
and switch all existing tests to use it.
2021-08-11 20:05:54 +03:00
..
2021-08-11 20:05:54 +03:00

## Page server architecture

The Page Server is responsible for all operations on a number of
"chunks" of relation data. A chunk corresponds to a PostgreSQL
relation segment (i.e. one max. 1 GB file in the data directory), but
it holds all the different versions of every page in the segment that
are still needed by the system.

Currently we do not specifically organize data in chunks.
All page images and corresponding WAL records are stored as entries in a key-value storage,
where StorageKey is a zenith_timeline_id + BufferTag + LSN.


The Page Server has a few different duties:

- Respond to GetPage@LSN requests from the Compute Nodes
- Receive WAL from WAL safekeeper
- Replay WAL that's applicable to the chunks that the Page Server maintains
- Backup to S3


The Page Server consists of multiple threads that operate on a shared
cache of page versions:


                                           | WAL
                                           V
                                   +--------------+
                                   |              |
                                   | WAL receiver |
                                   |              |
                                   +--------------+
                                                                                 +----+
                  +---------+                              ..........            |    |
                  |         |                              .        .            |    |
 GetPage@LSN      |         |                              . backup .  ------->  | S3 |
------------->    |  Page   |         page cache           .        .            |    |
                  | Service |                              ..........            |    |
   page           |         |                                                    +----+
<-------------    |         |
                  +---------+

                             ...................................
                             .                                 .
                             . Garbage Collection / Compaction .
                             ...................................

Legend:

+--+
|  |   A thread or multi-threaded service
+--+

....
.  .   Component that we will need, but doesn't exist at the moment. A TODO.
....

--->   Data flow
<---


Page Service
------------

The Page Service listens for GetPage@LSN requests from the Compute Nodes,
and responds with pages from the page cache.


WAL Receiver
------------

The WAL receiver connects to the external WAL safekeeping service (or
directly to the primary) using PostgreSQL physical streaming
replication, and continuously receives WAL. It decodes the WAL records,
and stores them to the page cache.


Page Cache
----------

The Page Cache is a switchboard to access different Repositories.

#### Repository
Repository corresponds to one .zenith directory.
Repository is needed to manage Timelines.

#### Timeline
Timeline is a page cache workhorse that accepts page changes
and serves get_page_at_lsn() and get_rel_size() requests.
Note: this has nothing to do with PostgreSQL WAL timeline.

#### Branch
We can create branch at certain LSN.
Each Branch lives in a corresponding timeline and has an ancestor.

To get full snapshot of data at certain moment we need to traverse timeline and its ancestors.

#### ObjectRepository
ObjectRepository implements Repository and has associated ObjectStore and WAL redo service.

#### ObjectStore
ObjectStore is an interface for key-value store for page images and wal records.
Currently it has one implementation - RocksDB.

#### WAL redo service
WAL redo service - service that runs PostgreSQL in a special wal_redo mode
to apply given WAL records over an old page image and return new page image.


TODO: Garbage Collection / Compaction
-------------------------------------

Periodically, the Garbage Collection / Compaction thread runs
and applies pending WAL records, and removes old page versions that
are no longer needed.


TODO: Backup service
--------------------

The backup service is responsible for periodically pushing the chunks to S3.

TODO: How/when do restore from S3? Whenever we get a GetPage@LSN request for
a chunk we don't currently have? Or when an external Control Plane tells us?

TODO: Sharding
--------------------

We should be able to run multiple Page Servers that handle sharded data.