Misc doc updates

This commit is contained in:
Heikki Linnakangas
2021-08-30 11:42:25 +03:00
parent c5fc4e6905
commit a3f3d46016
4 changed files with 70 additions and 53 deletions

View File

@@ -4,7 +4,9 @@
- [authentication.md](authentication.md) — pageserver JWT authentication. - [authentication.md](authentication.md) — pageserver JWT authentication.
- [docker.md](docker.md) — Docker images and building pipeline. - [docker.md](docker.md) — Docker images and building pipeline.
- [glossary.md](glossary.md) — Glossary of all the terms used in codebase.
- [multitenancy.md](multitenancy.md) — how multitenancy is organized in the pageserver and Zenith CLI. - [multitenancy.md](multitenancy.md) — how multitenancy is organized in the pageserver and Zenith CLI.
- [sourcetree.md](sourcetree.md) — Overview of the source tree layeout.
- [pageserver/README](/pageserver/README) — pageserver overview. - [pageserver/README](/pageserver/README) — pageserver overview.
- [postgres_ffi/README](/postgres_ffi/README) — Postgres FFI overview. - [postgres_ffi/README](/postgres_ffi/README) — Postgres FFI overview.
- [test_runner/README.md](/test_runner/README.md) — tests infrastructure overview. - [test_runner/README.md](/test_runner/README.md) — tests infrastructure overview.

View File

@@ -26,8 +26,8 @@ A checkpoint record in the WAL marks a point in the WAL sequence at which it is
NOTE: This is an overloaded term. NOTE: This is an overloaded term.
Whenever enough WAL has been accumulated in memory, the page server [] Whenever enough WAL has been accumulated in memory, the page server []
writes out the changes in memory into new snapshot files[]. This process writes out the changes in memory into new layer files[]. This process
is called "checkpointing". The page server only creates snapshot files for is called "checkpointing". The page server only creates layer files for
relations that have been modified since the last checkpoint. relations that have been modified since the last checkpoint.
### Compute node ### Compute node
@@ -41,6 +41,15 @@ Stateless Postgres node that stores data in pageserver.
Each of the separate segmented file sets in which a relation is stored. The main fork is where the actual data resides. There also exist two secondary forks for metadata: the free space map and the visibility map. Each of the separate segmented file sets in which a relation is stored. The main fork is where the actual data resides. There also exist two secondary forks for metadata: the free space map and the visibility map.
Each PostgreSQL fork is considered a separate relish. Each PostgreSQL fork is considered a separate relish.
### Layer file
Layered repository on-disk format is based on immutable files. The
files are called "layer files". Each file corresponds to one 10 MB
segment of a PostgreSQL relation fork. There are two kinds of layer
files: image files and delta files. An image file contains a
"snapshot" of the segment at a particular LSN, and a delta file
contains WAL records applicable to the segment, in a range of LSNs.
### Layered repository ### Layered repository
### LSN ### LSN
@@ -102,6 +111,10 @@ Repository stores multiple timelines, forked off from the same initial call to '
and has associated WAL redo service. and has associated WAL redo service.
One repository corresponds to one Tenant. One repository corresponds to one Tenant.
### Retention policy
How much history do we need to keep around for PITR and read-only nodes?
### SLRU ### SLRU
SLRUs include pg_clog, pg_multixact/members, and SLRUs include pg_clog, pg_multixact/members, and
@@ -110,16 +123,6 @@ they don't need to be stored permanently (e.g. pg_subtrans),
or we do not support them in zenith yet (pg_commit_ts). or we do not support them in zenith yet (pg_commit_ts).
Each SLRU segment is considered a separate relish[]. Each SLRU segment is considered a separate relish[].
### Snapshot file
Layered repository on-disk format is based on immutable files.
The files are called "snapshot files".
Each snapshot file contains a full snapshot, that is, full copy of all
pages in the relation, as of the "start LSN". It also contains all WAL
records applicable to the relation between the start and end
LSNs.
Each snapshot file corresponds to one 10 MB slice of a PostgreSQL relation fork.
### Tenant (Multitenancy) ### Tenant (Multitenancy)
Tenant represents a single customer, interacting with Zenith. Tenant represents a single customer, interacting with Zenith.
Wal redo[] activity, timelines[], snapshots[] are managed for each tenant independently. Wal redo[] activity, timelines[], snapshots[] are managed for each tenant independently.

View File

@@ -13,7 +13,7 @@ Intended to be used in integration tests and in CLI tools for local installation
Documentaion of the Zenith features and concepts. Documentaion of the Zenith features and concepts.
Now it is mostly dev documentation. Now it is mostly dev documentation.
`monitoring`: `/monitoring`:
TODO TODO
@@ -72,9 +72,9 @@ The workspace_hack crate exists only to pin down some dependencies.
Main entry point for the 'zenith' CLI utility. Main entry point for the 'zenith' CLI utility.
TODO: Doesn't it belong to control_plane? TODO: Doesn't it belong to control_plane?
`zenith_metrics`: `/zenith_metrics`:
TODO Helpers for exposing Prometheus metrics from the server.
`/zenith_utils`: `/zenith_utils`:

View File

@@ -8,10 +8,11 @@ The Page Server has a few different duties:
- Backup to S3 - Backup to S3
The Page Server consists of multiple threads that operate on a shared
cache of page versions:
The Page Server consists of multiple threads that operate on a shared
repository of page versions:
| WAL | WAL
V V
+--------------+ +--------------+
@@ -23,16 +24,14 @@ cache of page versions:
+---------+ .......... | | +---------+ .......... | |
| | . . | | | | . . | |
GetPage@LSN | | . backup . -------> | S3 | GetPage@LSN | | . backup . -------> | S3 |
-------------> | Page | page cache . . | | -------------> | Page | repository . . | |
| Service | .......... | | | Service | .......... | |
page | | +----+ page | | +----+
<------------- | | <------------- | |
+---------+ +---------+ +--------------------+
| Checkpointing / |
................................... | Garbage collection |
. . +--------------------+
. Garbage Collection / Compaction .
...................................
Legend: Legend:
@@ -52,7 +51,7 @@ Page Service
------------ ------------
The Page Service listens for GetPage@LSN requests from the Compute Nodes, The Page Service listens for GetPage@LSN requests from the Compute Nodes,
and responds with pages from the page cache. and responds with pages from the repository.
WAL Receiver WAL Receiver
@@ -61,46 +60,59 @@ WAL Receiver
The WAL receiver connects to the external WAL safekeeping service (or The WAL receiver connects to the external WAL safekeeping service (or
directly to the primary) using PostgreSQL physical streaming directly to the primary) using PostgreSQL physical streaming
replication, and continuously receives WAL. It decodes the WAL records, replication, and continuously receives WAL. It decodes the WAL records,
and stores them to the page cache repository. and stores them to the repository.
Page Cache Repository
---------- ----------
The Page Cache is a switchboard to access different Repositories. The repository stores all the page versions, or WAL records needed to
reconstruct them. Each tenant has a separate Repository, which is
stored in the .zenith/tenants/<tenantid> directory.
#### Repository Repository is an abstract trait, defined in `repository.rs`. It is
Repository corresponds to one .zenith directory. implemented by the LayeredRepository object in
Repository is needed to manage Timelines. `layered_repository.rs`. There is only that one implementation of the
Each repository has associated WAL redo service. Repository trait, but it's still a useful abstraction that keeps the
There is currently only one implementation of the Repository trait,
LayeredRepository, but it's still a useful abstraction that keeps the
interface for the low-level storage functionality clean. The layered interface for the low-level storage functionality clean. The layered
storage format is described in layered_repository/README.md. storage format is described in layered_repository/README.md.
#### Timeline Each repository consists of multiple Timelines. Timeline is a
Timeline is a page cache workhorse that accepts page changes workhorse that accepts page changes from the WAL, and serves
and serves get_page_at_lsn() and get_rel_size() requests. get_page_at_lsn() and get_rel_size() requests. Note: this has nothing
Note: this has nothing to do with PostgreSQL WAL timeline. to do with PostgreSQL WAL timeline. The term "timeline" is mostly
interchangeable with "branch", there is a one-to-one mapping from
branch to timeline. A timeline has a unique ID within the tenant,
represented as 16-byte hex string that never changes, whereas a
branch is a user-given name for a timeline.
#### Branch Each repository also has a WAL redo manager associated with it, see
We can create branch at certain LSN. `walredo.rs`. The WAL redo manager is used to replay PostgreSQL WAL
Each Branch lives in a corresponding timeline and has an ancestor. records, whenever we need to reconstruct a page version from WAL to
satisfy a GetPage@LSN request, or to avoid accumulating too much WAL
To get full snapshot of data at certain moment we need to traverse timeline and its ancestors. for a page. The WAL redo manager uses a Postgres process running in
special zenith wal-redo mode to do the actual WAL redo, and
#### WAL redo service communicates with the process using a pipe.
WAL redo service - service that runs PostgreSQL in a special wal_redo mode
to apply given WAL records over an old page image and return new page image.
TODO: Garbage Collection / Compaction Checkpointing / Garbage Collection
------------------------------------- ----------------------------------
Periodically, the Garbage Collection / Compaction thread runs Periodically, the checkpointer thread wakes up and performs housekeeping
and applies pending WAL records, and removes old page versions that duties on the repository. It has two duties:
are no longer needed.
### Checkpointing
Flush WAL that has accumulated in memory to disk, so that the old WAL
can be truncated away in the WAL safekeepers. Also, to free up memory
for receiving new WAL. This process is called "checkpointing". It's
similar to checkpointing in PostgreSQL or other DBMSs, but in the page
server, checkpointing happens on a per-segment basis.
### Garbage collection
Remove old on-disk layer files that are no longer needed according to the
PITR retention policy
TODO: Backup service TODO: Backup service