mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-25 15:19:58 +00:00
Misc doc updates
This commit is contained in:
@@ -4,7 +4,9 @@
|
|||||||
|
|
||||||
- [authentication.md](authentication.md) — pageserver JWT authentication.
|
- [authentication.md](authentication.md) — pageserver JWT authentication.
|
||||||
- [docker.md](docker.md) — Docker images and building pipeline.
|
- [docker.md](docker.md) — Docker images and building pipeline.
|
||||||
|
- [glossary.md](glossary.md) — Glossary of all the terms used in codebase.
|
||||||
- [multitenancy.md](multitenancy.md) — how multitenancy is organized in the pageserver and Zenith CLI.
|
- [multitenancy.md](multitenancy.md) — how multitenancy is organized in the pageserver and Zenith CLI.
|
||||||
|
- [sourcetree.md](sourcetree.md) — Overview of the source tree layeout.
|
||||||
- [pageserver/README](/pageserver/README) — pageserver overview.
|
- [pageserver/README](/pageserver/README) — pageserver overview.
|
||||||
- [postgres_ffi/README](/postgres_ffi/README) — Postgres FFI overview.
|
- [postgres_ffi/README](/postgres_ffi/README) — Postgres FFI overview.
|
||||||
- [test_runner/README.md](/test_runner/README.md) — tests infrastructure overview.
|
- [test_runner/README.md](/test_runner/README.md) — tests infrastructure overview.
|
||||||
|
|||||||
@@ -26,8 +26,8 @@ A checkpoint record in the WAL marks a point in the WAL sequence at which it is
|
|||||||
NOTE: This is an overloaded term.
|
NOTE: This is an overloaded term.
|
||||||
|
|
||||||
Whenever enough WAL has been accumulated in memory, the page server []
|
Whenever enough WAL has been accumulated in memory, the page server []
|
||||||
writes out the changes in memory into new snapshot files[]. This process
|
writes out the changes in memory into new layer files[]. This process
|
||||||
is called "checkpointing". The page server only creates snapshot files for
|
is called "checkpointing". The page server only creates layer files for
|
||||||
relations that have been modified since the last checkpoint.
|
relations that have been modified since the last checkpoint.
|
||||||
|
|
||||||
### Compute node
|
### Compute node
|
||||||
@@ -41,6 +41,15 @@ Stateless Postgres node that stores data in pageserver.
|
|||||||
Each of the separate segmented file sets in which a relation is stored. The main fork is where the actual data resides. There also exist two secondary forks for metadata: the free space map and the visibility map.
|
Each of the separate segmented file sets in which a relation is stored. The main fork is where the actual data resides. There also exist two secondary forks for metadata: the free space map and the visibility map.
|
||||||
Each PostgreSQL fork is considered a separate relish.
|
Each PostgreSQL fork is considered a separate relish.
|
||||||
|
|
||||||
|
### Layer file
|
||||||
|
|
||||||
|
Layered repository on-disk format is based on immutable files. The
|
||||||
|
files are called "layer files". Each file corresponds to one 10 MB
|
||||||
|
segment of a PostgreSQL relation fork. There are two kinds of layer
|
||||||
|
files: image files and delta files. An image file contains a
|
||||||
|
"snapshot" of the segment at a particular LSN, and a delta file
|
||||||
|
contains WAL records applicable to the segment, in a range of LSNs.
|
||||||
|
|
||||||
### Layered repository
|
### Layered repository
|
||||||
|
|
||||||
### LSN
|
### LSN
|
||||||
@@ -102,6 +111,10 @@ Repository stores multiple timelines, forked off from the same initial call to '
|
|||||||
and has associated WAL redo service.
|
and has associated WAL redo service.
|
||||||
One repository corresponds to one Tenant.
|
One repository corresponds to one Tenant.
|
||||||
|
|
||||||
|
### Retention policy
|
||||||
|
|
||||||
|
How much history do we need to keep around for PITR and read-only nodes?
|
||||||
|
|
||||||
### SLRU
|
### SLRU
|
||||||
|
|
||||||
SLRUs include pg_clog, pg_multixact/members, and
|
SLRUs include pg_clog, pg_multixact/members, and
|
||||||
@@ -110,16 +123,6 @@ they don't need to be stored permanently (e.g. pg_subtrans),
|
|||||||
or we do not support them in zenith yet (pg_commit_ts).
|
or we do not support them in zenith yet (pg_commit_ts).
|
||||||
Each SLRU segment is considered a separate relish[].
|
Each SLRU segment is considered a separate relish[].
|
||||||
|
|
||||||
### Snapshot file
|
|
||||||
|
|
||||||
Layered repository on-disk format is based on immutable files.
|
|
||||||
The files are called "snapshot files".
|
|
||||||
Each snapshot file contains a full snapshot, that is, full copy of all
|
|
||||||
pages in the relation, as of the "start LSN". It also contains all WAL
|
|
||||||
records applicable to the relation between the start and end
|
|
||||||
LSNs.
|
|
||||||
Each snapshot file corresponds to one 10 MB slice of a PostgreSQL relation fork.
|
|
||||||
|
|
||||||
### Tenant (Multitenancy)
|
### Tenant (Multitenancy)
|
||||||
Tenant represents a single customer, interacting with Zenith.
|
Tenant represents a single customer, interacting with Zenith.
|
||||||
Wal redo[] activity, timelines[], snapshots[] are managed for each tenant independently.
|
Wal redo[] activity, timelines[], snapshots[] are managed for each tenant independently.
|
||||||
|
|||||||
@@ -13,7 +13,7 @@ Intended to be used in integration tests and in CLI tools for local installation
|
|||||||
Documentaion of the Zenith features and concepts.
|
Documentaion of the Zenith features and concepts.
|
||||||
Now it is mostly dev documentation.
|
Now it is mostly dev documentation.
|
||||||
|
|
||||||
`monitoring`:
|
`/monitoring`:
|
||||||
|
|
||||||
TODO
|
TODO
|
||||||
|
|
||||||
@@ -72,9 +72,9 @@ The workspace_hack crate exists only to pin down some dependencies.
|
|||||||
Main entry point for the 'zenith' CLI utility.
|
Main entry point for the 'zenith' CLI utility.
|
||||||
TODO: Doesn't it belong to control_plane?
|
TODO: Doesn't it belong to control_plane?
|
||||||
|
|
||||||
`zenith_metrics`:
|
`/zenith_metrics`:
|
||||||
|
|
||||||
TODO
|
Helpers for exposing Prometheus metrics from the server.
|
||||||
|
|
||||||
`/zenith_utils`:
|
`/zenith_utils`:
|
||||||
|
|
||||||
|
|||||||
@@ -8,10 +8,11 @@ The Page Server has a few different duties:
|
|||||||
- Backup to S3
|
- Backup to S3
|
||||||
|
|
||||||
|
|
||||||
The Page Server consists of multiple threads that operate on a shared
|
|
||||||
cache of page versions:
|
|
||||||
|
|
||||||
|
|
||||||
|
The Page Server consists of multiple threads that operate on a shared
|
||||||
|
repository of page versions:
|
||||||
|
|
||||||
| WAL
|
| WAL
|
||||||
V
|
V
|
||||||
+--------------+
|
+--------------+
|
||||||
@@ -23,16 +24,14 @@ cache of page versions:
|
|||||||
+---------+ .......... | |
|
+---------+ .......... | |
|
||||||
| | . . | |
|
| | . . | |
|
||||||
GetPage@LSN | | . backup . -------> | S3 |
|
GetPage@LSN | | . backup . -------> | S3 |
|
||||||
-------------> | Page | page cache . . | |
|
-------------> | Page | repository . . | |
|
||||||
| Service | .......... | |
|
| Service | .......... | |
|
||||||
page | | +----+
|
page | | +----+
|
||||||
<------------- | |
|
<------------- | |
|
||||||
+---------+
|
+---------+ +--------------------+
|
||||||
|
| Checkpointing / |
|
||||||
...................................
|
| Garbage collection |
|
||||||
. .
|
+--------------------+
|
||||||
. Garbage Collection / Compaction .
|
|
||||||
...................................
|
|
||||||
|
|
||||||
Legend:
|
Legend:
|
||||||
|
|
||||||
@@ -52,7 +51,7 @@ Page Service
|
|||||||
------------
|
------------
|
||||||
|
|
||||||
The Page Service listens for GetPage@LSN requests from the Compute Nodes,
|
The Page Service listens for GetPage@LSN requests from the Compute Nodes,
|
||||||
and responds with pages from the page cache.
|
and responds with pages from the repository.
|
||||||
|
|
||||||
|
|
||||||
WAL Receiver
|
WAL Receiver
|
||||||
@@ -61,46 +60,59 @@ WAL Receiver
|
|||||||
The WAL receiver connects to the external WAL safekeeping service (or
|
The WAL receiver connects to the external WAL safekeeping service (or
|
||||||
directly to the primary) using PostgreSQL physical streaming
|
directly to the primary) using PostgreSQL physical streaming
|
||||||
replication, and continuously receives WAL. It decodes the WAL records,
|
replication, and continuously receives WAL. It decodes the WAL records,
|
||||||
and stores them to the page cache repository.
|
and stores them to the repository.
|
||||||
|
|
||||||
|
|
||||||
Page Cache
|
Repository
|
||||||
----------
|
----------
|
||||||
|
|
||||||
The Page Cache is a switchboard to access different Repositories.
|
The repository stores all the page versions, or WAL records needed to
|
||||||
|
reconstruct them. Each tenant has a separate Repository, which is
|
||||||
|
stored in the .zenith/tenants/<tenantid> directory.
|
||||||
|
|
||||||
#### Repository
|
Repository is an abstract trait, defined in `repository.rs`. It is
|
||||||
Repository corresponds to one .zenith directory.
|
implemented by the LayeredRepository object in
|
||||||
Repository is needed to manage Timelines.
|
`layered_repository.rs`. There is only that one implementation of the
|
||||||
Each repository has associated WAL redo service.
|
Repository trait, but it's still a useful abstraction that keeps the
|
||||||
|
|
||||||
There is currently only one implementation of the Repository trait,
|
|
||||||
LayeredRepository, but it's still a useful abstraction that keeps the
|
|
||||||
interface for the low-level storage functionality clean. The layered
|
interface for the low-level storage functionality clean. The layered
|
||||||
storage format is described in layered_repository/README.md.
|
storage format is described in layered_repository/README.md.
|
||||||
|
|
||||||
#### Timeline
|
Each repository consists of multiple Timelines. Timeline is a
|
||||||
Timeline is a page cache workhorse that accepts page changes
|
workhorse that accepts page changes from the WAL, and serves
|
||||||
and serves get_page_at_lsn() and get_rel_size() requests.
|
get_page_at_lsn() and get_rel_size() requests. Note: this has nothing
|
||||||
Note: this has nothing to do with PostgreSQL WAL timeline.
|
to do with PostgreSQL WAL timeline. The term "timeline" is mostly
|
||||||
|
interchangeable with "branch", there is a one-to-one mapping from
|
||||||
|
branch to timeline. A timeline has a unique ID within the tenant,
|
||||||
|
represented as 16-byte hex string that never changes, whereas a
|
||||||
|
branch is a user-given name for a timeline.
|
||||||
|
|
||||||
#### Branch
|
Each repository also has a WAL redo manager associated with it, see
|
||||||
We can create branch at certain LSN.
|
`walredo.rs`. The WAL redo manager is used to replay PostgreSQL WAL
|
||||||
Each Branch lives in a corresponding timeline and has an ancestor.
|
records, whenever we need to reconstruct a page version from WAL to
|
||||||
|
satisfy a GetPage@LSN request, or to avoid accumulating too much WAL
|
||||||
To get full snapshot of data at certain moment we need to traverse timeline and its ancestors.
|
for a page. The WAL redo manager uses a Postgres process running in
|
||||||
|
special zenith wal-redo mode to do the actual WAL redo, and
|
||||||
#### WAL redo service
|
communicates with the process using a pipe.
|
||||||
WAL redo service - service that runs PostgreSQL in a special wal_redo mode
|
|
||||||
to apply given WAL records over an old page image and return new page image.
|
|
||||||
|
|
||||||
|
|
||||||
TODO: Garbage Collection / Compaction
|
Checkpointing / Garbage Collection
|
||||||
-------------------------------------
|
----------------------------------
|
||||||
|
|
||||||
Periodically, the Garbage Collection / Compaction thread runs
|
Periodically, the checkpointer thread wakes up and performs housekeeping
|
||||||
and applies pending WAL records, and removes old page versions that
|
duties on the repository. It has two duties:
|
||||||
are no longer needed.
|
|
||||||
|
### Checkpointing
|
||||||
|
|
||||||
|
Flush WAL that has accumulated in memory to disk, so that the old WAL
|
||||||
|
can be truncated away in the WAL safekeepers. Also, to free up memory
|
||||||
|
for receiving new WAL. This process is called "checkpointing". It's
|
||||||
|
similar to checkpointing in PostgreSQL or other DBMSs, but in the page
|
||||||
|
server, checkpointing happens on a per-segment basis.
|
||||||
|
|
||||||
|
### Garbage collection
|
||||||
|
|
||||||
|
Remove old on-disk layer files that are no longer needed according to the
|
||||||
|
PITR retention policy
|
||||||
|
|
||||||
|
|
||||||
TODO: Backup service
|
TODO: Backup service
|
||||||
|
|||||||
Reference in New Issue
Block a user