mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-22 21:59:59 +00:00
Misc doc updates
This commit is contained in:
@@ -4,7 +4,9 @@
|
||||
|
||||
- [authentication.md](authentication.md) — pageserver JWT authentication.
|
||||
- [docker.md](docker.md) — Docker images and building pipeline.
|
||||
- [glossary.md](glossary.md) — Glossary of all the terms used in codebase.
|
||||
- [multitenancy.md](multitenancy.md) — how multitenancy is organized in the pageserver and Zenith CLI.
|
||||
- [sourcetree.md](sourcetree.md) — Overview of the source tree layeout.
|
||||
- [pageserver/README](/pageserver/README) — pageserver overview.
|
||||
- [postgres_ffi/README](/postgres_ffi/README) — Postgres FFI overview.
|
||||
- [test_runner/README.md](/test_runner/README.md) — tests infrastructure overview.
|
||||
|
||||
@@ -26,8 +26,8 @@ A checkpoint record in the WAL marks a point in the WAL sequence at which it is
|
||||
NOTE: This is an overloaded term.
|
||||
|
||||
Whenever enough WAL has been accumulated in memory, the page server []
|
||||
writes out the changes in memory into new snapshot files[]. This process
|
||||
is called "checkpointing". The page server only creates snapshot files for
|
||||
writes out the changes in memory into new layer files[]. This process
|
||||
is called "checkpointing". The page server only creates layer files for
|
||||
relations that have been modified since the last checkpoint.
|
||||
|
||||
### Compute node
|
||||
@@ -41,6 +41,15 @@ Stateless Postgres node that stores data in pageserver.
|
||||
Each of the separate segmented file sets in which a relation is stored. The main fork is where the actual data resides. There also exist two secondary forks for metadata: the free space map and the visibility map.
|
||||
Each PostgreSQL fork is considered a separate relish.
|
||||
|
||||
### Layer file
|
||||
|
||||
Layered repository on-disk format is based on immutable files. The
|
||||
files are called "layer files". Each file corresponds to one 10 MB
|
||||
segment of a PostgreSQL relation fork. There are two kinds of layer
|
||||
files: image files and delta files. An image file contains a
|
||||
"snapshot" of the segment at a particular LSN, and a delta file
|
||||
contains WAL records applicable to the segment, in a range of LSNs.
|
||||
|
||||
### Layered repository
|
||||
|
||||
### LSN
|
||||
@@ -102,6 +111,10 @@ Repository stores multiple timelines, forked off from the same initial call to '
|
||||
and has associated WAL redo service.
|
||||
One repository corresponds to one Tenant.
|
||||
|
||||
### Retention policy
|
||||
|
||||
How much history do we need to keep around for PITR and read-only nodes?
|
||||
|
||||
### SLRU
|
||||
|
||||
SLRUs include pg_clog, pg_multixact/members, and
|
||||
@@ -110,16 +123,6 @@ they don't need to be stored permanently (e.g. pg_subtrans),
|
||||
or we do not support them in zenith yet (pg_commit_ts).
|
||||
Each SLRU segment is considered a separate relish[].
|
||||
|
||||
### Snapshot file
|
||||
|
||||
Layered repository on-disk format is based on immutable files.
|
||||
The files are called "snapshot files".
|
||||
Each snapshot file contains a full snapshot, that is, full copy of all
|
||||
pages in the relation, as of the "start LSN". It also contains all WAL
|
||||
records applicable to the relation between the start and end
|
||||
LSNs.
|
||||
Each snapshot file corresponds to one 10 MB slice of a PostgreSQL relation fork.
|
||||
|
||||
### Tenant (Multitenancy)
|
||||
Tenant represents a single customer, interacting with Zenith.
|
||||
Wal redo[] activity, timelines[], snapshots[] are managed for each tenant independently.
|
||||
|
||||
@@ -13,7 +13,7 @@ Intended to be used in integration tests and in CLI tools for local installation
|
||||
Documentaion of the Zenith features and concepts.
|
||||
Now it is mostly dev documentation.
|
||||
|
||||
`monitoring`:
|
||||
`/monitoring`:
|
||||
|
||||
TODO
|
||||
|
||||
@@ -72,9 +72,9 @@ The workspace_hack crate exists only to pin down some dependencies.
|
||||
Main entry point for the 'zenith' CLI utility.
|
||||
TODO: Doesn't it belong to control_plane?
|
||||
|
||||
`zenith_metrics`:
|
||||
`/zenith_metrics`:
|
||||
|
||||
TODO
|
||||
Helpers for exposing Prometheus metrics from the server.
|
||||
|
||||
`/zenith_utils`:
|
||||
|
||||
|
||||
@@ -8,10 +8,11 @@ The Page Server has a few different duties:
|
||||
- Backup to S3
|
||||
|
||||
|
||||
The Page Server consists of multiple threads that operate on a shared
|
||||
cache of page versions:
|
||||
|
||||
|
||||
The Page Server consists of multiple threads that operate on a shared
|
||||
repository of page versions:
|
||||
|
||||
| WAL
|
||||
V
|
||||
+--------------+
|
||||
@@ -23,16 +24,14 @@ cache of page versions:
|
||||
+---------+ .......... | |
|
||||
| | . . | |
|
||||
GetPage@LSN | | . backup . -------> | S3 |
|
||||
-------------> | Page | page cache . . | |
|
||||
-------------> | Page | repository . . | |
|
||||
| Service | .......... | |
|
||||
page | | +----+
|
||||
<------------- | |
|
||||
+---------+
|
||||
|
||||
...................................
|
||||
. .
|
||||
. Garbage Collection / Compaction .
|
||||
...................................
|
||||
+---------+ +--------------------+
|
||||
| Checkpointing / |
|
||||
| Garbage collection |
|
||||
+--------------------+
|
||||
|
||||
Legend:
|
||||
|
||||
@@ -52,7 +51,7 @@ Page Service
|
||||
------------
|
||||
|
||||
The Page Service listens for GetPage@LSN requests from the Compute Nodes,
|
||||
and responds with pages from the page cache.
|
||||
and responds with pages from the repository.
|
||||
|
||||
|
||||
WAL Receiver
|
||||
@@ -61,46 +60,59 @@ WAL Receiver
|
||||
The WAL receiver connects to the external WAL safekeeping service (or
|
||||
directly to the primary) using PostgreSQL physical streaming
|
||||
replication, and continuously receives WAL. It decodes the WAL records,
|
||||
and stores them to the page cache repository.
|
||||
and stores them to the repository.
|
||||
|
||||
|
||||
Page Cache
|
||||
Repository
|
||||
----------
|
||||
|
||||
The Page Cache is a switchboard to access different Repositories.
|
||||
The repository stores all the page versions, or WAL records needed to
|
||||
reconstruct them. Each tenant has a separate Repository, which is
|
||||
stored in the .zenith/tenants/<tenantid> directory.
|
||||
|
||||
#### Repository
|
||||
Repository corresponds to one .zenith directory.
|
||||
Repository is needed to manage Timelines.
|
||||
Each repository has associated WAL redo service.
|
||||
|
||||
There is currently only one implementation of the Repository trait,
|
||||
LayeredRepository, but it's still a useful abstraction that keeps the
|
||||
Repository is an abstract trait, defined in `repository.rs`. It is
|
||||
implemented by the LayeredRepository object in
|
||||
`layered_repository.rs`. There is only that one implementation of the
|
||||
Repository trait, but it's still a useful abstraction that keeps the
|
||||
interface for the low-level storage functionality clean. The layered
|
||||
storage format is described in layered_repository/README.md.
|
||||
|
||||
#### Timeline
|
||||
Timeline is a page cache workhorse that accepts page changes
|
||||
and serves get_page_at_lsn() and get_rel_size() requests.
|
||||
Note: this has nothing to do with PostgreSQL WAL timeline.
|
||||
Each repository consists of multiple Timelines. Timeline is a
|
||||
workhorse that accepts page changes from the WAL, and serves
|
||||
get_page_at_lsn() and get_rel_size() requests. Note: this has nothing
|
||||
to do with PostgreSQL WAL timeline. The term "timeline" is mostly
|
||||
interchangeable with "branch", there is a one-to-one mapping from
|
||||
branch to timeline. A timeline has a unique ID within the tenant,
|
||||
represented as 16-byte hex string that never changes, whereas a
|
||||
branch is a user-given name for a timeline.
|
||||
|
||||
#### Branch
|
||||
We can create branch at certain LSN.
|
||||
Each Branch lives in a corresponding timeline and has an ancestor.
|
||||
|
||||
To get full snapshot of data at certain moment we need to traverse timeline and its ancestors.
|
||||
|
||||
#### WAL redo service
|
||||
WAL redo service - service that runs PostgreSQL in a special wal_redo mode
|
||||
to apply given WAL records over an old page image and return new page image.
|
||||
Each repository also has a WAL redo manager associated with it, see
|
||||
`walredo.rs`. The WAL redo manager is used to replay PostgreSQL WAL
|
||||
records, whenever we need to reconstruct a page version from WAL to
|
||||
satisfy a GetPage@LSN request, or to avoid accumulating too much WAL
|
||||
for a page. The WAL redo manager uses a Postgres process running in
|
||||
special zenith wal-redo mode to do the actual WAL redo, and
|
||||
communicates with the process using a pipe.
|
||||
|
||||
|
||||
TODO: Garbage Collection / Compaction
|
||||
-------------------------------------
|
||||
Checkpointing / Garbage Collection
|
||||
----------------------------------
|
||||
|
||||
Periodically, the Garbage Collection / Compaction thread runs
|
||||
and applies pending WAL records, and removes old page versions that
|
||||
are no longer needed.
|
||||
Periodically, the checkpointer thread wakes up and performs housekeeping
|
||||
duties on the repository. It has two duties:
|
||||
|
||||
### Checkpointing
|
||||
|
||||
Flush WAL that has accumulated in memory to disk, so that the old WAL
|
||||
can be truncated away in the WAL safekeepers. Also, to free up memory
|
||||
for receiving new WAL. This process is called "checkpointing". It's
|
||||
similar to checkpointing in PostgreSQL or other DBMSs, but in the page
|
||||
server, checkpointing happens on a per-segment basis.
|
||||
|
||||
### Garbage collection
|
||||
|
||||
Remove old on-disk layer files that are no longer needed according to the
|
||||
PITR retention policy
|
||||
|
||||
|
||||
TODO: Backup service
|
||||
|
||||
Reference in New Issue
Block a user