mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-26 15:49:58 +00:00
131 lines
4.8 KiB
Plaintext
131 lines
4.8 KiB
Plaintext
## Page server architecture
|
|
|
|
The Page Server has a few different duties:
|
|
|
|
- Respond to GetPage@LSN requests from the Compute Nodes
|
|
- Receive WAL from WAL safekeeper
|
|
- Replay WAL that's applicable to the chunks that the Page Server maintains
|
|
- Backup to S3
|
|
|
|
S3 is the main fault-tolerant storage of all data, as there are no Page Server
|
|
replicas. We use a separate fault-tolerant WAL service to reduce latency. It
|
|
keeps track of WAL records which are not syncted to S3 yet.
|
|
|
|
The Page Server consists of multiple threads that operate on a shared
|
|
repository of page versions:
|
|
|
|
| WAL
|
|
V
|
|
+--------------+
|
|
| |
|
|
| WAL receiver |
|
|
| |
|
|
+--------------+
|
|
+----+
|
|
+---------+ .......... | |
|
|
| | . . | |
|
|
GetPage@LSN | | . backup . -------> | S3 |
|
|
-------------> | Page | repository . . | |
|
|
| Service | .......... | |
|
|
page | | +----+
|
|
<------------- | |
|
|
+---------+ +--------------------+
|
|
| Checkpointing / |
|
|
| Garbage collection |
|
|
+--------------------+
|
|
|
|
Legend:
|
|
|
|
+--+
|
|
| | A thread or multi-threaded service
|
|
+--+
|
|
|
|
....
|
|
. . Component that we will need, but doesn't exist at the moment. A TODO.
|
|
....
|
|
|
|
---> Data flow
|
|
<---
|
|
|
|
|
|
Page Service
|
|
------------
|
|
|
|
The Page Service listens for GetPage@LSN requests from the Compute Nodes,
|
|
and responds with pages from the repository.
|
|
|
|
|
|
WAL Receiver
|
|
------------
|
|
|
|
The WAL receiver connects to the external WAL safekeeping service (or
|
|
directly to the primary) using PostgreSQL physical streaming
|
|
replication, and continuously receives WAL. It decodes the WAL records,
|
|
and stores them to the repository.
|
|
|
|
|
|
Repository
|
|
----------
|
|
|
|
The repository stores all the page versions, or WAL records needed to
|
|
reconstruct them. Each tenant has a separate Repository, which is
|
|
stored in the .zenith/tenants/<tenantid> directory.
|
|
|
|
Repository is an abstract trait, defined in `repository.rs`. It is
|
|
implemented by the LayeredRepository object in
|
|
`layered_repository.rs`. There is only that one implementation of the
|
|
Repository trait, but it's still a useful abstraction that keeps the
|
|
interface for the low-level storage functionality clean. The layered
|
|
storage format is described in layered_repository/README.md.
|
|
|
|
Each repository consists of multiple Timelines. Timeline is a
|
|
workhorse that accepts page changes from the WAL, and serves
|
|
get_page_at_lsn() and get_rel_size() requests. Note: this has nothing
|
|
to do with PostgreSQL WAL timeline. The term "timeline" is mostly
|
|
interchangeable with "branch", there is a one-to-one mapping from
|
|
branch to timeline. A timeline has a unique ID within the tenant,
|
|
represented as 16-byte hex string that never changes, whereas a
|
|
branch is a user-given name for a timeline.
|
|
|
|
Each repository also has a WAL redo manager associated with it, see
|
|
`walredo.rs`. The WAL redo manager is used to replay PostgreSQL WAL
|
|
records, whenever we need to reconstruct a page version from WAL to
|
|
satisfy a GetPage@LSN request, or to avoid accumulating too much WAL
|
|
for a page. The WAL redo manager uses a Postgres process running in
|
|
special zenith wal-redo mode to do the actual WAL redo, and
|
|
communicates with the process using a pipe.
|
|
|
|
|
|
Checkpointing / Garbage Collection
|
|
----------------------------------
|
|
|
|
Periodically, the checkpointer thread wakes up and performs housekeeping
|
|
duties on the repository. It has two duties:
|
|
|
|
### Checkpointing
|
|
|
|
Flush WAL that has accumulated in memory to disk, so that the old WAL
|
|
can be truncated away in the WAL safekeepers. Also, to free up memory
|
|
for receiving new WAL. This process is called "checkpointing". It's
|
|
similar to checkpointing in PostgreSQL or other DBMSs, but in the page
|
|
server, checkpointing happens on a per-segment basis.
|
|
|
|
### Garbage collection
|
|
|
|
Remove old on-disk layer files that are no longer needed according to the
|
|
PITR retention policy
|
|
|
|
|
|
TODO: Backup service
|
|
--------------------
|
|
|
|
The backup service is responsible for periodically pushing the chunks to S3.
|
|
|
|
TODO: How/when do restore from S3? Whenever we get a GetPage@LSN request for
|
|
a chunk we don't currently have? Or when an external Control Plane tells us?
|
|
|
|
TODO: Sharding
|
|
--------------------
|
|
|
|
We should be able to run multiple Page Servers that handle sharded data.
|