Add more details to pageserver and safekeeper docs (#680)

This commit is contained in:
Egor Suvorov
2021-10-05 19:10:50 +03:00
committed by GitHub
parent 7e190d72a5
commit 530d3eaf09
2 changed files with 40 additions and 2 deletions

View File

@@ -7,8 +7,9 @@ The Page Server has a few different duties:
- Replay WAL that's applicable to the chunks that the Page Server maintains
- Backup to S3
S3 is the main fault-tolerant storage of all data, as there are no Page Server
replicas. We use a separate fault-tolerant WAL service to reduce latency. It
keeps track of WAL records which are not syncted to S3 yet.
The Page Server consists of multiple threads that operate on a shared
repository of page versions:

View File

@@ -76,6 +76,43 @@ safekeepers.
See README_PROTO.md for a more detailed desription of the consensus
protocol. spec/ contains TLA+ specification of it.
# Q&A
Q: Why have a separate service instead of connecting Page Server directly to a
primary PostgreSQL node?
A: Page Server is a single server which can be lost. As our primary
fault-tolerant storage is S3, we do not want to wait for it before
committing a transaction. The WAL service acts as a temporary fault-tolerant
storage for recent data before it gets to the Page Server and then finally
to S3. Whenever WALs and pages are committed to S3, WAL's storage can be
trimmed.
Q: What if the compute node evicts a page, needs it back, but the page is yet
to reach the Page Server?
A: If the compute node has evicted a page, all changes from that page are
already committed, i.e. they are saved on majority of WAL safekeepers. These
WAL records will eventually reach the Page Server. The Page Server notes
that the compute note requests pages with a very recent LSN and will not
respond to the compute node until it a corresponding WAL is received from WAL
safekeepers.
Q: How long may Page Server wait for?
A: Not too long, hopefully. If a page is evicted, it probably was not used for
a while, so the WAL service have had enough time to push changes to the Page
Server. There may be issues if there is no backpressure and compute node with
WAL service run ahead of Page Server, though.
There is no backpressure right now, so you may even see some spurious
timeouts in tests.
Q: How do WAL safekeepers communicate with each other?
A: They may only send each other messages via the compute node, they never
communicate directly with each other.
Q: Why have a consensus algorithm if there is only a single compute node?
A: Actually there may be moments with multiple PostgreSQL nodes running at the
same time. E.g. we are bringing one up and one down. We would like to avoid
simultaneous writes from different nodes, so there should be a consensus on
who is the primary node.
# Terminology