diff --git a/pageserver/README b/pageserver/README index 7595fb7738..0f858bb4ed 100644 --- a/pageserver/README +++ b/pageserver/README @@ -7,8 +7,9 @@ The Page Server has a few different duties: - Replay WAL that's applicable to the chunks that the Page Server maintains - Backup to S3 - - +S3 is the main fault-tolerant storage of all data, as there are no Page Server +replicas. We use a separate fault-tolerant WAL service to reduce latency. It +keeps track of WAL records which are not syncted to S3 yet. The Page Server consists of multiple threads that operate on a shared repository of page versions: diff --git a/walkeeper/README b/walkeeper/README index db8deda337..6c5a69e926 100644 --- a/walkeeper/README +++ b/walkeeper/README @@ -76,6 +76,43 @@ safekeepers. See README_PROTO.md for a more detailed desription of the consensus protocol. spec/ contains TLA+ specification of it. +# Q&A + +Q: Why have a separate service instead of connecting Page Server directly to a + primary PostgreSQL node? +A: Page Server is a single server which can be lost. As our primary + fault-tolerant storage is S3, we do not want to wait for it before + committing a transaction. The WAL service acts as a temporary fault-tolerant + storage for recent data before it gets to the Page Server and then finally + to S3. Whenever WALs and pages are committed to S3, WAL's storage can be + trimmed. + +Q: What if the compute node evicts a page, needs it back, but the page is yet + to reach the Page Server? +A: If the compute node has evicted a page, all changes from that page are + already committed, i.e. they are saved on majority of WAL safekeepers. These + WAL records will eventually reach the Page Server. The Page Server notes + that the compute note requests pages with a very recent LSN and will not + respond to the compute node until it a corresponding WAL is received from WAL + safekeepers. + +Q: How long may Page Server wait for? +A: Not too long, hopefully. If a page is evicted, it probably was not used for + a while, so the WAL service have had enough time to push changes to the Page + Server. There may be issues if there is no backpressure and compute node with + WAL service run ahead of Page Server, though. + There is no backpressure right now, so you may even see some spurious + timeouts in tests. + +Q: How do WAL safekeepers communicate with each other? +A: They may only send each other messages via the compute node, they never + communicate directly with each other. + +Q: Why have a consensus algorithm if there is only a single compute node? +A: Actually there may be moments with multiple PostgreSQL nodes running at the + same time. E.g. we are bringing one up and one down. We would like to avoid + simultaneous writes from different nodes, so there should be a consensus on + who is the primary node. # Terminology