Improve Readability in Docs

Signed-off-by: Ryan Russell <ryanrussell@users.noreply.github.com>
2025-12-22 21:59:59 +00:00 · 2022-05-30 07:00:23 -05:00
parent 595a6bc1e1
commit 54e163ac03
12 changed files with 14 additions and 14 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -6,7 +6,7 @@
 - [docker.md](docker.md) — Docker images and building pipeline.
 - [glossary.md](glossary.md) — Glossary of all the terms used in codebase.
 - [multitenancy.md](multitenancy.md) — how multitenancy is organized in the pageserver and Zenith CLI.
- [sourcetree.md](sourcetree.md) — Overview of the source tree layeout.
+- [sourcetree.md](sourcetree.md) — Overview of the source tree layout.
 - [pageserver/README.md](/pageserver/README.md) — pageserver overview.
 - [postgres_ffi/README.md](/libs/postgres_ffi/README.md) — Postgres FFI overview.
 - [test_runner/README.md](/test_runner/README.md) — tests infrastructure overview.
--- a/docs/glossary.md
+++ b/docs/glossary.md
@@ -2,7 +2,7 @@

 ### Authentication

-### Backpresssure
+### Backpressure

 Backpressure is used to limit the lag between pageserver and compute node or WAL service.

--- a/docs/rfcs/003-laptop-cli.md
+++ b/docs/rfcs/003-laptop-cli.md
@@ -136,9 +136,9 @@ s3tank          80G     S3

 ## pg

-Manages postgres data directories and can start postgreses with proper configuration. An experienced user may avoid using that (except pg create) and configure/run postgres by themself.
+Manages postgres data directories and can start postgres instances with proper configuration. An experienced user may avoid using that (except pg create) and configure/run postgres by themselves.

-Pg is a term for a single postgres running on some data. I'm trying to avoid here separation of datadir management and postgres instance management -- both that concepts bundled here together.
+Pg is a term for a single postgres running on some data. I'm trying to avoid separation of datadir management and postgres instance management -- both that concepts bundled here together.

 **zenith pg create** [--no-start --snapshot --cow] -s storage-name -n pgdata

--- a/docs/rfcs/006-laptop-cli-v2-repository-structure.md
+++ b/docs/rfcs/006-laptop-cli-v2-repository-structure.md
@@ -121,7 +121,7 @@ repository, launch an instance on the same branch in both clones, and
 later try to push/pull between them? Perhaps create a new timeline
 every time you start up an instance? Then you would detect that the
 timelines have diverged. That would match with the "epoch" concept
-that we have in the WAL safekeepr
+that we have in the WAL safekeeper

 ### zenith checkout/commit

--- a/docs/rfcs/009-snapshot-first-storage-cli.md
+++ b/docs/rfcs/009-snapshot-first-storage-cli.md
@@ -2,7 +2,7 @@ While working on export/import commands, I understood that they fit really well

 We may think about backups as snapshots in a different format (i.e plain pgdata format, basebackup tar format, WAL-G format (if they want to support it) and so on). They use same storage API, the only difference is the code that packs/unpacks files.

-Even if zenith aims to maintains durability using it's own snapshots, backups will be useful for uploading data from postges to zenith.
+Even if zenith aims to maintains durability using it's own snapshots, backups will be useful for uploading data from postgres to zenith.

 So here is an attempt to design consistent CLI for different usage scenarios:

--- a/docs/rfcs/009-snapshot-first-storage-pitr.md
+++ b/docs/rfcs/009-snapshot-first-storage-pitr.md
@@ -192,7 +192,7 @@ for a particular relation readily available alongside the snapshot
 files, and you don't need to track what snapshot LSNs exist
 separately.

-(If we wanted to minize the number of files, you could include the
+(If we wanted to minimize the number of files, you could include the
 snapshot @300 and the WAL between 200 and 300 in the same file, but I
 feel it's probably better to keep them separate)

--- a/docs/rfcs/009-snapshot-first-storage.md
+++ b/docs/rfcs/009-snapshot-first-storage.md
@@ -121,7 +121,7 @@ The properties of s3 that we depend on are:
 list objects
 streaming read of entire object
 read byte range from object
-streaming write new object (may use multipart upload for better relialibity)
+streaming write new object (may use multipart upload for better reliability)
 delete object (that should not disrupt an already-started read).

 Uploaded files, restored backups, or s3 buckets controlled by users could contain malicious content. We should always validate that objects contain the content they’re supposed to. Incorrect, Corrupt or malicious-looking contents should cause software (cloud tools, pageserver) to fail gracefully.
--- a/docs/rfcs/010-storage_details.md
+++ b/docs/rfcs/010-storage_details.md
@@ -40,7 +40,7 @@ b) overwrite older pages with the newer pages -- if there is no replica we proba

 I imagine that newly created pages would just be added to the back of PageStore (again in queue-like fashion) and this way there wouldn't be any meaningful ordering inside of that queue. When we are forming a new incremental snapshot we may prohibit any updates to the current set of pages in PageStore (giving up on single page version rule) and cut off that whole set when snapshot creation is complete.

-With option b) we can also treat PageStor as an uncompleted increamental snapshot.
+With option b) we can also treat PageStor as an uncompleted incremental snapshot.

 ### LocalStore

@@ -131,7 +131,7 @@ As for exact data that should go to snapshots I think it is the following for ea

 It is also important to be able to load metadata quickly since it would be one of the main factors impacting the time of page server start. E.g. if would store/cache about 10TB of data per page server, the size of uncompressed page references would be about 30GB (10TB / ( 8192 bytes page size / ( ~18 bytes per ObjectTag + 8 bytes offset in the file))).

-1) Since our ToC/array of entries can be sorted by ObjectTag we can store the whole BufferTag only when realtion_id is changed and store only delta-encoded offsets for a given relation. That would reduce the average per-page metadata size to something less than 4 bytes instead of 26 (assuming that pages would follow the same order and offset delatas would be small).
+1) Since our ToC/array of entries can be sorted by ObjectTag we can store the whole BufferTag only when relation_id is changed and store only delta-encoded offsets for a given relation. That would reduce the average per-page metadata size to something less than 4 bytes instead of 26 (assuming that pages would follow the same order and offset deltas would be small).
 2) It makes sense to keep ToC at the beginning of the file to avoid extra seeks to locate it. Doesn't matter too much with the local files but matters on S3 -- if we are accessing a lot of ~1Gb files with the size of metadata ~ 1Mb then the time to transfer this metadata would be comparable with access latency itself (which is about a half of a second). So by slurping metadata with one read of file header instead of N reads we can improve the speed of page server start by this N factor.

 I think both of that optimizations can be done later, but that is something to keep in mind when we are designing our storage serialization routines.
--- a/docs/rfcs/013-term-history.md
+++ b/docs/rfcs/013-term-history.md
@@ -7,7 +7,7 @@ and e.g. prevents electing two proposers with the same term -- it is actually
 called `term` in the code. The second, called `epoch`, reflects progress of log
 receival and this might lag behind `term`; safekeeper switches to epoch `n` when
 it has received all committed log records from all `< n` terms. This roughly
-correspones to proposed in
+corresponds to proposed in

 https://github.com/zenithdb/rfcs/pull/3/files

--- a/docs/settings.md
+++ b/docs/settings.md
@@ -185,7 +185,7 @@ If no IAM bucket access is used during the remote storage usage, use the `AWS_AC

 ###### General remote storage configuration

-Pagesever allows only one remote storage configured concurrently and errors if parameters from multiple different remote configurations are used.
+Pageserver allows only one remote storage configured concurrently and errors if parameters from multiple different remote configurations are used.
 No default values are used for the remote storage configuration parameters.

 Besides, there are parameters common for all types of remote storage that can be configured, those have defaults:
--- a/pageserver/src/layered_repository/README.md
+++ b/pageserver/src/layered_repository/README.md
@@ -260,7 +260,7 @@ Whenever a GetPage@LSN request comes in from the compute node, the
 page server needs to reconstruct the requested page, as it was at the
 requested LSN. To do that, the page server first checks the recent
 in-memory layer; if the requested page version is found there, it can
-be returned immediatedly without looking at the files on
+be returned immediately without looking at the files on
 disk. Otherwise the page server needs to locate the layer file that
 contains the requested page version.

--- a/safekeeper/README_PROTO.md
+++ b/safekeeper/README_PROTO.md
@@ -152,7 +152,7 @@ It is assumed that in case of losing local data by some safekeepers, it should b
 * `FlushLSN`: part of WAL persisted to the disk by safekeeper.
 * `NodeID`: pair (term,UUID)
 * `Pager`: Neon component restoring pages from WAL stream
-* `Replica`: read-only computatio node
+* `Replica`: read-only computation node
 * `VCL`: the largest LSN for which we can guarantee availability of all prior records.

 ## Algorithm