diff --git a/docs/synthetic-size.md b/docs/synthetic-size.md new file mode 100644 index 0000000000..8378efc842 --- /dev/null +++ b/docs/synthetic-size.md @@ -0,0 +1,335 @@ +# Synthetic size + +Neon storage has copy-on-write branching, which makes it difficult to +answer the question "how large is my database"? To give one reasonable +answer, we calculate _synthetic size_ for a project. + +The calculation is called "synthetic", because it is based purely on +the user-visible logical size, which is the size that you would see on +a standalone PostgreSQL installation, and the amount of WAL, which is +also the same as what you'd see on a standalone PostgreSQL, for the +same set of updates. + +The synthetic size does *not* depend on the actual physical size +consumed in the storage, or implementation details of the Neon storage +like garbage collection, compaction and compression. There is a +strong *correlation* between the physical size and the synthetic size, +but the synthetic size is designed to be independent of the +implementation details, so that any improvements we make in the +storage system simply reduce our COGS. And vice versa: any bugs or bad +implementation where we keep more data than we would need to, do not +change the synthetic size or incur any costs to the user. + +The synthetic size is calculated for the whole project. It is not +straighforward to attribute size to individual branches. See "What is +the size of an individual branch?" for discussion on those +difficulties. + +The synthetic size is designed to: + +- Take into account the copy-on-write nature of the storage. For + example, if you create a branch, it doesn't immediately add anything + to the synthetic size. It starts to affect the synthetic size only + as it diverges from the parent branch. + +- Be independent of any implementation details of the storage, like + garbage collection, remote storage, or compression. + +## Terms & assumptions + +- logical size is the size of a database *at a given point in + time*. It's the total size of all tables in all databases, as you + see with "\l+" in psql for example, plus the Postgres SLRUs and some + small amount of metadata. NOTE that currently, Neon does not include + the SLRUs and metadata in the logical size. See comment to `get_current_logical_size_non_incremental()`. + +- a "point in time" is defined as an LSN value. You can convert a + timestamp to an LSN, but the storage internally works with LSNs. + +- PITR horizon can be set per-branch. + +- PITR horizon can be set as a time interval, e.g. 5 days or hours, or + as amount of WAL, in bytes. If it's given as a time interval, it's + converted to an LSN for the calculation. + +- PITR horizon can be set to 0, if you don't want to retain any history. + +## Calculation + +Inputs to the calculation are: +- logical size of the database at different points in time, +- amount of WAL generated, and +- the PITR horizon settings + +The synthetic size is based on an idealistic model of the storage +system, where we pretend that the storage consists of two things: +- snapshots, containing a full snapshot of the database, at a given + point in time, and +- WAL. + +In the simple case that the project contains just one branch (main), +and a fixed PITR horizon, the synthetic size is the sum of: + +- the logical size of the branch *at the beginning of the PITR + horizon*, i.e. at the oldest point that you can still recover to, and +- the size of the WAL covering the PITR horizon. + +The snapshot allows you to recover to the beginning of the PITR +horizon, and the WAL allows you to recover from that point to any +point within the horizon. + +``` + WAL + -----------------------#########> + ^ + snapshot + +Legend: + ##### PITR horizon. This is the region that you can still access + with Point-in-time query and you can still create branches + from. + ----- history that has fallen out of the PITR horizon, and can no + longer be accessed +``` + +NOTE: This is not how the storage system actually works! The actual +implementation is also based on snapshots and WAL, but the snapshots +are taken for individual database pages and ranges of pages rather +than the whole database, and it is much more complicated. This model +is a reasonable approximation, however, to make the synthetic size a +useful proxy for the actual storage consumption. + + +## Example: Data is INSERTed + +For example, let's assume that your database contained 10 GB of data +at the beginning of the PITR horizon, and you have since then inserted +5 GB of additional data into it. The additional insertions of 5 GB of +data consume roughly 5 GB of WAL. In that case, the synthetic size is: + +> 10 GB (snapshot) + 5 GB (WAL) = 15 GB + +If you now set the PITR horizon on the project to 0, so that no +historical data is retained, then the beginning PITR horizon would be +at the end of the branch, so the size of the snapshot would be +calculated at the end of the branch, after the insertions. Then the +synthetic size is: + +> 15 GB (snapshot) + 0 GB (WAL) = 15 GB. + +In this case, the synthetic size is the same, regardless of the PITR horizon, +because all the history consists of inserts. The newly inserted data takes +up the same amount of space, whether it's stored as part of the logical +snapshot, or as WAL. (*) + +(*) This is a rough approximation. In reality, the WAL contains +headers and other overhead, and on the other hand, the logical +snapshot includes empty space on pages, so the size of insertions in +WAL can be smaller or greater than the size of the final table after +the insertions. But in most cases, it's in the same ballpark. + +## Example: Data is DELETEd + +Let's look at another example: + +Let's start again with a database that contains 10 GB of data. Then, +you DELETE 5 GB of the data, and run VACUUM to free up the space, so +that the logical size of the database is now only 5 GB. + +Let's assume that the WAL for the deletions and the vacuum take up +100 MB of space. In that case, the synthetic size of the project is: + +> 10 GB (snapshot) + 100 MB (WAL) = 10.1 GB + +This is much larger than the logical size of the database after the +deletions (5 GB). That's because the system still needs to retain the +deleted data, because it's still accessible to queries and branching +in the PITR window. + +If you now set the PITR horizon to 0 or just wait for time to pass so +that the data falls out of the PITR horizon, making the deleted data +inaccessible, the synthetic size shrinks: + +> 5 GB (snapshot) + 0 GB (WAL) = 5 GB + + +# Branching + +Things get more complicated with branching. Branches in Neon are +copy-on-write, which is also reflected in the synthetic size. + +When you create a branch, it doesn't immediately change the synthetic +size at all. The branch point is within the PITR horizon, and all the +data needed to recover to that point in time needs to be retained +anyway. + +However, if you make modifications on the branch, the system needs to +keep the WAL of those modifications. The WAL is included in the +synthetic size. + +## Example: branch and INSERT + +Let's assume that you again start with a 10 GB database. +On the main branch, you insert 2 GB of data. Then you create +a branch at that point, and insert another 3 GB of data on the +main branch, and 1 GB of data on the child branch + +``` + child +#####> + | + | WAL + main ---------###############> + ^ + snapshot +``` + +In this case, the synthetic size consists of: +- the snapshot at the beginning of the PITR horizon (10 GB) +- the WAL on the main branch (2 GB + 3 GB = 5 GB) +- the WAL on the child branch (1 GB) + +Total: 16 GB + +# Diverging branches + +If there is only a small amount of changes in the database on the +different branches, as in the previous example, the synthetic size +consists of a snapshot before the branch point, containing all the +shared data, and the WAL on both branches. However, if the branches +diverge a lot, it is more efficient to store a separate snapshot of +branches. + +## Example: diverging branches + +You start with a 10 GB database. You insert 5 GB of data on the main +branch. Then you create a branch, and immediately delete all the data +on the child branch and insert 5 GB of new data to it. Then you do the +same on the main branch. Let's assume +that the PITR horizon requires keeping the last 1 GB of WAL on the +both branches. + +``` + snapshot + v WAL + child +---------##############> + | + | + main -------------+---------##############> + ^ WAL + snapshot +``` + +In this case, the synthetic size consists of: +- snapshot at the beginning of the PITR horizon on the main branch (4 GB) +- WAL on the main branch (1 GB) +- snapshot at the beginning of the PITR horizon on the child branch (4 GB) +- last 1 GB of WAL on the child branch (1 GB) + +Total: 10 GB + +The alternative way to store this would be to take only one snapshot +at the beginning of branch point, and keep all the WAL on both +branches. However, the size with that method would be larger, as it +would require one 10 GB snapshot, and 5 GB + 5 GB of WAL. It depends +on the amount of changes (WAL) on both branches, and the logical size +at the branch point, which method would result in a smaller synthetic +size. On each branch point, the system performs the calculation with +both methods, and uses the method that is cheaper, i.e. the one that +results in a smaller synthetic size. + +One way to think about this is that when you create a branch, it +starts out as a thin branch that only stores the WAL since the branch +point. As you modify it, and the amount of WAL grows, at some point +it becomes cheaper to store a completely new snapshot of the branch +and truncate the WAL. + + +# What is the size of an individual branch? + +Synthetic size is calculated for the whole project, and includes all +branches. There is no such thing as the size of a branch, because it +is not straighforward to attribute the parts of size to individual +branches. + +## Example: attributing size to branches + +(copied from https://github.com/neondatabase/neon/pull/2884#discussion_r1029365278) + +Imagine that you create two branches, A and B, at the same point from +main branch, and do a couple of small updates on both branches. Then +six months pass, and during those six months the data on the main +branch churns over completely multiple times. The retention period is, +say 1 month. + +``` + +------> A + / +--------------------*-------------------------------> main + \ + +--------> B +``` + +In that situation, the synthetic tenant size would be calculated based +on a "logical snapshot" at the branch point, that is, the logical size +of the database at that point. Plus the WAL on branches A and B. Let's +say that the snapshot size is 10 GB, and the WAL is 1 MB on both +branches A and B. So the total synthetic storage size is 10002 +MB. (Let's ignore the main branch for now, that would be just added to +the sum) + +How would you break that down per branch? I can think of three +different ways to do it, and all of them have their own problems: + +### Subtraction method + +For each branch, calculate how much smaller the total synthetic size +would be, if that branch didn't exist. In other words, how much would +you save if you dropped the branch. With this method, the size of +branches A and B is 1 MB. + +With this method, the 10 GB shared logical snapshot is not included +for A nor B. So the size of all branches is not equal to the total +synthetic size of the tenant. If you drop branch A, you save 1 MB as +you'd expect, but also the size of B suddenly jumps from 1 MB to 10001 +MB, which might feel surprising. + +### Division method + +Divide the common parts evenly across all branches that need +them. With this method, the size of branches A and B would be 5001 MB. + +With this method, the sum of all branches adds up to the total +synthetic size. But it's surprising in other ways: if you drop branch +A, you might think that you save 5001 MB, but in reality you only save +1 MB, and the size of branch B suddenly grows from 5001 to 10001 MB. + +### Addition method + +For each branch, include all the snapshots and WAL that it depends on, +even if some of them are shared by other branches. With this method, +the size of branches A and B would be 10001 MB. + +The surprise with this method is that the sum of all the branches is +larger than the total synthetic size. And if you drop branch A, the +total synthetic size doesn't fall by 10001 MB as you might think. + +# Alternatives + +A sort of cop-out method would be to show the whole tree of branches +graphically, and for each section of WAL or logical snapshot, display +the size of that section. You can then see which branches depend on +which sections, which sections are shared etc. That would be good to +have in the UI anyway. + +Or perhaps calculate per-branch numbers using the subtraction method, +and in addition to that, one more number for "shared size" that +includes all the data that is needed by more than one branch. + +## Which is the right method? + +The bottom line is that it's not straightforward to attribute the +synthetic size to individual branches. There are things we can do, and +all of those methods are pretty straightforward to implement, but they +all have their own problems. What makes sense depends a lot on what +you want to do with the number, what question you are trying to +answer.