Before this patch, the range from which the random delay is picked
is at minimum 10 seconds.
The disk usage eviction tests are apparently the first to wait for
a background loop to do its job.
With this patch, they only need to wait the 1s which they configure.
The MinResidentSizePartition is effectively what `overage` was earlier,
but more expressive and outside of EvictionCandidates.
So switch the code back to a single list,
but use (MinResidentSizePartition, EvictionCandidates) tuples.
That eliminates the need for iter_in_eviction_order() alltogether.
It consumes 8 bytes more memory per candidate, but, that doesn't matter
for now.
The algorithm is the same (with two small exceptions), but rewrite the
way it's implemented to make it easier to follow.
The exceptions:
1. 'min_resident_size' now protects at least that much data in the first
"respectful" phase of the algorithm. Previously, it would evict layers
until the resident size fell below min_resident_size. In other words,
we know protect one more layer of each tenant, so that the resident
size stays just above min_resident_size, while previously we would
evict enough to bring the resident size just under min_resident_size.
2. Previously, the "max layer size" that's used as the default
min_resident_size was calculated from *all* layers in the tenant,
including remote layers. Now it's only calculated across all
locally-present layers. I don't know if that was a deliberate choice,
but this is slightly simpler.
Without this, we run it every p.period, which can be quite low. For
example, the running experiment with 3000 tenants in prod uses a period
of 1 minute.
Doing it once per p.threshold is enough to prevent eviction.
fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn
Assume a single-timeline project.
If the gc_horizon covers all WAL (last_record_lsn < gc_horizon)
but we have written more data than just initdb, the synthetic
size calculation worker needs to calculate the logical size
at LSN initdb_lsn (Segment BranchStart).
Before this patch, that calculation would incorrectly return
the initial logical size calculation result that we cache in
the Timeline::initial_logical_size. Presumably, because there
was confusion around initdb_lsn vs. initial size calculation.
The fix is to only hand out the initialized_size() only if
the LSN matches.
The distinction in the metrics between "init logical size" and "logical
size" was also incorrect because of the above. So, remove it.
There was a special case for `size != 0`. This was to cover the case of
LogicalSize::empty_initial(), but `initial_part_end` is `None` in that
case, so the new `LogicalSize::initialized_size()` will return None
in that case as well.
Lastly, to prevent confusion like this in the future, rename all
occurrences of `init_lsn` to either just `lsn` or a more specific name.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>