The algorithm is the same (with two small exceptions), but rewrite the
way it's implemented to make it easier to follow.
The exceptions:
1. 'min_resident_size' now protects at least that much data in the first
"respectful" phase of the algorithm. Previously, it would evict layers
until the resident size fell below min_resident_size. In other words,
we know protect one more layer of each tenant, so that the resident
size stays just above min_resident_size, while previously we would
evict enough to bring the resident size just under min_resident_size.
2. Previously, the "max layer size" that's used as the default
min_resident_size was calculated from *all* layers in the tenant,
including remote layers. Now it's only calculated across all
locally-present layers. I don't know if that was a deliberate choice,
but this is slightly simpler.
Without this, we run it every p.period, which can be quite low. For
example, the running experiment with 3000 tenants in prod uses a period
of 1 minute.
Doing it once per p.threshold is enough to prevent eviction.
fix synthetic size for (last_record_lsn - gc_horizon) < initdb_lsn
Assume a single-timeline project.
If the gc_horizon covers all WAL (last_record_lsn < gc_horizon)
but we have written more data than just initdb, the synthetic
size calculation worker needs to calculate the logical size
at LSN initdb_lsn (Segment BranchStart).
Before this patch, that calculation would incorrectly return
the initial logical size calculation result that we cache in
the Timeline::initial_logical_size. Presumably, because there
was confusion around initdb_lsn vs. initial size calculation.
The fix is to only hand out the initialized_size() only if
the LSN matches.
The distinction in the metrics between "init logical size" and "logical
size" was also incorrect because of the above. So, remove it.
There was a special case for `size != 0`. This was to cover the case of
LogicalSize::empty_initial(), but `initial_part_end` is `None` in that
case, so the new `LogicalSize::initialized_size()` will return None
in that case as well.
Lastly, to prevent confusion like this in the future, rename all
occurrences of `init_lsn` to either just `lsn` or a more specific name.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
whoami() was never called, 'is_test' was never set.
'restart()' might be useful, but it wasn't hooked up the CLI so it was
dead code. It's not clear what kind of a restart it should perform,
anyway: just restart Postgres, or re-initialize the data directory
from a fresh basebackup like "stop"+"start" does.
To avoid re-downloading evicted files on restart, re-compute logical
size and partitioning before each threshold based eviction run.
Cc: #3802
Co-authored-by: Christian Schwarz <christian@neon.tech>
* Add the REPOSITORY env to build args to avoid the following error when
executing without the credentials for the repository.
```
ERROR: Service 'compute' failed to build: Head
"https://369495373322.dkr.ecr.eu-central-1.amazonaws.com/v2/compute-node-v15/manifests/2221":
no basic auth credentials
```
* update the tag version in the documentation to support storage broker
## Describe your changes
Added a query param to detach API
Allow to remove local state of a tenant even if its not in the memory
(following ignore API)
## Issue ticket number and link
#3828
## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
---------
Co-authored-by: Kirill Bulatov <kirill@neon.tech>
The PR enforces current newest `index_part.json` format in the type
system (version `1`), not allowing any previous forms of it, that were
used in the past.
Similarly, the code to mitigate the
https://github.com/neondatabase/neon/issues/3024 issue is now also
removed.
Current code does not produce old formats and extra files in the
index_part.json, in the future we will be able to use
https://github.com/neondatabase/aversion or other approach to make
version transitions more explicit.
See https://neondb.slack.com/archives/C033RQ5SPDH/p1679134185248119 for
the justification on the breaking changes.
We read the pageserver connection string from the spec file, so let's
read the auth token from the same place.
We've been talking about pre-launching compute nodes that are not
associated with any particular tenant at startup, so that the spec
file is delivered to the compute node later. We cannot change the env
variables after the process has been launched.
We still pass the token to 'postgres' binary in the NEON_AUTH_TOKEN
env variable, but compute_ctl is now responsible for setting it.