## Problem
The scrubber would like to check the highest mtime in a tenant's objects
as a safety check during purges. It recently switched to use
GenericRemoteStorage, so we need to expose that in the listing methods.
## Summary of changes
- In Listing.keys, return a ListingObject{} including a last_modified
field, instead of a RemotePath
---------
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
There is a race condition between timeline shutdown and the split task.
Timeline shutdown first shuts down the upload queue, and only then fires
the cancellation token. A parallel running timeline split operation
might thus encounter a cancelled upload queue before the cancellation
token is fired, and print a noisy error.
Fix this by mapping `anyhow::Error{ NotInitialized::ShuttingDown }) to
`FlushLayerError::Cancelled` instead of `FlushLayerError::Other(_)`.
Fixes#8496
This pull request (should) fix the failure of test_gc_feedback. See the
explanation in the newly-added test case.
Part of https://github.com/neondatabase/neon/issues/8002
Allow incomplete history for the compaction algorithm.
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Re-attach blocks the pageserver http server from starting up. Hence, it
can't reply to heartbeats
until that's done. This makes the storage controller mark the node
off-line (not good). We worked
around this by setting the interval after which nodes are marked offline
to 5 minutes. This isn't a
long term solution.
## Summary of changes
* Introduce a new `NodeAvailability` state: `WarmingUp`. This state
models the following time interval:
* From receiving the re-attach request until the pageserver replies to
the first heartbeat post re-attach
* The heartbeat delta generator becomes aware of this state and uses a
separate longer interval
* Flag `max-warming-up-interval` now models the longer timeout and
`max-offline-interval` the shorter one to
match the names of the states
Closes https://github.com/neondatabase/neon/issues/7552
Before this PR
1.The circuit breaker would trip on CompactionError::Shutdown. That's
wrong, we want to ignore those cases.
2. remote timeline client shutdown would not be mapped to
CompactionError::Shutdown in all circumstances.
We observed this in staging, see
https://neondb.slack.com/archives/C033RQ5SPDH/p1721829745384449
This PR fixes (1) with a simple `match` statement, and (2) by switching
a bunch of `anyhow` usage over to distinguished errors that ultimately
get mapped to `CompactionError::Shutdown`.
I removed the implicit `#[from]` conversion from `anyhow::Error` to
`CompactionError::Other` to discover all the places that were mapping
remote timeline client shutdown to `anyhow::Error`.
In my opinion `#[from]` is an antipattern and we should avoid it,
especially for `anyhow::Error`. If some callee is going to return
anyhow, the very least the caller should to is to acknowledge, through a
`map_err(MyError::Other)` that they're conflating different failure
reasons.
## Problem
PR that modified compaction raced with PR that modified the GcInfo
structure
## Summary of changes
Fix it
Co-authored-by: Vlad Lazar <vlalazar.vlad@gmail.com>
## Problem
The in-memory layer vectored read was very slow in some conditions
(walingest::test_large_rel) test. Upon profiling, I realised that 80% of
the time was spent building up the binary heap of reads. This stage
isn't actually needed.
## Summary of changes
Remove the planning stage as we never took advantage of it in order to
merge reads. There should be no functional change from this patch.
## Problem
Previously, Timeline::gc_info was only updated in a batch operation at
the start of GC. That means that timelines didn't generally have
accurate information about who their children were before the first GC,
or between GC cycles.
Knowledge of child branches is important for calculating layer
visibility in #8398
## Summary of changes
- Split out part of refresh_gc_info into initialize_gc_info, which is
now called early in startup
- Include TimelineId in retain_lsns so that we can later add/remove the
LSNs for particular children
- When timelines are added/removed, update their parent's retain_lsns
## Problem
In `test_basebackup_with_high_slru_count`, the pageserver is sometimes
mysteriously hanging on startup, having been started+stopped earlier in
the test setup while populating template tenant data.
- #7586
We can't see why this is hanging in this particular test. The test does
some weird stuff though, like attaching a load of broken tenants and
then doing a SIGQUIT kill of a pageserver.
## Summary of changes
- Attach tenants normally instead of doing a failpoint dance to attach
them as broken
- Shut the pageserver down gracefully during init instead of using
immediate mode
- Remove the "sequential" variant of the unstable test, as this is going
away soon anyway
- Log before trying to acquire lock file, so that if it hangs we have a
clearer sense of if that's really where it's hanging. It seems like it
is, but that code does a non-blocking flock so it's surprising.
## Problem
LayerAccessStats contains a lot of detail that we don't use: short
histories of most recent accesses, specifics on what kind of task
accessed a layer, etc. This is all stored inside a Mutex, which is
locked every time something accesses a layer.
## Summary of changes
- Store timestamps at a very low resolution (to the nearest second),
sufficient for use on the timescales of eviction.
- Pack access time and last residence change time into a single u64
- Use the high bits of the u64 for other flags, including the new layer
visibility concept.
- Simplify the external-facing model for access stats to just include
what we now track.
Note that the `HistoryBufferWithDropCounter` is removed here because it
is no longer used. I do not dislike this type, we just happen not to use
it for anything else at present.
Co-authored-by: Christian Schwarz <christian@neon.tech>
part of https://github.com/neondatabase/neon/issues/8002
The main thing in this pull request is the new `generate_key_retention`
function. It decides which deltas to retain and generate images for a
given key based on its history + retain_lsn + horizon.
On that, we generate a flat single level of delta layers over all deltas
included in the compaction. In the future, we can decide whether to
split them over the LSN axis as described in the RFC.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
## Problem
Postgres is using `access()` function in `GetNewRelFileNumber` to check
if assigned relfilenumber is not used for any other relation. This check
will not work in Neon, because we do not have all files in local
storage.
## Summary of changes
Use smgrexists() instead which will check at page server if such
relfilenode is used.
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
As described in https://github.com/neondatabase/neon/issues/8398, layer
visibility is a new hint that will help us manage disk space more
efficiently.
## Summary of changes
- Introduce LayerVisibilityHint and store it as part of access stats
- Automatically mark a layer visible if it is accessed, or when it is
created.
The impact on the access stats size will be reversed in
https://github.com/neondatabase/neon/pull/8431
This is functionally a no-op change: subsequent PRs will add the logic
that sets layers to Covered, and which uses the layer visibility as an
input to eviction and heatmap generation.
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
This test sometimes found that ancestors were getting cleaned up before
it had done any compaction.
Compaction was happening implicitly via Workload.
Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8298/10032173390/index.html#testresult/fb04786402f80822/retries
## Summary of changes
- Set upload=False when writing data after shard split, to avoid doing a
checkpoint
- Add a checkpoint_period & explicit wait for uploads so that we ensure
data lands in S3 without doing a checkpoint