Commit Graph

2334 Commits

Author SHA1 Message Date
Joonas Koivunen
44cdb9fb58 refactor: reparented_direct_children query 2024-07-26 14:39:31 +00:00
Joonas Koivunen
cdfaf0700f fix: bifurcate the detach+reparent step 2024-07-26 14:39:31 +00:00
Joonas Koivunen
881e1ad056 refactor: no need to collect reparentable here 2024-07-26 14:39:31 +00:00
Joonas Koivunen
bb3d70e24d fix: properly cancel if any reparenting failed 2024-07-26 14:39:31 +00:00
Joonas Koivunen
c6c560e4c8 rewrite to include testing assertion 2024-07-26 14:39:31 +00:00
Joonas Koivunen
8dd332aed5 doc: remove unnecessary comment 2024-07-26 14:39:31 +00:00
Joonas Koivunen
5c03a17eb8 wip: some progress
now we hit the todo! in "already detached" path.
2024-07-26 14:39:31 +00:00
Joonas Koivunen
402d66778e make reparenting operations idempotent 2024-07-26 14:39:31 +00:00
Joonas Koivunen
39e2bc932f prepare to reparent while gc blocked 2024-07-26 14:39:31 +00:00
Joonas Koivunen
5fc034fa7f feat: block gc persistently until detach ancestor completes 2024-07-26 14:39:31 +00:00
Joonas Koivunen
f9b12def0b add support for WaitToActivate errors 2024-07-26 14:39:31 +00:00
Joonas Koivunen
5d0071447c partial: index_part.json support for ongoing_detach_ancestor 2024-07-26 14:39:31 +00:00
Joonas Koivunen
409e2eff9e fix: run upload_rewritten_layer in a span
there was a weird failure observed with CI tests: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8430/10108870590/index.html#suites/a1c2be32556270764423c495fad75d47/94a4686382b96297
2024-07-26 14:39:31 +00:00
Joonas Koivunen
e6e3b9a716 doc: remove on_gc_task_start fixme 2024-07-26 08:52:55 +00:00
Joonas Koivunen
7f31a3f671 forgotten rename, maybe 2024-07-26 08:52:55 +00:00
Joonas Koivunen
9971ae3d24 rename is_detached_from_{original_,}ancestor (just the rename) 2024-07-26 08:52:55 +00:00
Joonas Koivunen
48a2a20de3 chore: derive default 2024-07-26 08:52:55 +00:00
Joonas Koivunen
29ef8f15ce chore: unused variable 2024-07-26 08:52:55 +00:00
Joonas Koivunen
5e45dd3f86 rename SharedState::notify
to continue_existing_attempt
2024-07-26 08:52:55 +00:00
Joonas Koivunen
5fced442d7 warning caused by removed body 2024-07-26 08:52:55 +00:00
Joonas Koivunen
4222610233 cleanup index part dependent 2024-07-26 08:52:55 +00:00
Joonas Koivunen
92deb0dfd7 plumbing: collect timelines index parts 2024-07-26 08:52:55 +00:00
Joonas Koivunen
46ca6f17c5 plumbing: notify shared state of existing attempt 2024-07-26 08:52:55 +00:00
Joonas Koivunen
14869abb77 complete the plumbing with non-notifying attempt_blocks_gc impl 2024-07-26 08:52:55 +00:00
Joonas Koivunen
5330fd9366 doc(fixme): shared state 2024-07-26 08:52:55 +00:00
Joonas Koivunen
6c5b3b7812 doc: more sketched api comments 2024-07-26 08:52:55 +00:00
Joonas Koivunen
849fe0f191 plumb the shared state through
the api for the gc pausing is quite awkward.
2024-07-26 08:52:55 +00:00
Joonas Koivunen
f564b66f21 shared state sketch 2024-07-26 08:52:55 +00:00
Joonas Koivunen
2e58ccee78 temp: planning 2024-07-26 08:52:55 +00:00
Joonas Koivunen
0ad31bb7fb doc: remove obsolete FIXME
this was cleared with partial metadata updates.
2024-07-26 08:52:55 +00:00
Joonas Koivunen
86f26d0918 chore: minor rename FIXME in IndexPart 2024-07-26 08:52:55 +00:00
Joonas Koivunen
4a562dff2e doc: more 2024-07-26 08:52:55 +00:00
Joonas Koivunen
f9185b42a9 doc: minor enhancements 2024-07-26 08:52:55 +00:00
Joonas Koivunen
d4f30daa81 chore: minor indentation problem 2024-07-26 08:52:55 +00:00
John Spray
6711087ddf remote_storage: expose last_modified in listings (#8497)
## Problem

The scrubber would like to check the highest mtime in a tenant's objects
as a safety check during purges. It recently switched to use
GenericRemoteStorage, so we need to expose that in the listing methods.

## Summary of changes

- In Listing.keys, return a ListingObject{} including a last_modified
field, instead of a RemotePath

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-07-26 10:57:52 +03:00
Arpad Müller
8e02db1ab9 Handle NotInitialized::ShuttingDown error in shard split (#8506)
There is a race condition between timeline shutdown and the split task.
Timeline shutdown first shuts down the upload queue, and only then fires
the cancellation token. A parallel running timeline split operation
might thus encounter a cancelled upload queue before the cancellation
token is fired, and print a noisy error.

Fix this by mapping `anyhow::Error{ NotInitialized::ShuttingDown }) to
`FlushLayerError::Cancelled` instead of `FlushLayerError::Other(_)`.


Fixes #8496
2024-07-26 02:16:10 +02:00
Alex Chi Z.
bea0468f1f fix(pageserver): allow incomplete history in btm-gc-compaction (#8500)
This pull request (should) fix the failure of test_gc_feedback. See the
explanation in the newly-added test case.

Part of https://github.com/neondatabase/neon/issues/8002

Allow incomplete history for the compaction algorithm.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-07-25 12:56:37 -04:00
Vlad Lazar
9c5ad21341 storcon: make heartbeats restart aware (#8222)
## Problem
Re-attach blocks the pageserver http server from starting up. Hence, it
can't reply to heartbeats
until that's done. This makes the storage controller mark the node
off-line (not good). We worked
around this by setting the interval after which nodes are marked offline
to 5 minutes. This isn't a
long term solution.

## Summary of changes
* Introduce a new `NodeAvailability` state: `WarmingUp`. This state
models the following time interval:
* From receiving the re-attach request until the pageserver replies to
the first heartbeat post re-attach
* The heartbeat delta generator becomes aware of this state and uses a
separate longer interval
* Flag `max-warming-up-interval` now models the longer timeout and
`max-offline-interval` the shorter one to
match the names of the states

Closes https://github.com/neondatabase/neon/issues/7552
2024-07-25 14:09:12 +01:00
Christian Schwarz
a1256b2a67 fix: remote timeline client shutdown trips circuit breaker (#8495)
Before this PR

1.The circuit breaker would trip on CompactionError::Shutdown. That's
wrong, we want to ignore those cases.
2. remote timeline client shutdown would not be mapped to
CompactionError::Shutdown in all circumstances.

We observed this in staging, see
https://neondb.slack.com/archives/C033RQ5SPDH/p1721829745384449

This PR fixes (1) with a simple `match` statement, and (2) by switching
a bunch of `anyhow` usage over to distinguished errors that ultimately
get mapped to `CompactionError::Shutdown`.

I removed the implicit `#[from]` conversion from `anyhow::Error` to
`CompactionError::Other` to discover all the places that were mapping
remote timeline client shutdown to `anyhow::Error`.

In my opinion `#[from]` is an antipattern and we should avoid it,
especially for `anyhow::Error`. If some callee is going to return
anyhow, the very least the caller should to is to acknowledge, through a
`map_err(MyError::Other)` that they're conflating different failure
reasons.
2024-07-25 09:44:31 +01:00
Christian Schwarz
d57412aaab followup(#8359): pre-initialize circuitbreaker metrics (#8491) 2024-07-25 10:24:28 +02:00
John Spray
5f4e14d27d pageserver: fix a compilation error (#8487)
## Problem
PR that modified compaction raced with PR that modified the GcInfo
structure

## Summary of changes
Fix it

Co-authored-by: Vlad Lazar <vlalazar.vlad@gmail.com>
2024-07-24 16:37:15 +01:00
Vlad Lazar
2723a8156a pageserver: faster and simpler inmem layer vec read (#8469)
## Problem
The in-memory layer vectored read was very slow in some conditions
(walingest::test_large_rel) test. Upon profiling, I realised that 80% of
the time was spent building up the binary heap of reads. This stage
isn't actually needed.

## Summary of changes
Remove the planning stage as we never took advantage of it in order to
merge reads. There should be no functional change from this patch.
2024-07-24 14:23:03 +01:00
John Spray
2ef8e57f86 pageserver: maintain gc_info incrementally (#8427)
## Problem

Previously, Timeline::gc_info was only updated in a batch operation at
the start of GC. That means that timelines didn't generally have
accurate information about who their children were before the first GC,
or between GC cycles.

Knowledge of child branches is important for calculating layer
visibility in #8398

## Summary of changes

- Split out part of refresh_gc_info into initialize_gc_info, which is
now called early in startup
- Include TimelineId in retain_lsns so that we can later add/remove the
LSNs for particular children
- When timelines are added/removed, update their parent's retain_lsns
2024-07-24 12:33:44 +02:00
John Spray
842c3d8c10 tests: simplify code around unstable test_basebackup_with_high_slru_count (#8477)
## Problem

In `test_basebackup_with_high_slru_count`, the pageserver is sometimes
mysteriously hanging on startup, having been started+stopped earlier in
the test setup while populating template tenant data.

- #7586 

We can't see why this is hanging in this particular test. The test does
some weird stuff though, like attaching a load of broken tenants and
then doing a SIGQUIT kill of a pageserver.

## Summary of changes

- Attach tenants normally instead of doing a failpoint dance to attach
them as broken
- Shut the pageserver down gracefully during init instead of using
immediate mode
- Remove the "sequential" variant of the unstable test, as this is going
away soon anyway
- Log before trying to acquire lock file, so that if it hangs we have a
clearer sense of if that's really where it's hanging. It seems like it
is, but that code does a non-blocking flock so it's surprising.
2024-07-24 11:26:24 +01:00
John Spray
f5db655447 pageserver: simplify LayerAccessStats (#8431)
## Problem

LayerAccessStats contains a lot of detail that we don't use: short
histories of most recent accesses, specifics on what kind of task
accessed a layer, etc. This is all stored inside a Mutex, which is
locked every time something accesses a layer.

## Summary of changes

- Store timestamps at a very low resolution (to the nearest second),
sufficient for use on the timescales of eviction.
- Pack access time and last residence change time into a single u64
- Use the high bits of the u64 for other flags, including the new layer
visibility concept.
- Simplify the external-facing model for access stats to just include
what we now track.

Note that the `HistoryBufferWithDropCounter` is removed here because it
is no longer used. I do not dislike this type, we just happen not to use
it for anything else at present.


Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-07-24 08:17:28 +01:00
Alex Chi Z.
18cf5cfefd feat(pageserver): support retain_lsn in bottommost gc-compaction (#8328)
part of https://github.com/neondatabase/neon/issues/8002

The main thing in this pull request is the new `generate_key_retention`
function. It decides which deltas to retain and generate images for a
given key based on its history + retain_lsn + horizon.

On that, we generate a flat single level of delta layers over all deltas
included in the compaction. In the future, we can decide whether to
split them over the LSN axis as described in the RFC.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-07-24 00:28:43 +01:00
Konstantin Knizhnik
563d73d923 Use smgrexists() instead of access() to enforce uniqueness of generated relfilenumber (#7992)
## Problem

Postgres is using `access()` function in `GetNewRelFileNumber` to check
if assigned relfilenumber is not used for any other relation. This check
will not work in Neon, because we do not have all files in local
storage.

## Summary of changes

Use smgrexists() instead which will check at page server if such
relfilenode is used.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-07-23 18:41:55 +03:00
John Spray
1a4c1eba92 pageserver: add LayerVisibilityHint (#8432)
## Problem

As described in https://github.com/neondatabase/neon/issues/8398, layer
visibility is a new hint that will help us manage disk space more
efficiently.

## Summary of changes

- Introduce LayerVisibilityHint and store it as part of access stats
- Automatically mark a layer visible if it is accessed, or when it is
created.

The impact on the access stats size will be reversed in
https://github.com/neondatabase/neon/pull/8431

This is functionally a no-op change: subsequent PRs will add the logic
that sets layers to Covered, and which uses the layer visibility as an
input to eviction and heatmap generation.

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2024-07-23 15:37:12 +01:00
John Spray
80c8ceacbc tests: make test_scrubber_physical_gc_ancestors more stable (#8453)
## Problem

This test sometimes found that ancestors were getting cleaned up before
it had done any compaction.

Compaction was happening implicitly via Workload.

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8298/10032173390/index.html#testresult/fb04786402f80822/retries

## Summary of changes

- Set upload=False when writing data after shard split, to avoid doing a
checkpoint
- Add a checkpoint_period & explicit wait for uploads so that we ensure
data lands in S3 without doing a checkpoint
2024-07-23 12:57:57 +01:00
Vlad Lazar
35854928d9 pageserver: use identity file as node id authority and remove init command and config-override flags (#7766)
Ansible will soon write the node id to `identity.toml` in the work dir
for new pageservers. On the pageserver side, we read the node id from
the identity file if it is present and use that as the source of truth.
If the identity file is missing, cannot be read, or does not
deserialise, start-up is aborted.
 
This PR also removes the `--init` mode and the `--config-override` flag
from the `pageserver` binary.
The neon_local is already not using these flags anymore.

Ansible still uses them until the linked change is merged & deployed,
so, this PR has to land simultaneously or after the Ansible change due
to that.

Related Ansible change: https://github.com/neondatabase/aws/pull/1322
Cplane change to remove config-override usages:
https://github.com/neondatabase/cloud/pull/13417
Closes: https://github.com/neondatabase/neon/issues/7736
Overall plan:
https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4

Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-07-23 11:41:12 +01:00