This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).
# Full Stack Of Preliminary PRs
Thanks to the countless preliminary PRs, this conversion is relatively
straight-forward.
1. Clean-ups
* https://github.com/neondatabase/neon/pull/4316
* https://github.com/neondatabase/neon/pull/4317
* https://github.com/neondatabase/neon/pull/4318
* https://github.com/neondatabase/neon/pull/4319
* https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
thought we'd need in my original plan where we would need to convert
`Tenant::timelines` into an async locking primitive (#4333). In reviews,
we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485
# Significant Changes In This PR
## `compact_level0_phase1` & `create_delta_layer`
This commit partially reverts
"pgserver: spawn_blocking in compaction (#4265)"
4e359db4c7.
Specifically, it reverts the `spawn_blocking`-ificiation of
`compact_level0_phase1`.
If we didn't revert it, we'd have to use `Timeline::layers.blocking_read()`
inside `compact_level0_phase1`. That would use up a thread in the
`spawn_blocking` thread pool, which is hard-capped.
I considered wrapping the code that follows the second
`layers.read().await` into `spawn_blocking`, but there are lifetime
issues with `deltas_to_compact`.
Also, this PR switches the `create_delta_layer` _function_ back to
async, and uses `spawn_blocking` inside to run the code that does sync
IO, while keeping the code that needs to lock `Timeline::layers` async.
## `LayerIter` and `LayerKeyIter` `Send` bounds
I had to add a `Send` bound on the `dyn` type that `LayerIter`
and `LayerKeyIter` wrap. Why? Because we now have the second
`layers.read().await` inside `compact_level0_phase`, and these
iterator instances are held across that await-point.
More background:
https://github.com/neondatabase/neon/pull/4462#issuecomment-1587376960
## `DatadirModification::flush`
Needed to replace the `HashMap::retain` with a hand-rolled variant
because `TimelineWriter::put` is now async.
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
Or more specifically, #4441, where we turn Timeline::layers into a
tokio::sync::RwLock.
By using try_write() here, we can avoid turning init_empty_layer_map
async,
which is nice because much of its transitive call(er) graph isn't async.
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.
That will require the `TimelineWriter` to become async, because at times
its functions need to lock `Timeline::layers` in order to freeze the
open layer.
While doing that, rustc complains that we're now holding
`Timeline::write_lock` across await points (lock order is that
`write_lock` must be acquired before `Timelines::layers`).
So, we need to switch it over to an async primitive.
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.
That will require the `TimelineWriter` to become async.
That will require `freeze_inmem_layer` to become async.
So, inside check_checkpoint_distance, we will have
`freeze_inmem_layer().await`.
But current rustc isn't smart enough to understand that we
`drop(layers)` earlier, and hence, will complain about the `!Send`
`layers` being held across the `freeze_inmem_layer().await`-point.
This patch puts the guard into a scope, so rustc will shut up in the
next patch where we make the transition for `TimelineWriter`.
obsoletes https://github.com/neondatabase/neon/pull/4474
Commit `create_test_timeline: always put@initdb_lsn the minimum required keys`
already switched us over to using valid initdb_lsns.
All that's left to do is to actually flush the minimum keys so that
we move from disk_consistent_lsn=Lsn(0) to disk_consistent_lsn=initdb_lsn.
Co-authored-by: Christian Schwarz <christian@neon.tech>
Part of https://github.com/neondatabase/neon/pull/4364
Clarify who's responsible for initializing the layer map. There were
previously two different ways to do it:
- create_empty_timeline and bootstrap_timeline let prepare_timeline()
initialize an empty layer map.
- branch_timeline passed a flag to initialize_with_lock() to tell
initialize_with_lock to call load_layer_map(). Because it was a
newly created timeline, load_layer_map() never found any layer
files, so it just initialized an empty layer map.
With this commit, prepare_new_timeline() always does it. The LSN to
initialize it with is passed as argument.
Other changes per function:
prepare_timeline:
- rename to 'prepare_new_timeline' to make it clear that it's only used
when creating a new timeline, not when loading an existing timeline
- always initialize an empty layer map. The caller can pass the LSN to
initialize it with. (Previously, prepare_timeline would optionally
load the layer map at 'initdb_lsn'. Some caller used that, while others
let initialize_with_lock do it
initialize_with_lock:
- As mentioned above, remove the option to load the layer map
- Acquire the 'timelines' lock in the function itself. None of the callers
did any other work while holding the lock.
- Rename it to finish_creation() to make its intent more clear. It's only
used when creating a new timeline now.
create_timeline_data:
- Rename to create_timeline_struct() for clarity. It just initializes
the Timeline struct, not any other "data"
create_timeline_files:
- use create_dir rather than create_dir_all, to be a little more strict.
We know that the parent directory should already exist, and the timeline
directory should not exist.
- Move the call to create_timeline_struct() to the caller. It was just
being "passed through"
Part of https://github.com/neondatabase/neon/pull/4364
This patch inlines `initialize_with_lock` and then reorganizes the code
such that we can `load_layer_map` without holding the
`Tenant::timelines` lock.
As a nice aside, we can get rid of the dummy() uninit mark, which has
always been a terrible hack.
Part of https://github.com/neondatabase/neon/pull/4364
## Problem
Part of https://github.com/neondatabase/neon/issues/4418
## Summary of changes
This PR implements the local manifest interfaces. After the refactor of
timeline is done, we can integrate this with the current storage. The
reader will stop at the first corrupted record.
---------
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
Delete data from s3 when timeline deletion is requested
## Summary of changes
UploadQueue is altered to support scheduling of delete operations in
stopped state. This looks weird, and I'm thinking whether there are
better options/refactorings for upload client to make it look better.
Probably can be part of https://github.com/neondatabase/neon/issues/4378
Deletion is implemented directly in existing endpoint because changes are not
that significant. If we want more safety we can separate those or create
feature flag for new behavior.
resolves [#4193](https://github.com/neondatabase/neon/issues/4193)
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
part of https://github.com/neondatabase/neon/issues/4392
## Summary of changes
This PR adds a new HashMap that maps persistent layer desc to the layer
object *inside* LayerMap. Originally I directly went towards adding such
layer cache in Timeline, but the changes are too many and cannot be
reviewed as a reasonably-sized PR. Therefore, we take this intermediate
step to change part of the codebase to use persistent layer desc, and
come up with other PRs to move this hash map of layer desc to the
timeline struct.
Also, file_size is now part of the layer desc.
---------
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
## Problem
Attach failures are not reported in public part of the api (in
`attachment_status` field of TenantInfo).
## Summary of changes
Expose TenantState::Broken as TenantAttachmentStatus::Failed
In the way its written Failed status will be reported even if no
attachment happened. (I e if tenant become broken on startup). This is
in line with other members. I e Active will be resolved to Attached even
if no actual attach took place.
This can be tweaked if needed. At the current stage it would be overengineering without clear motivation
resolves#4344