Does three things:
* add a `Display` impl for `LayerFileName` equal to the `short_id`
* based on that, replace the `Layer::short_id` function by a requirement
for a `Display` impl
* use that `Display` impl in the places where the `short_id` and `file_name()` functions were used instead
Fixes#4145
Looking at logs from staging and prod, I found there are a bunch of log
lines without tenant / timeline context.
Manully walk through all task_mgr::spawn lines and fix that using the
least amount of work required.
While doing it, remove some redundant `shutting down` messages.
refs https://github.com/neondatabase/neon/issues/4222
This renames the `pageserver_tenant_synthetic_size` metric to
`pageserver_tenant_synthetic_cached_size_bytes`, as was requested on
slack (link in the linked issue).
* `_cached` to hint that it is not incrementally calculated
* `_bytes` to indicate the unit the size is measured in
Fixes#3748
## Problem
#4528
## Summary of changes
Add a 60 seconds default timeout to the reqwest client
Add retries for up to 3 times to call into the metric consumption
endpoint
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
## Problem
part of https://github.com/neondatabase/neon/issues/4392, continuation
of https://github.com/neondatabase/neon/pull/4408
## Summary of changes
This PR removes all layer objects from LayerMap and moves it to the
timeline struct. In timeline struct, LayerFileManager maps a layer
descriptor to a layer object, and it is stored in the same RwLock as
LayerMap to avoid behavior difference.
Key changes:
* LayerMap now does not have generic, and only stores descriptors.
* In Timeline, we add a new struct called layer mapping.
* Currently, layer mapping is stored in the same lock with layer map.
Every time we retrieve data from the layer map, we will need to map the
descriptor to the actual object.
* Replace_historic is moved to layer mapping's replace, and the return
value behavior is different from before. I'm a little bit unsure about
this part and it would be good to have some comments on that.
* Some test cases are rewritten to adapt to the new interface, and we
can decide whether to remove it in the future because it does not make
much sense now.
* LayerDescriptor is moved to `tests` module and should only be intended
for unit testing / benchmarks.
* Because we now have a usage pattern like "take the guard of lock, then
get the reference of two fields", we want to avoid dropping the
incorrect object when we intend to unlock the lock guard. Therefore, a
new set of helper function `drop_r/wlock` is added. This can be removed
in the future when we finish the refactor.
TODOs after this PR: fully remove RemoteLayer, and move LayerMapping to
a separate LayerCache.
all refactor PRs:
```
#4437 --- #4479 ------------ #4510 (refactor done at this point)
\-- #4455 -- #4502 --/
```
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
There is a magic number about how often we repartition and therefore
affecting how often we compact. This PR makes this number `10` a global
constant and add docs.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Lets keep 500 for unusual stuff that is not considered normal. Came up
during one of the discussions around console logs now seeing this 500's.
## Summary of changes
- Return 409 Conflict instead of 500
- Remove 200 ok status because it is not used anymore
This PR concludes the "async `Layer::get_value_reconstruct_data`"
project.
The problem we're solving is that, before this patch, we'd execute
`Layer::get_value_reconstruct_data` on the tokio executor threads.
This function is IO- and/or CPU-intensive.
The IO is using VirtualFile / std::fs; hence it's blocking.
This results in unfairness towards other tokio tasks, especially under
(disk) load.
Some context can be found at
https://github.com/neondatabase/neon/issues/4154
where I suspect (but can't prove) load spikes of logical size
calculation to
cause heavy eviction skew.
Sadly we don't have tokio runtime/scheduler metrics to quantify the
unfairness.
But generally, we know blocking the executor threads on std::fs IO is
bad.
So, let's have this change and watch out for severe perf regressions in
staging & during rollout.
## Changes
* rename `Layer::get_value_reconstruct_data` to
`Layer::get_value_reconstruct_data_blocking`
* add a new blanket impl'd `Layer::get_value_reconstruct_data`
`async_trait` method that runs `get_value_reconstruct_data_blocking`
inside `spawn_blocking`.
* The `spawn_blocking` requires `'static` lifetime of the captured
variables; hence I had to change the data flow to _move_ the
`ValueReconstructState` into and back out of get_value_reconstruct_data
instead of passing a reference. It's a small struct, so I don't expect a
big performance penalty.
## Performance
Fundamentally, the code changes cause the following performance-relevant
changes:
* Latency & allocations: each `get_value_reconstruct_data` call now
makes a short-lived allocation because `async_trait` is just sugar for
boxed futures under the hood
* Latency: `spawn_blocking` adds some latency because it needs to move
the work to a thread pool
* using `spawn_blocking` plus the existing synchronous code inside is
probably more efficient better than switching all the synchronous code
to tokio::fs because _each_ tokio::fs call does `spawn_blocking` under
the hood.
* Throughput: the `spawn_blocking` thread pool is much larger than the
async executor thread pool. Hence, as long as the disks can keep up,
which they should according to AWS specs, we will be able to deliver
higher `get_value_reconstruct_data` throughput.
* Disk IOPS utilization: we will see higher disk utilization if we get
more throughput. Not a problem because the disks in prod are currently
under-utilized, according to node_exporter metrics & the AWS specs.
* CPU utilization: at higher throughput, CPU utilization will be higher.
Slightly higher latency under regular load is acceptable given the
throughput gains and expected better fairness during disk load peaks,
such as logical size calculation peaks uncovered in #4154.
## Full Stack Of Preliminary PRs
This PR builds on top of the following preliminary PRs
1. Clean-ups
* https://github.com/neondatabase/neon/pull/4316
* https://github.com/neondatabase/neon/pull/4317
* https://github.com/neondatabase/neon/pull/4318
* https://github.com/neondatabase/neon/pull/4319
* https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
thought we'd need in my original plan where we would need to convert
`Tenant::timelines` into an async locking primitive (#4333). In reviews,
we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485
7. https://github.com/neondatabase/neon/pull/4441
The stats for `compact_level0_phase` that I added in #4527 show the
following breakdown (24h data from prod, only looking at compactions
with > 1 L1 produced):
* 10%ish of wall-clock time spent between the two read locks
* I learned that the `DeltaLayer::iter()` and `DeltaLayer::key_iter()`
calls actually do IO, even before we call `.next()`. I suspect that is
why they take so much time between the locks.
* 80+% of wall-clock time spent writing layer files
* Lock acquisition time is irrelevant (low double-digit microseconds at
most)
* The generation of the holes holds the read lock for a relatively long
time and it's proportional to the amount of keys / IO required to
iterate over them (max: 110ms in prod; staging (nightly benchmarks):
multiple seconds).
Find below screenshots from my ad-hoc spreadsheet + some graphs.
<img width="1182" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/81398b3f-6fa1-40dd-9887-46a4715d9194">
<img width="901" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/e4ac0393-f2c1-4187-a5e5-39a8b0c394c9">
<img width="210" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/7977ade7-6aa5-4773-a0a2-f9729aecee0d">
## Changes In This PR
This PR makes the following changes:
* rearrange the `compact_level0_phase1` code such that we build the
`all_keys_iter` and `all_values_iter` later than before
* only grab the `Timeline::layers` lock once, and hold it until we've
computed the holes
* run compact_level0_phase1 in spawn_blocking, pre-grabbing the
`Timeline::layers` lock in the async code and passing it in as an
`OwnedRwLockReadGuard`.
* the code inside spawn_blocking drops this guard after computing the
holds
* the `OwnedRwLockReadGuard` requires the `Timeline::layers` to be
wrapped in an `Arc`. I think that's Ok, the locking for the RwLock is
more heavy-weight than an additional pointer indirection.
## Alternatives Considered
The naive alternative is to throw the entire function into
`spawn_blocking`, and use `blocking_read` for `Timeline::layers` access.
What I've done in this PR is better because, with this alternative,
1. while we `blocking_read()`, we'd waste one slot in the spawn_blocking
pool
2. there's deadlock risk because the spawn_blocking pool is a finite
resource

## Metadata
Fixes https://github.com/neondatabase/neon/issues/4492
This is follow-up to
```
commit 2252c5c282
Author: Alex Chi Z <iskyzh@gmail.com>
Date: Wed Jun 14 17:12:34 2023 -0400
metrics: convert some metrics to pageserver-level (#4490)
```
The consumption metrics synthetic size worker does logical size
calculation. Logical size calculation currently does synchronous disk
IO. This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving
other futures.
While there's work on the way to move the synchronous disk IO into
spawn_blocking, the quickfix here is to use the BACKGROUND_RUNTIME
instead of MGMT_REQUEST_RUNTIME.
Actually it's not just a quickfix. We simply shouldn't be blocking
MGMT_REQUEST_RUNTIME executor threads on CPU or sync disk IO.
That work isn't done yet, as many of the mgmt tasks still _do_ disk IO.
But it's not as intensive as the logical size calculations that we're
fixing here.
While we're at it, fix disk-usage-based eviction in a similar way. It
wasn't the culprit here, according to prod logs, but it can
theoretically be a little CPU-intensive.
More context, including graphs from Prod:
https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949
The histogram distinguishes by ok/err.
I took the liberty to create a small abstraction for such use cases.
It helps keep the label values inside `metrics.rs`, right next
to the place where the metric and its labels are declared.
* `compaction_threshold` should be an integer, not a string.
* uncomment `[section]` so that if a user needs to modify the config,
they can simply uncomment the corresponding line. Otherwise it's easy
for us to forget uncommenting the `[section]` when uncommenting the
config item we want to configure.
Signed-off-by: Alex Chi <iskyzh@gmail.com>
Commit
```
commit 472cc17b7a
Author: Dmitry Rodionov <dmitry@neon.tech>
Date: Thu Jun 15 17:30:12 2023 +0300
propagate lock guard to background deletion task (#4495)
```
did a drive-by fix, but, the drive-by had a typo.
```
gc_loop{tenant_id=2e2f2bff091b258ac22a4c4dd39bd25d}:update_gc_info{timline_id=837c688fd37c903639b9aa0a6dd3f1f1}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000024DA0D1-000000000443FB51}:panic{thread=background op worker location=pageserver/src/tenant/timeline.rs:4843:25}: missing extractors: ["TimelineId"]
Stack backtrace:
0: utils::logging::tracing_panic_hook
at /libs/utils/src/logging.rs:166:21
1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/alloc/src/boxed.rs:2002:9
2: std::panicking::rust_panic_with_hook
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:692:13
3: std::panicking::begin_panic_handler::{{closure}}
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:579:13
4: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/sys_common/backtrace.rs:137:18
5: rust_begin_unwind
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5
6: core::panicking::panic_fmt
at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14
7: pageserver::tenant::timeline::debug_assert_current_span_has_tenant_and_timeline_id
at /pageserver/src/tenant/timeline.rs:4843:25
8: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0}
at /pageserver/src/tenant/timeline.rs:4368:9
9: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}::{closure#0}> as core::future::future::Future>::poll
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
10: <pageserver::tenant::timeline::Timeline>::download_remote_layer::{closure#0}
at /pageserver/src/tenant/timeline.rs:4363:5
11: <pageserver::tenant::timeline::Timeline>::get_reconstruct_data::{closure#0}
at /pageserver/src/tenant/timeline.rs:2618:69
12: <pageserver::tenant::timeline::Timeline>::get::{closure#0}
at /pageserver/src/tenant/timeline.rs:565:13
13: <pageserver::tenant::timeline::Timeline>::list_slru_segments::{closure#0}
at /pageserver/src/pgdatadir_mapping.rs:427:42
14: <pageserver::tenant::timeline::Timeline>::is_latest_commit_timestamp_ge_than::{closure#0}
at /pageserver/src/pgdatadir_mapping.rs:390:13
15: <pageserver::tenant::timeline::Timeline>::find_lsn_for_timestamp::{closure#0}
at /pageserver/src/pgdatadir_mapping.rs:338:17
16: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0}
at /pageserver/src/tenant/timeline.rs:3967:71
17: <tracing::instrument::Instrumented<<pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}::{closure#0}> as core::future::future::Future>::poll
at /.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
18: <pageserver::tenant::timeline::Timeline>::update_gc_info::{closure#0}
at /pageserver/src/tenant/timeline.rs:3948:5
19: <pageserver::tenant::Tenant>::refresh_gc_info_internal::{closure#0}
at /pageserver/src/tenant.rs:2687:21
20: <pageserver::tenant::Tenant>::gc_iteration_internal::{closure#0}
at /pageserver/src/tenant.rs:2551:13
21: <pageserver::tenant::Tenant>::gc_iteration::{closure#0}
at /pageserver/src/tenant.rs:1490:13
22: pageserver::tenant::tasks::gc_loop::{closure#0}::{closure#0}
at /pageserver/src/tenant/tasks.rs:187:21
23: pageserver::tenant::tasks::gc_loop::{closure#0}
at /pageserver/src/tenant/tasks.rs:208:5
```
## Problem
## Summary of changes
## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Checklist before merging
- [ ] Do not forget to reformat commit message to not include the above
checklist
## Problem
During timeline creation we create special mark file which presense
indicates that initialization didnt complete successfully. In case of a
crash restart we can remove such half-initialized timeline and following
retry from control plane side should perform another attempt.
So in case of a possible crash restart during initial loading we have
following picture:
```
timelines
| - <timeline_id>___uninit
| - <timeline_id>
| - | <timeline files>
```
We call `std::fs::read_dir` to walk files in `timelines` directory one
by one. If we see uninit file
we proceed with deletion of both, timeline directory and uninit file. If
we see timeline we check if uninit file exists and do the same cleanup.
But in fact its possible to get both branches to be true at the same
time. Result of readdir doesnt reflect following directory state
modifications. So you can still get "valid" entry on the next iteration
of the loop despite the fact that it was deleted in one of the previous
iterations of the loop.
To see that you can apply the following patch (it disables uninit mark
cleanup on successful timeline creation):
```diff
diff --git a/pageserver/src/tenant.rs b/pageserver/src/tenant.rs
index 4beb2664..b3cdad8f 100644
--- a/pageserver/src/tenant.rs
+++ b/pageserver/src/tenant.rs
@@ -224,11 +224,6 @@ impl UninitializedTimeline<'_> {
)
})?;
}
- uninit_mark.remove_uninit_mark().with_context(|| {
- format!(
- "Failed to remove uninit mark file for timeline {tenant_id}/{timeline_id}"
- )
- })?;
v.insert(Arc::clone(&new_timeline));
new_timeline.maybe_spawn_flush_loop();
```
And perform the following steps:
```bash
neon_local init
neon_local start
neon_local tenant create
neon_local stop
neon_local start
```
The error is:
```log
INFO load{tenant_id=X}:blocking: Found an uninit mark file .neon/tenants/X/timelines/Y.___uninit, removing the timeline and its uninit mark
2023-06-09T18:43:41.664247Z ERROR load{tenant_id=X}: load failed, setting tenant state to Broken: failed to load metadata
Caused by:
0: Failed to read metadata bytes from path .neon/tenants/X/timelines/Y/metadata
1: No such file or directory (os error 2)
```
So uninit mark got deleted together with timeline directory but we still
got directory entry for it and tried to load it.
The bug prevented tenant from being successfully loaded.
## Summary of changes
Ideally I think we shouldnt place uninit marks in the same directory as timeline directories but move them to separate directory and
gather them as an input to actual listing, but that would be sort of an
on-disk format change, so just check whether entries are still valid
before operating on them.
The data will help decide whether it's ok
to keep holding Timeline::layers in shared mode until
after we've calculated the holes.
Other timings are to understand the general breakdown
of timings in that function.
Context: https://github.com/neondatabase/neon/issues/4492
I observe sporadic reconnections with ~10k idle computes. It looks like a
separate issue, probably walreceiver runtime gets blocked somewhere, but in any
case 2-3 seconds is too small.
## Problem
1. During the rollout we got a panic: "timeline that we were deleting
was concurrently removed from 'timelines' map" that was caused by lock
guard not being propagated to the background part of the deletion.
Existing test didnt catch it because failpoint that was used for
verification was placed earlier prior to background task spawning.
2. When looking at surrounding code one more bug was detected. We
removed timeline from the map before deletion is finished, which breaks
client retry logic, because it will indicate 404 before actual deletion
is completed which can lead to client stopping its retry poll earlier.
## Summary of changes
1. Carry the lock guard over to background deletion. Ensure existing
test case fails without applied patch (second deletion becomes stuck
without it, which eventually leads to a test failure).
2. Move delete_all call earlier so timeline is removed from the map is
the last thing done during deletion.
Additionally I've added timeline_id to the `update_gc_info` span,
because `debug_assert_current_span_has_tenant_and_timeline_id` in
`download_remote_layer` was firing when `update_gc_info` lead to
on-demand downloads via `find_lsn_for_timestamp` (caught by @problame).
This is not directly related to the PR but fixes possible flakiness.
Another smaller set of changes involves deletion wrapper used in python
tests. Now there is a simpler wrapper that waits for deletions to
complete `timeline_delete_wait_completed`. Most of the
test_delete_timeline.py tests make negative tests, i.e., "does
ps_http.timeline_delete() fail in this and that scenario".
These can be left alone. Other places when we actually do the deletions,
we need to use the helper that polls for completion.
Discussion
https://neondb.slack.com/archives/C03F5SM1N02/p1686668007396639resolves#4496
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
## Problem
Some metrics are better to be observed at page-server level. Otherwise,
as we have a lot of tenants in production, we cannot do a sum b/c
Prometheus has limit on how many time series we can aggregate. This also
helps reduce metrics scraping size.
## Summary of changes
Some integration tests are likely not to pass as it will check the
existence of some metrics. Waiting for CI complete and fix them.
Metrics downgraded: page cache hit (where we are likely to have a
page-server level page cache in the future instead of per-tenant), and
reconstruct time (this would better be tenant-level, as we have one pg
replayer for each tenant, but now we make it page-server level as we do
not need that fine-grained data).
---------
Signed-off-by: Alex Chi <iskyzh@gmail.com>
This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`).
# Full Stack Of Preliminary PRs
Thanks to the countless preliminary PRs, this conversion is relatively
straight-forward.
1. Clean-ups
* https://github.com/neondatabase/neon/pull/4316
* https://github.com/neondatabase/neon/pull/4317
* https://github.com/neondatabase/neon/pull/4318
* https://github.com/neondatabase/neon/pull/4319
* https://github.com/neondatabase/neon/pull/4321
* Note: these were mostly to find an alternative to #4291, which I
thought we'd need in my original plan where we would need to convert
`Tenant::timelines` into an async locking primitive (#4333). In reviews,
we walked away from that, but these cleanups were still quite useful.
2. https://github.com/neondatabase/neon/pull/4364
3. https://github.com/neondatabase/neon/pull/4472
4. https://github.com/neondatabase/neon/pull/4476
5. https://github.com/neondatabase/neon/pull/4477
6. https://github.com/neondatabase/neon/pull/4485
# Significant Changes In This PR
## `compact_level0_phase1` & `create_delta_layer`
This commit partially reverts
"pgserver: spawn_blocking in compaction (#4265)"
4e359db4c7.
Specifically, it reverts the `spawn_blocking`-ificiation of
`compact_level0_phase1`.
If we didn't revert it, we'd have to use `Timeline::layers.blocking_read()`
inside `compact_level0_phase1`. That would use up a thread in the
`spawn_blocking` thread pool, which is hard-capped.
I considered wrapping the code that follows the second
`layers.read().await` into `spawn_blocking`, but there are lifetime
issues with `deltas_to_compact`.
Also, this PR switches the `create_delta_layer` _function_ back to
async, and uses `spawn_blocking` inside to run the code that does sync
IO, while keeping the code that needs to lock `Timeline::layers` async.
## `LayerIter` and `LayerKeyIter` `Send` bounds
I had to add a `Send` bound on the `dyn` type that `LayerIter`
and `LayerKeyIter` wrap. Why? Because we now have the second
`layers.read().await` inside `compact_level0_phase`, and these
iterator instances are held across that await-point.
More background:
https://github.com/neondatabase/neon/pull/4462#issuecomment-1587376960
## `DatadirModification::flush`
Needed to replace the `HashMap::retain` with a hand-rolled variant
because `TimelineWriter::put` is now async.
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
Or more specifically, #4441, where we turn Timeline::layers into a
tokio::sync::RwLock.
By using try_write() here, we can avoid turning init_empty_layer_map
async,
which is nice because much of its transitive call(er) graph isn't async.
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.
That will require the `TimelineWriter` to become async, because at times
its functions need to lock `Timeline::layers` in order to freeze the
open layer.
While doing that, rustc complains that we're now holding
`Timeline::write_lock` across await points (lock order is that
`write_lock` must be acquired before `Timelines::layers`).
So, we need to switch it over to an async primitive.
This is preliminary work for/from #4220 (async
`Layer::get_value_reconstruct_data`).
There, we want to switch `Timeline::layers` to be a
`tokio::sync::RwLock`.
That will require the `TimelineWriter` to become async.
That will require `freeze_inmem_layer` to become async.
So, inside check_checkpoint_distance, we will have
`freeze_inmem_layer().await`.
But current rustc isn't smart enough to understand that we
`drop(layers)` earlier, and hence, will complain about the `!Send`
`layers` being held across the `freeze_inmem_layer().await`-point.
This patch puts the guard into a scope, so rustc will shut up in the
next patch where we make the transition for `TimelineWriter`.
obsoletes https://github.com/neondatabase/neon/pull/4474
Commit `create_test_timeline: always put@initdb_lsn the minimum required keys`
already switched us over to using valid initdb_lsns.
All that's left to do is to actually flush the minimum keys so that
we move from disk_consistent_lsn=Lsn(0) to disk_consistent_lsn=initdb_lsn.
Co-authored-by: Christian Schwarz <christian@neon.tech>
Part of https://github.com/neondatabase/neon/pull/4364
Clarify who's responsible for initializing the layer map. There were
previously two different ways to do it:
- create_empty_timeline and bootstrap_timeline let prepare_timeline()
initialize an empty layer map.
- branch_timeline passed a flag to initialize_with_lock() to tell
initialize_with_lock to call load_layer_map(). Because it was a
newly created timeline, load_layer_map() never found any layer
files, so it just initialized an empty layer map.
With this commit, prepare_new_timeline() always does it. The LSN to
initialize it with is passed as argument.
Other changes per function:
prepare_timeline:
- rename to 'prepare_new_timeline' to make it clear that it's only used
when creating a new timeline, not when loading an existing timeline
- always initialize an empty layer map. The caller can pass the LSN to
initialize it with. (Previously, prepare_timeline would optionally
load the layer map at 'initdb_lsn'. Some caller used that, while others
let initialize_with_lock do it
initialize_with_lock:
- As mentioned above, remove the option to load the layer map
- Acquire the 'timelines' lock in the function itself. None of the callers
did any other work while holding the lock.
- Rename it to finish_creation() to make its intent more clear. It's only
used when creating a new timeline now.
create_timeline_data:
- Rename to create_timeline_struct() for clarity. It just initializes
the Timeline struct, not any other "data"
create_timeline_files:
- use create_dir rather than create_dir_all, to be a little more strict.
We know that the parent directory should already exist, and the timeline
directory should not exist.
- Move the call to create_timeline_struct() to the caller. It was just
being "passed through"
Part of https://github.com/neondatabase/neon/pull/4364
This patch inlines `initialize_with_lock` and then reorganizes the code
such that we can `load_layer_map` without holding the
`Tenant::timelines` lock.
As a nice aside, we can get rid of the dummy() uninit mark, which has
always been a terrible hack.
Part of https://github.com/neondatabase/neon/pull/4364