* Preserve task result in TaskHandle by keeping join handle around
The solution is not great, but it should hep to debug staging issue
I tried to do it in a least destructive way. TaskHandle used only in
one place so it is ok to use something less generic unless we want
to extend its usage across the codebase. In its current current form
for its single usage place it looks too abstract
Some problems around this code:
1. Task can drop event sender and continue running
2. Task cannot be joined several times (probably not needed,
but still, can be surprising)
3. Had to split task event into two types because ahyhow::Error
does not implement clone. So TaskContinueEvent derives clone
but usual task evend does not. Clone requirement appears
because we clone the current value in next_task_event.
Taking it by reference is complicated.
4. Split between Init and Started is artificial and comes from
watch::channel requirement to have some initial value.
To summarize from 3 and 4. It may be a better idea to use
RWLock or a bounded channel instead
Changes are:
* Correct typo "firts" -> "first"
* Change <empty panic with comment explaining> to <panic with message
taken from the comment>
* Fix weird indentation that rustfmt was failing to handle
* Use existing `anyhow::{anyhow,bail}!` as `{anyhow,bail}!` if it's
already in scope
* Spell `Result<T, anyhow::Error>` as `anyhow::Result<T>`
* In general, closer to matching the rest of the codebase
* Change usages of `hash_map::Entry` to `Entry` when it's already in
scope
* A quick search shows our style on this one varies across the files
it's used in
* Fix extreme metrics bloat in storage sync
From 78 metrics per (timeline, tenant) pair down to (max) 10 metrics per
(timeline, tenant) pair, plus another 117 metrics in a global histogram that
replaces the previous per-timeline histogram.
* Drop image sync operation metric series when dropping TimelineMetrics.
- Split postgres_ffi into two version specific files.
- Preserve pg_version in timeline metadata.
- Use pg_version in safekeeper code. Check for postgres major version mismatch.
- Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding.
- Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture.
To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable:
'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress'
Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.
Replace the layer array and linear search with R-tree
So far, the in-memory layer map that holds information about layer
files that exist, has used a simple Vec, in no particular order, to
hold information about all the layers. That obviously doesn't scale
very well; with thousands of layer files the linear search was
consuming a lot of CPU. Replace it with a two-dimensional R-tree, with
Key and LSN ranges as the dimensions.
For the R-tree, use the 'rstar' crate. To be able to use that, we
convert the Keys and LSNs into 256-bit integers. 64 bits would be
enough to represent LSNs, and 128 bits would be enough to represent
Keys. However, we use 256 bits, because rstar internally performs
multiplication to calculate the area of rectangles, and the result of
multiplying two 128 bit integers doesn't necessarily fit in 128 bits,
causing integer overflow and, if overflow-checks are enabled,
panic. To avoid that, we use 256 bit integers.
Add a performance test that creates a lot of layer files, to
demonstrate the benefit.