- evictiontask: remove unused imports
- eviction_task and dube: cleanup
- timeline: pub(crate) eviction
- timeline: adjust to "layere: adjust eviction"
- test: remove layer_eviction_aba_fails because it can no longer happen
- test: fix up evicts later test with ability to await for eviction
- eviction_task: more unused imports
- eviction: clippy
- eviction_task: more clippy
- fixup eviction: docs
- eviction: hold only Arc<Layer> after checking downloadedness
- refactor earlier eviction: use drop_eviction_guard instead
- dube: evict in spawned tasks
- timeline, eviction: evict in spawned
- eviction: add more errors
- evict_layers: remove witness
- eviction: post-witness forgotten panic
- eviction: remove blog references
Unrelated fixes noticed while integrating #4938.
- Stop leaking future layers in remote storage
- We schedule extra index_part uploads if layer name to be removed was
not actually present
Starts `postgres` in cgroup directly from `compute_ctl` instead of from
`vm-builder`. This is required because the `vm-monitor` cannot be in the
cgroup it is managing. Otherwise, it itself would be frozen when
freezing the cgroup.
Requires https://github.com/neondatabase/cloud/pull/6331, which adds the
`AUTOSCALING` environment variable letting `compute_ctl` know to start
`postgres` in the cgroup.
Requires https://github.com/neondatabase/autoscaling/pull/468, which
prevents `vm-builder` from starting the monitor and putting postgres in
a cgroup. This will require a `VM_BUILDER_VERSION` bump.
## Problem
The `metadata_bytes` field of IndexPart required explicit
deserialization & error checking everywhere it was used -- there isn't
anything special about this structure that should prevent it from being
serialized & deserialized along with the rest of the structure.
## Summary of changes
- Implement Serialize and Deserialize for TimelineMetadata
- Replace IndexPart::metadata_bytes with a simpler `metadata`, that can
be used directly.
---------
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
## Problem
We were returning Pending when a connection had a notice/notification
(introduced recently in #5020). When returning pending, the runtime
assumes you will call `cx.waker().wake()` in order to continue
processing.
We weren't doing that, so the connection task would get stuck
## Summary of changes
Don't return pending. Loop instead
## Problem
We want to make `read_blk` an async function, but outside of
`async_trait`, which allocates, and nightly features, we can't use async
fn's in traits.
## Summary of changes
* Remove all uses of `BlockReader::read_blk` in favour of using block
cursors, at least where the type of the `BlockReader` is behind a
generic
* Introduce a `BlockReaderRef` enum that lists all implementors of
`BlockReader::read_blk`.
* Remove `BlockReader::read_blk` and move its implementations into
inherent functions on the types instead.
We don't turn `read_blk` into an async fn yet, for that we also need to
modify the page cache. So this is a preparatory PR, albeit an important
one.
Part of #4743.
## Problem
The previous arguments have the monitor listen on `localhost`, which the
informant can connect to since it's also in the VM, but which the agent
cannot. Also, the port is wrong.
## Summary of changes
Listen on `0.0.0.0:10301`
## Problem
The `EphemeralFile::write_blob` function accesses the page cache
internally. We want to require `async` for these accesses in #5023.
## Summary of changes
This removes the implementaiton of the `BlobWriter` trait for
`EphemeralFile` and turns the `write_blob` function into an inherent
function. We can then make it async as well as the `push_bytes`
function. We move the `SER_BUFFER` thread-local into the
`InMemoryLayerInner` so that the same buffer can be accessed by
different threads as the async is (potentially) moved between threads.
Part of #4743, preparation for #5023.
Current implementation first calls `load_layer_map`, which loads all
local layers, cleans up files, leave cleaning up stuff to "second
function". Then the "second function" is finally called, it does not do
the cleanup and some of the first functions setup can torn down. "Second
function" is actually both `reconcile_with_remote` and
`create_remote_layers`.
This change makes it a bit more verbose but in one phase with the
following sub-steps:
1. scan the timeline directory
2. delete extra files
- now including on-demand download files
- fixes#3660
3. recoincile the two sources of layers (directory, index_part)
4. rename_to_backup future layers, short layers
5. create the remaining as layers
Needed by #4938.
It was also noticed that this is blocking code in an `async fn` so just
do it in a `spawn_blocking`, which should be healthy for our startup
times. Other effects includes hopefully halving of `stat` calls; extra
calls which were not done previously are now done for the future layers.
Co-authored-by: Christian Schwarz <christian@neon.tech>
Co-authored-by: John Spray <john@neon.tech>