NB: effectively a no-op in the neon env since the handling is config
gated
in storcon
## Problem
When a pageserver suffers from a local disk/node failure and restarts,
the storage controller will receive a re-attach call and return all the
tenants the pageserver is suppose to attach, but the pageserver will not
act on any tenants that it doesn't know about locally. As a result, the
pageserver will not rehydrate any tenants from remote storage if it
restarted following a local disk loss, while the storage controller
still thinks that the pageserver have all the tenants attached. This
leaves the system in a bad state, and the symptom is that PG's
pageserver connections will fail with "tenant not found" errors.
## Summary of changes
Made a slight change to the storage controller's `re_attach` API:
* The pageserver will set an additional bit `empty_local_disk` in the
reattach request, indicating whether it has started with an empty disk
or does not know about any tenants.
* Upon receiving the reattach request, if this `empty_local_disk` bit is
set, the storage controller will go ahead and clear all observed
locations referencing the pageserver. The reconciler will then discover
the discrepancy between the intended state and observed state of the
tenant and take care of the situation.
To facilitate rollouts this extra behavior in the `re_attach` API is
guarded by the `handle_ps_local_disk_loss` command line flag of the
storage controller.
---------
Co-authored-by: William Huang <william.huang@databricks.com>
## Problem
Imports don't support schema evolution nicely. If we want to change the
stuff we keep in storcon,
we'd have to carry the old cruft around.
## Summary of changes
Version import progress. Note that the import progress version
determines the version of the import
job split and execution. This means that we can also use it as a
mechanism for deploying new import
implementations in the future.
## Problem
Import up-calls did not enforce the usage of the latest generation. The
import might have finished in one previous generation, but not in the
latest one. Hence, the controller might try to activate a timeline
before it is ready. In theory, that would be fine, but it's tricky to
reason about.
## Summary of Changes
Pageserver provides the current generation in the upcall to the storage
controller and the later validates the generation. If the generation is
stale, we return an error which stops progress of the import job. Note
that the import job will retry the upcall until the stale location is
detached.
I'll add some proper tests for this as part of the [checkpointing
PR](https://github.com/neondatabase/neon/pull/11862).
Closes https://github.com/neondatabase/neon/issues/11884
## Problem
Lifetime of imported timelines (and implicitly the import background
task) has some shortcomings:
1. Timeline activation upon import completion is tricky. Previously, a
timeline that finished importing
after a tenant detach would not get activated and there's concerns about
the safety of activating
concurrently with shut-down.
2. Import jobs can prevent tenant shut down since they hold the tenant
gate
## Summary of Changes
Track the import tasks in memory and abort them explicitly on tenant
shutdown.
Integrate more closely with the storage controller:
1. When an import task has finished all of its jobs, it notifies the
storage controller, but **does not** mark the import as done in the
index_part. When all shards have finished importing, the storage
controller will call the `/activate_post_import` idempotent endpoint for
all of them. The handler, marks the import complete in index part,
resets the tenant if required and checks if the timeline is active yet.
2. Not directly related, but the import job now gets the starting state
from the storage controller instead of the import bucket. This paves the
way for progress checkpointing.
Related: https://github.com/neondatabase/neon/issues/11568
## Problem
We had retained the ability to run in a generation-less mode to support
test_generations_upgrade, which was replaced with a cleaner backward
compat test in https://github.com/neondatabase/neon/pull/10701
## Summary of changes
- Remove all the special cases for "if no generation" or "if no control
plane api"
- Make control_plane_api config mandatory
---------
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
## Problem
Pageservers notify control plane directly when a shard import has
completed.
Control plane has to download the status of each shard from S3 and
figure out if everything is truly done,
before proceeding with branch activation.
Issues with this approach are:
* We can't control shard split behaviour on the storage controller side.
It's unsafe to split
during import.
* Control plane needs to know about shards and implement logic to check
all timelines are indeed ready.
## Summary of changes
In short, storage controller coordinates imports, and, only when
everything is done, notifies control plane.
Big rocks:
1. Store timeline imports in the storage controller database. Each
import stores the status of its shards in the database.
We hook into the timeline creation call as our entry point for this.
2. Pageservers get a new upcall endpoint to notify the storage
controller of shard import updates.
3. Storage controller handles these updates by updating persisted state.
If an update finalizes the import,
then poll pageservers until timeline activation, and, then, notify the
control plane that the import is complete.
Cplane side change with new endpoint is in
https://github.com/neondatabase/cloud/pull/26166
Closes https://github.com/neondatabase/neon/issues/11566
## Problem
The pageserver upcall api was designed to work with control plane or the
storage controller.
We have completed the transition period and now the upcall api only
targets the storage controller.
## Summary of changes
Rename types accordingly and tweak some comments.
Updates storage components to edition 2024. We like to stay on the
latest edition if possible. There is no functional changes, however some
code changes had to be done to accommodate the edition's breaking
changes.
The PR has two commits:
* the first commit updates storage crates to edition 2024 and appeases
`cargo clippy` by changing code. i have accidentially ran the formatter
on some files that had other edits.
* the second commit performs a `cargo fmt`
I would recommend a closer review of the first commit and a less close
review of the second one (as it just runs `cargo fmt`).
part of https://github.com/neondatabase/neon/issues/10918
## Problem
This is tech debt. While we introduced generations for tenants, some
legacy situations without generations needed to delete things inline
(async operation) instead of enqueing them (sync operation).
## Summary of changes
- Remove the async code, replace calls with the sync variant, and assert
that the generation is always set
## Problem
It appears that the Azure storage API tends to hang TCP connections more
than S3 does.
Currently we use a 2 minute timeout for all downloads. This is large
because sometimes the objects we download are large. However, waiting 2
minutes when doing something like downloading a manifest on tenant
attach is problematic, because when someone is doing a "create tenant,
create timeline" workflow, that 2 minutes is long enough for them
reasonably to give up creating that timeline.
Rather than propagate oversized timeouts further up the stack, we should
use a different timeout for objects that we expect to be small.
Closes: https://github.com/neondatabase/neon/issues/9836
## Summary of changes
- Add a `small_timeout` configuration attribute to remote storage,
defaulting to 30 seconds (still a very generous period to do something
like download an index)
- Add a DownloadKind parameter to DownloadOpts, so that callers can
indicate whether they expect the object to be small or large.
- In the azure client, use small timeout for HEAD requests, and for GET
requests if DownloadKind::Small is used.
- Use DownloadKind::Small for manifests, indices, and heatmap downloads.
This PR intentionally does not make the equivalent change to the S3
client, to reduce blast radius in case this has unexpected consequences
(we could accomplish the same thing by editing lots of configs, but just
skipping the code is simpler for right now)
## Problem
close https://github.com/neondatabase/neon/issues/9859
## Summary of changes
Ensure that the deletion queue gets fully flushed (i.e., the deletion
lists get applied) during a graceful shutdown.
It is still possible that an incomplete shutdown would leave deletion
list behind and cause race upon the next startup, but we assume this
will unlikely happen, and even if it happened, the pageserver should
already be at a tainted state and the tenant should be moved to a new
tenant with a new generation number.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
We wish to have high level WAL decoding logic in `wal_decoder::decoder`
module.
## Summary of Changes
For this we need the `Value` and `NeonWalRecord` types accessible there, so:
1. Move `Value` and `NeonWalRecord` to `pageserver::value` and
`pageserver::record` respectively.
2. Get rid of `pageserver::repository` (follow up from (1))
3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In
theory they could live in `wal_decoder`, but it would create a circular
dependency between `wal_decoder` and `postgres_ffi`. Long term it makes
sense for those types to be PG version specific, so that will work out nicely.
4. Move higher level WAL record types (to be ingested by pageserver)
into `wal_decoder::models`
Related: https://github.com/neondatabase/neon/issues/9335
Epic: https://github.com/neondatabase/neon/issues/9329
## Problem
- In https://github.com/neondatabase/neon/pull/8784, the validate
controller API is modified to check generations directly in the
database. It batches tenants into separate queries to avoid generating a
huge statement, but
- While updating this, I realized that "control_plane_client" is a kind
of confusing name for the client code now that it primarily talks to the
storage controller (the case of talking to the control plane will go
away in a few months).
## Summary of changes
- Big rename to "ControllerUpcallClient" -- this reflects the storage
controller's api naming, where the paths used by the pageserver are in
`/upcall/`
- When sending validate requests, break them up into chunks so that we
avoid possible edge cases of generating any HTTP requests that require
database I/O across many thousands of tenants.
This PR mixes a functional change with a refactor, but the commits are
cleanly separated -- only the last commit is a functional change.
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
PR #8299 has switched the storage scrubber to use
`DefaultCredentialsChain`. Now we do this for `remote_storage`, as it
allows us to use `remote_storage` from inside kubernetes. Most of the
diff is due to `GenericRemoteStorage::from_config` becoming `async fn`.
## Problem
In https://github.com/neondatabase/neon/pull/5299, the new config-v1
tenant config file was added to hold the LocationConf type. We left the
old config file in place for forward compat, and because running without
generations (therefore without LocationConf) as still useful before the
storage controller was ready for prime-time.
Closes: https://github.com/neondatabase/neon/issues/5388
## Summary of changes
- Remove code for reading and writing the legacy config file
- Remove Generation::Broken: it was unused.
- Treat missing config file on disk as an error loading a tenant, rather
than defaulting it. We can now remove LocationConf::default, and thereby
guarantee that we never construct a tenant with a None generation.
- Update some comments + add some assertions to clarify that
Generation::None is only used in layer metadata, not in the state of a
running tenant.
- Update docker compose test to create tenants with a generation
Adds a `Deserialize` impl to `RemoteStorageConfig`. We thus achieve the
same as #7743 but with less repetitive code, by deriving `Deserialize`
impls on `S3Config`, `AzureConfig`, and `RemoteStorageConfig`. The
disadvantage is less useful error messages.
The git history of this PR contains a state where we go via an
intermediate representation, leveraging the `serde_json` crate,
without it ever being actual json though.
Also, the PR adds deserialization tests.
Alternative to #7743 .
This is a preparation for
https://github.com/neondatabase/neon/issues/6337.
The idea is to add FullAccessTimeline, which will act as a guard for
tasks requiring access to WAL files. Eviction will be blocked on these
tasks and WAL won't be deleted from disk until there is at least one
active FullAccessTimeline.
To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`.
After eviction is implemented, this function will be responsible for
downloading missing WAL file and waiting until the download finishes.
This commit also contains other small refactorings:
- Separate `get_tenant_dir` and `get_timeline_dir` functions for
building a local path. This is useful for looking at usages and finding
tasks requiring access to local filesystem.
- `timeline_manager` is now responsible for spawning all background
tasks
- WAL removal task is now spawned instantly after horizon is updated
## Problem
This is historical baggage from when the pageserver could be run with
local disk only: we had a bunch of places where we had to treat remote
storage as optional.
Closes: https://github.com/neondatabase/neon/issues/6890
## Changes
- Remove Option<> around remote storage (in
https://github.com/neondatabase/neon/pull/7722 we made remote storage
clearly mandatory)
- Remove code for deleting old metadata files: they're all gone now.
- Remove other references to metadata files when loading directories, as
none exist.
I checked last 14 days of logs for "found legacy metadata", there are no
instances.
- Rename "filename" types which no longer map directly to a filename
(LayerFileName -> LayerName)
- Add a -v1- part to local layer paths to smooth the path to future
updates (we anticipate a -v2- that uses checksums later)
- Rename methods that refer to the string-ized version of a LayerName to
no longer be called "filename"
- Refactor reconcile() function to use a LocalLayerFileMetadata type
that includes the local path, rather than carrying local path separately
in a tuple and unwrap()'ing it later.
This change improves the resilience of the system to unclean restarts.
Previously, re-attach responses only included attached tenants
- If the pageserver had local state for a secondary location, it would
remain, but with no guarantee that it was still _meant_ to be there.
After this change, the pageserver will only retain secondary locations
if the /re-attach response indicates that they should still be there.
- If the pageserver had local state for an attached location that was
omitted from a re-attach response, it would be entirely detached. This
is wasteful in a typical HA setup, where an offline node's tenants might
have been re-attached elsewhere before it restarts, but the offline
node's location should revert to a secondary location rather than being
wiped. Including secondary tenants in the re-attach response enables the
pageserver to avoid throwing away local state unnecessarily.
In this PR:
- The re-attach items are extended with a 'mode' field.
- Storage controller populates 'mode'
- Pageserver interprets it (default is attached if missing) to construct
either a SecondaryTenant or a Tenant.
- A new test exercises both cases.
## Problem
Currently we manually register nodes with the storage controller, and
use a script during deploy to register with the cloud control plane.
Rather than extend that script further, nodes should just register on
startup.
## Summary of changes
- Extend the re-attach request to include an optional
NodeRegisterRequest
- If the `register` field is set, handle it like a normal node
registration before executing the normal re-attach work.
- Update tests/neon_local that used to rely on doing an explicit
register step that could be enabled/disabled.
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
Nightly has added a bunch of compiler and linter warnings. There is also
two dependencies that fail compilation on latest nightly due to using
the old `stdsimd` feature name. This PR fixes them.
Reverts neondatabase/neon#6765 , bringing back #6731
We concluded that #6731 never was the root cause for the instability in
staging.
More details:
https://neondb.slack.com/archives/C033RQ5SPDH/p1708011674755319
However, the massive amount of concurrent `spawn_blocking` calls from
the `save_metadata` calls during startups might cause a performance
regression.
So, we'll merge this PR here after we've stopped writing the metadata
#6769).
Cancellation and timeouts are handled at remote_storage callsites, if
they are. However they should always be handled, because we've had
transient problems with remote storage connections.
- Add cancellation token to the `trait RemoteStorage` methods
- For `download*`, `list*` methods there is
`DownloadError::{Cancelled,Timeout}`
- For the rest now using `anyhow::Error`, it will have root cause
`remote_storage::TimeoutOrCancel::{Cancel,Timeout}`
- Both types have `::is_permanent` equivalent which should be passed to
`backoff::retry`
- New generic RemoteStorageConfig option `timeout`, defaults to 120s
- Start counting timeouts only after acquiring concurrency limiter
permit
- Cancellable permit acquiring
- Download stream timeout or cancellation is communicated via an
`std::io::Error`
- Exit backoff::retry by marking cancellation errors permanent
Fixes: #6096Closes: #4781
Co-authored-by: arpad-m <arpad-m@users.noreply.github.com>
Some callers of `VirtualFile::crashsafe_overwrite` call it on the
executor thread, thereby potentially stalling it.
Others are more diligent and wrap it in `spawn_blocking(...,
Handle::block_on, ... )` to avoid stalling the executor thread.
However, because `crashsafe_overwrite` uses
VirtualFile::open_with_options internally, we spawn a new thread-local
`tokio-epoll-uring::System` in the blocking pool thread that's used for
the `spawn_blocking` call.
This PR refactors the situation such that we do the `spawn_blocking`
inside `VirtualFile::crashsafe_overwrite`. This unifies the situation
for the better:
1. Callers who didn't wrap in `spawn_blocking(..., Handle::block_on,
...)` before no longer stall the executor.
2. Callers who did it before now can avoid the `block_on`, resolving the
problem with the short-lived `tokio-epoll-uring::System`s in the
blocking pool threads.
A future PR will build on top of this and divert to tokio-epoll-uring if
it's configures as the IO engine.
Changes
-------
- Convert implementation to std::fs and move it into `crashsafe.rs`
- Yes, I know, Safekeepers (cc @arssher ) added `durable_rename` and
`fsync_async_opt` recently. However, `crashsafe_overwrite` is different
in the sense that it's higher level, i.e., it's more like
`std::fs::write` and the Safekeeper team's code is more building block
style.
- The consequence is that we don't use the VirtualFile file descriptor
cache anymore.
- I don't think it's a big deal because we have plenty of slack wrt
production file descriptor limit rlimit (see [this
dashboard](https://neonprod.grafana.net/d/e4a40325-9acf-4aa0-8fd9-f6322b3f30bd/pageserver-open-file-descriptors?orgId=1))
- Use `tokio::task::spawn_blocking` in
`VirtualFile::crashsafe_overwrite` to call the new
`crashsafe::overwrite` API.
- Inspect all callers to remove any double-`spawn_blocking`
- spawn_blocking requires the captures data to be 'static + Send. So,
refactor the callers. We'll need this for future tokio-epoll-uring
support anyway, because tokio-epoll-uring requires owned buffers.
Related Issues
--------------
- overall epic to enable write path to tokio-epoll-uring: #6663
- this is also kind of relevant to the tokio-epoll-uring System creation
failures that we encountered in staging, investigation being tracked in
#6667
- why is it relevant? Because this PR removes two uses of
`spawn_blocking+Handle::block_on`
Fix several test flakes:
- test_sharding_service_smoke had log failures on "Dropped LSN updates"
- test_emergency_mode had log failures on a deletion queue shutdown
check, where the check was incorrect because it was expecting channel
receiver to stay alive after cancellation token was fired.
- test_secondary_mode_eviction had racing heatmap uploads because the
test was using a live migration hook to set up locations, where that
migration was itself uploading heatmaps and generally making the
situation more complex than it needed to be.
These are the failure modes that I saw when spot checking the last few
failures of each test.
This will mostly/completely address #6511, but I'll leave that ticket
open for a couple days and then check if either of the tests named in
that ticket are flaky.
Related #6511
This uses the [newly stable](https://blog.rust-lang.org/2023/12/21/async-fn-rpit-in-traits.html)
async trait feature for three internal traits. One requires `Send`
bounds to be present so uses `impl Future<...> + Send` instead.
Advantages:
* less macro usage
* no extra boxing
Disadvantages:
* impl syntax needed for `Send` bounds is a bit more verbose (but only
required in one place)
(includes two preparatory commits from
https://github.com/neondatabase/neon/pull/5960)
## Problem
To accommodate multiple shards in the same tenant on the same
pageserver, we must include the full TenantShardId in local paths. That
means that all code touching local storage needs to see the
TenantShardId.
## Summary of changes
- Replace `tenant_id: TenantId` with `tenant_shard_id: TenantShardId` on
Tenant, Timeline and RemoteTimelineClient.
- Use TenantShardId in helpers for building local paths.
- Update all the relevant call sites.
This doesn't update absolutely everything: things like PageCache,
TaskMgr, WalRedo are still shard-naive. The purpose of this PR is to
update the core types so that others code can be added/updated
incrementally without churning the most central shared types.
Precursor for https://github.com/neondatabase/neon/pull/5957
## Problem
When DeletionList was written, TenantId/TimelineId didn't have
human-friendly modes in their serde. #5335 added those, such that the
helpers used in serialization of HashMaps are no longer necessary.
## Summary of changes
- Add a unit test to ensure that this change isn't changing anything
about the serialized form
- Remove the serialization helpers for maps of Id
## Problem
For sharded tenants, the layer keys must include the shard number and
shard count, to disambiguate keys written by different shards in the
same tenant (shard number), and disambiguate layers written before and
after splits (shard count).
Closes: https://github.com/neondatabase/neon/issues/5924
## Summary of changes
There are no functional changes in this PR: everything behaves the same
for the default ShardIndex::unsharded() value. Actual construct of
sharded tenants will come next.
- Add a ShardIndex type: this is just a wrapper for a ShardCount and
ShardNumber. This is a subset of ShardIdentity: whereas ShardIdentity
contains enough information to filter page keys, ShardIndex contains
just enough information to construct a remote key. ShardIndex has a
compact encoding, the same as the shard part of TenantShardId.
- Store the ShardIndex as part of IndexLayerMetadata, if it is set to a
different value than ShardIndex::unsharded.
- Update RemoteTimelineClient and DeletionQueue to construct paths using
the layer metadata. Deletion code paths that previously just passed a
`Generation` now pass a full `LayerFileMetadata` to capture the shard as
well.
Notes to reviewers:
- In deletion code paths, I could have used a (Generation, ShardIndex)
instead of the full LayerFileMetadata. I opted for the full object
partly for brevity, and partly because in future when we add checksums
the deletion code really will care about the full metadata in order to
validate that it is deleting what was intended.
- While ShardIdentity and TenantShardId could both use a ShardIndex, I
find that they read more cleanly as "flat" structs that spell out the
shard count and number field separately. Serialization code would need
writing out by hand anyway, because TenantShardId's serialized form is
not a serde struct-style serialization.
- ShardIndex doesn't _have_ to exist (we could use ShardIdentity
everywhere), but it is a worthwhile optimization, as we will have many
copies of this as part of layer metadata. In future the size difference
betweedn ShardIndex and ShardIdentity may become larger if we implement
more sophisticated key distribution mechanisms (i.e. new values of
ShardIdentity::layout).
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
fixes https://github.com/neondatabase/neon/issues/5878
obsoletes https://github.com/neondatabase/neon/issues/5879
Before this PR, it could happen that `load_layer_map` schedules removal
of the future
image layer. Then a later compaction run could re-create the same image
layer, scheduling a PUT.
Due to lack of an upload queue barrier, the PUT and DELETE could be
re-ordered.
The result was IndexPart referencing a non-existent object.
## Summary of changes
* Add support to `pagectl` / Python tests to decode `IndexPart`
* Rust
* new `pagectl` Subcommand
* `IndexPart::{from,to}_s3_bytes()` methods to internalize knowledge
about encoding of `IndexPart`
* Python
* new `NeonCli` subclass
* Add regression test
* Rust
* Ability to force repartitioning; required to ensure image layer
creation at last_record_lsn
* Python
* The regression test.
* Fix the issue
* Insert an `UploadOp::Barrier` after scheduling the deletions.
Improve the serde impl for several types (`Lsn`, `TenantId`,
`TimelineId`) by making them sensitive to
`Serializer::is_human_readadable` (true for json, false for bincode).
Fixes#3511 by:
- Implement the custom serde for `Lsn`
- Implement the custom serde for `Id`
- Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs`
- Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in
all possible structs
Additionally some safekeeper types gained serde tests.
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Following from discussion on
https://github.com/neondatabase/neon/pull/5436 where hacking an implicit
die-on-fatal-io behavior into an Error type was a source of disagreement
-- in this PR, dying on fatal I/O errors is explicit, with `fatal_err`
and `maybe_fatal_err` helpers in the `MaybeFatalIo` trait, which is
implemented for std::io::Result.
To enable this approach with `crashsafe_overwrite`, the return type of
that function is changed to std::io::Result -- the previous error enum
for this function was not used for any logic, and the utility of saying
exactly which step in the function failed is outweighed by the hygiene
of having an I/O funciton return an io::Result.
The initial use case for these helpers is the deletion queue.
## Problem
If a caller detaches a tenant and then attaches it again, pending
deletions from the old attachment might not have happened yet. This is
not a correctness problem, but it causes:
- Risk of leaking some objects in S3
- Some warnings from the deletion queue when pending LSN updates and
pending deletions don't pass validation.
## Summary of changes
- Deletion queue now uses UnboundedChannel so that the push interfaces
don't have to be async.
- This was pulled out of https://github.com/neondatabase/neon/pull/5397,
where it is also useful to be able to drive the queue from non-async
contexts.
- Why is it okay for this to be unbounded? The only way the
unbounded-ness of the channel can become a problem is if writing out
deletion lists can't keep up, but if the system were that overloaded
then the code generating deletions (GC, compaction) would also be
impacted.
- DeletionQueueClient gets a new `flush_advisory` function, which is
like flush_execute, but doesn't wait for completion: this is appropriate
for use in contexts where we would like to encourage the deletion queue
to flush, but don't need to block on it.
- This function is also expected to be useful in next steps for seamless
migration, where the option to flush to S3 while transitioning into
AttachedStale will also include flushing deletion queue, but we wouldn't
want to block on that flush.
- The tenant_detach code in mgr.rs invokes flush_advisory after stopping
the `Tenant` object.
---------
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
## Problem
Seen in CI for https://github.com/neondatabase/neon/pull/5453 -- the
time gap between validation completing and the header getting written is
long enough to fail the test, where it was doing a cheeky 1 second
sleep.
## Summary of changes
- Replace 1 second sleep with a wait_until to see the header file get
written
- Use enums as test params to make the results more readable (instead of
True-False parameters)
- Fix the temp suffix used for deletion queue headers: this worked fine,
but resulted in `..tmp` extension.
## Problem
Pageservers with `control_plane_api` configured require a control plane
to start up: in an incident this might be a problem.
## Summary of changes
Note to reviewers: most of the code churn in mgr.rs is the refactor
commit that enables the later emergency mode commit: you may want to
review commits separately.
- Add `control_plane_emergency_mode` configuration property
- Refactor init_tenant_mgr to separate loading configurations from the
main loop where we construct Tenant, so that the generations fetch can
peek at the configs in emergency mode.
- During startup, in emergency mode, attach any tenants that were
attached on their last run, using the same generation number.
Closes: #5381
Closes: https://github.com/neondatabase/neon/issues/5492
Doesn't do an upgrade of rustc to 1.73.0 as we want to wait for the
cargo response of the curl CVE before updating. In preparation for an
update, we address the clippy lints that are newly firing in 1.73.0.
Fixes#4689 by replacing all of `std::Path` , `std::PathBuf` with
`camino::Utf8Path`, `camino::Utf8PathBuf` in
- pageserver
- safekeeper
- control_plane
- libs/remote_storage
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
Pageservers must not delete objects or advertise updates to
remote_consistent_lsn without checking that they hold the latest
generation for the tenant in question (see [the RFC](
https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md))
In this PR:
- A new "deletion queue" subsystem is introduced, through which
deletions flow
- `RemoteTimelineClient` is modified to send deletions through the
deletion queue:
- For GC & compaction, deletions flow through the full generation
verifying process
- For timeline deletions, deletions take a fast path that bypasses
generation verification
- The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced
with a mechanism that maintains a "projected" lsn (equivalent to the
previous property), and a "visible" LSN (which is the one that we may
share with safekeepers).
- Until `control_plane_api` is set, all deletions skip generation
validation
- Tests are introduced for the new functionality in
`test_pageserver_generations.py`
Once this lands, if a pageserver is configured with the
`control_plane_api` configuration added in
https://github.com/neondatabase/neon/pull/5163, it becomes safe to
attach a tenant to multiple pageservers concurrently.
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>