- Add support for splitting async postgres_backend into read and write halfes.
Safekeeper needs this for bidirectional streams. To this end, encapsulate
reading-writing postgres messages to framed.rs with split support without any
additional changes (relying on BufRead for reading and BytesMut out buffer for
writing).
- Use async postgres_backend throughout safekeeper (and in proxy auth link
part).
- In both safekeeper COPY streams, do read-write from the same thread/task with
select! for easier error handling.
- Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL
-- join split parts back catching errors from them before returning.
Initially I hoped to do that read-write without split at all, through polling
IO:
https://github.com/neondatabase/neon/pull/3522
However that turned out to be more complicated than I initially expected
due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>>
which is Send construct just to satisfy the checker; 2) can be workaround with
transmute. But this is so messy that I decided to leave split.
Remote operations fail sometimes due to network failures or other
external reasons. Add retry logic to all the remote downloads, so that
a transient failure at pageserver startup or tenant attach doesn't
cause the whole tenant to be marked as Broken.
Like in the uploads retry logic, we print the failure to the log as a
WARNing after three retries, but keep retrying. We will retry up to 10
times now, before returning the error to the caller.
To test the retries, I created a new RemoteStorage wrapper that simulates
failures, by returning an error for the first N times that a remote
operation is performed. It can be enabled by setting a new
"test_remote_failures" option in the pageserver config file.
Fixes#3112
Removes the race during pageserver initial timeline creation that lead to partial layer uploads.
This race is only reproducible in test code, we do not create initial timelines in cloud (yet, at least), but still nice to remove the non-deterministic behavior.
refactor: use new type LayerFileName when referring to layer file names in PathBuf/RemotePath
Before this patch, we would sometimes carry around plain file names in
`Path` types and/or awkwardly "rebase" paths to have a unified
representation of the layer file name between local and remote.
This patch introduces a new type `LayerFileName` which replaces the use
of `Path` / `PathBuf` / `RemotePath` in the `storage_sync2` APIs.
Instead of holding a string, it contains the parsed representation of
the image and delta file name.
When we need the file name, e.g., to construct a local path or
remote object key, we construct the name ad-hoc.
`LayerFileName` is also serde {Dese,Se}rializable, and in an initial
version of this patch, it was supposed to be used directly inside
`IndexPart`, replacing `RemotePath`.
However,
commit 3122f3282f
Ignore backup files (ones with .n.old suffix) in download_missing
fixed handling of `*.old` backup file names in IndexPart, and we need
to carry that behavior forward.
The solution is to remove `*.old` backup files names during
deserialization. When we re-serialize the IndexPart, the `*.old` file
will be gone.
This leaks the `.old` file in the remote storage, but makes it safe
to clean it up later.
There is additional churn by a preliminary refactoring that got squashed
into this change:
split off LayerMap's needs from trait Layer into super trait
That refactoring renames `Layer` to `PersistentLayer` and splits off a subset
of the functions into a super-trait called `Layer`.
The upser trait implements just the functions needed by `LayerMap`, whereas
`PersisentLayer` adds the context of the pageserver.
The naming is imperfect as some functions that reside in `PersistentLayer`
have nothing persistence-specific to it. But it's a step in the right direction.
Changes:
* Remove `RemoteObjectId` concept from remote_storage.
Operate directly on /-separated names instead.
These names are now represented by struct `RemotePath` which was renamed from struct `RelativePath`
* Require remote storage to operate on relative paths for its contents, thus simplifying the way to derive them in pageserver and safekeeper
* Make `IndexPart` to use `String` instead of `RelativePath` for its entries, since those are just the layer names
We had a pattern like this:
match remote_storage {
GenericRemoteStorage::Local(storage) => {
let source = storage.remote_object_id(&file_path)?;
...
storage
.function(&source, ...)
.await
},
GenericRemoteStorage::S3(storage) => {
... exact same code as for the Local case ...
},
This removes the code duplication, by allowing you to call the functions
directly on GenericRemoteStorage.
Also change RemoveObjectId to be just a type alias for String. Now that
the callers of GenericRemoteStorage functions don't know whether they're
dealing with the LocalFs or S3 implementation, RemoveObjectId must be the
same type for both.
download operations of all timelines for one tenant are now grouped
together so when attach is invoked pageserver downloads all of them
and registers them in a single apply_sync_status_update call so
branches can be used safely with attach/detach
Separate task is launched for each timeline and stopped when timeline doesn't
need offloading. Decision who offloads is done through etcd leader election;
currently there is no pre condition for participating, that's a TODO.
neon_local and tests infrastructure for remote storage in safekeepers added,
along with the test itself.
ref #1009
Co-authored-by: Anton Shyrabokau <ahtoxa@Antons-MacBook-Pro.local>