I believe the explicit memory fence instructions are
unnecessary. Performing a store with Release ordering makes all the
previous non-atomic writes visible too. Per rust docs for Ordering::Release
( https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html#variant.Release):
> When coupled with a store, all previous operations become ordered
> before any load of this value with Acquire (or stronger)
> ordering. In particular, all previous writes become visible to all
> threads that perform an Acquire (or stronger) load of this value.
>
> ...
>
> Corresponds to memory_order_release in C++20.
The "all previous writes" means non-atomic writes too. It's not very
clear from that text, but the C++20 docs that it links to is more
explicit about it:
> All memory writes (including non-atomic and relaxed atomic) that
> happened-before the atomic store from the point of view of thread A,
> become visible side-effects in thread B. That is, once the atomic
> load is completed, thread B is guaranteed to see everything thread A
> wrote to memory.
In addition to removing the fence instructions, fix the comments on
each atomic Acquire operation to point to the correct Release
counterpart. We had such comments but they had gone out-of-date as
code has moved.
All the callers did that previously. So rather than document that the
caller needs to do it, just do it in start_neon_io_request() straight
away. (We might want to revisit this if we get codepaths where the C
code submits multiple IO requests as a batch. In that case, it would
be more efficient to fill all the request slots first and only send
one notification to the pipe for all of them)
All the code talks about "request slots", better to make the struct
name reflect that. The "Handle" term was borrowed from Postgres v18
AIO implementation, from the similar handles or slots used to submit
IO requests from backends to worker processes. But even though the
idea is similar, it's a completely separate implementation and there's
nothing else shared between them than the very high level
design.
This greatly reduces the cases where we make a request to the
pageserver with a very recent LSN. Those cases are slow because the
pageserver needs to wait for the WAL to arrive. This speeds up the
Postgres pg_regress and isolation tests greatly.
With this refactoring, the Rust code deals with one giant array of
requests, and doesn't know that it's sliced up per backend
process. The C code is now responsible for slicing it up.
This also adds code to complete old IOs at backends start that were
started and left behind by a previous session. That was a little more
straightforward to do with the refactoring, which is why I tackled it
now.
Some tests were failing with "Only request bodies with a known size
can be checksum validated." erros. This is a known issue with more
recent aws client versions, see
https://github.com/neondatabase/neon/issues/11363.
Some tests were failing, because pageserver didn't shut down promptly.
Tonic server graceful shutdown was a little too graceful; any open
streams linger until they're closed. Check the cancellation token
while waiting for next request, and close the stream if
shutdown/cancellation was requested.
This doesn't do anything interesting yet, but demonstrates linking Rust
code to the neon Postgres extension, so that we can review and test
drive just the build process changes independently.
Mostly a revert of https://github.com/neondatabase/neon/pull/11851 and
https://github.com/neondatabase/neon/pull/12330 .
Christian suggested reverting his PR to fix the issue
https://github.com/neondatabase/neon/issues/12369 .
Alternatives considered:
1. I have originally wanted to introduce cancellation tokens to
`RequestContext`, but in the end I gave up on them because I didn't find
a select-free way of preventing
`test_layer_download_cancelled_by_config_location` from hanging.
Namely if I put a select around the `get_or_maybe_download` invocation
in `get_values_reconstruct_data`, it wouldn't hang, but if I put it
around the `download_init_and_wait` invocation in
`get_or_maybe_download`, the test would still hang. Not sure why, even
though I made the attached child function of the `RequestContext` create
a child token.
2. Introduction of a `download_cancel` cancellation token as a child of
a timeline token, putting it into `RemoteTimelineClient` together with
the main token, and then putting it into the whole
`RemoteTimelineClient` read path.
3. Greater refactorings, like to make cancellation tokens follow a DAG
structure so you can have tokens cancelled either by say timeline
shutting down or a request ending. It doesn't just represent an effort
that we don't have the engineering budget for, it also causes
interesting questions like what to do about batching (do you cancel the
entire request if only some requests get cancelled?).
We might see a reemergence of
https://github.com/neondatabase/neon/issues/11762, but given that we
have https://github.com/neondatabase/neon/pull/11853 and
https://github.com/neondatabase/neon/pull/12376 now, it is possible that
it will not come back. Looking at some code, it might actually fix the
locations where the error pops up. Let's see.
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
attach_writer()/reader() are called from each forked process. It's too
late to do initialization there, in fact we used to overwrite the
contents of the hash table (or at least the freelist?) every time a
new process attached to it. The initialization must be done earlier,
in the HashMapInit() constructors.
- Add ComputeSpec flag `offload_lfc_interval_seconds` controlling
whether LFC should be offloaded to endpoint storage. Default value
(None) means "don't offload".
- Add glue code around it for `neon_local` and integration tests.
- Add `autoprewarm` mode for `test_lfc_prewarm` testing
`offload_lfc_interval_seconds` and `autoprewarm` flags in conjunction.
- Rename `compute_ctl_lfc_prewarm_requests_total` and
`compute_ctl_lfc_offload_requests_total` to
`compute_ctl_lfc_prewarms_total`
and `compute_ctl_lfc_offloads_total` to reflect we count prewarms and
offloads, not `compute_ctl` requests of those.
Don't count request in metrics if there is a prewarm/offload already
ongoing.
https://github.com/neondatabase/cloud/issues/19011
Resolves: https://github.com/neondatabase/cloud/issues/30770
Don't build walproposer-lib as a separate job. It only takes a few
seconds, after you have built all its dependencies.
Don't cache the Neon Pg extensions in the per-postgres-version caches.
This is in preparation for the communicator project, which will
introduce Rust parts to the Neon Pg extension, which complicates the
build process. With that, the 'make neon-pg-ext' step requires some of
the Rust bits to be built already, or it will build them on the spot,
which in turn requires all the Rust sources to be present, and we don't
want to repeat that part for each Postgres version anyway. To prepare
for that, rely on "make all" to build the neon extension and the rust
bits in the correct order instead. Building the neon extension doesn't
currently take very long anyway after you have built Postgres itself, so
you don't gain much by caching it. See
https://github.com/neondatabase/neon/pull/12266.
Add an explicit "rustup update" step to update the toolchain. It's not
strictly necessary right now, because currently "make all" will only
invoke "cargo build" once and the race condition described in the
comment doesn't happen. But prepare for the future.
To further simplify the build, get rid of the separate 'build-postgres'
jobs too, and just build Postgres as a step in the main job. That makes
the overall workflow run longer, because we no longer build all the
postgres versions in parallel (although you still get intra-runner
parallelism thanks to `make -j`), but that's acceptable. In the
cache-hit case, it might even be a little faster because there is less
overhead from launching jobs, and in the cache-miss case, it's maybe
5-10 minutes slower altogether.
---------
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
## Problem
The current deletion operation is synchronous and blocking, which is
unsuitable for potentially long-running tasks like. In such cases, the
standard HTTP request-response pattern is not a good fit.
## Summary of Changes
- Added new `storcon_cli` commands: `NodeStartDelete` and
`NodeCancelDelete` to initiate and cancel deletion asynchronously.
- Added corresponding `storcon` HTTP handlers to support the new
start/cancel deletion flow.
- Introduced a new type of background operation: `Delete`, to track and
manage the deletion process outside the request lifecycle.
---------
Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>
I also added INFO messages for when a backend blocks on the
io-in-progress lock. It's probably too noisy for production, but
useful now to get a picture of how much it happens.