Commit Graph

478 Commits

Author SHA1 Message Date
Heikki Linnakangas
c14cf15b52 Tidy up the memory ordering instructions on request slot code
I believe the explicit memory fence instructions are
unnecessary. Performing a store with Release ordering makes all the
previous non-atomic writes visible too. Per rust docs for Ordering::Release
( https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html#variant.Release):

> When coupled with a store, all previous operations become ordered
> before any load of this value with Acquire (or stronger)
> ordering. In particular, all previous writes become visible to all
> threads that perform an Acquire (or stronger) load of this value.
>
> ...
>
> Corresponds to memory_order_release in C++20.

The "all previous writes" means non-atomic writes too. It's not very
clear from that text, but the C++20 docs that it links to is more
explicit about it:

> All memory writes (including non-atomic and relaxed atomic) that
> happened-before the atomic store from the point of view of thread A,
> become visible side-effects in thread B. That is, once the atomic
> load is completed, thread B is guaranteed to see everything thread A
> wrote to memory.

In addition to removing the fence instructions, fix the comments on
each atomic Acquire operation to point to the correct Release
counterpart. We had such comments but they had gone out-of-date as
code has moved.
2025-07-10 15:19:20 +03:00
Heikki Linnakangas
5da06d4129 Make start_neon_io_request() wakeup the communicator process
All the callers did that previously. So rather than document that the
caller needs to do it, just do it in start_neon_io_request() straight
away. (We might want to revisit this if we get codepaths where the C
code submits multiple IO requests as a batch. In that case, it would
be more efficient to fill all the request slots first and only send
one notification to the pipe for all of them)
2025-07-10 15:19:20 +03:00
Heikki Linnakangas
f30c59bec9 Improve comments on request slots 2025-07-10 15:19:20 +03:00
Heikki Linnakangas
47c099a0fb Rename NeonIOHandle to NeonIORequestSlot
All the code talks about "request slots", better to make the struct
name reflect that. The "Handle" term was borrowed from Postgres v18
AIO implementation, from the similar handles or slots used to submit
IO requests from backends to worker processes. But even though the
idea is similar, it's a completely separate implementation and there's
nothing else shared between them than the very high level
design.
2025-07-10 14:52:16 +03:00
Heikki Linnakangas
b67e8f2edc Move some code, just for more natural logical ordering 2025-07-10 14:49:29 +03:00
Heikki Linnakangas
b5b1db29bb Implement shard map live-update 2025-07-10 12:25:15 +03:00
Heikki Linnakangas
ed4652b65b Update the relsize cache rather than forget it at end of index build
This greatly reduces the cases where we make a request to the
pageserver with a very recent LSN. Those cases are slow because the
pageserver needs to wait for the WAL to arrive. This speeds up the
Postgres pg_regress and isolation tests greatly.
2025-07-09 17:21:06 +03:00
Heikki Linnakangas
60d87966b8 minor comment improvement 2025-07-09 16:39:40 +03:00
Heikki Linnakangas
8db138ef64 Plumb through the stripe size to the communicator 2025-07-09 16:18:26 +03:00
Heikki Linnakangas
1ee24602d5 Implement working set size estimation 2025-07-09 16:18:26 +03:00
Heikki Linnakangas
732bd26e70 cargo fmt 2025-07-09 16:18:26 +03:00
Heikki Linnakangas
d63f1d259a avoid assertion failure about calling palloc() in critical section 2025-07-08 21:33:25 +03:00
Heikki Linnakangas
4053092408 Fix LSN tracking on "unlogged index builds"
Fixes the test_gin_redo.py test failure, and probably some others
2025-07-08 17:22:24 +03:00
Heikki Linnakangas
ccf88e9375 Improve debug logging by printing IO request details 2025-07-08 17:16:09 +03:00
Heikki Linnakangas
a79fd3bda7 Move logic for picking request slot to the C code
With this refactoring, the Rust code deals with one giant array of
requests, and doesn't know that it's sliced up per backend
process. The C code is now responsible for slicing it up.

This also adds code to complete old IOs at backends start that were
started and left behind by a previous session. That was a little more
straightforward to do with the refactoring, which is why I tackled it
now.
2025-07-07 12:59:08 +03:00
Heikki Linnakangas
e1b58d5d69 Don't segfault if one of the unimplemented functions are called
We'll need to implement these, but let's stop the crashing for now
2025-07-07 11:33:44 +03:00
Erik Grinaker
9ae004f3bc Rename ShardMap to ShardSpec 2025-07-06 19:13:59 +02:00
Erik Grinaker
4b06b547c1 pageserver/client_grpc: add shard map updates 2025-07-06 13:27:17 +02:00
Heikki Linnakangas
74e0d85a04 fix: Don't lose track of in-progress request if query is cancelled 2025-07-06 13:04:03 +03:00
Heikki Linnakangas
e14bb4be39 Merge remote-tracking branch 'origin/main' into communicator-rewrite 2025-07-05 16:59:51 +03:00
Heikki Linnakangas
f3a6c0d8ff cargo fmt 2025-07-05 16:26:24 +03:00
Heikki Linnakangas
d6ec1f1a1c Skip legacy LFC initialization when communicator is used
It clashes with the initialization of the LFC file
2025-07-05 16:26:24 +03:00
Heikki Linnakangas
b568189f7b Build dummy libcommunicator into the 'neon' extension (#12266)
This doesn't do anything interesting yet, but demonstrates linking Rust
code to the neon Postgres extension, so that we can review and test
drive just the build process changes independently.
2025-07-04 23:27:28 +00:00
Heikki Linnakangas
4c916552e8 Reduce logging noise
These are very useful while debugging, but also very noisy; let's dial
it down a little.
2025-07-04 23:11:36 +03:00
Heikki Linnakangas
00affada26 Add request ID to all communicator log lines as context information 2025-07-04 20:34:26 +03:00
Heikki Linnakangas
90d3c09c24 Minor cleanup
Tidy up and add some comments. Rename a few things for clarity.
2025-07-04 20:32:59 +03:00
Heikki Linnakangas
6c398aeae7 Fix dependency in Makefile 2025-07-04 20:24:21 +03:00
Heikki Linnakangas
1856bbbb9f Minor cleanup and commenting 2025-07-04 18:28:34 +03:00
Heikki Linnakangas
bd46dd60a0 Add a temporary timeout to handling an IO request in the communicator
It's nicer to timeout in the communicator and return an error to the
backend, than PANIC the backend.
2025-07-04 16:08:22 +03:00
Heikki Linnakangas
5f2d476a58 Add request ID to io-in-progress locking table, to ease debugging
I also added INFO messages for when a backend blocks on the
io-in-progress lock. It's probably too noisy for production, but
useful now to get a picture of how much it happens.
2025-07-04 15:55:57 +03:00
Heikki Linnakangas
3231cb6138 Await the io-in-progress locking futures
Otherwise they don't do anything. Oops.
2025-07-04 15:55:57 +03:00
Heikki Linnakangas
e558e0da5c Assign request_id earlier, in the originating backend
Makes it more useful for stitching together logs etc. for a specific
request.
2025-07-04 15:55:55 +03:00
Heikki Linnakangas
70bf2e088d Request multiple block numbers in a single GetPageV request
That's how it was always intended to be used
2025-07-04 15:49:04 +03:00
Konstantin Knizhnik
436a117c15 Do not allocate anything in subtransaction memory context (#12176)
## Problem

See https://github.com/neondatabase/neon/issues/12173

## Summary of changes

Allocate table in TopTransactionMemoryContext

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-07-04 10:24:39 +00:00
Heikki Linnakangas
da3f9ee72d cargo fmt 2025-07-04 12:39:41 +03:00
David Freifeld
794bb7a9e8 Merge branch 'quantumish/comm-lfc-integration' into communicator-rewrite 2025-07-03 10:52:29 -07:00
Konstantin Knizhnik
495112ca50 Add GUC for dynamically enable compare local mode (#12424)
## Problem

DEBUG_LOCAL_COMPARE mode allows to detect data corruption.
But it requires rebuild of neon extension (and so requires special
image) and significantly slowdown execution because always fetch pages
from page server.

## Summary of changes

Introduce new GUC `neon.debug_compare_local`, accepting the following
values: " none", "prefetch", "lfc", "all" (by default it is definitely
disabled).
In mode less than "all", neon SMGR will not fetch page from PS if it is
found in local caches.

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2025-07-03 17:37:05 +00:00
Heikki Linnakangas
96a817fa2b Fix the case that storage auth token is _not_ used
I broke that in previous commit while fixing the case of using a token.
2025-07-03 18:39:06 +03:00
Heikki Linnakangas
e7b057f2e8 Fix passing storage JWT token to the communicator process
Makes the 'test_compute_auth_to_pageserver' test pass
2025-07-03 18:14:22 +03:00
Heikki Linnakangas
956c2f4378 cargo fmt 2025-07-03 16:16:42 +03:00
Heikki Linnakangas
3293e4685e Fix cases where pageserver gets stuck waiting for LSN
The compute might make a request with an LSN that it hasn't even
flushed yet.
2025-07-03 16:14:45 +03:00
Erik Grinaker
14214eb853 Add client shard routing 2025-07-03 14:42:35 +02:00
Erik Grinaker
de97b73d6e Lint fixes 2025-07-03 10:38:14 +02:00
Heikki Linnakangas
d8556616c9 Fix running Postgres in "vanilla mode", without neon storage
Some tests do that
2025-07-03 00:32:40 +03:00
Heikki Linnakangas
d8296e60e6 Fix caching of newly extended pages
This fixes read errors e.g. in test_compute_catalog.py test (and
probably many others).
2025-07-02 23:21:42 +03:00
David Freifeld
86fb7b966a Update integrated_cache.rs to use new hashmap API 2025-07-02 12:18:37 -07:00
Heikki Linnakangas
2cc28c75be Fix "ERROR: could not read size of rel ..." in many regression tests.
We were incorrectly skipping the call to communicator_new_rel_create(),
which resulted in an error during index build, when the btree build code
tried to check the size of the newly-created relation.
2025-07-02 14:10:11 +03:00
Erik Grinaker
bf01145ae4 Remove some old code 2025-07-02 11:46:54 +02:00
Erik Grinaker
8ab8fc11a3 Use new PageserverClient 2025-07-02 11:27:56 +02:00
Heikki Linnakangas
2fefece77d temporary hack to make regression tests fail faster 2025-07-02 01:42:39 +03:00