Erik Grinaker
d7678df445
Reap idle pool resources
2025-07-05 13:35:28 +02:00
Erik Grinaker
03d9f0ec41
Comment tweaks
2025-07-05 11:16:40 +02:00
Erik Grinaker
56845f2da2
Add GetPageClass::is_bulk
2025-07-05 11:15:28 +02:00
Heikki Linnakangas
9a37bfdf63
Fix re-finding an entry in bucket chain
2025-07-05 00:44:46 +03:00
Heikki Linnakangas
4c916552e8
Reduce logging noise
...
These are very useful while debugging, but also very noisy; let's dial
it down a little.
2025-07-04 23:11:36 +03:00
Heikki Linnakangas
50fbf4ac53
Fix hash table initialization across forked processes
...
attach_writer()/reader() are called from each forked process. It's too
late to do initialization there, in fact we used to overwrite the
contents of the hash table (or at least the freelist?) every time a
new process attached to it. The initialization must be done earlier,
in the HashMapInit() constructors.
2025-07-04 23:08:34 +03:00
Erik Grinaker
cb698a3951
Add dedicated client pools for bulk requests
2025-07-04 21:52:25 +02:00
Erik Grinaker
f6cc5cbd0c
Split out retry handler to separate module
2025-07-04 20:20:09 +02:00
Heikki Linnakangas
00affada26
Add request ID to all communicator log lines as context information
2025-07-04 20:34:26 +03:00
Heikki Linnakangas
90d3c09c24
Minor cleanup
...
Tidy up and add some comments. Rename a few things for clarity.
2025-07-04 20:32:59 +03:00
Heikki Linnakangas
6c398aeae7
Fix dependency in Makefile
2025-07-04 20:24:21 +03:00
Heikki Linnakangas
1856bbbb9f
Minor cleanup and commenting
2025-07-04 18:28:34 +03:00
Heikki Linnakangas
bd46dd60a0
Add a temporary timeout to handling an IO request in the communicator
...
It's nicer to timeout in the communicator and return an error to the
backend, than PANIC the backend.
2025-07-04 16:08:22 +03:00
Heikki Linnakangas
5f2d476a58
Add request ID to io-in-progress locking table, to ease debugging
...
I also added INFO messages for when a backend blocks on the
io-in-progress lock. It's probably too noisy for production, but
useful now to get a picture of how much it happens.
2025-07-04 15:55:57 +03:00
Heikki Linnakangas
3231cb6138
Await the io-in-progress locking futures
...
Otherwise they don't do anything. Oops.
2025-07-04 15:55:57 +03:00
Heikki Linnakangas
e558e0da5c
Assign request_id earlier, in the originating backend
...
Makes it more useful for stitching together logs etc. for a specific
request.
2025-07-04 15:55:55 +03:00
Heikki Linnakangas
70bf2e088d
Request multiple block numbers in a single GetPageV request
...
That's how it was always intended to be used
2025-07-04 15:49:04 +03:00
Heikki Linnakangas
da3f9ee72d
cargo fmt
2025-07-04 12:39:41 +03:00
Erik Grinaker
88d1127bf4
Tweak GetPageSplitter
2025-07-03 21:12:26 +02:00
David Freifeld
794bb7a9e8
Merge branch 'quantumish/comm-lfc-integration' into communicator-rewrite
2025-07-03 10:52:29 -07:00
Erik Grinaker
42e4e5a418
Add GetPage request splitting
2025-07-03 18:31:12 +02:00
Heikki Linnakangas
96a817fa2b
Fix the case that storage auth token is _not_ used
...
I broke that in previous commit while fixing the case of using a token.
2025-07-03 18:39:06 +03:00
Heikki Linnakangas
e7b057f2e8
Fix passing storage JWT token to the communicator process
...
Makes the 'test_compute_auth_to_pageserver' test pass
2025-07-03 18:14:22 +03:00
Heikki Linnakangas
956c2f4378
cargo fmt
2025-07-03 16:16:42 +03:00
Heikki Linnakangas
3293e4685e
Fix cases where pageserver gets stuck waiting for LSN
...
The compute might make a request with an LSN that it hasn't even
flushed yet.
2025-07-03 16:14:45 +03:00
Erik Grinaker
6f8650782f
Client tweaks
2025-07-03 14:54:23 +02:00
Erik Grinaker
14214eb853
Add client shard routing
2025-07-03 14:42:35 +02:00
Erik Grinaker
d4b4724921
Sanity-check Pageserver URLs
2025-07-03 14:18:14 +02:00
Erik Grinaker
9aba9550dd
Instrument client methods
2025-07-03 14:11:53 +02:00
Erik Grinaker
375e8e5592
Improve retries and logging
2025-07-03 14:02:43 +02:00
Erik Grinaker
52c586f678
Restructure shard management
2025-07-03 11:51:19 +02:00
Erik Grinaker
de97b73d6e
Lint fixes
2025-07-03 10:38:14 +02:00
Heikki Linnakangas
d8556616c9
Fix running Postgres in "vanilla mode", without neon storage
...
Some tests do that
2025-07-03 00:32:40 +03:00
Heikki Linnakangas
d8296e60e6
Fix caching of newly extended pages
...
This fixes read errors e.g. in test_compute_catalog.py test (and
probably many others).
2025-07-02 23:21:42 +03:00
Heikki Linnakangas
7263d6e2e5
Clarify error message if not_modified_lsn > request_lsn
...
I'm seeing this error from some python tests. Which means there's a
bug in the compute side of course, but it took me a while to figure
that out.
2025-07-02 23:21:42 +03:00
David Freifeld
86fb7b966a
Update integrated_cache.rs to use new hashmap API
2025-07-02 12:18:37 -07:00
David Freifeld
0c099b0944
Merge branch 'quantumish/lfc-resizable-map' into quantumish/comm-lfc-integration
2025-07-02 12:05:24 -07:00
David Freifeld
2fe27f510d
Make neon-shmem tests thread-safe and report errno in panics
2025-07-02 11:57:49 -07:00
David Freifeld
19b5618578
Switch to neon_shmem::sync lock_api and integrate into hashmap
2025-07-02 11:44:38 -07:00
Erik Grinaker
12dade35fa
Comment tweaks
2025-07-02 14:47:27 +02:00
Erik Grinaker
1ec63bd6bc
Misc pool improvements
2025-07-02 14:42:06 +02:00
Heikki Linnakangas
7012b4aa90
Remove --grpc options from neon_local endpoint reconfigure and start calls
...
They don't exist in neon_local anymore, and aren't actually used in
tests either.
2025-07-02 15:10:18 +03:00
Heikki Linnakangas
2cc28c75be
Fix "ERROR: could not read size of rel ..." in many regression tests.
...
We were incorrectly skipping the call to communicator_new_rel_create(),
which resulted in an error during index build, when the btree build code
tried to check the size of the newly-created relation.
2025-07-02 14:10:11 +03:00
Erik Grinaker
bf01145ae4
Remove some old code
2025-07-02 11:46:54 +02:00
Erik Grinaker
8ab8fc11a3
Use new PageserverClient
2025-07-02 11:27:56 +02:00
Erik Grinaker
6f0af96a54
Add new PageserverClient
2025-07-02 10:59:40 +02:00
Heikki Linnakangas
9913d2668a
print retried pageserver requests to log
...
Not sure how verbose we want this to be in production, but for now,
more is better.
This shows that many tests are failing with errors like these:
PG:2025-07-01 23:02:34.311 GMT [1456523] LOG: [COMMUNICATOR] send_process_get_rel_size_request: got error status: NotFound, message: "Read error", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue, 01 Jul 2025 23:02:34 GMT"} }, retrying
I haven't debugged why that is yet. Did the compute make a bogus request?
2025-07-02 02:04:04 +03:00
Heikki Linnakangas
2fefece77d
temporary hack to make regression tests fail faster
2025-07-02 01:42:39 +03:00
Heikki Linnakangas
471191e64e
Fix updating relsize cache during WAL replay
...
This makes some of the test_runner/regress/test_hot_standby.py tests
pass, (Others are still failing..)
2025-07-01 21:22:04 +03:00
Erik Grinaker
f6761760a2
Documentation and tweaks
2025-07-01 17:54:41 +02:00