rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-26 09:30:37 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	a79fd3bda7	Move logic for picking request slot to the C code With this refactoring, the Rust code deals with one giant array of requests, and doesn't know that it's sliced up per backend process. The C code is now responsible for slicing it up. This also adds code to complete old IOs at backends start that were started and left behind by a previous session. That was a little more straightforward to do with the refactoring, which is why I tackled it now.	2025-07-07 12:59:08 +03:00
Heikki Linnakangas	e1b58d5d69	Don't segfault if one of the unimplemented functions are called We'll need to implement these, but let's stop the crashing for now	2025-07-07 11:33:44 +03:00
Erik Grinaker	9ae004f3bc	Rename ShardMap to ShardSpec	2025-07-06 19:13:59 +02:00
Erik Grinaker	4b06b547c1	pageserver/client_grpc: add shard map updates	2025-07-06 13:27:17 +02:00
Heikki Linnakangas	74e0d85a04	fix: Don't lose track of in-progress request if query is cancelled	2025-07-06 13:04:03 +03:00
Heikki Linnakangas	e14bb4be39	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-05 16:59:51 +03:00
Heikki Linnakangas	f3a6c0d8ff	cargo fmt	2025-07-05 16:26:24 +03:00
Heikki Linnakangas	d6ec1f1a1c	Skip legacy LFC initialization when communicator is used It clashes with the initialization of the LFC file	2025-07-05 16:26:24 +03:00
Heikki Linnakangas	b568189f7b	Build dummy libcommunicator into the 'neon' extension (#12266 ) This doesn't do anything interesting yet, but demonstrates linking Rust code to the neon Postgres extension, so that we can review and test drive just the build process changes independently.	2025-07-04 23:27:28 +00:00
Heikki Linnakangas	4c916552e8	Reduce logging noise These are very useful while debugging, but also very noisy; let's dial it down a little.	2025-07-04 23:11:36 +03:00
Heikki Linnakangas	00affada26	Add request ID to all communicator log lines as context information	2025-07-04 20:34:26 +03:00
Heikki Linnakangas	90d3c09c24	Minor cleanup Tidy up and add some comments. Rename a few things for clarity.	2025-07-04 20:32:59 +03:00
Heikki Linnakangas	6c398aeae7	Fix dependency in Makefile	2025-07-04 20:24:21 +03:00
Heikki Linnakangas	1856bbbb9f	Minor cleanup and commenting	2025-07-04 18:28:34 +03:00
Heikki Linnakangas	bd46dd60a0	Add a temporary timeout to handling an IO request in the communicator It's nicer to timeout in the communicator and return an error to the backend, than PANIC the backend.	2025-07-04 16:08:22 +03:00
Heikki Linnakangas	5f2d476a58	Add request ID to io-in-progress locking table, to ease debugging I also added INFO messages for when a backend blocks on the io-in-progress lock. It's probably too noisy for production, but useful now to get a picture of how much it happens.	2025-07-04 15:55:57 +03:00
Heikki Linnakangas	3231cb6138	Await the io-in-progress locking futures Otherwise they don't do anything. Oops.	2025-07-04 15:55:57 +03:00
Heikki Linnakangas	e558e0da5c	Assign request_id earlier, in the originating backend Makes it more useful for stitching together logs etc. for a specific request.	2025-07-04 15:55:55 +03:00
Heikki Linnakangas	70bf2e088d	Request multiple block numbers in a single GetPageV request That's how it was always intended to be used	2025-07-04 15:49:04 +03:00
Konstantin Knizhnik	436a117c15	Do not allocate anything in subtransaction memory context (#12176 ) ## Problem See https://github.com/neondatabase/neon/issues/12173 ## Summary of changes Allocate table in TopTransactionMemoryContext --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-07-04 10:24:39 +00:00
Heikki Linnakangas	da3f9ee72d	cargo fmt	2025-07-04 12:39:41 +03:00
David Freifeld	794bb7a9e8	Merge branch 'quantumish/comm-lfc-integration' into communicator-rewrite	2025-07-03 10:52:29 -07:00
Konstantin Knizhnik	495112ca50	Add GUC for dynamically enable compare local mode (#12424 ) ## Problem DEBUG_LOCAL_COMPARE mode allows to detect data corruption. But it requires rebuild of neon extension (and so requires special image) and significantly slowdown execution because always fetch pages from page server. ## Summary of changes Introduce new GUC `neon.debug_compare_local`, accepting the following values: " none", "prefetch", "lfc", "all" (by default it is definitely disabled). In mode less than "all", neon SMGR will not fetch page from PS if it is found in local caches. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-07-03 17:37:05 +00:00
Heikki Linnakangas	96a817fa2b	Fix the case that storage auth token is _not_ used I broke that in previous commit while fixing the case of using a token.	2025-07-03 18:39:06 +03:00
Heikki Linnakangas	e7b057f2e8	Fix passing storage JWT token to the communicator process Makes the 'test_compute_auth_to_pageserver' test pass	2025-07-03 18:14:22 +03:00
Heikki Linnakangas	956c2f4378	cargo fmt	2025-07-03 16:16:42 +03:00
Heikki Linnakangas	3293e4685e	Fix cases where pageserver gets stuck waiting for LSN The compute might make a request with an LSN that it hasn't even flushed yet.	2025-07-03 16:14:45 +03:00
Erik Grinaker	14214eb853	Add client shard routing	2025-07-03 14:42:35 +02:00
Erik Grinaker	de97b73d6e	Lint fixes	2025-07-03 10:38:14 +02:00
Heikki Linnakangas	d8556616c9	Fix running Postgres in "vanilla mode", without neon storage Some tests do that	2025-07-03 00:32:40 +03:00
Heikki Linnakangas	d8296e60e6	Fix caching of newly extended pages This fixes read errors e.g. in test_compute_catalog.py test (and probably many others).	2025-07-02 23:21:42 +03:00
David Freifeld	86fb7b966a	Update `integrated_cache.rs` to use new hashmap API	2025-07-02 12:18:37 -07:00
Heikki Linnakangas	2cc28c75be	Fix "ERROR: could not read size of rel ..." in many regression tests. We were incorrectly skipping the call to communicator_new_rel_create(), which resulted in an error during index build, when the btree build code tried to check the size of the newly-created relation.	2025-07-02 14:10:11 +03:00
Erik Grinaker	bf01145ae4	Remove some old code	2025-07-02 11:46:54 +02:00
Erik Grinaker	8ab8fc11a3	Use new `PageserverClient`	2025-07-02 11:27:56 +02:00
Heikki Linnakangas	2fefece77d	temporary hack to make regression tests fail faster	2025-07-02 01:42:39 +03:00
Heikki Linnakangas	471191e64e	Fix updating relsize cache during WAL replay This makes some of the test_runner/regress/test_hot_standby.py tests pass, (Others are still failing..)	2025-07-01 21:22:04 +03:00
Heikki Linnakangas	175c2e11e3	Add assertions that the legacy relsize cache is not used with new communicator And fix a few cases where it was being called	2025-07-01 16:44:25 +03:00
Heikki Linnakangas	efdb07e7b6	Implement function to check if page is in local cache This is needed for read replicas. There's one more TODO that needs to implemented before read replicas work though, in neon_extend_rel_size()	2025-07-01 16:22:51 +03:00
Heikki Linnakangas	b0970b415c	Don't call legacy lfc function when new communicator is used	2025-07-01 15:47:26 +03:00
Erik Grinaker	c3cb1ab98d	Merge branch 'main' into communicator-rewrite	2025-06-30 21:07:01 +02:00
Erik Grinaker	a5b0fc560c	Fix/allow remaining clippy lints	2025-06-30 12:36:20 +02:00
Erik Grinaker	67b04f8ab3	Fix a bunch of linter warnings	2025-06-30 11:10:02 +02:00
Heikki Linnakangas	97a8f4ef85	Handle unexpected EOF while doing an LFC read more gracefully There's a bug somewhere because this happens in python regression tests. We need to hunt that down, but in any case, let's not get stuck in an infinite loop if it happens.	2025-06-30 00:59:53 +03:00
Heikki Linnakangas	39f31957e3	Handle pageserver response with different number of pages gracefully Some tests are hitting this case, where pageserver returns 0 page images in the response to a GetPage request. I suspect it's because the code doesn't handle sharding correclty? In any case, let's not panic on it, but return an IO error to the originating backend.	2025-06-29 23:44:28 +03:00
Heikki Linnakangas	f3ba201800	Run `cargo fmt`	2025-06-29 21:21:07 +03:00
Heikki Linnakangas	a352d290eb	Plumb through both libpq and grpc connection strings to the compute Add a new 'pageserver_connection_info' field in the compute spec. It replaces the old 'pageserver_connstring' field with a more complicated struct that includes both libpq and grpc URLs, for each shard (or only one of the the URLs, depending on the configuration). It also includes a flag suggesting which one to use; compute_ctl now uses it to decide which protocol to use for the basebackup. This is compatible with everything that's in production, because the control plane never used the 'pageserver_connstring' field. That was added a long time ago with the idea that it would replace the code that digs the 'neon.pageserver_connstring' GUC from the list of Postgres settings, but we never got around to do that in the control plane. Hence, it was only used with neon_local. But the plan now is to pass the 'pageserver_connection_info' from the control plane, and once that's fully deployed everywhere, the code to parse 'neon.pageserver_connstring' in compute_ctl can be removed. The 'grpc' flag on an endpoint in endpoint config is now more of a suggestion. Compute_ctl gets both URLs, so it can choose to use libpq or grpc as it wishes. It currently always obeys the 'prefer_grpc' flag that's part of the connection info though. Postgres however uses grpc iff the new rust-based communicator is enabled. TODO/plan for the control plane: - Start to pass `pageserver_connection_info` in the spec file. - Also keep the current `neon.pageserver_connstring` setting for now, for backwards compatibility with old computes After that, the `pageserver_connection_info.prefer_grpc` flag in the spec file can be used to control whether compute_ctl uses grpc or libpq. The actual compute's grpc usage will be controlled by the `neon.enable_new_communicator` GUC. It can be set separately from 'prefer_grpc'. Later: - Once all old computes are gone, remove the code to pass `neon.pageserver_connstring`	2025-06-29 18:16:49 +03:00
Heikki Linnakangas	8c122a1c98	Don't call into the old LFC when using the new communicator This fixes errors like `index "pg_class_relname_nsp_index" contains unexpected zero page at block 2` when running the python tests smgrzeroextend() still called into the old LFC's lfc_write() function, even when using the new communicator, which zeroed some arbitrary pages in the LFC file, overwriting pages managed by the new LFC implementation managed by `integrated_cache.rs`	2025-06-29 17:40:46 +03:00
David Freifeld	78b6da270b	Sketchily integrate hashmap rewrite with `integrated_cache`	2025-06-26 16:45:48 -07:00
David Freifeld	4713715c59	Merge branch 'communicator-rewrite' of github.com:neondatabase/neon into communicator-rewrite	2025-06-26 10:26:41 -07:00

1 2 3 4 5 ...

464 Commits