rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-07 22:20:36 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	2fefece77d	temporary hack to make regression tests fail faster	2025-07-02 01:42:39 +03:00
Heikki Linnakangas	471191e64e	Fix updating relsize cache during WAL replay This makes some of the test_runner/regress/test_hot_standby.py tests pass, (Others are still failing..)	2025-07-01 21:22:04 +03:00
Heikki Linnakangas	175c2e11e3	Add assertions that the legacy relsize cache is not used with new communicator And fix a few cases where it was being called	2025-07-01 16:44:25 +03:00
Heikki Linnakangas	efdb07e7b6	Implement function to check if page is in local cache This is needed for read replicas. There's one more TODO that needs to implemented before read replicas work though, in neon_extend_rel_size()	2025-07-01 16:22:51 +03:00
Heikki Linnakangas	b0970b415c	Don't call legacy lfc function when new communicator is used	2025-07-01 15:47:26 +03:00
Erik Grinaker	c3cb1ab98d	Merge branch 'main' into communicator-rewrite	2025-06-30 21:07:01 +02:00
Erik Grinaker	a5b0fc560c	Fix/allow remaining clippy lints	2025-06-30 12:36:20 +02:00
Erik Grinaker	67b04f8ab3	Fix a bunch of linter warnings	2025-06-30 11:10:02 +02:00
Heikki Linnakangas	97a8f4ef85	Handle unexpected EOF while doing an LFC read more gracefully There's a bug somewhere because this happens in python regression tests. We need to hunt that down, but in any case, let's not get stuck in an infinite loop if it happens.	2025-06-30 00:59:53 +03:00
Heikki Linnakangas	39f31957e3	Handle pageserver response with different number of pages gracefully Some tests are hitting this case, where pageserver returns 0 page images in the response to a GetPage request. I suspect it's because the code doesn't handle sharding correclty? In any case, let's not panic on it, but return an IO error to the originating backend.	2025-06-29 23:44:28 +03:00
Heikki Linnakangas	f3ba201800	Run `cargo fmt`	2025-06-29 21:21:07 +03:00
Heikki Linnakangas	a352d290eb	Plumb through both libpq and grpc connection strings to the compute Add a new 'pageserver_connection_info' field in the compute spec. It replaces the old 'pageserver_connstring' field with a more complicated struct that includes both libpq and grpc URLs, for each shard (or only one of the the URLs, depending on the configuration). It also includes a flag suggesting which one to use; compute_ctl now uses it to decide which protocol to use for the basebackup. This is compatible with everything that's in production, because the control plane never used the 'pageserver_connstring' field. That was added a long time ago with the idea that it would replace the code that digs the 'neon.pageserver_connstring' GUC from the list of Postgres settings, but we never got around to do that in the control plane. Hence, it was only used with neon_local. But the plan now is to pass the 'pageserver_connection_info' from the control plane, and once that's fully deployed everywhere, the code to parse 'neon.pageserver_connstring' in compute_ctl can be removed. The 'grpc' flag on an endpoint in endpoint config is now more of a suggestion. Compute_ctl gets both URLs, so it can choose to use libpq or grpc as it wishes. It currently always obeys the 'prefer_grpc' flag that's part of the connection info though. Postgres however uses grpc iff the new rust-based communicator is enabled. TODO/plan for the control plane: - Start to pass `pageserver_connection_info` in the spec file. - Also keep the current `neon.pageserver_connstring` setting for now, for backwards compatibility with old computes After that, the `pageserver_connection_info.prefer_grpc` flag in the spec file can be used to control whether compute_ctl uses grpc or libpq. The actual compute's grpc usage will be controlled by the `neon.enable_new_communicator` GUC. It can be set separately from 'prefer_grpc'. Later: - Once all old computes are gone, remove the code to pass `neon.pageserver_connstring`	2025-06-29 18:16:49 +03:00
Heikki Linnakangas	8c122a1c98	Don't call into the old LFC when using the new communicator This fixes errors like `index "pg_class_relname_nsp_index" contains unexpected zero page at block 2` when running the python tests smgrzeroextend() still called into the old LFC's lfc_write() function, even when using the new communicator, which zeroed some arbitrary pages in the LFC file, overwriting pages managed by the new LFC implementation managed by `integrated_cache.rs`	2025-06-29 17:40:46 +03:00
Erik Grinaker	e3ecdfbecc	pgxn/neon: actually use UNAME_S	2025-06-26 12:38:44 +02:00
Erik Grinaker	d08e553835	pgxn/neon: fix `callback_get_request_lsn_unsafe` return type	2025-06-26 12:33:59 +02:00
Erik Grinaker	7fffb5b4df	pgxn/neon: fix macOS build	2025-06-26 12:33:39 +02:00
Konstantin Knizhnik	be23eae3b6	Mark pages as avaiable in LFC only after generation check (#12350 ) ## Problem If LFC generation is changed then `lfc_readv_select` will return -1 but pages are still marked as available in bitmap. ## Summary of changes Update bitmap after generation check. Co-authored-by: Kosntantin Knizhnik <konstantin.knizhnik@databricks.com>	2025-06-26 07:06:27 +00:00
Heikki Linnakangas	e90be06d46	silence a few compiler warnings about unnecessary 'mut's and 'use's	2025-06-23 18:16:54 +03:00
Heikki Linnakangas	356ba67607	Merge remote-tracking branch 'origin/main' into HEAD I also included build script changes from https://github.com/neondatabase/neon/pull/12266, which is not yet merged but will be soon.	2025-06-23 17:46:30 +03:00
Heikki Linnakangas	3d822dbbde	Refactor Makefile rules for building the extensions under pgxn/ (#12305 )	2025-06-22 19:43:14 +00:00
Heikki Linnakangas	1847f4de54	Add missing #include. Got a warning on macos without this	2025-06-18 17:26:20 +03:00
Mikhail	e95f2f9a67	compute_ctl: return LSN in /terminate (#12240 ) - Add optional `?mode=fast\|immediate` to `/terminate`, `fast` is default. Immediate avoids waiting 30 seconds before returning from `terminate`. - Add `TerminateMode` to `ComputeStatus::TerminationPending` - Use `/terminate?mode=immediate` in `neon_local` instead of `pg_ctl stop` for `test_replica_promotes`. - Change `test_replica_promotes` to check returned LSN - Annotate `finish_sync_safekeepers` as `noreturn`. https://github.com/neondatabase/cloud/issues/29807	2025-06-18 12:25:19 +00:00
Mikhail	7d4f662fbf	upgrade default neon version to 1.6 (#12185 ) Changes for 1.6 were merged and deployed two months ago https://github.com/neondatabase/neon/blob/main/pgxn/neon/neon--1.6--1.5.sql. In order to deploy https://github.com/neondatabase/neon/pull/12183, we need 1.6 to be default, otherwise we can't use prewarm API on read-only replica (`ALTER EXTENSION` won't work) and we need it for promotion	2025-06-17 17:46:35 +00:00
Konstantin Knizhnik	dfa055f4be	Support event trigger for Neon users (#10624 ) ## Problem https://github.com/neondatabase/neon/issues/7570 Even triggers are supported only for superusers. ## Summary of changes Temporary switch to superuser when even trigger is created and disable execution of user's even triggers under superuser. --------- Co-authored-by: Dimitri Fontaine <dim@tapoueh.org> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-06-17 15:44:50 +00:00
Konstantin Knizhnik	24038033bf	Remove default from DROP FUNCTION (#12202 ) ## Problem DROP FUNCTION doesn't allow to specify default for parameters. ## Summary of changes Remove DEFAULT clause from pgxn/neon/neon--1.6--1.5.sql Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-06-11 13:16:58 +00:00
Konstantin Knizhnik	21949137ed	Return last ring index instead of min_ring_index in prefetch_register_bufferv (#12039 ) ## Problem See https://github.com/neondatabase/neon/issues/12018 Now `prefetch_register_bufferv` calculates min_ring_index of all vector requests. But because of pump prefetch state or connection failure, previous slots can be already proceeded and reused. ## Summary of changes Instead of returning minimal index, this function should return last slot index. Actually result of this function is used only in two places. A first place just fort checking (and this check is redundant because the same check is done in `prefetch_register_bufferv` itself. And in the second place where index of filled slot is actually used, there is just one request. Sp fortunately this bug can cause only assert failure in debug build. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-06-10 10:09:46 +00:00
Dmitrii Kovalkov	73be6bb736	fix(compute): use proper safekeeper in VotesCollectedMset (#12175 ) ## Problem `VotesCollectedMset` uses the wrong safekeeper to update truncateLsn. This led to some failed assert later in the code during running safekeeper migration tests. - Relates to https://github.com/neondatabase/neon/issues/11823 ## Summary of changes Use proper safekeeper to update truncateLsn in VotesCollectedMset	2025-06-10 07:16:42 +00:00
Elizabeth Murray	7140a50225	Minor changes to get integration tests to run for communicator.	2025-06-06 04:32:51 +02:00
Elizabeth Murray	68f18ccacf	Request Tracker Prototype Does not include splitting requests across shards.	2025-06-05 13:32:18 -07:00
Heikki Linnakangas	786888d93f	Instead of a fixed TCP port for metrics, listen on a unix domain socket That avoids clashes if you run two computes at the same time. More secure too. We might want to have a TCP port in the long run, but this is less trouble for now. To see the metrics with curl you can use: curl --unix-socket .neon/endpoints/ep-main/pgdata/.metrics.socket http://localhost/metrics	2025-06-05 21:28:11 +03:00
Heikki Linnakangas	255537dda1	avoid hitting assertion failure in MarkPostmasterChildWalSender()	2025-06-05 20:08:32 +03:00
Erik Grinaker	28a61741b3	Mangle gRPC connstrings to use port 51051	2025-06-05 17:46:58 +02:00
Erik Grinaker	2fb6164bf8	Misc build fixes	2025-06-05 17:22:11 +02:00
Erik Grinaker	95838056da	Fix `RelTag` fields	2025-06-05 17:13:51 +02:00
Erik Grinaker	6d451654f1	Remove generated communicator_bindings.h	2025-06-05 17:12:13 +02:00
Erik Grinaker	37c58522a2	Merge branch 'main' into communicator-rewrite	2025-06-05 15:08:05 +02:00
Konstantin Knizhnik	9c6c780201	Replica promote (#12090 ) ## Problem This PR is part of larger computes support activity: https://www.notion.so/neondatabase/Larger-computes-114f189e00478080ba01e8651ab7da90 Epic: https://github.com/neondatabase/cloud/issues/19010 In case of planned node restart, we are going to 1. create new read-only replica 2. capture LFC state at primary 3. use this state to prewarm replica 4. stop old primary 5. promote replica to primary Steps 1-3 are currently implemented and support from compute side. This PR provides compute level implementation of replica promotion. Support replica promotion ## Summary of changes Right now replica promotion is done in three steps: 1. Set safekeepers list (now it is empty for replica) 2. Call `pg_promote()` top promote replica 3. Update endpoint setting to that it ids not more treated as replica. May be all this three steps should be done by some function in compute_ctl. But right now this logic is only implement5ed in test. Postgres submodules PRs: https://github.com/neondatabase/postgres/pull/648 https://github.com/neondatabase/postgres/pull/649 https://github.com/neondatabase/postgres/pull/650 https://github.com/neondatabase/postgres/pull/651 --------- Co-authored-by: Matthias van de Meent <matthias@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-06-05 11:27:14 +00:00
Konstantin Knizhnik	6123fe2d5e	Add query execution time histogram (#10050 ) ## Problem It will be useful to understand what kind of queries our clients are executed. And one of the most important characteristic of query is query execution time - at least it allows to distinguish OLAP and OLTP queries. Also monitoring query execution time can help to detect problem with performance (assuming that workload is more or less stable). ## Summary of changes Add query execution time histogram. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-06-05 11:23:39 +00:00
Arpad Müller	dae203ef69	pgxn: support generations in safekeepers_cmp (#12129 ) `safekeepers_cmp` was added by #8840 to make changes of the safekeeper set order independent: a `sk1,sk2,sk3` specifier changed to `sk3,sk1,sk2` should not cause a walproposer restart. However, this check didn't support generations, in the sense that it would see the `g#123:` as part of the first safekeeper in the list, and if the first safekeeper changes, it would also restart the walproposer. Therefore, parse the generation properly and make it not be a part of the generation. This PR doesn't add a specific test, but I have confirmed locally that `test_safekeepers_reconfigure_reorder` is fixed with the changes of PR #11712 applied thanks to this PR. Part of https://github.com/neondatabase/neon/issues/11670	2025-06-04 23:02:31 +00:00
Erik Grinaker	b36f880710	Fix Linux build failures	2025-06-03 13:37:56 +02:00
Erik Grinaker	745b750f33	Merge branch 'main' into communicator-rewrite	2025-06-03 13:29:45 +02:00
Heikki Linnakangas	f06bb2bbd8	Implement growing the hash table. Fix unit tests.	2025-05-29 15:54:55 +03:00
Heikki Linnakangas	b3c25418a6	Add metrics to track memory usage of the rust communicator	2025-05-29 02:14:01 +03:00
Heikki Linnakangas	33549bad1d	use separate hash tables for relsize cache and block mappings	2025-05-28 23:57:55 +03:00
Heikki Linnakangas	009168d711	Add placeholder shmem hashmap implementation Use that instead of the half-baked Adaptive Radix Tree implementation. ART would probably be better in the long run, but more complicated to implement.	2025-05-28 11:08:35 +03:00
Tristan Partin	fe1513ca57	Add neon.safekeeper_conninfo_options GUC (#11901 ) In order to enable TLS connections between computes and safekeepers, we need to provide the control plane with a way to configure the various libpq keyword parameters, sslmode and sslrootcert. neon.safekeepers is a comma separated list of safekeepers formatted as host:port, so isn't available for extension in the same way that neon.pageserver_connstring is. This could be remedied in a future PR. Part-of: https://github.com/neondatabase/cloud/issues/25823 Link: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-05-27 02:21:24 +00:00
Heikki Linnakangas	bb28109ffa	Merge remote-tracking branch 'origin/main' into communicator-rewrite-with-integrated-cache There were conflicts because of the differences in the page_api protocol that was merged to main vs what was on the branch. I adapted the code for the protocol in main.	2025-05-26 11:52:32 +03:00
Konstantin Knizhnik	d5023f2b89	Restrict pump prefetch state only to regular backends (#12000 ) ## Problem See https://github.com/neondatabase/neon/issues/11997 This guard prevents race condition with pump prefetch state (initiated by timeout). Assert checks that prefetching is also done under guard. But prewarm knows nothing about it. ## Summary of changes Pump prefetch state only in regular backends. Prewarming is done by background workers now. Also it seems to have not sense to pump prefetch state in any other background workers: parallel executors, vacuum,... because they are short living and can not leave unconsumed responses in socket. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-05-23 08:48:06 +00:00
Konstantin Knizhnik	81c557d87e	Unlogged build get smgr (#11954 ) ## Problem See https://github.com/neondatabase/neon/issues/11910 and https://neondb.slack.com/archives/C04DGM6SMTM/p1747314649059129 ## Summary of changes Do not change persistence in `start_unlogged_build` Postgres PRs: https://github.com/neondatabase/postgres/pull/642 https://github.com/neondatabase/postgres/pull/641 https://github.com/neondatabase/postgres/pull/640 https://github.com/neondatabase/postgres/pull/639 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-05-18 05:02:47 +00:00
Konstantin Knizhnik	8e05639dbf	Invalidate LFC after unlogged build (#11951 ) ## Problem See https://neondb.slack.com/archives/C04DGM6SMTM/p1747391617951239 LFC is not always properly updated during unlogged build so it can contain stale content. ## Summary of changes Invalidate LFC content at the end of unlogged build Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-05-17 19:06:59 +00:00

1 2 3 4 5 ...

423 Commits