rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-22 21:59:59 +00:00

Author	SHA1	Message	Date
quantumish	5f5d5122a7	Add capability to reclaim free blocks in LFC	2025-08-03 16:53:52 -07:00
quantumish	135316f424	Merge remote-tracking branch 'origin/communicator-rewrite' into quantumish/lfc-resize-static-shmem	2025-07-31 12:28:17 -07:00
Heikki Linnakangas	ede37c5346	rever unintentional changes to submodules	2025-07-31 21:17:19 +03:00
Heikki Linnakangas	b72f410b6e	cargo fmt	2025-07-31 20:47:52 +03:00
Heikki Linnakangas	e1c7d79e2a	dial down smgr trace logging to same level as on 'main'	2025-07-31 20:47:20 +03:00
Heikki Linnakangas	bb1f50bf09	Set `num_shards` in shared memory. The get_num_shards() function, called from the WAL proposer, requires it. Fixes test_timeline_size_quota_on_startup	2025-07-31 16:29:24 +03:00
Heikki Linnakangas	9871a3f9e7	tidy up error handling a bit Pass back a suitable 'errno' from the communicator process to the originating backend in all cases. Usually it's just EIO because we don't have a good way to map from tonic StatusCodes to libc error numbers. That's probably good enough; from the original backend's perspective all errors are IO errors. In the C code, set libc errno variable before calling ereport(), so that errcode_for_file_access() works. And once we do that, we can replace pg_strerror() calls with %m.	2025-07-31 15:31:19 +03:00
Heikki Linnakangas	e1df05448c	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-31 15:01:34 +03:00
Heikki Linnakangas	b4a63e0a34	Fix how `neon.stripe_size` option is set in postgresql.conf file (#12776 ) Commit `1dce2a9e74` changed how the `neon.pageserver_connstring` setting is formed, but it messed up setting the `neon.stripe_size` setting so that it was set twice. That got mixed up during development of the patch, as commit `7fef4435c1` landed first and was merged incorrectly.	2025-07-31 11:46:57 +00:00
Heikki Linnakangas	17cd611ccc	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-31 14:45:22 +03:00
Heikki Linnakangas	c509d53cd1	fix clippy warnings	2025-07-31 14:45:13 +03:00
Heikki Linnakangas	84f4dcd2be	fix test scripts to not set neon.use_communicator_worker anymore compute_ctl does it based on prefer_protocol now	2025-07-31 14:36:26 +03:00
Heikki Linnakangas	b4808a4e5c	Set `neon.use_communicator_worker` GUC based on `prefer_protocol`	2025-07-31 14:24:38 +03:00
Heikki Linnakangas	5e2a19ce73	cargo fmt	2025-07-31 14:24:17 +03:00
Heikki Linnakangas	8a4f16a471	More work on metrics Switch to the 'measured' crate everywhere in the communicator. Connect the allocator metrics to the metrics endpoint.	2025-07-31 14:09:39 +03:00
Erik Grinaker	f8fc0bf3c0	neon_local: use doc comments for help texts (#12270 ) Clap automatically uses doc comments as help/about texts. Doc comments are strictly better, since they're also used e.g. for IDE documentation, and are better formatted. This patch updates all `neon_local` commands to use doc comments (courtesy of GPT-o3).	2025-07-31 10:25:33 +00:00
Alexey Kondratov	8fe7596120	chore(compute_tools): Delete unused anon_ext_fn_reassign.sql (#12787 ) It's an anon v1 failed launch artifact, I suppose.	2025-07-31 10:11:30 +00:00
Krzysztof Szafrański	f3ee6e818d	[proxy] Correctly classify ConnectErrors (#12793 ) As is, e.g. quota errors on wake compute are logged as "compute" errors.	2025-07-31 09:53:48 +00:00
Dmitrii Kovalkov	edd60730c8	safekeeper: use last_log_term in mconf switch + choose most advanced sk in pull timeline (#12778 ) ## Problem I discovered two bugs corresponding to safekeeper migration, which together might lead to a data loss during the migration. The second bug is from a hadron patch and might lead to a data loss during the safekeeper restore in hadron as well. 1. `switch_membership` returns the current `term` instead of `last_log_term`. It is used to choose the `sync_position` in the algorithm, so we might choose the wrong one and break the correctness guarantees. 2. The current `term` is used to choose the most advanced SK in `pull_timeline` with higher priority than `flush_lsn`. It is incorrect because the most advanced safekeeper is the one with the highest `(last_log_term, flush_lsn)` pair. The compute might bump term on the least advanced sk, making it the best choice to pull from, and thus making committed log entries "uncommitted" after `pull_timeline` Part of https://databricks.atlassian.net/browse/LKB-1017 ## Summary of changes - Return `last_log_term` in `switch_membership` - Use `(last_log_term, flush_lsn)` as a primary key for choosing the most advanced sk in `pull_timeline` and deny pulling if the `max_term` is higher than on the most advanced sk (hadron only) - Write tests for both cases - Retry `sync_safekeepers` in `compute_ctl` - Take into the account the quorum size when calculating `sync_position`	2025-07-31 09:29:25 +00:00
Aleksandr Sarantsev	975b95f4cd	Introduce deletion API improvement RFC (#12484 ) ## Problem The deletion logic had become difficult to understand and maintain. ## Summary of changes - Added an RFC detailing proposed improvements to all deletion-related APIs. --------- Co-authored-by: Aleksandr Sarantsev <aleksandr.sarantsev@databricks.com>	2025-07-31 08:34:47 +00:00
Heikki Linnakangas	0428164058	Fix LFC stats exposed by the built-in prometheus endpoint	2025-07-31 11:34:14 +03:00
Heikki Linnakangas	c8042f9e31	Run pgindent on the new communicator C code	2025-07-31 11:11:38 +03:00
Heikki Linnakangas	4016808dff	Handle get_raw_page_at_lsn() debugging function properly This adds a new request type between backend and communicator, to make a getpage request at a given LSN, bypassing the LFC. Only used by the get_raw_page_at_lsn() debugging/testing function.	2025-07-31 11:04:15 +03:00
Mikhail	01c39f378e	prewarm cancellation (#12785 ) Add DELETE /lfc/prewarm route which handles ongoing prewarm cancellation, update API spec, add prewarm Cancelled state Add offload Cancelled state when LFC is not initialized	2025-07-30 22:05:51 +00:00
Dimitri Fontaine	4d3b28bd2e	[Hadron] Always run databricks auth hook. (#12683 )	2025-07-30 21:34:30 +00:00
Heikki Linnakangas	c8b875c93b	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-30 23:08:43 +03:00
Heikki Linnakangas	768fc101cc	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-30 23:08:18 +03:00
Heikki Linnakangas	81ddd10be6	tests: Don't print Hostname on every test connection (#12782 ) These lines are a significant fraction of the total log size of the regression tests. And it seems very uninteresting, it's always 'localhost' in local tests.	2025-07-30 19:56:22 +00:00
Heikki Linnakangas	3dfa2fc3e4	Fix relsize caching in hot standby mode Fixes remaining test_hot_standby.py failures	2025-07-30 22:55:38 +03:00
Suhas Thalanki	e470997627	enable tests introduced in hadron commits (#12790 ) Enables skipped tests introduced in hadron integration commits	2025-07-30 19:10:33 +00:00
Heikki Linnakangas	49204b6a59	don't try to update the legacy last-written LSN cache with new communicator	2025-07-30 22:01:04 +03:00
Heikki Linnakangas	c0360644a7	Evict and retry if the block hash map is full I made this change to one the is_write==true case earlier already, but the is_write==false codepath needs the same treatment.	2025-07-30 21:48:25 +03:00
Heikki Linnakangas	688990e7ec	Crank down the logging More logs is useful during debugging, but it's time to crank it down a notch...	2025-07-30 21:24:19 +03:00
Heikki Linnakangas	af5e3da381	Fix updating last-written LSN when WAL redo skips updating a block This makes the test_replica_query_race test pass, and probably some other read replica tests too.	2025-07-30 21:20:10 +03:00
Erik Grinaker	eb2741758b	storcon: actually update gRPC address on reattach (#12784 ) ## Problem In #12268, we added Pageserver gRPC addresses to the storage controller. However, we didn't actually persist these in the database. ## Summary of changes Update the database with the new gRPC address on reattach.	2025-07-30 16:18:35 +00:00
Matthias van de Meent	f3a0e4f255	Improve specificity with which we apply compute specs (#12773 ) This makes sure we don't confuse user-controlled functions with PG's builtin functions. ## Problem See https://github.com/neondatabase/cloud/issues/31628	2025-07-30 15:29:16 +00:00
Suhas Thalanki	842a5091d5	[BRC-3051] Walproposer: Safekeeper quorum health metrics (#930 ) (#12750 ) Today we don't have any indications (other than spammy logs in PG that nobody monitors) if the Walproposer in PG cannot connect to/get votes from all Safekeepers. This means we don't have signals indicating that the Safekeepers are operating at degraded redundancy. We need these signals. Added plumbing in PG extension so that the `neon_perf_counters` view exports the following gauge metrics on safekeeper health: - `num_configured_safekeepers`: The total number of safekeepers configured in PG. - `num_active_safekeepers`: The number of safekeepers that PG is actively streaming WAL to. An alert should be raised whenever `num_active_safekeepers` < `num_configured_safekeepers`. The metrics are implemented by adding additional state to the Walproposer shared memory keeping track of the active statuses of safekeepers using a simple array. The status of the safekeeper is set to active (1) after the Walproposer acquires a quorum and starts streaming data to the safekeeper, and is set to inactive (0) when the connection with a safekeeper is shut down. We scan the safekeeper status array in Walproposer shared memory when collecting the metrics to produce results for the gauges. Added coverage for the metrics to integration test `test_wal_acceptor.py::test_timeline_disk_usage_limit`. ## Problem ## Summary of changes --------- Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-30 15:14:59 +00:00
Suhas Thalanki	056056bef0	fix(compute): validate `prewarm_local_cache()` input (#12648 ) ## Problem ``` postgres=> select neon.prewarm_local_cache('\xfcfcfcfc01000000ffffffff070000000000000000000000000000000000000000000000000000000000000000000000000000ff', 1); WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: server conn crashed? ``` The function takes a bytea argument and casts it to a C struct, without validating the contents. ## Summary of changes Added validation for number of pages to be prefetched and for the chunks as well.	2025-07-30 14:33:19 +00:00
Heikki Linnakangas	fca52af7e3	Don't update the legacy last-written LSN cache with new communicator The new communicator has its own tracking	2025-07-30 17:31:51 +03:00
Ruslan Talpa	e989e0da78	[proxy] accept jwts when configured as rest_broker (#12777 ) ## Problem when compiled with rest_broker feature and is_rest_broker=true (but is_auth_broker=false) accept_jwts is set to false ## Summary of changes set the config with ``` accept_jwts: args.is_auth_broker \|\| args.is_rest_broker ``` Co-authored-by: Ruslan Talpa <ruslan.talpa@databricks.com>	2025-07-30 14:17:51 +00:00
Heikki Linnakangas	b3c1aecd11	tests: Stop endpoints in parallel (#12769 ) Shaves off a few seconds from tests involving multiple endpoints.	2025-07-30 12:19:00 +00:00
Heikki Linnakangas	95ef69ca95	Enable gRPC in the docker-compose setup	2025-07-30 15:16:50 +03:00
Heikki Linnakangas	9e250e382a	Merge remote-tracking branch 'origin/main' into communicator-rewrite	2025-07-30 11:19:42 +03:00
Heikki Linnakangas	1dce2a9e74	Change how pageserver connection info is passed in compute spec (#12604 ) Add a new 'pageserver_connection_info' field in the compute spec. It replaces the old 'pageserver_connstring' field with a more complicated struct that includes both libpq and grpc URLs, for each shard (or only one of the the URLs, depending on the configuration). It also includes a flag suggesting which one to use; compute_ctl now uses it to decide which protocol to use for the basebackup. This is backwards-compatible with everything that's in production. If the control plane fills in `pageserver_connection_info`, compute_ctl uses that. If it fills in the `pageserver_connstring`/`shard_stripe_size` fields, it uses those. As last resort, it uses the 'neon.pageserver_connstring' GUC from the list of Postgres settings. The 'grpc' flag in the endpoint config is now more of a suggestion, and it's used to populate the 'prefer_protocol' flag in the compute spec. Regardless of the flag, compute_ctl gets both URLs, so it can choose to use libpq or grpc as it wishes. It currently always obeys the flag to choose which method to use for getting the basebackup, but Postgres itself will always use the libpq protocol. (That will be changed with the new rust-based communicator project, which implements the gRPC client in the compute). After that, the `pageserver_connection_info.prefer_protocol` flag in the spec file can be used to control whether compute_ctl uses grpc or libpq. The actual compute's grpc usage will be controlled by the `neon.enable_new_communicator` GUC (not yet; that will be introduced in the future, with the new rust-base communicator project). It can be set separately from 'prefer_protocol'. Later: - Once all old computes are gone, remove the code to pass `neon.pageserver_connstring`	2025-07-29 22:20:05 +00:00
HaoyuHuang	ca88521653	Set neon_superuser privilege under lakebase mode (#12775 ) ## Problem ## Summary of changes	2025-07-29 21:30:34 +00:00
Suhas Thalanki	07c3cfd2a0	[BRC-2905] Feed back PS-detected data corruption signals to SK and PG… (#12748 ) … walproposer (#895) Data corruptions are typically detected on the pageserver side when it replays WAL records. However, since PS doesn't synchronously replay WAL records as they are being ingested through safekeepers, we need some extra plumbing to feed information about pageserver-detected corruptions during compaction (and/or WAL redo in general) back to SK and PG for proper action. We don't yet know what actions PG/SK should take upon receiving the signal, but we should have the detection and feedback in place. Add an extra `corruption_detected` field to the `PageserverFeedback` message that is sent from PS -> SK -> PG. It's a boolean value that is set to true when PS detects a "critical error" that signals data corruption, and it's sent in all `PageserverFeedback` messages. Upon receiving this signal, the safekeeper raises a `safekeeper_ps_corruption_detected` gauge metric (value set to 1). The safekeeper then forwards this signal to PG where a `ps_corruption_detected` gauge metric (value also set to 1) is raised in the `neon_perf_counters` view. Added an integration test in `test_compaction.py::test_ps_corruption_detection_feedback` that confirms that the safekeeper and PG can receive the data corruption signal in the `PageserverFeedback` message in a simulated data corruption. ## Problem ## Summary of changes --------- Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-29 20:40:07 +00:00
Erik Grinaker	80d052f262	Merge branch 'main' into communicator-rewrite	2025-07-29 22:05:16 +02:00
Erik Grinaker	7cd0066212	page_api: add `SplitError` for `GetPageSplitter` (#12709 ) Add a `SplitError` for `GetPageSplitter`, with an `Into<tonic::Status>` implementation. This avoids a bunch of boilerplate to convert `GetPageSplitter` errors into `tonic::Status`. Requires #12702. Touches [LKB-191](https://databricks.atlassian.net/browse/LKB-191).	2025-07-29 18:26:20 +00:00
Suhas Thalanki	bf3a1529bf	Report metrics on data/index corruption (#12729 ) ## Problem We don't have visibility into data/index corruption. ## Summary of changes Add data/index corruptions metrics. PG calls elog ERROR errcode to emit these corruption errors. PG Changes: https://github.com/neondatabase/postgres/pull/698	2025-07-29 18:08:24 +00:00
Erik Grinaker	65d1be6e90	pageserver: route gRPC requests to child shards (#12702 ) ## Problem During shard splits, each parent shard is split and removed incrementally. Only when all parent shards have split is the split committed and the compute notified. This can take several minutes for large tenants. In the meanwhile, the compute will be sending requests to the (now-removed) parent shards. This was (mostly) not a problem for the libpq protocol, because it does shard routing on the server-side. The compute just sends requests to some Pageserver, and the server will figure out which local shard should serve it. It is a problem for the gRPC protocol, where the client explicitly says which shard it's talking to. Touches [LKB-191](https://databricks.atlassian.net/browse/LKB-191). Requires #12772. ## Summary of changes * Add server-side routing of gRPC requests to any local child shards if the parent does not exist. * Add server-side splitting of GetPage batch requests straddling multiple child shards. * Move the `GetPageSplitter` into `pageserver_page_api`. I really don't like this approach, but it avoids making changes to the split protocol. I could be convinced we should change the split protocol instead, e.g. to keep the parent shard alive until the split commits and the compute has been notified, but we can also do that as a later change without blocking the communicator on it.	2025-07-29 16:28:57 +00:00

1 2 3 4 5 ...

8761 Commits