rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-06 05:30:38 +00:00

Author	SHA1	Message	Date
Arseny Sher	db97d3f140	fix unit tests	2025-02-04 11:22:35 +01:00
Arseny Sher	42059b027e	convert appendresponse	2025-02-04 10:11:27 +01:00
Arseny Sher	9aad617d76	convert appendrequest	2025-02-04 10:11:27 +01:00
Arseny Sher	443228e6b7	convert ProposerElected	2025-02-04 10:11:27 +01:00
Arseny Sher	50c93da736	remove propelected.timeline_start_lsn usages in sk	2025-02-04 10:11:27 +01:00
Arseny Sher	8fb781f908	Convert VoteResponse, remove timeline_start_lsn.	2025-02-04 10:11:27 +01:00
Arseny Sher	d5065339aa	convert voterequest	2025-02-04 10:11:27 +01:00
Arseny Sher	9ecc22e355	move greeting init v2 to PAMessageSerialize	2025-02-04 10:11:27 +01:00
Arseny Sher	874f4ff0d9	wp: print mconf	2025-02-04 10:11:27 +01:00
Arseny Sher	ec745e4d08	convert acceptorgreeting	2025-02-04 10:11:27 +01:00
Arseny Sher	8683462157	rename sk members to m for brevity	2025-02-04 10:11:27 +01:00
Arseny Sher	7961f969a3	infra for v3 acceptor -> proposer msgs	2025-02-04 10:11:27 +01:00
Arseny Sher	98e98ac650	convert ProposerGreeting	2025-02-04 10:11:27 +01:00
Arseny Sher	efdddc062b	Add two safekeeper proto versions to test_normal_work.	2025-02-04 10:11:27 +01:00
Arseny Sher	5e9c22f1b6	Pass proto selection to safekeeper, prepare parsing v3.	2025-02-04 10:11:27 +01:00
Arseny Sher	9630b5c965	Add PAMessageSerialize and ProposerAcceptorGreeting v3	2025-02-04 10:11:27 +01:00
Arseny Sher	0c386b272a	pgindent wp code	2025-02-04 10:11:27 +01:00
Alex Chi Z.	e219d48bfe	refactor(pageserver): clearify compaction return value (#10643 ) ## Problem ## Summary of changes Make the return value of the set of compaction functions less confusing. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-03 21:56:55 +00:00
Alex Chi Z.	c1be84197e	feat(pageserver): preempt image layer generation if L0 piles up (#10572 ) ## Problem Image layer generation could block L0 compactions for a long time. ## Summary of changes * Refactored the return value of `create_image_layers_for_` functions to make it self-explainable. Preempt image layer generation in `Try` mode if L0 piles up. Note that we might potentially run into a state that only the beginning part of the keyspace gets image coverage. In that case, we either need to implement something to prioritize some keyspaces with image coverage, or tune the image_creation_threshold to ensure that the frequency of image creation could keep up with L0 compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-02-03 20:55:47 +00:00
dependabot[bot]	d80cbb2443	build(deps): bump openssl from 0.10.66 to 0.10.70 in /test_runner/pg_clients/rust/tokio-postgres in the cargo group across 1 directory (#10642 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-03 19:42:40 +00:00
Erik Grinaker	06b45fd0fd	utils/logging: add `critical!` macro and metric (#10641 ) ## Problem We don't currently have good alerts for critical errors, e.g. data loss/corruption. Touches #10094. ## Summary of changes Add a `critical!` macro and corresponding `libmetrics_tracing_event_count{level="critical"}` metric. This will: * Emit an `ERROR` log message with prefix `"CRITICAL:"` and a backtrace. * Increment `libmetrics_tracing_event_count{level="critical"}`, and indirectly `level="error"`. * Trigger a pageable alert (via the metric above). * In debug builds, panic the process. I'll add uses of the macro separately.	2025-02-03 19:23:12 +00:00
John Spray	715e20343a	storage controller: improve scheduling of tenants created in PlacementPolicy::Secondary (#10590 ) ## Problem I noticed when onboarding lots of tenants that the AZ scheduling violation stat was climbing, before falling later as optimisations happened. This was happening because we first add the tenant with PlacementPolicy::Secondary, and then later go to PlacementPolicy::Attached, and the scheduler's behavior led to a bad AZ choice: 1. Create a secondary location in the non-preferred AZ 2. Upgrade to Attached where we promote that non-preferred-AZ location to attached and then create another secondary 3. Optimiser later realises we're in the wrong AZ and moves us ## Summary of changes - Extend some logging to give more information about AZs - When scheduling secondary location in PlacementPolicy::Secondary, select it as if we were attached: in this mode, our business goal is to have a warm pageserver location that we can make available as attached quickly if needed, therefore we want it to be in the preferred AZ. - Make optimize_secondary logic the same, so that it will consider a secondary location in the preferred AZ to be optimal when in PlacementPolicy::Secondary - When transitioning to from PlacementPolicy::Attached(N) to PlacementPolicy::Secondary, instead of arbitrarily picking a location to keep, prefer to keep the location in the preferred AZ	2025-02-03 19:01:16 +00:00
Arpad Müller	c774f0a147	storcon db: allow accepting any TLS certificate (#10640 ) We encountered some TLS validation errors for the storcon since applying #10614. Add an option to downgrade them to logged errors instead to allow us to debug with more peace. cc issue https://github.com/neondatabase/cloud/issues/23583	2025-02-03 18:21:01 +00:00
Folke Behrens	628a9616c4	fix(proxy): Don't use --is-private-access-proxy to disable IP check (#10633 ) ## Problem * The behavior of this flag changed. Plus, it's not necessary to disable the IP check as long as there are no IPs listed in the local postgres. ## Summary of changes * Drop the flag from the command in the README.md section. * Change the postgres URL passed to proxy to not use the endpoint hostname. * Also swap postgres creation and proxy startup, so the DB is running when proxy comes up.	2025-02-03 14:12:41 +00:00
Alexander Bayandin	43682624b5	CI(pg-clients): fix logical replication tests (#10623 ) ## Problem Tests for logical replication (on Staging) have been failing for some time because logical replication is not enabled for them. This issue occurred after switching to an org API key with a different default setting, where logical replication was not enabled by default. ## Summary of changes - Add `enable_logical_replication` input to `actions/neon-project-create` - Enable logical replication in `test-logical-replication` job	2025-02-03 13:41:41 +00:00
Em Sharnoff	e617a3a075	vm-monitor: Improve error display (#10542 ) Logging errors with the debug format specifier causes multi-line errors, which are sometimes a pain to deal with. Instead, we should use anyhow's alternate display format, which shows the same information on a single line. Also adjusted a couple of error messages that were stale. Fixes neondatabase/cloud#14710.	2025-02-03 13:34:11 +00:00
Fedor Dikarev	23ca8b061b	Use actions/checkout for checkout (#10630 ) ## Problem 1. First of all it's more correct 2. Current usage allows ` Time-of-Check-Time-of-Use (TOCTOU) 'Pwn Request' vulnerabilities`. Please check security slack channel or reach me for more details. I will update PR description after merge. ## Summary of changes 1. Use `actions/checkout` with `ref: ${{ github.event.pull_request.head.sha }}` Discovered by and Co-author: @varunsh-coder	2025-02-03 12:55:48 +00:00
Anastasia Lubennikova	b1bc33eb4d	Fix logical_replication_sync test fixture (#10531 ) Fixes flaky test_lr_with_slow_safekeeper test #10242 Fix query to `pg_catalog.pg_stat_subscription` catalog to handle table synchronization and parallel LR correctly.	2025-02-03 12:44:47 +00:00
OBBO67	b1e451091a	pageserver: clean up references to timeline delete marker, uninit marker (#5718 ) (#10627 ) ## Problem Since [#5580](https://github.com/neondatabase/neon/pull/5580) the delete and uninit file markers are no longer needed. ## Summary of changes Remove the remaining code for the delete and uninit markers. Additionally removes the `ends_with_suffix` function as it is no longer required. Closes [#5718](https://github.com/neondatabase/neon/issues/5718).	2025-02-03 11:54:07 +00:00
Arpad Müller	87ad50c925	storcon: use diesel-async again, now with tls support (#10614 ) Successor of #10280 after it was reverted in #10592. Re-introduce the usage of diesel-async again, but now also add TLS support so that we connect to the storcon database using TLS. By default, diesel-async doesn't support TLS, so add some code to make us explicitly request TLS. cc https://github.com/neondatabase/cloud/issues/23583	2025-02-03 11:53:51 +00:00
Alexander Bayandin	89b9f74077	CI(pre-merge-checks): do not run `conclusion` job for PRs (#10619 ) ## Problem While working on https://github.com/neondatabase/neon/pull/10617 I (unintentionally) merged the PR before the main CI pipeline has finished. I suspect this happens because we have received all the required job results from the pre-merge-checks workflow, which runs on PRs that include changes to relevant files. ## Summary of changes - Skip the `conclusion` job in `pre-merge-checks` workflows for PRs	2025-02-03 09:40:12 +00:00
John Spray	f071800979	tests: stabilize shard locations earlier in test_scrubber_tenant_snapshot (#10606 ) ## Problem This test would sometimes emit unexpected logs from the storage controller's requests to do migrations, which overlap with the test's restarts of pageservers, where those migrations are happening some time after a shard split as the controller moves load around. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10602/13067323736/index.html#testresult/f66f1329557a1fc5/retries ## Summary of changes - Do a reconcile_until_idle after shard split, so that the rest of the test doesn't run concurrently with migrations	2025-02-03 09:02:21 +00:00
Peter Bendel	4dfe60e2ad	revert https://github.com/neondatabase/neon/pull/10616 (#10631 ) ## Problem https://github.com/neondatabase/neon/pull/10616 was only intended temparily during the weekend, want to reset to prior state ## Summary of changes revert https://github.com/neondatabase/neon/pull/10616 but keep fixes in https://github.com/neondatabase/neon/pull/10622	2025-02-03 09:00:23 +00:00
Arpad Müller	8ae6f656a6	Don't require partial backup semaphore capacity for deletions (#10628 ) In the safekeeper, we block deletions on the timeline's gate closing, and any `WalResidentTimeline` keeps the gate open (because it owns a gate lock object). Thus, unless the `main_task` function of a partial backup doesn't return, we can't delete the associated timeline. In order to make these tasks exit early, we call the cancellation token of the timeline upon its shutdown. However, the partial backup task wasn't looking for the cancellation while waiting to acquire a partial backup permit. On a staging safekeeper we have been in a situation in the past where the semaphore was already empty for a duration of many hours, rendering all attempted deletions unable to proceed until a restart where the semaphore was reset: https://neondb.slack.com/archives/C03H1K0PGKH/p1738416586442029	2025-02-03 04:11:06 +00:00
Peter Bendel	b9e1a67246	fix generate matrix for olap for saturdays (#10622 ) ## Problem when introducing pg17 for job step `Generate matrix for OLAP benchmarks` I introduced a syntax error that only hits on Saturdays. ## Summary of changes Remove trailing comma ## successful test run https://github.com/neondatabase/neon/actions/runs/13086363907	2025-02-01 11:09:45 +00:00
Folke Behrens	6318828c63	Update rust to 1.84.1 (#10618 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.84.1/). Prior update was in https://github.com/neondatabase/neon/pull/10328. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-01-31 20:52:17 +00:00
Stefan Radig	6dd48ba148	feat(proxy): Implement access control with VPC endpoint checks and block for public internet / VPC (#10143 ) - Wired up filtering on VPC endpoints - Wired up block access from public internet / VPC depending on per project flag - Added cache invalidation for VPC endpoints (partially based on PR from Raphael) - Removed BackendIpAllowlist trait --------- Co-authored-by: Ivan Efremov <ivan@neon.tech>	2025-01-31 20:32:57 +00:00
Conrad Ludgate	ad1a41157a	feat(proxy): optimizing the chances of large write in copy_bidirectional (#10608 ) We forked copy_bidirectional to solve some issues like fast-shutdown (disallowing half-open connections) and to introduce better error tracking (which side of the conn closed down). A change recently made its way upstream offering performance improvements: https://github.com/tokio-rs/tokio/pull/6532. These seem applicable to our fork, thus it makes sense to apply them here as well.	2025-01-31 19:14:27 +00:00
Tristan Partin	fcd195c2b6	Migrate compute_ctl arg parsing to clap derive (#10497 ) The primary benefit is that all the ad hoc get_matches() calls are no longer necessary. Now all it takes to get at the CLI arguments is referencing a struct member. It's also great the we can replace the ad hoc CLI struct we had with this more formal solution. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-31 19:04:26 +00:00
Peter Bendel	bc7822d90c	temporarily disable some steps and run more often to expose more pgbench --initialize in benchmarking workflow (#10616 ) ## Problem we want to disable some steps in benchmarking workflow that do not initialize new projects and instead run the test more frequently Test run https://github.com/neondatabase/neon/actions/runs/13077737888	2025-01-31 18:41:17 +00:00
Alexander Bayandin	48c87dc458	CI(pre-merge-checks): fix condition (#10617 ) ## Problem Merge Queue fails if changes include Rust code. ## Summary of changes - Fix condition for `build-build-tools-image` - Add a couple of no-op `false \|\|` to make predicates look symmetric	2025-01-31 18:07:26 +00:00
John Spray	aedeb1c7c2	pageserver: revise logging of cancelled request results (#10604 ) ## Problem When a client dropped before a request completed, and a handler returned an ApiError, we would log that at error severity. That was excessive in the case of a request erroring on a shutdown, and could cause test flakes. example: https://neon-github-public-dev.s3.amazonaws.com/reports/main/13067651123/index.html#suites/ad9c266207b45eafe19909d1020dd987/6021ce86a0d72ae7/ ``` Cancelled request finished with an error: ShuttingDown ``` ## Summary of changes - Log a different info-level on ShuttingDown and ResourceUnavailable API errors from cancelled requests	2025-01-31 17:43:54 +00:00
John Spray	a93e9f22fc	pageserver: remove faulty debug assertion in compaction (#10610 ) ## Problem This assertion is incorrect: it is legal to see another shard's data at this point, after a shard split. Closes: https://github.com/neondatabase/neon/issues/10609 ## Summary of changes - Remove faulty assertion	2025-01-31 17:43:31 +00:00
JC Grünhage	10cf5e7a38	Move cargo-deny into a separate workflow on a schedule (#10289 ) ## Problem There are two (related) problems with the previous handling of `cargo-deny`: - When a new advisory is added to rustsec that affects a dependency, unrelated pull requests will fail. - New advisories rely on pushes or PRs to be surfaced. Problems that already exist on main will only be found if we try to merge new things into main. ## Summary of changes We split out `cargo-deny` into a separate workflow that runs on all PRs that touch `Cargo.lock`, and on a schedule on `main`, `release`, `release-compute` and `release-proxy` to find new advisories.	2025-01-31 13:42:59 +00:00
Arpad Müller	dce617fe07	Update to rebased rust-postgres (#10584 ) Update to a rebased version of our rust-postgres patches, rebased on [this](`98f5a11bc0`) commit this time. With #10280 reapplied, this means that the rust-postgres crates will be deduplicated, as the new crate versions are finally compatible with the requirements of diesel-async. Earlier update: #10561 rust-postgres PR: https://github.com/neondatabase/rust-postgres/pull/39	2025-01-31 12:40:20 +00:00
Alexander Bayandin	503bc72d31	CI: add `diesel print-schema` check (#10527 ) ## Problem We want to check that `diesel print-schema` doesn't generate any changes (`storage_controller/src/schema.rs`) in comparison with the list of migration. ## Summary of changes - Add `diesel_cli` to `build-tools` image - Add `Check diesel schema` step to `build-neon` job, at this stage we have all required binaries, so don't need to compile anything additionally - Check runs only on x86 release builds to be sure we do it at least once per CI run.	2025-01-31 11:48:46 +00:00
Fedor Dikarev	89cff08354	unify pg-build-nonroot-with-cargo base layer and config retries in curl (#10575 ) Ref: https://github.com/neondatabase/cloud/issues/23461 ## Problem Just made changes around and see these 2 base layers could be optimised. and after review comment from @myrrc setting up timeouts and retries in `alpine/curl` image ## Summary of changes	2025-01-31 11:46:33 +00:00
Erik Grinaker	afbcebe7f7	test_runner: force-compact in `test_sharding_autosplit` (#10605 ) ## Problem This test may not fully detect data corruption during splits, since we don't force-compact the entire keyspace. ## Summary of changes Force-compact all data in `test_sharding_autosplit`.	2025-01-31 11:31:58 +00:00
Arpad Müller	7d5c70c717	Update AWS SDK crates (#10588 ) We want to keep the AWS SDK up to date as that way we benefit from new developments and improvements. Prior update was in #10056	2025-01-31 11:23:12 +00:00
John Spray	f09cfd11cb	pageserver: exclude archived timelines from freeze+flush on shutdown (#10594 ) ## Problem If offloading races with normal shutdown, we get a "failed to freeze and flush: cannot flush frozen layers when flush_loop is not running, state is Exited". This is harmless but points to it being quite strange to try and freeze and flush such a timeline. flushing on shutdown for an archived timeline isn't useful. Related: https://github.com/neondatabase/neon/issues/10389 ## Summary of changes - During Timeline::shutdown, ignore ShutdownMode::FreezeAndFlush if the timeline is archived	2025-01-31 10:54:14 +00:00

1 2 3 4 5 ...

7113 Commits