rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 20:50:40 +00:00

Author	SHA1	Message	Date
BodoBolero	d0d29468d8	one more segfault in handling error messages	2025-02-06 17:43:00 +01:00
BodoBolero	567a665dc4	tried patch locally	2025-02-06 13:52:02 +01:00
BodoBolero	b0007302d0	include patches in repo	2025-02-06 13:37:00 +01:00
BodoBolero	b35dd198c3	fix pgcopydb seg fault and -c idle_in_transaction_session_timeout=0	2025-02-06 11:27:29 +01:00
Arpad Müller	b66fbd6176	Warn on basebackups for archived timelines (#10688 ) We don't want any external requests for an archived timeline. This includes basebackup requests, i.e. when a compute is being started up. Therefore, we'd like to forbid such basebackup requests: any attempt to get a basebackup on an archived timeline (or any getpage request really) is a cplane bug. Make this a warning for now so that, if there is potentially a bug, we can detect cases in the wild before they cause stuck operations, but the intention is to return an error eventually. Related: #9548	2025-02-06 10:09:20 +00:00
Vlad Lazar	95588dab98	safekeeper: fix wal fan-out shard subscription data race (#10677 ) ## Problem [This select arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L414) runs when we want to attach a new reader to the current cursor. It checks the current position of the cursor and resets it if required. The current position of the cursor is updated in the [other select arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L336-L345). That runs when we get some WAL to send. Now, what happens if we want to attach two shards consecutively to the cursor? Let's say [this select arm](https://github.com/neondatabase/neon/blob/main/safekeeper/src/send_interpreted_wal.rs#L397) runs twice in a row. Let's assume cursor is currently at LSN X. First shard wants to attach at position V and the other one at W. Assume X > W > V. First shard resets the stream to position V. Second shard comes in, sees stale cursor position X and resets it to W. This means that the first shard doesn't get wal in the [V, W) range. ## Summary of changes Ultimately, this boils down to the current position not being kept in sync with the reset of the WAL stream. This patch fixes the race by updating it when resetting the WAL stream and adds a unit test repro. Closes https://github.com/neondatabase/cloud/issues/23750	2025-02-06 09:24:28 +00:00
Christian Schwarz	1686d9e733	perf(page_service): dont `.instrument(span.clone())` the response flush (#10686 ) On my AX102 Hetzner box, removing this line removes about 20us from the `latency_mean` result in `test_pageserver_characterize_latencies_with_1_client_and_throughput_with_many_clients_one_tenant`. If the same 20us can be removed in the nightly benchmark run, this will be a ~10% improvement because there, mean latencies are about ~220us. This span was added during batching refactors, we didn't have it before, and I don't think it's terribly useful. refs - https://github.com/neondatabase/cloud/issues/21759	2025-02-06 08:33:37 +00:00
Erik Grinaker	abcd00181c	pageserver: set a concurrency limit for LocalFS (#10676 ) ## Problem The local filesystem backend for remote storage doesn't set a concurrency limit. While it can't/won't enforce a concurrency limit itself, this also bounds the upload queue concurrency. Some tests create thousands of uploads, which slows down the quadratic scheduling of the upload queue, and there is no point spawning that many Tokio tasks. Resolves #10409. ## Summary of changes Set a concurrency limit of 100 for the LocalFS backend. Before: `test_layer_map[release-pg17].test_query: 68.338 s` After: `test_layer_map[release-pg17].test_query: 5.209 s`	2025-02-06 07:24:36 +00:00
Konstantin Knizhnik	01f0be03b5	Fix bugs in lfc_cache_containsv (#10682 ) ## Problem Incorrect manipulations with iteration index in `lfc_cache_containsv` ## Summary of changes ``` - int this_chunk = Min(nblocks, BLOCKS_PER_CHUNK - chunk_offs); + int this_chunk = Min(nblocks - i, BLOCKS_PER_CHUNK - chunk_offs); int this_chunk = ``` - if (i + 1 >= nblocks) + if (i >= nblocks) ``` Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-06 07:00:00 +00:00
Konstantin Knizhnik	81cd30e4d6	Use #ifdef instead of #if USE_ASSERT_CHECKING (#10683 ) ## Problem USE_ASSERT _CHECKING is defined as empty entity. but it is checked using #if ## Summary of changes Replace `#if USE_ASSERT _CHECKING` with `#ifdef USE_ASSERT _CHECKING` as done in other places in Postgres Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-06 05:47:56 +00:00
Konstantin Knizhnik	7fc6953da4	Is neon superuser (#10625 ) ## Problem is_neon_superuser() fiunction is public in pg14/pg15 but statically defined in publicationcmd.c in pg16/pg17 ## Summary of changes Make this function public for all Postgres version. It is intended to be used not only in publicationcmd.c See https://github.com/neondatabase/postgres/pull/573 https://github.com/neondatabase/postgres/pull/576 --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2025-02-06 05:42:14 +00:00
Christian Schwarz	77f9e74d86	pgxn: include socket send & recv queue size in slow response logs (#10673 ) # Problem When we see an apparent slow request, one possible cause is that the client is failing to consume responses, but we don't have a clear way to see that. # Solution - Log the socket queue depths on slow/stuck connections, so that we have an indication of whether the compute is keeping up with processing the connection's responses. refs - slack https://neondb.slack.com/archives/C036U0GRMRB/p1738652644396329 - refs https://github.com/neondatabase/cloud/issues/23515 - refs https://github.com/neondatabase/cloud/issues/23486	2025-02-06 01:14:29 +00:00
Alex Chi Z.	0ceeec9be3	fix(pageserver): schedule compaction immediately if pending (#10684 ) ## Problem The code is intended to reschedule compaction immediately if there are pending tasks. We set the duration to 0 before if there are pending tasks, but this will go through the `if period == Duration::ZERO {` branch and sleep for another 10 seconds. ## Summary of changes Set duration to 1 so that it doesn't sleep for too long. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 22:11:50 +00:00
Alex Chi Z.	733a57247b	fix(pageserver): disallow gc-compaction produce l0 layer (#10679 ) ## Problem Any compaction should never produce l0 layers. This never happened in my experiments, but would be good to guard it early. ## Summary of changes Disallow gc-compaction to produce l0 layers. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 20:44:28 +00:00
Heikki Linnakangas	6699a30a49	Make it easy to build only a subset of extensions into compute image (#10655 ) The full build of all extensions takes a long time. When working locally on parts that don't need extensions, you can iterate more quickly by skipping the unnecessary extensions. This adds a build argument to the dockerfile to specify extensions to build. There are three options: - EXTENSIONS=all (default) - EXTENSIONS=minimal: Build only a few extensions that are listed in shared_preload_libraries in the default neon config. - EXTENSIONS=none: Build no extensions (except for the mandatory 'neon' extension).	2025-02-05 18:07:51 +00:00
Alex Chi Z.	133b89a83d	feat(pageserver): continue from last incomplete image layer creation (#10660 ) ## Problem close https://github.com/neondatabase/neon/issues/10651 ## Summary of changes * Image layer creation starts from the next partition of the last processed partition if the previous attempt was not complete. * Add tests. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-05 17:35:39 +00:00
Arseny Sher	fba22a7123	Record more timings in test_layer_map (#10670 ) ## Problem It it is not very clear how much time take different operations. ## Summary of changes Record more timings. ref https://github.com/neondatabase/neon/issues/10409	2025-02-05 17:00:26 +00:00
John Spray	14e05276a3	storcon: fix a case where optimise could get stuck on unschedulable node (#10648 ) ## Problem When a shard has two secondary locations, but one of them is on a node with MaySchedule::No, the optimiser would get stuck, because it couldn't decide which secondary to remove. This is generally okay if a node is offline, but if a node is in Pause mode for a long period of time, it's a problem. Closes: https://github.com/neondatabase/neon/issues/10646 ## Summary of changes - Instead of insisting on finding a node in the wrong AZ to remove, find an available node in the _right_ AZ, and remove all the others. This ensures that if there is one live suitable node, then other offline/paused nodes cannot hold things up.	2025-02-05 16:05:12 +00:00
Tristan Partin	ebc55e6ae8	Fix logic for checking if a compute can install a remote extension (#10656 ) Given a remote extensions manifest of the following: ```json { "public_extensions": [], "custom_extensions": null, "library_index": { "pg_search": "pg_search" }, "extension_data": { "pg_search": { "control_data": { "pg_search.control": "comment = 'pg_search: Full text search for PostgreSQL using BM25'\ndefault_version = '0.14.1'\nmodule_pathname = '$libdir/pg_search'\nrelocatable = false\nsuperuser = true\nschema = paradedb\ntrusted = true\n" }, "archive_path": "13117844657/v14/extensions/pg_search.tar.zst" } } } ``` We were allowing a compute to install a remote extension that wasn't listed in either public_extensions or custom_extensions. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-02-05 14:58:33 +00:00
Erik Grinaker	f07119cca7	pageserver: add `pageserver_wal_ingest_values_committed` metric (#10653 ) ## Problem We don't have visibility into the ratio of image vs. delta pages ingested in Pageservers. This might be useful to determine whether we should compress WAL records before storing them, which in turn might make compaction more efficient. ## Summary of changes Add `pageserver_wal_ingest_values_committed` metric with dimensions `class=metadata\|data` and `kind=image\|delta`.	2025-02-05 14:33:04 +00:00
Vlad Lazar	47975d06d9	storcon: silence cplane 404s on tenant creation (#10665 ) ## Problem We get WARN log noise on tenant creations. Cplane creates tenants via /location_config. That returns the attached locations in the response and spawns a reconciliation which will also attempt to notify cplane. If the notification is attempted before cplane persists the shards to its database, storcon gets back a 404. The situation is harmless, but annoying. ## Summary of Changes * Add a tenant creation hint to the reconciler config * If the hint is true and we get back a 404 on the notification from cplane, ignore the error, but still queue the reconcile up for a retry. Closes https://github.com/neondatabase/cloud/issues/20732	2025-02-05 12:41:09 +00:00
Fedor Dikarev	472007dd7c	ci: unify Dockerfiles, set bash as SHELL for debian layers, make cpan step as separate RUN (#10645 ) ## Problem Ref: https://github.com/neondatabase/cloud/issues/23461 and follow-up after: https://github.com/neondatabase/neon/pull/10553 we used `echo` to set-up `.wgetrc` and `.curlrc`, and there we used `\n` to make these multiline configs with one echo command. The problem is that Debian `/bin/sh`'s built-in echo command behaves differently from the `/bin/echo` executable and from the `echo` built-in in `bash`. Namely, it does not support the`-e` option, and while it does treat `\n` as a newline, passing `-e` here will add that `-e` to the output. At the same time, when we use different base images, for example `alpine/curl`, their `/bin/sh` supports and requires `-e` for treating escape sequences like `\n`. But having different `echo` and remembering difference in their behaviour isn't best experience for the developer and makes bad experience maintaining Dockerfiles. Work-arounds: - Explicitly use `/bin/bash` (like in this PR) - Use `/bin/echo` instead of the shell's built-in echo function - Use printf "foo\n" instead of echo -e "foo\n" ## Summary of changes 1. To fix that, we process with the option setting `/bin/bash` as a SHELL for the debian-baysed layers 2. With no changes for `alpine/curl` based layers. 3. And one more change here: in `extensions` layer split to the 2 steps: installing dependencies from `CPAN` and installing `lcov` from github, so upgrading `lcov` could reuse previous layer with installed cpan modules.	2025-02-04 18:58:02 +00:00
Vlad Lazar	f9009d6b80	pageserver: write heatmap to disk after uploading it (#10650 ) ## Problem We wish to make heatmap generation additive in https://github.com/neondatabase/neon/pull/10597. However, if the pageserver restarts and has a heatmap on disk from when it was a secondary long ago, we can end up keeping extra layers on the secondary's disk. ## Summary of changes Persist the heatmap after a successful upload.	2025-02-04 17:52:54 +00:00
Alex Chi Z.	cab60b6d9f	fix(pagesever): stablize gc-compaction tests (#10621 ) ## Problem Hopefully this can resolve https://github.com/neondatabase/neon/issues/10517. The reason why the test is flaky is that after restart the compute node might write some data so that the pageserver flush some layers, and in the end, causing L0 compaction to run, and we cannot get the test scenario as we want. ## Summary of changes Ensure all L0 layers are compacted before starting the test. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-04 16:11:31 +00:00
Erik Grinaker	06090bbccd	pageserver: log critical error on `ClearVmBits` for unknown pages (#10634 ) ## Problem In #9895, we fixed some issues where `ClearVmBits` were broadcast to all shards, even those not owning the VM relation. As part of that, we found some ancient code from #1417, which discarded spurious incorrect `ClearVmBits` records for pages outside of the VM relation. We added observability in #9911 to see how often this actually happens in the wild. After two months, we have not seen this happen once in production or staging. However, out of caution, we don't want a hard error and break WAL ingestion. Resolves #10067. ## Summary of changes Log a critical error when ingesting `ClearVmBits` for unknown VM relations or pages.	2025-02-04 14:55:11 +00:00
Folke Behrens	dcf335a251	proxy: Switch proxy to JSON logging (#9857 ) ## Problem We want to switch proxy and ideally all Rust services to structured JSON logging to support better filtering and cross-referencing with tracing. ## Summary of changes * Introduce a custom tracing-subscriber to write the JSON. In a first attempt a customized tracing::fmt::FmtSubscriber was used, but it's very inefficient and can still generate invalid JSON. It's also doesn't allow us to add important fields to the root object. * Make this opt in: the `LOGFMT` env var can be set to `"json"` to enable to new logger at startup.	2025-02-04 14:50:53 +00:00
Arpad Müller	b6e9daea9a	storcon: only allow errrors of the server cert verification (#10644 ) This PR does a bunch of things: * only allow errors of the server cert verification, not of the TLS handshake. The TLS handshake doesn't cause any errors for us so we can just always require it to be valid. This simplifies the code a little. * As the solution is more permanent than originally anticipated, I think it makes sense to move the `AcceptAll` verifier outside. * log the connstr information. this helps with figuring out which domain names are configured in the connstr, etc. I think it is generally useful to print it. make extra sure that the password is not leaked. Follow-up of #10640	2025-02-04 14:01:57 +00:00
a-masterov	d5c3a4e2b9	Add support for pgjwt test (#10611 ) ## Problem We don't currently test pgjwt, while it is based on pg_prove and can be easily added ## Summary of changes The test for pgjwt was added.	2025-02-04 13:49:44 +00:00
Heikki Linnakangas	8107140f7f	Refactor compute dockerfile (#10371 ) Refactor how extensions are built in compute Dockerfile 1. Rename some of the extension layers, so that names correspond more precisely to the upstream repository name and the source directory name. For example, instead of "pg-jsonschema-pg-build", spell it "pg_jsonschema-build". Some of the layer names had the extra "pg-" part, and some didn't; harmonize on not having it. And use an underscore if the upstream project name uses an underscore. 2. Each extension now consists of two dockerfile targets: [extension]-src and [extension]-build. By convention, the -src target downloads the sources and applies any neon-specific patches if necessary. The source tarball is downloaded and extracted under /ext-src. For example, the 'pgvector' extension creates the following files and directory: /ext-src/pgvector.tar.gz # original tarball /ext-src/pgvector.patch # neon-specific patch, copied from patches/ dir /ext-src/pgvector-src/ # extracted tarball, with patch applied This separation avoids re-downloading the sources every time the extension is recompiled. The 'extension-tests' target also uses the [extension]-src layers, by copying the /ext-src/ dirs from all the extensions together into one image. This refactoring came about when I was experimenting with different ways of splitting up the Dockerfile so that each extension would be in a separate file. That's not part of this PR yet, but this is a good step in modularizing the extensions.	2025-02-04 10:35:43 +00:00
Alex Chi Z.	e219d48bfe	refactor(pageserver): clearify compaction return value (#10643 ) ## Problem ## Summary of changes Make the return value of the set of compaction functions less confusing. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-02-03 21:56:55 +00:00
Alex Chi Z.	c1be84197e	feat(pageserver): preempt image layer generation if L0 piles up (#10572 ) ## Problem Image layer generation could block L0 compactions for a long time. ## Summary of changes * Refactored the return value of `create_image_layers_for_` functions to make it self-explainable. Preempt image layer generation in `Try` mode if L0 piles up. Note that we might potentially run into a state that only the beginning part of the keyspace gets image coverage. In that case, we either need to implement something to prioritize some keyspaces with image coverage, or tune the image_creation_threshold to ensure that the frequency of image creation could keep up with L0 compaction. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Erik Grinaker <erik@neon.tech>	2025-02-03 20:55:47 +00:00
dependabot[bot]	d80cbb2443	build(deps): bump openssl from 0.10.66 to 0.10.70 in /test_runner/pg_clients/rust/tokio-postgres in the cargo group across 1 directory (#10642 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-02-03 19:42:40 +00:00
Erik Grinaker	06b45fd0fd	utils/logging: add `critical!` macro and metric (#10641 ) ## Problem We don't currently have good alerts for critical errors, e.g. data loss/corruption. Touches #10094. ## Summary of changes Add a `critical!` macro and corresponding `libmetrics_tracing_event_count{level="critical"}` metric. This will: * Emit an `ERROR` log message with prefix `"CRITICAL:"` and a backtrace. * Increment `libmetrics_tracing_event_count{level="critical"}`, and indirectly `level="error"`. * Trigger a pageable alert (via the metric above). * In debug builds, panic the process. I'll add uses of the macro separately.	2025-02-03 19:23:12 +00:00
John Spray	715e20343a	storage controller: improve scheduling of tenants created in PlacementPolicy::Secondary (#10590 ) ## Problem I noticed when onboarding lots of tenants that the AZ scheduling violation stat was climbing, before falling later as optimisations happened. This was happening because we first add the tenant with PlacementPolicy::Secondary, and then later go to PlacementPolicy::Attached, and the scheduler's behavior led to a bad AZ choice: 1. Create a secondary location in the non-preferred AZ 2. Upgrade to Attached where we promote that non-preferred-AZ location to attached and then create another secondary 3. Optimiser later realises we're in the wrong AZ and moves us ## Summary of changes - Extend some logging to give more information about AZs - When scheduling secondary location in PlacementPolicy::Secondary, select it as if we were attached: in this mode, our business goal is to have a warm pageserver location that we can make available as attached quickly if needed, therefore we want it to be in the preferred AZ. - Make optimize_secondary logic the same, so that it will consider a secondary location in the preferred AZ to be optimal when in PlacementPolicy::Secondary - When transitioning to from PlacementPolicy::Attached(N) to PlacementPolicy::Secondary, instead of arbitrarily picking a location to keep, prefer to keep the location in the preferred AZ	2025-02-03 19:01:16 +00:00
Arpad Müller	c774f0a147	storcon db: allow accepting any TLS certificate (#10640 ) We encountered some TLS validation errors for the storcon since applying #10614. Add an option to downgrade them to logged errors instead to allow us to debug with more peace. cc issue https://github.com/neondatabase/cloud/issues/23583	2025-02-03 18:21:01 +00:00
Folke Behrens	628a9616c4	fix(proxy): Don't use --is-private-access-proxy to disable IP check (#10633 ) ## Problem * The behavior of this flag changed. Plus, it's not necessary to disable the IP check as long as there are no IPs listed in the local postgres. ## Summary of changes * Drop the flag from the command in the README.md section. * Change the postgres URL passed to proxy to not use the endpoint hostname. * Also swap postgres creation and proxy startup, so the DB is running when proxy comes up.	2025-02-03 14:12:41 +00:00
Alexander Bayandin	43682624b5	CI(pg-clients): fix logical replication tests (#10623 ) ## Problem Tests for logical replication (on Staging) have been failing for some time because logical replication is not enabled for them. This issue occurred after switching to an org API key with a different default setting, where logical replication was not enabled by default. ## Summary of changes - Add `enable_logical_replication` input to `actions/neon-project-create` - Enable logical replication in `test-logical-replication` job	2025-02-03 13:41:41 +00:00
Em Sharnoff	e617a3a075	vm-monitor: Improve error display (#10542 ) Logging errors with the debug format specifier causes multi-line errors, which are sometimes a pain to deal with. Instead, we should use anyhow's alternate display format, which shows the same information on a single line. Also adjusted a couple of error messages that were stale. Fixes neondatabase/cloud#14710.	2025-02-03 13:34:11 +00:00
Fedor Dikarev	23ca8b061b	Use actions/checkout for checkout (#10630 ) ## Problem 1. First of all it's more correct 2. Current usage allows ` Time-of-Check-Time-of-Use (TOCTOU) 'Pwn Request' vulnerabilities`. Please check security slack channel or reach me for more details. I will update PR description after merge. ## Summary of changes 1. Use `actions/checkout` with `ref: ${{ github.event.pull_request.head.sha }}` Discovered by and Co-author: @varunsh-coder	2025-02-03 12:55:48 +00:00
Anastasia Lubennikova	b1bc33eb4d	Fix logical_replication_sync test fixture (#10531 ) Fixes flaky test_lr_with_slow_safekeeper test #10242 Fix query to `pg_catalog.pg_stat_subscription` catalog to handle table synchronization and parallel LR correctly.	2025-02-03 12:44:47 +00:00
OBBO67	b1e451091a	pageserver: clean up references to timeline delete marker, uninit marker (#5718 ) (#10627 ) ## Problem Since [#5580](https://github.com/neondatabase/neon/pull/5580) the delete and uninit file markers are no longer needed. ## Summary of changes Remove the remaining code for the delete and uninit markers. Additionally removes the `ends_with_suffix` function as it is no longer required. Closes [#5718](https://github.com/neondatabase/neon/issues/5718).	2025-02-03 11:54:07 +00:00
Arpad Müller	87ad50c925	storcon: use diesel-async again, now with tls support (#10614 ) Successor of #10280 after it was reverted in #10592. Re-introduce the usage of diesel-async again, but now also add TLS support so that we connect to the storcon database using TLS. By default, diesel-async doesn't support TLS, so add some code to make us explicitly request TLS. cc https://github.com/neondatabase/cloud/issues/23583	2025-02-03 11:53:51 +00:00
Alexander Bayandin	89b9f74077	CI(pre-merge-checks): do not run `conclusion` job for PRs (#10619 ) ## Problem While working on https://github.com/neondatabase/neon/pull/10617 I (unintentionally) merged the PR before the main CI pipeline has finished. I suspect this happens because we have received all the required job results from the pre-merge-checks workflow, which runs on PRs that include changes to relevant files. ## Summary of changes - Skip the `conclusion` job in `pre-merge-checks` workflows for PRs	2025-02-03 09:40:12 +00:00
John Spray	f071800979	tests: stabilize shard locations earlier in test_scrubber_tenant_snapshot (#10606 ) ## Problem This test would sometimes emit unexpected logs from the storage controller's requests to do migrations, which overlap with the test's restarts of pageservers, where those migrations are happening some time after a shard split as the controller moves load around. Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10602/13067323736/index.html#testresult/f66f1329557a1fc5/retries ## Summary of changes - Do a reconcile_until_idle after shard split, so that the rest of the test doesn't run concurrently with migrations	2025-02-03 09:02:21 +00:00
Peter Bendel	4dfe60e2ad	revert https://github.com/neondatabase/neon/pull/10616 (#10631 ) ## Problem https://github.com/neondatabase/neon/pull/10616 was only intended temparily during the weekend, want to reset to prior state ## Summary of changes revert https://github.com/neondatabase/neon/pull/10616 but keep fixes in https://github.com/neondatabase/neon/pull/10622	2025-02-03 09:00:23 +00:00
Arpad Müller	8ae6f656a6	Don't require partial backup semaphore capacity for deletions (#10628 ) In the safekeeper, we block deletions on the timeline's gate closing, and any `WalResidentTimeline` keeps the gate open (because it owns a gate lock object). Thus, unless the `main_task` function of a partial backup doesn't return, we can't delete the associated timeline. In order to make these tasks exit early, we call the cancellation token of the timeline upon its shutdown. However, the partial backup task wasn't looking for the cancellation while waiting to acquire a partial backup permit. On a staging safekeeper we have been in a situation in the past where the semaphore was already empty for a duration of many hours, rendering all attempted deletions unable to proceed until a restart where the semaphore was reset: https://neondb.slack.com/archives/C03H1K0PGKH/p1738416586442029	2025-02-03 04:11:06 +00:00
Peter Bendel	b9e1a67246	fix generate matrix for olap for saturdays (#10622 ) ## Problem when introducing pg17 for job step `Generate matrix for OLAP benchmarks` I introduced a syntax error that only hits on Saturdays. ## Summary of changes Remove trailing comma ## successful test run https://github.com/neondatabase/neon/actions/runs/13086363907	2025-02-01 11:09:45 +00:00
Folke Behrens	6318828c63	Update rust to 1.84.1 (#10618 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://releases.rs/docs/1.84.1/). Prior update was in https://github.com/neondatabase/neon/pull/10328. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-01-31 20:52:17 +00:00
Stefan Radig	6dd48ba148	feat(proxy): Implement access control with VPC endpoint checks and block for public internet / VPC (#10143 ) - Wired up filtering on VPC endpoints - Wired up block access from public internet / VPC depending on per project flag - Added cache invalidation for VPC endpoints (partially based on PR from Raphael) - Removed BackendIpAllowlist trait --------- Co-authored-by: Ivan Efremov <ivan@neon.tech>	2025-01-31 20:32:57 +00:00
Conrad Ludgate	ad1a41157a	feat(proxy): optimizing the chances of large write in copy_bidirectional (#10608 ) We forked copy_bidirectional to solve some issues like fast-shutdown (disallowing half-open connections) and to introduce better error tracking (which side of the conn closed down). A change recently made its way upstream offering performance improvements: https://github.com/tokio-rs/tokio/pull/6532. These seem applicable to our fork, thus it makes sense to apply them here as well.	2025-01-31 19:14:27 +00:00

1 2 3 4 5 ...

7125 Commits