rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-06 21:12:55 +00:00

Author	SHA1	Message	Date
a-masterov	b8d47b5acf	Run the extensions' tests on staging (#11164 ) ## Problem We currently don't run end-to-end tests for PostgreSQL extensions on our cloud infrastructure, which means we might miss problems that only occur in a real cloud environment. ## Summary of changes - Added a workflow to run extension tests against a cloud staging instance - Set up proper project configuration for extension testing - Implemented test execution with appropriate environment settings - Added error handling and reporting for test failures --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2025-04-28 08:13:49 +00:00
Alexander Lakhin	97e01ae6fd	Add workflow to run particular test(s) N times (#11050 ) ## Problem Provide an easy way to run particular test(s) N times on CI. ## Summary of changes * Allow for passing the test selection and the number of test runs to the existing "Build and Test Locally" workflow * Allow for running multiple selected tests by the "Pytest regression tests" step * Introduce a new workflow to run specified test(s) several times * Store results in a separate database to distinguish between testing tests for stability and usual testing	2025-04-28 04:04:37 +00:00
StepSecurity Bot	902d361107	CI/CD Hardening: Fixing StepSecurity Flagged Issues (#11724 ) ### Summary I'm fixing one or more of the following CI/CD misconfigurations to improve security. Please feel free to leave a comment if you think the current permissions for the GITHUB_TOKEN should not be restricted so I can take a note of it as accepted behaviour. - Restrict permissions for GITHUB_TOKEN - Add step-security/harden-runner - Pin Actions to a full length commit SHA ### Security Fixes will fix https://github.com/neondatabase/cloud/issues/26141	2025-04-25 14:36:45 +00:00
Fedor Dikarev	7b03216dca	CI(check-macos-build): use gh native cache (#11707 ) ## Problem - using Hetzner buckets for cache requires secrets, we either need `secrets: inherit` to make it works - we don't have self-hosted MacOs runners, so actually GH native cache is more optimal solution there ## Summary of changes - switch to GH native cache for macos builds	2025-04-25 09:18:20 +00:00
Christian Schwarz	8afb783708	feat: Direct IO for the pageserver write path (#11558 ) # Problem The Pageserver read path exclusively uses direct IO if `virtual_file_io_mode=direct`. The write path is half-finished. Here is what the various writing components use: \|what\|buffering\|flags on <br/>`v_f_io_mode`<br/>=`buffered`\|flags on <br/>`virtual_file_io_mode`<br/>=`direct`\| \|-\|-\|-\|-\| \|`DeltaLayerWriter`\| BlobWriter<BUFFERED=true> \| () \| () \| \|`ImageLayerWriter`\| BlobWriter<BUFFERED=false> \| () \| () \| \|`download_layer_file`\|BufferedWriter\|()\|()\| \|`InMemoryLayer`\|BufferedWriter\|()\|O_DIRECT\| The vehicle towards direct IO support is `BufferedWriter` which - largely takes care of O_DIRECT alignment & size-multiple requirements - double-buffering to mask latency `DeltaLayerWriter`, `ImageLayerWriter` use `blob_io::BlobWriter` , which has neither of these. # Changes ## High-Level At a high-level this PR makes the following primary changes: - switch the two layer writer types to use `BufferedWriter` & make sensitive to `virtual_file_io_mode` (via open_with_options_v2) - make `download_layer_file` sensitive to `virtual_file_io_mode` (also via open_with_options_v2) - add `virtual_file_io_mode=direct-rw` as a feature gate - we're hackish-ly piggybacking on OpenOptions's ask for write access here - this means with just `=direct` InMemoryLayer reads and writes no longer uses O_DIRECT - this is transitory and we'll remove the `direct-rw` variant once the rollout is complete (The `_v2` APIs for opening / creating VirtualFile are those that are sensitive to `virtual_file_io_mode`) The result is: \|what\|uses <br/>`BufferedWriter`\|flags on <br/>`v_f_io_mode`<br/>=`buffered`\|flags on <br/>`v_f_io_mode`<br/>=`direct`\|flags on <br/>`v_f_io_mode`<br/>=`direct-rw`\| \|-\|-\|-\|-\|-\| \|`DeltaLayerWriter`\| ~~Blob~~BufferedWriter \| () \| () \| O_DIRECT \| \|`ImageLayerWriter`\| ~~Blob~~BufferedWriter \| () \| () \| O_DIRECT \| \|`download_layer_file`\|BufferedWriter\|()\|()\|O_DIRECT\| \|`InMemoryLayer`\|BufferedWriter\|()\|~~O_DIRECT~~()\|O_DIRECT\| ## Code-Level The main change is: - Switch `blob_io::BlobWriter` away from its own buffering method to use `BufferedWriter`. Additional prep for upholding `O_DIRECT` requirements: - Layer writer `finish()` methods switched to use IoBufferMut for guaranteed buffer address alignment. The size of the buffers is PAGE_SZ and thereby implicitly assumed to fulfill O_DIRECT requirements. For the hacky feature-gating via `=direct-rw`: - Track `OpenOptions::write(true\|false)` in a field; bunch of mechanical churn. - Consolidate the APIs in which we "open" or "create" VirtualFile for better overview over which parts of the code use the `_v2` APIs. Necessary refactorings & infra work: - Add doc comments explaining how BufferedWriter ensures that writes are compliant with O_DIRECT alignment & size constraints. This isn't new, but should be spelled out. - Add the concept of shutdown modes to `BufferedWriter::shutdown` to make writer shutdown adhere to these constraints. - The `PadThenTruncate` mode might not be necessary in practice because I believe all layer files ever written are sized in multiples `PAGE_SZ` and since `PAGE_SZ` is larger than the current alignment requirements (512/4k depending on platform), it won't be necesary to pad. - Some test (I believe `round_trip_test_compressed`?) required it though - [ ] TODO: decide if we want to accept that complexity; if we do then address TODO in the code to separate alignment requirement from buffer capacity - Add `set_len` (=`ftruncate`) VirtualFile operation to support the above. - Allow `BufferedWriter` to start at a non-zero offset (to make room for the summary block). Cleanups unlocked by this change: - Remove non-positional APIs from VirtualFile (e.g. seek, write_full, read_full) Drive-by fixes: - PR https://github.com/neondatabase/neon/pull/11585 aimed to run unit tests for all `virtual_file_io_mode` combinations but didn't because of a missing `_` in the env var. # Performance This section assesses this PR's impact on deployments with current production setting (`=direct`) and anticipated impact of switching to (`=direct-rw`). For `DeltaLayerWriter`, `=direct` should remain unchanged to slightly improved on throughput because the `BlobWriter`'s buffer had the same size as the `BufferedWriter`'s buffer, but it didn't have the double-buffering that `BufferedWriter` has. The `=direct-rw` enables direct IO; throughput should not be suffering because of double-buffering; benchmarks will show if this is true. The `ImageLayerWriter` was previously not doing any buffering (`BUFFERED=false`). It went straight to issuing the IO operation to the underlying VirtualFile and the buffering was done by the kernel. The switch to `BufferedWriter` under `=direct` adds an additional memcpy into the BufferedWriter's buffer. We will win back that memcpy when enabling direct IO via `=direct-rw`. A nice win from the switch to `BufferedWriter` is that ImageLayerWriter performs >=16x fewer write operations to VirtualFile (the BlobWriter performs one write per len field and one write per image value). This should save low tens of microseconds of CPU overhead from doing all these syscalls/io_uring operations, regardless of `=direct` or `=direct-rw`. Aside from problems with alignment, this write frequency without double-buffering is prohibitive if we actually have to wait for the disk, which is what will happen when we enable direct IO via (`=direct-rw`). Throughput should not be suffering because of BufferedWrite's double-buffering; benchmarks will show if this is true. `InMemoryLayer` at `=direct` will flip back to using buffered IO but remain on BufferedWriter. The buffered IO adds back one memcpy of CPU overhead. Throughput should not suffer and will might improve on not-memory-pressured Pageservers but let's remember that we're doing the whole direct IO thing to eliminate global memory pressure as a source of perf variability. ## bench_ingest I reran `bench_ingest` on `im4gn.2xlarge` and `Hetzner AX102`. Use `git diff` with `--word-diff` or similar to see the change. General guidance on interpretation: - immediate production impact of this PR without production config change can be gauged by comparing the same `io_mode=Direct` - end state of production switched over to `io_mode=DirectRw` can be gauged by comparing old results' `io_mode=Direct` to new results' `io_mode=DirectRw` Given above guidance, on `im4gn.2xlarge` - immediate impact is a significant improvement in all cases - end state after switching has same significant improvements in all cases - ... except `ingest/io_mode=DirectRw volume_mib=128 key_size_bytes=8192 key_layout=Sequential write_delta=Yes` which only achieves `238 MiB/s` instead of `253.43 MiB/s` - this is a 6% degradation - this workload is typical for image layer creation # Refs - epic https://github.com/neondatabase/neon/issues/9868 - stacked atop - preliminary refactor https://github.com/neondatabase/neon/pull/11549 - bench_ingest overhaul https://github.com/neondatabase/neon/pull/11667 - derived from https://github.com/neondatabase/neon/pull/10063 Co-authored-by: Yuchen Liang <yuchen@neon.tech>	2025-04-24 14:57:36 +00:00
Christian Schwarz	f8100d66d5	ci: extend 'Wait for extension build to finish' timeout (#11689 ) Refs - https://neondb.slack.com/archives/C059ZC138NR/p1745427571307149	2025-04-24 08:15:08 +00:00
JC Grünhage	3158442a59	fix(ci): set token for fast-forward failure comments and allow merging with state unstable (#11647 ) ## Problem https://github.com/neondatabase/neon/actions/runs/14538136318/job/40790985693?pr=11645 failed, even though the relevant parts of the CI had passed and auto-merge determined the PR is ready to merge. After that, commenting failed. ## Summary of changes - set GH_TOKEN for commenting after fast-forward failure - allow merging with mergeable_state unstable	2025-04-18 17:49:34 +00:00
JC Grünhage	f006879fb7	fix(ci): make regex to find rc branches less strict (#11646 ) ## Problem https://github.com/neondatabase/neon/actions/runs/14537161022/job/40787763965 failed to find the correct RC PR run, preventing artifact re-use. This broke in https://github.com/neondatabase/neon/pull/11547. There's a hotfix release containing this in https://github.com/neondatabase/neon/pull/11645. ## Summary of changes Make the regex for finding the RC PR run less strict, it was needlessly precise.	2025-04-18 16:39:18 +00:00
Alexander Bayandin	182bd95a4e	CI(regress-tests): run tests on `large-metal` (#11634 ) ## Problem Regression tests are more flaky on virtualised (`qemu-x64-*`) runners See https://neondb.slack.com/archives/C069Z2199DL/p1744891865307769 Ref https://github.com/neondatabase/neon/issues/11627 ## Summary of changes - Switch `regress-tests` to metal-only large runners to mitigate flaky behaviour	2025-04-18 01:25:38 +00:00
a-masterov	6c2e5c044c	random operations test (#10986 ) ## Problem We need to test the stability of Neon. ## Summary of changes The test runs random operations on a Neon project. It performs via the Public API calls the following operations: `create a branch`, `delete a branch`, `add a read-only endpoint`, `delete a read-only endpoint`, `restore a branch to a random position in the past`. All the branches and endpoints are loaded with `pgbench`. --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-04-17 19:59:35 +00:00
Christian Schwarz	2b041964b3	cover direct IO + concurrent IO in unit, regression & perf tests (#11585 ) This mirrors the production config. Thread that discusses the merits of this: - https://neondb.slack.com/archives/C033RQ5SPDH/p1744742010740569 # Refs - context https://neondb.slack.com/archives/C04BLQ4LW7K/p1744724844844589?thread_ts=1744705831.014169&cid=C04BLQ4LW7K - prep for https://github.com/neondatabase/neon/pull/11558 which adds new io mode `direct-rw` # Impact on CI turnaround time Spot-checking impact on CI timings - Baseline: [some recent main commit](https://github.com/neondatabase/neon/actions/runs/14471549758/job/40587837475) - Comparison: [this commit](https://github.com/neondatabase/neon/actions/runs/14471945087/job/40589613274) in this PR here Impact on CI turnaround time - Regression tests: - x64: very minor, sometimes better; likely in the noise - arm64: substantial 30min => 40min - Benchmarks (x86 only I think): very minor; noise seems higher than regress tests --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Alex Chi Z. <4198311+skyzh@users.noreply.github.com> Co-authored-by: Peter Bendel <peterbendel@neon.tech> Co-authored-by: Alex Chi Z <chi@neon.tech>	2025-04-17 15:53:10 +00:00
Alexander Bayandin	5819938c93	CI(pg-clients): fix workflow permissions (#11623 ) ## Problem `pg-clients` can't start: ``` The workflow is not valid. .github/workflows/pg-clients.yml (Line: 44, Col: 3): Error calling workflow 'neondatabase/neon/.github/workflows/build-build-tools-image.yml@aa19f10e7e958fbe0e0641f2e8c5952ce3be44b3'. The nested job 'check-image' is requesting 'packages: read', but is only allowed 'packages: none'. .github/workflows/pg-clients.yml (Line: 44, Col: 3): Error calling workflow 'neondatabase/neon/.github/workflows/build-build-tools-image.yml@aa19f10e7e958fbe0e0641f2e8c5952ce3be44b3'. The nested job 'build-image' is requesting 'packages: write', but is only allowed 'packages: none'. ``` ## Summary of changes - Grant required `packages: write` permissions to the workflow	2025-04-17 08:54:23 +00:00
Tristan Partin	9794f386f4	Make Postgres 17 the default version (#11619 ) This is mostly a documentation update, but a few updates with regard to neon_local, pageserver, and tests. 17 is our default for users in production, so dropping references to 16 makes sense. Signed-off-by: Tristan Partin <tristan@neon.tech> Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-16 23:23:37 +00:00
Alexander Bayandin	0f7c2cc382	CI(release): add time to RC PR branch names (#11547 ) ## Problem We can't have more than one open release PR created on the same day (due to non-unique enough branch names). ## Summary of changes - Add time (hours and minutes) to RC PR branch names - Also make sure we use UTC for releases	2025-04-15 15:08:05 +00:00
Alexander Bayandin	19bea5fd0c	CI: do not wait for tests to trigger deploy job (#11548 ) ## Problem There is too much delay between merging a PR into `main` and deploying the changes to staging ## Summary of changes - Trigger `deploy` job without waiting for `build-and-test-locally` job	2025-04-15 11:23:41 +00:00
Fedor Dikarev	9a6ace9bde	introduce new runners: unit-perf and use them for benchmark jobs (#11409 ) ## Problem Benchmarks results are inconsistent on existing small-metal runners ## Summary of changes Introduce new `unit-perf` runners, and lets run benchmark on them. The new hardware has slower, but consistent, CPU frequency - if run with default governor schedutil. Thus we needed to adjust some testcases' timeouts and add some retry steps where hard-coded timeouts couldn't be increased without changing the system under test. - [wait_for_last_record_lsn](`6592d69a67/test_runner/fixtures/pageserver/utils.py (L193)`) 1000s -> 2000s - [test_branch_creation_many](https://github.com/neondatabase/neon/pull/11409/files#diff-2ebfe76f89004d563c7e53e3ca82462e1d85e92e6d5588e8e8f598bbe119e927) 1000s - [test_ingest_insert_bulk](https://github.com/neondatabase/neon/pull/11409/files#diff-e90e685be4a87053bc264a68740969e6a8872c8897b8b748d0e8c5f683a68d9f) - with back throttling disabled compute becomes unresponsive for more than 60 seconds (PG hard-coded client authentication connection timeout) - [test_sharded_ingest](https://github.com/neondatabase/neon/pull/11409/files#diff-e8d870165bd44acb9a6d8350f8640b301c1385a4108430b8d6d659b697e4a3f1) 600s -> 1200s Right now there are only 2 runners of that class, and if we decide to go with them, we have to check how much that type of runners we need, so jobs not stuck with waiting for that type of runners available. However we now decided to run those runners with governor performance instead of schedutil. This achieves almost same performance as previous runners but still achieves consistent results for same commit Related issue to activate performance governor on these runners https://github.com/neondatabase/runner/pull/138 ## Verification that it helps ### analyze runtimes on new runner for same commit Table of runtimes for the same commit on different runners in [run](https://github.com/neondatabase/neon/actions/runs/14417589789) \| Run \| Benchmarks (1) \| Benchmarks (2) \|Benchmarks (3) \|Benchmarks (4) \| Benchmarks (5) \| \|--------\|--------\|---------\|---------\|---------\|---------\| \| 1 \| 1950.37s \| 6374.55s \| 3646.15s \| 4149.48s \| 2330.22s \| \| 2 \| - \| 6369.27s \| 3666.65s \| 4162.42s \| 2329.23s \| \| Delta % \| - \| 0,07 % \| 0,5 % \| 0,3 % \| 0,04 % \| \| with governor performance \| 1519.57s \| 4131.62s \| - \| - \| - \| \| second run gov. perf. \| 1513.62s \| 4134.67s \| - \| - \| - \| \| Delta % \| 0,3 % \| 0,07 % \| - \| - \| - \| \| speedup gov. performance \| 22 % \| 35 % \| - \| - \| - \| \| current desktop class hetzner runners (main) \| 1487.10s \| 3699.67s \| - \| - \| - \| \| slower than desktop class \| 2 % \| 12 % \| - \| - \| - \| In summary, the runtimes for the same commit on this hardware varies less than 1 %. --------- Co-authored-by: BodoBolero <peterbendel@neon.tech>	2025-04-15 08:21:44 +00:00
JC Grünhage	295be03a33	impr(ci): send clearer notifications to slack when retrying container image pushes (#11447 ) ## Problem We've started sending slack notifications for failed container image pushes that are being retried. There are more messages coming in than expected, so clicking through the link to see what image failed is happening more often than we hoped. ## Summary of changes - Make slack notifications clearer, including whether the job succeeded and what retries have happened. - Log failures/retries in step more clearly, so that you can easily see when something fails.	2025-04-04 14:56:41 +00:00
a-masterov	edc874e1b3	Use the same test image version as the computer one (#11448 ) ## Problem Changes in compute can cause errors in tests if another version of `neon-test-extensions` image is used. ## Summary of changes Use the same version of `neon-test-extensions` image as `compute` one for docker-compose based extension tests.	2025-04-04 10:13:00 +00:00
JC Grünhage	64a8d0c2e6	impr(ci): retry container image pushing and send slack messages for failures (#11416 ) ## Problem We've seen quite a few CI failures related to pushes to docker hub failing with weird error messages that indicate maybe docker hub is just not reliable. ## Summary of changes Retry container image pushing up to 10 times, and send a slack message if we had to retry, regardless of the job succeeding or not.	2025-04-03 08:59:42 +00:00
JC Grünhage	3c2bc5baba	fix(ci): run checks on release PRs (#11375 ) ## Problem Hotfix releases mean that sometimes changes in release PRs haven't been tested and linted yet. Disabling tests and lints is therefore not necessarily safe. In the future we will check whether tests have run on the same git tree already to speed things up, but for now we need to turn tests back on fully. This partially reverts: https://github.com/neondatabase/neon/pull/11272 ## Summary of changes Run checks on `.*-rc-pr` runs.	2025-04-02 14:32:53 +00:00
a-masterov	c179d098ef	Increase the timeout for extensions upgrade tests (on schedule). (#11406 ) ## Problem Sometimes the forced extension upgrade test fails (on schedule) due to a timeout. ## Summary of changes The timeout is increased to 60 mins.	2025-04-02 11:18:37 +00:00
Fedor Dikarev	00bcafe82e	chore(ci): upgrade stats action with docker images from ghcr.io (#11378 ) ## Problem Current version of GitHub Workflow Stats action pull docker images from DockerHub, that could be an issue with the new pull limits on DockerHub side. ## Summary of changes Switch to version `v0.2.2`, with docker images hosted on `ghcr.io`	2025-03-31 14:21:07 +00:00
JC Grünhage	5cb6a4bc8b	fix(ci): use the right sha in release PRs (#11365 ) ## Problem `github.sha` contains a merge commit of `head` and `base` if we're in a PR. In release PRs, this makes no sense, because we fast-forward the `base` branch to contain the changes from `head`. Even though we correctly use `${{ github.event.pull_request.head.sha \|\| github.sha }}` to reference the git commit when building artifacts, we don't use that when checking out code, because we want to test the merge of head and base usually. In the case of release PRs, we definitely always want to test on the head sha though, because we're going to forward that, and it already has the base sha as a parent, so the merge would end up with the same tree anyway. As a side effect, not checking out `${{ github.event.pull_request.head.sha \|\| github.sha }}` also caused https://github.com/neondatabase/neon/actions/runs/13986389780/job/39173256184#step:6:49 to say `release-tag=release-compute-8187`, while https://github.com/neondatabase/neon/actions/runs/14084613121/job/39445314780#step:6:48 is talking about `build-tag=release-compute-8186` ## Summary of changes Run a few things on `github.event.pull_request.head.sha`, if we're in a release PR.	2025-03-28 11:56:24 +00:00
Fedor Dikarev	1d5d168626	impr(ci): use hetzner buckets for cache (#11364 ) ## Problem Occasionally getting data from GH cache could be slow, with less than 10MB/s and taking 5+ minutes to download cache: ``` Received 20971520 of 2987085791 (0.7%), 9.9 MBs/sec Received 50331648 of 2987085791 (1.7%), 15.9 MBs/sec ... Received 1065353216 of 2987085791 (35.7%), 4.8 MBs/sec Received 1065353216 of 2987085791 (35.7%), 4.7 MBs/sec ... ``` https://github.com/neondatabase/neon/actions/runs/13956437454/job/39068664599#step:7:17 Resulting in getting cache even longer that build time. ## Summary of changes Switch to the caches, that are closer to the runners, and they provided stable throughput about 70-80MB/s	2025-03-27 11:11:45 +00:00
Folke Behrens	019a29748d	proxy: Move release PR creation to Tuesday (#11306 ) Move the creation of the proxy release PR to Tuesday mornings.	2025-03-19 16:45:29 +00:00
StepSecurity Bot	88ea855cff	fix(ci): Fixing StepSecurity Flagged Issues (#11311 ) This pull request is created by [StepSecurity](https://app.stepsecurity.io/securerepo) at the request of @areyou1or0. ## Summary This pull request is created by [StepSecurity](https://app.stepsecurity.io/securerepo) at the request of @areyou1or0. Please merge the Pull Request to incorporate the requested changes. Please tag @areyou1or0 on your message if you have any questions related to the PR. ## Summary This pull request is created by [StepSecurity](https://app.stepsecurity.io/securerepo) at the request of @areyou1or0. Please merge the Pull Request to incorporate the requested changes. Please tag @areyou1or0 on your message if you have any questions related to the PR. ## Security Fixes ### Least Privileged GitHub Actions Token Permissions The GITHUB_TOKEN is an automatically generated secret to make authenticated calls to the GitHub API. GitHub recommends setting minimum token permissions for the GITHUB_TOKEN. - [GitHub Security Guide](https://docs.github.com/en/actions/security-guides/automatic-token-authentication#using-the-github_token-in-a-workflow) - [The Open Source Security Foundation (OpenSSF) Security Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions) ### Pinned Dependencies GitHub Action tags and Docker tags are mutable. This poses a security risk. GitHub's Security Hardening guide recommends pinning actions to full length commit. - [GitHub Security Guide](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions) - [The Open Source Security Foundation (OpenSSF) Security Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies) ### Harden Runner [Harden-Runner](https://github.com/step-security/harden-runner) is an open-source security agent for the GitHub-hosted runner to prevent software supply chain attacks. It prevents exfiltration of credentials, detects tampering of source code during build, and enables running jobs without `sudo` access. See how popular open-source projects use Harden-Runner [here](https://docs.stepsecurity.io/whos-using-harden-runner). <details> <summary>Harden runner usage</summary> You can find link to view insights and policy recommendation in the build log <img src="https://github.com/step-security/harden-runner/blob/main/images/buildlog1.png?raw=true" width="60%" height="60%"> Please refer to [documentation](https://docs.stepsecurity.io/harden-runner) to find more details. </details> will fix https://github.com/neondatabase/cloud/issues/26141	2025-03-19 16:44:22 +00:00
JC Grünhage	aedeb37220	fix(ci): put the BUILD_TAG of the upcoming release into RC PR artifacts (#11304 ) ## Problem #11061 changed how artifacts for releases are built, by reusing/retagging the artifacts from release PRs. This resulted in the BUILD_TAG that's baked into the images to not be as expected. Context: https://neondb.slack.com/archives/C08JBTT3R1Q/p1742333300129069 ## Summary of changes Set BUILD_TAG to the release tag of the upcoming release when running inside release PRs.	2025-03-19 09:34:28 +00:00
JC Grünhage	99639c26b4	fix(ci): update build-tools image references (#11293 ) ## Problem https://github.com/neondatabase/neon/pull/11210 migrated pushing images to ghcr. Unfortunately, it was incomplete in using images from ghcr, which resulted in a few places referencing the ghcr build-tools image, while trying to use docker hub credentials. ## Summary of changes Use build-tools image from ghcr consistently.	2025-03-18 15:21:22 +00:00
JC Grünhage	eb6efda98b	impr(ci): move some kinds of tests to PR runs only (#11272 ) ## Problem The pipelines after release merges are slower than they need to be at the moment. This is because some kinds of tests/checks run on all kinds of pipelines, even though they only matter in some of those. ## Summary of changes Run `check-codestyle-{rust,python,jsonnet}`, `build-and-test-locally` and `trigger-e2e-tests` only on regular PRs, not release PR or pushes to main or release branches.	2025-03-18 13:49:34 +00:00
JC Grünhage	2dfff6a2a3	impr(ci): use ghcr.io as the default container registry (#11210 ) ## Problem Docker Hub has new rate limits coming up, and to avoid problems coming with those we're switching to GHCR. ## Summary of changes - Push images to GHCR initially and distribute them from there - Use images from GHCR in docker-compose	2025-03-18 11:30:49 +00:00
JC Grünhage	486ffeef6d	fix(ci): don't have neon-test-extensions release tag push depend on compute-node-image build (#11281 ) ## Problem Failures like https://github.com/neondatabase/neon/actions/runs/13901493608/job/38896940612?pr=11272 are caused by the dependency on `compute-node-image`, which was wrong on release jobs anyway. ## Summary of changes Remove dependency on `compute-node-image` from the job `add-release-tag-to-neon-test-extension-image`.	2025-03-17 16:31:49 +00:00
JC Grünhage	fdf04d4d81	fix(ci): use correct branch ref for checking whether this is a release merge queue (#11270 ) ## Problem https://github.com/neondatabase/neon/actions/runs/13894288475/job/38871819190 shows the "Add fast-fordward label to PR to trigger fast-forward merge" job being skipped. This is due to not using the right variable for checking which branch the merge queue is merging into. ## Summary of changes Use the `branch` output of the `meta` task for checking the target branch of a merge group.	2025-03-17 09:26:45 +00:00
Alexander Bayandin	136cae76c2	fix(ci): correct regex to detect release-compute RC PRs (#11269 ) ## Problem The regex in `_meta.yml` workflow doesn't detect RC PRs for compute releases: https://neondb.slack.com/archives/C059ZC138NR/p1742164884669389 ## Summary of changes - Fix regex --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech>	2025-03-17 07:25:12 +00:00
Peter Bendel	228bb75354	Extend large tenant OLTP workload ... (#11166 ) ... to better match the workload characteristics of real Neon customers ## Problem We analyzed workloads of large Neon users and want to extend the oltp workload to include characteristics seen in those workloads. ## Summary of changes - for re-use branch delete inserted rows from last run - adjust expected run-time (time-outs) in GitHub workflow - add queries that exposes the prefetch getpages path - add I/U/D transactions for another table (so far the workload was insert/append-only) - add an explicit vacuum analyze step and measure its time - add reindex concurrently step and measure its time (and take care that this step succeeds even if prior reindex runs have failed or were canceled) - create a second connection string for the pooled connection that removes the `-pooler` suffix from the hostname because we want to run long-running statements (database maintenance) and bypass the pooler which doesn't support unlimited statement timeout ## Test run https://github.com/neondatabase/neon/actions/runs/13851772887/job/38760172415	2025-03-16 14:04:48 +00:00
Cihan Demirci	a5b00b87ba	CI(pre-merge-checks): use step-security/changed-files (#11265 ) Use Step Security maintained version of `tj-actions/changed-files`. https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised#use-the-stepsecurity-maintained-changed-files-action	2025-03-16 13:53:27 +00:00
JC Grünhage	066b0a1be9	fix(ci): correctly push neon-test-extensions in releases and to ghcr (#11225 ) ## Problem `ef0d4a48a` adjusted how we build container images and how we push them, and the neon-test-extensions image was overlooked. Additionally, is was also missed in `1f0dea9a1`, which pushed our container images to GHCR. ## Summary of changes Push neon-test-extensions to GHCR and also push release tags for it.	2025-03-13 18:18:55 +00:00
JC Grünhage	89c7e4e917	fix(ci): use paranthesis for error handling in jq when fetching release PRs (#11217 ) ## Problem #11061 introduced code fetching previous releases. #11151 introduced jq error handling, which has also been applied in #11061, but parenthesis have been missed. ## Summary of changes Add parenthesis around error handling code.	2025-03-13 13:40:43 +00:00
JC Grünhage	afc9524bc7	fix(ci): run lint-release-pr on head-ref (#11206 ) ## Problem #11061 changed release pr creation, and I missed that the workflow will checkout a would-be-merge of the rc branch and the release branch instead of the head ref, unless explicitly instructed otherwise. ## Summary of changes Check out head ref for linting the release PRs.	2025-03-13 08:17:33 +00:00
JC Grünhage	507353404c	fix(ci): pass emtpy body when creating release PRs (#11203 ) ## Problem #11061 changed release pr creation, and I missed that creating PRs using `gh` in non-interactive environments requires `--body` instead of defaulting to an empty body. ## Summary of changes Explicitly set an empty body when creating release PRs.	2025-03-12 23:54:43 +00:00
JC Grünhage	48be4df3f3	fix(ci): fetch all refs in release PR creation (#11201 ) ## Problem #11061 changed release PR creation, and I missed that we need to explicitly fetch the whole history so that the relevant git refs and objects are available. ## Summary of changes - Fetch all git refs including history by setting fetch-depth to 0 - Reference release branch as a remote branch, because we haven't checked it out locally	2025-03-12 22:32:38 +00:00
JC Grünhage	ef0d4a48a8	Reuse artifacts from release PRs (#11061 ) ## Problem When we release our components, we perform builds in the release PR, then test the components, then merge the PR, and then build everything again, run tests again, and only then start deployments. To speed things up, we want to perform builds and run tests in the PR, and start deployments using the existing artifacts from the release PR. To make that possible, we need to have both CI pipelines running on the same commit hash, which requires fast forwarding release. That only works, if we have a commit in the PR that has the current release branch state as an ancestor. ## Summary of changes - Changes to release PR creation: - Remove templates and automatic bodies for release PRs. The previous template wasn't used anymore, and the automatic body we created in the pipeline didn't contain any useful content anymore after the changees here. - Make it possible to select the source branch. For releases that aren't cut from `main`, like https://github.com/neondatabase/neon/pull/11051, we need a way to trigger the new flow from a different branch. - Determine `release-branch` automatically from the component name instead of passing that as well. - Changes to the merge queue job: - Rename `get-changed-files` to `meta` in preparation of additional data being fetched as part of that job - Fail the merge queue if we're trying to merge into a branch other than main - this is to prevent non-fast-forward merges. - Label PRs to branches other than main as `fast-forward`, to trigger the fast-forward job - Add a fast-forward job that can be triggered with the `fast-forward` label that performs a fast-forward merge. This only happens if the PR has `mergeable_state == clean`, so CI having passed. - Build and Test on releases now skips building images, skips testing images and skips triggering e2e tests. We add new tags to the images from the release PR to tag them as release images, and we push them to the prod registries.	2025-03-12 21:00:59 +00:00
JC Grünhage	e8396034ac	fix(ci): fail meta using jq halt_error if data is unexpectedly missing (#11151 ) ## Problem When the githb API is having problems, we might not get data back, and are happily setting vars as empty. This causes problems down the line. See https://github.com/neondatabase/neon/actions/runs/13718859397/job/38381946590?pr=11132#step:5:1 for example. ## Summary of changes Fail the `meta` job if we don't get expected data back from github.	2025-03-11 22:59:30 +00:00
Christian Schwarz	083a30b1e2	storage broker: disable deploy by default (#11172 ) context - https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222 - companion infra.git PR: https://github.com/neondatabase/infra/pull/3249	2025-03-11 19:45:06 +00:00
JC Grünhage	f17931870f	fix(ci): use <!subteam^ID> syntax for pinging groups on slack (#11135 ) ## Problem Pinging groups on slack didn't work, because I didn't use the correct syntax. ## Summary of changes Use the correct syntax for pinging groups.	2025-03-10 13:27:23 +00:00
Alexander Lakhin	a4ce20db5c	Support workflow_dispatch event in _meta.yml (#11133 ) ## Problem Allow for using _meta.yml with workflow_dispatch event. ## Summary of changes Handle this event in the run-kind step; fix and update the description of the run-kind output.	2025-03-07 15:00:06 +00:00
Peter Bendel	3bb318a295	run periodic page bench more frequently to simplify bi-secting regressions (#11121 ) ## Problem When periodic pagebench runs only once a day a lot of commits can be in between a good run and a regression. ## Summary of changes Run the workflow every 3 hours	2025-03-06 17:47:54 +00:00
JC Grünhage	94e6897ead	fix(ci): make deploy job depend on pushing images to dev registries (#11089 ) ## Problem If an image fails to push to dev registries, we shouldn't trigger the deploy job, because that depends on images existing in dev registries. To ensure this is the case, the deploy job needs to depend on pushing to dev registries. ## Summary of changes Make `deploy` depend on `push-neon-image-dev` and `push-compute-image-dev`.	2025-03-05 14:28:43 +00:00
Peter Bendel	906d7468cc	exclude separate perf tests from bench step (#11084 ) ## Problem Our benchmarking workflow has a job step `bench`which runs all tests in test_runner/performance/* except those that we want to run separately. We recently added two test cases to that testcase directory that we want to run separately but forgot to ignore them during the bench step. This is now causing [failures](https://github.com/neondatabase/neon/actions/runs/13667689340/job/38212087331#step:7:392). ## Summary of changes Ignore the separately run tests in the bench step.	2025-03-05 10:14:51 +00:00
Peter Bendel	f62ddb11ed	Distinguish manually submitted runs for periodic pagebench in grafana dashboard (#11079 ) ## Problem Periodic pagebench workflow runs periodically from latest main commit and also allows to dispatch it manually for a given commit hash to bi-sect regressions. However in the dashboards we can not distinguish manual runs from periodic runs which makes it harder to follow the trend. ## Summary of changes Send an additional flag commit type to the benchmark runner instance to distinguish the run type. Note: this needs a follow-up PR on the receiving side.	2025-03-04 18:11:43 +00:00
Alexander Lakhin	9a4e2eab61	Fix artifact name for build with sanitizers (#11066 ) ## Problem When a build is made with sanitizers, this is not reflected in the artifact name, which can lead to overriding normal builds with sanitized ones. ## Summary of changes Take this property of a build into account when constructing the artifact name.	2025-03-03 18:00:53 +00:00

1 2 3 4 5 ...

681 Commits