Commit Graph

680 Commits

Author SHA1 Message Date
Alexey Masterov
02896e3d09 Change the Tag 2025-04-14 17:24:43 +02:00
Alexey Masterov
2ec7630252 change tag 2025-04-14 14:46:50 +02:00
Alexey Masterov
d904e56e80 Add default_endpoint_settings 2025-04-14 14:07:27 +02:00
Alexey Masterov
9a89a8e9ee Add ADMIN API KEY 2025-04-14 10:40:08 +02:00
Alexey Masterov
307899408f Fix pg_graphql 2025-04-11 15:35:28 +02:00
Alexey Masterov
ecb059d5f7 Add skip 2025-04-08 14:47:19 +02:00
Alexey Masterov
d912f05e36 Add print for regression diffs 2025-04-08 13:55:26 +02:00
Alexey Masterov
8378b1c603 Fix the shellcheck warning 2025-04-07 14:48:32 +02:00
Alexey Masterov
154d215c78 Fix the shellcheck warning 2025-04-07 14:46:51 +02:00
Alexey Masterov
0b10a5336c Workflows refactoring 2025-04-07 14:35:02 +02:00
Alexey Masterov
5c36f3786d Add project settings 2025-04-04 15:28:25 +02:00
Alexey Masterov
f3e928d781 Do not use cloud-regress.yml in this branch 2025-04-04 12:44:29 +02:00
Alexey Masterov
d6de25f85f add an env var 2025-04-01 11:52:33 +02:00
Alexey Masterov
e43968bc95 add a tag for debug only 2025-04-01 11:41:58 +02:00
Alexey Masterov
27d6ebabb3 add checkout 2025-04-01 11:17:28 +02:00
Alexey Masterov
fa6b4ed659 fix pg-version 2025-04-01 11:14:41 +02:00
Alexey Masterov
3c7a526971 change runs-on 2025-04-01 11:13:09 +02:00
Alexey Masterov
da01a57759 fix 2025-04-01 11:02:11 +02:00
Alexey Masterov
9c2e296af7 Debug only 2025-04-01 11:00:12 +02:00
Alexey Masterov
c92dc09104 create a project in the workflow 2025-04-01 10:53:16 +02:00
Alexey Masterov
b41bca9f58 Add a workflow 2025-03-31 17:01:02 +02:00
JC Grünhage
5cb6a4bc8b fix(ci): use the right sha in release PRs (#11365)
## Problem

`github.sha` contains a merge commit of `head` and `base` if we're in a
PR. In release PRs, this makes no sense, because we fast-forward the
`base` branch to contain the changes from `head`.

Even though we correctly use `${{ github.event.pull_request.head.sha ||
github.sha }}` to reference the git commit when building artifacts, we
don't use that when checking out code, because we want to test the merge
of head and base usually. In the case of release PRs, we definitely
always want to test on the head sha though, because we're going to
forward that, and it already has the base sha as a parent, so the merge
would end up with the same tree anyway.

As a side effect, not checking out `${{
github.event.pull_request.head.sha || github.sha }}` also caused
https://github.com/neondatabase/neon/actions/runs/13986389780/job/39173256184#step:6:49
to say `release-tag=release-compute-8187`, while
https://github.com/neondatabase/neon/actions/runs/14084613121/job/39445314780#step:6:48
is talking about `build-tag=release-compute-8186`

## Summary of changes
Run a few things on `github.event.pull_request.head.sha`, if we're in a
release PR.
2025-03-28 11:56:24 +00:00
Fedor Dikarev
1d5d168626 impr(ci): use hetzner buckets for cache (#11364)
## Problem
Occasionally getting data from GH cache could be slow, with less than
10MB/s and taking 5+ minutes to download cache:
```
Received 20971520 of 2987085791 (0.7%), 9.9 MBs/sec
Received 50331648 of 2987085791 (1.7%), 15.9 MBs/sec
...
Received 1065353216 of 2987085791 (35.7%), 4.8 MBs/sec
Received 1065353216 of 2987085791 (35.7%), 4.7 MBs/sec
...
```

https://github.com/neondatabase/neon/actions/runs/13956437454/job/39068664599#step:7:17

Resulting in getting cache even longer that build time.

## Summary of changes
Switch to the caches, that are closer to the runners, and they provided
stable throughput about 70-80MB/s
2025-03-27 11:11:45 +00:00
Folke Behrens
019a29748d proxy: Move release PR creation to Tuesday (#11306)
Move the creation of the proxy release PR to Tuesday mornings.
2025-03-19 16:45:29 +00:00
StepSecurity Bot
88ea855cff fix(ci): Fixing StepSecurity Flagged Issues (#11311)
This pull request is created by
[StepSecurity](https://app.stepsecurity.io/securerepo) at the request of
@areyou1or0.
 ## Summary

This pull request is created by
[StepSecurity](https://app.stepsecurity.io/securerepo) at the request of
@areyou1or0. Please merge the Pull Request to incorporate the requested
changes. Please tag @areyou1or0 on your message if you have any
questions related to the PR.
## Summary

This pull request is created by
[StepSecurity](https://app.stepsecurity.io/securerepo) at the request of
@areyou1or0. Please merge the Pull Request to incorporate the requested
changes. Please tag @areyou1or0 on your message if you have any
questions related to the PR.

## Security Fixes

### Least Privileged GitHub Actions Token Permissions

The GITHUB_TOKEN is an automatically generated secret to make
authenticated calls to the GitHub API. GitHub recommends setting minimum
token permissions for the GITHUB_TOKEN.

- [GitHub Security
Guide](https://docs.github.com/en/actions/security-guides/automatic-token-authentication#using-the-github_token-in-a-workflow)
- [The Open Source Security Foundation (OpenSSF) Security
Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions)
### Pinned Dependencies

GitHub Action tags and Docker tags are mutable. This poses a security
risk. GitHub's Security Hardening guide recommends pinning actions to
full length commit.

- [GitHub Security
Guide](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions)
- [The Open Source Security Foundation (OpenSSF) Security
Guide](https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies)
### Harden Runner

[Harden-Runner](https://github.com/step-security/harden-runner) is an
open-source security agent for the GitHub-hosted runner to prevent
software supply chain attacks. It prevents exfiltration of credentials,
detects tampering of source code during build, and enables running jobs
without `sudo` access. See how popular open-source projects use
Harden-Runner
[here](https://docs.stepsecurity.io/whos-using-harden-runner).

<details>
<summary>Harden runner usage</summary>

You can find link to view insights and policy recommendation in the
build log

<img
src="https://github.com/step-security/harden-runner/blob/main/images/buildlog1.png?raw=true"
width="60%" height="60%">

Please refer to
[documentation](https://docs.stepsecurity.io/harden-runner) to find more
details.
</details>



will fix https://github.com/neondatabase/cloud/issues/26141
2025-03-19 16:44:22 +00:00
JC Grünhage
aedeb37220 fix(ci): put the BUILD_TAG of the upcoming release into RC PR artifacts (#11304)
## Problem
#11061 changed how artifacts for releases are built, by
reusing/retagging the artifacts from release PRs. This resulted in the
BUILD_TAG that's baked into the images to not be as expected.
Context: https://neondb.slack.com/archives/C08JBTT3R1Q/p1742333300129069

## Summary of changes
Set BUILD_TAG to the release tag of the upcoming release when running
inside release PRs.
2025-03-19 09:34:28 +00:00
JC Grünhage
99639c26b4 fix(ci): update build-tools image references (#11293)
## Problem
https://github.com/neondatabase/neon/pull/11210 migrated pushing images
to ghcr. Unfortunately, it was incomplete in using images from ghcr,
which resulted in a few places referencing the ghcr build-tools image,
while trying to use docker hub credentials.

## Summary of changes
Use build-tools image from ghcr consistently.
2025-03-18 15:21:22 +00:00
JC Grünhage
eb6efda98b impr(ci): move some kinds of tests to PR runs only (#11272)
## Problem
The pipelines after release merges are slower than they need to be at
the moment. This is because some kinds of tests/checks run on all kinds
of pipelines, even though they only matter in some of those.

## Summary of changes
Run `check-codestyle-{rust,python,jsonnet}`, `build-and-test-locally`
and `trigger-e2e-tests` only on regular PRs, not release PR or pushes to
main or release branches.
2025-03-18 13:49:34 +00:00
JC Grünhage
2dfff6a2a3 impr(ci): use ghcr.io as the default container registry (#11210)
## Problem
Docker Hub has new rate limits coming up, and to avoid problems coming
with those we're switching to GHCR.

## Summary of changes
- Push images to GHCR initially and distribute them from there
- Use images from GHCR in docker-compose
2025-03-18 11:30:49 +00:00
JC Grünhage
486ffeef6d fix(ci): don't have neon-test-extensions release tag push depend on compute-node-image build (#11281)
## Problem
Failures like
https://github.com/neondatabase/neon/actions/runs/13901493608/job/38896940612?pr=11272
are caused by the dependency on `compute-node-image`, which was wrong on
release jobs anyway.

## Summary of changes
Remove dependency on `compute-node-image` from the job
`add-release-tag-to-neon-test-extension-image`.
2025-03-17 16:31:49 +00:00
JC Grünhage
fdf04d4d81 fix(ci): use correct branch ref for checking whether this is a release merge queue (#11270)
## Problem

https://github.com/neondatabase/neon/actions/runs/13894288475/job/38871819190
shows the "Add fast-fordward label to PR to trigger fast-forward merge"
job being skipped. This is due to not using the right variable for
checking which branch the merge queue is merging into.

## Summary of changes
Use the `branch` output of the `meta` task for checking the target
branch of a merge group.
2025-03-17 09:26:45 +00:00
Alexander Bayandin
136cae76c2 fix(ci): correct regex to detect release-compute RC PRs (#11269)
## Problem
The regex in `_meta.yml` workflow doesn't detect RC PRs for compute
releases:
https://neondb.slack.com/archives/C059ZC138NR/p1742164884669389

## Summary of changes
- Fix regex

---------

Co-authored-by: Peter Bendel <peterbendel@neon.tech>
2025-03-17 07:25:12 +00:00
Peter Bendel
228bb75354 Extend large tenant OLTP workload ... (#11166)
... to better match the workload characteristics of real Neon customers

## Problem

We analyzed workloads of large Neon users and want to extend the oltp
workload to include characteristics seen in those workloads.

## Summary of changes

- for re-use branch delete inserted rows from last run
- adjust expected run-time (time-outs) in GitHub workflow
- add queries that exposes the prefetch getpages path
- add I/U/D transactions for another table (so far the workload was
insert/append-only)
- add an explicit vacuum analyze step and measure its time
- add reindex concurrently step and measure its time (and take care that
this step succeeds even if prior reindex runs have failed or were
canceled)
- create a second connection string for the pooled connection that
removes the `-pooler` suffix from the hostname because we want to run
long-running statements (database maintenance) and bypass the pooler
which doesn't support unlimited statement timeout

## Test run


https://github.com/neondatabase/neon/actions/runs/13851772887/job/38760172415
2025-03-16 14:04:48 +00:00
Cihan Demirci
a5b00b87ba CI(pre-merge-checks): use step-security/changed-files (#11265)
Use Step Security maintained version of `tj-actions/changed-files`.

https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised#use-the-stepsecurity-maintained-changed-files-action
2025-03-16 13:53:27 +00:00
JC Grünhage
066b0a1be9 fix(ci): correctly push neon-test-extensions in releases and to ghcr (#11225)
## Problem
ef0d4a48a adjusted how we build container images and how we push them,
and the neon-test-extensions image was overlooked. Additionally, is was
also missed in 1f0dea9a1, which pushed our container images to GHCR.

## Summary of changes
Push neon-test-extensions to GHCR and also push release tags for it.
2025-03-13 18:18:55 +00:00
JC Grünhage
89c7e4e917 fix(ci): use paranthesis for error handling in jq when fetching release PRs (#11217)
## Problem
#11061 introduced code fetching previous releases. #11151 introduced jq
error handling, which has also been applied in #11061, but parenthesis
have been missed.

## Summary of changes
Add parenthesis around error handling code.
2025-03-13 13:40:43 +00:00
JC Grünhage
afc9524bc7 fix(ci): run lint-release-pr on head-ref (#11206)
## Problem
#11061 changed release pr creation, and I missed that the workflow will
checkout a would-be-merge of the rc branch and the release branch
instead of the head ref, unless explicitly instructed otherwise.

## Summary of changes
Check out head ref for linting the release PRs.
2025-03-13 08:17:33 +00:00
JC Grünhage
507353404c fix(ci): pass emtpy body when creating release PRs (#11203)
## Problem
#11061 changed release pr creation, and I missed that creating PRs using
`gh` in non-interactive environments *requires* `--body` instead of
defaulting to an empty body.

## Summary of changes
Explicitly set an empty body when creating release PRs.
2025-03-12 23:54:43 +00:00
JC Grünhage
48be4df3f3 fix(ci): fetch all refs in release PR creation (#11201)
## Problem
#11061 changed release PR creation, and I missed that we need to
explicitly fetch the whole history so that the relevant git refs and
objects are available.

## Summary of changes
- Fetch all git refs including history by setting fetch-depth to 0
- Reference release branch as a remote branch, because we haven't
checked it out locally
2025-03-12 22:32:38 +00:00
JC Grünhage
ef0d4a48a8 Reuse artifacts from release PRs (#11061)
## Problem
When we release our components, we perform builds in the release PR,
then test the components, then merge the PR, and then build everything
*again*, run tests *again*, and only then start deployments.

To speed things up, we want to perform builds and run tests in the PR,
and start deployments using the existing artifacts from the release PR.

To make that possible, we need to have both CI pipelines running on the
same commit hash, which requires fast forwarding release. That only
works, if we have a commit in the PR that has the current release branch
state as an ancestor.

## Summary of changes
- Changes to release PR creation:
- Remove templates and automatic bodies for release PRs. The previous
template wasn't used anymore, and the automatic body we created in the
pipeline didn't contain any useful content anymore after the changees
here.
- Make it possible to select the source branch. For releases that aren't
cut from `main`, like https://github.com/neondatabase/neon/pull/11051,
we need a way to trigger the new flow from a different branch.
- Determine `release-branch` automatically from the component name
instead of passing that as well.
- Changes to the merge queue job:
- Rename `get-changed-files` to `meta` in preparation of additional data
being fetched as part of that job
- Fail the merge queue if we're trying to merge into a branch other than
main - this is to prevent non-fast-forward merges.
- Label PRs to branches other than main as `fast-forward`, to trigger
the fast-forward job
- Add a fast-forward job that can be triggered with the `fast-forward`
label that performs a fast-forward merge. This only happens if the PR
has `mergeable_state == clean`, so CI having passed.
- Build and Test on releases now skips building images, skips testing
images and skips triggering e2e tests. We add new tags to the images
from the release PR to tag them as release images, and we push them to
the prod registries.
2025-03-12 21:00:59 +00:00
JC Grünhage
e8396034ac fix(ci): fail meta using jq halt_error if data is unexpectedly missing (#11151)
## Problem
When the githb API is having problems, we might not get data back, and
are happily setting vars as empty. This causes problems down the line.
See
https://github.com/neondatabase/neon/actions/runs/13718859397/job/38381946590?pr=11132#step:5:1
for example.

## Summary of changes
Fail the `meta` job if we don't get expected data back from github.
2025-03-11 22:59:30 +00:00
Christian Schwarz
083a30b1e2 storage broker: disable deploy by default (#11172)
context
-
https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222
- companion infra.git PR:
https://github.com/neondatabase/infra/pull/3249
2025-03-11 19:45:06 +00:00
JC Grünhage
f17931870f fix(ci): use <!subteam^ID> syntax for pinging groups on slack (#11135)
## Problem
Pinging groups on slack didn't work, because I didn't use the correct
syntax.

## Summary of changes
Use the correct syntax for pinging groups.
2025-03-10 13:27:23 +00:00
Alexander Lakhin
a4ce20db5c Support workflow_dispatch event in _meta.yml (#11133)
## Problem
Allow for using _meta.yml with workflow_dispatch event.

## Summary of changes
Handle this event in the run-kind step; fix and update the description
of the run-kind output.
2025-03-07 15:00:06 +00:00
Peter Bendel
3bb318a295 run periodic page bench more frequently to simplify bi-secting regressions (#11121)
## Problem

When periodic pagebench runs only once a day a lot of commits can be in
between a good run and a regression.

## Summary of changes

Run the workflow every 3 hours
2025-03-06 17:47:54 +00:00
JC Grünhage
94e6897ead fix(ci): make deploy job depend on pushing images to dev registries (#11089)
## Problem
If an image fails to push to dev registries, we shouldn't trigger the
deploy job, because that depends on images existing in dev registries.
To ensure this is the case, the deploy job needs to depend on pushing to
dev registries.

## Summary of changes

Make `deploy` depend on `push-neon-image-dev` and
`push-compute-image-dev`.
2025-03-05 14:28:43 +00:00
Peter Bendel
906d7468cc exclude separate perf tests from bench step (#11084)
## Problem

Our benchmarking workflow has a job step `bench`which runs all tests in
test_runner/performance/* except those that we want to run separately.
We recently added two test cases to that testcase directory that we want
to run separately but forgot to ignore them during the bench step. This
is now causing
[failures](https://github.com/neondatabase/neon/actions/runs/13667689340/job/38212087331#step:7:392).

## Summary of changes

Ignore the separately run tests in the bench step.
2025-03-05 10:14:51 +00:00
Peter Bendel
f62ddb11ed Distinguish manually submitted runs for periodic pagebench in grafana dashboard (#11079)
## Problem

Periodic pagebench workflow runs periodically from latest main commit
and also allows to dispatch it manually for a given commit hash to
bi-sect regressions.
However in the dashboards we can not distinguish manual runs from
periodic runs which makes it harder to follow the trend.

## Summary of changes

Send an additional flag commit type to the benchmark runner instance to
distinguish the run type.

Note: this needs a follow-up PR on the receiving side.
2025-03-04 18:11:43 +00:00
Alexander Lakhin
9a4e2eab61 Fix artifact name for build with sanitizers (#11066)
## Problem
When a build is made with sanitizers, this is not reflected in the
artifact name, which can lead to overriding normal builds with sanitized
ones.

## Summary of changes
Take this property of a build into account when constructing the
artifact name.
2025-03-03 18:00:53 +00:00
Peter Bendel
a07599949f First version of a new benchmark to test larger OLTP workload (#11053)
## Problem

We want to support larger tenants (regarding logical database size,
number of transactions per second etc.) and should increase our test
coverage of OLTP transactions at larger scale.

## Summary of changes

Start a new benchmark that over time will add more OLTP tests at larger
scale.
This PR covers the first version and will be extended in further PRs.

Also fix some infrastructure:
- default for new connections and large tenants is to use connection
pooler pgbouncer, however our fixture always added
`statement_timeout=120` which is not compatible with pooler
[see](https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter)
- action to create branch timed out after 10 seconds and 10 retries but
for large tenants it can take longer so use increasing back-off for
retries

## Test run

https://github.com/neondatabase/neon/actions/runs/13593446706
2025-03-03 15:25:48 +00:00