## Problem
`HEAD` commit for a PR is a phantom merge commit which skews the baseline
commit for coverage reports.
See
https://github.com/neondatabase/neon/pull/5751#issuecomment-1790717867
## Summary of changes
- Use commit hash instead of `HEAD` for finding baseline commits for
code coverage
- Use the base branch for PRs or the current branch for pushes
rationale: some crates pull these in as default; hopefully these hints
will require less cleanup-after and Cargo.lock file watching.
follow-up to #5848.
## Problem
I've added `BUILD_TAG` to docker images.
(https://github.com/neondatabase/neon/pull/5812), but forgot to add it
to services that we build for tests
## Summary of changes
- Set `BUILD_TAG` in `build-neon` job
## Problem
Some developers use workstations with arm CPUs, and sometimes x86-64
code is not fully compatible with it (for example,
https://github.com/neondatabase/neon/pull/5827).
Although we don't have arm CPUs in the prod (yet?), it is worth having
some basic checks for this architecture to have a better developer
experience.
Closes https://github.com/neondatabase/neon/issues/5829
## Summary of changes
- Run `check-codestyle-rust`-like & `build-neon`-like jobs on Arm runner
- Add `run-extra-build-*` label to run all available extra builds
replace the checkbox list with a a proper task list in the epic template
NB: this PR does not change the code, it only touches the github issue
templates
## Problem
https://github.com/neondatabase/neon/pull/5576 added `build-tag`
reporting to `libmetrics_build_info`, but it's not reported because we
didn't set the corresponding env variable in the build process.
## Summary of changes
- Add `BUILD_TAG` env var while building services
Prepare for a new release workflow
* Release PR is created on Fridays
* The discussion/approval happens during Friday
* Sunday morning the deployment will be done in central-il and perf
tests will be run
* On Monday early IST morning gradually start rolling (starting from US
regions as they are still in weekend time)
See slack for discussion:
https://neondb.slack.com/archives/C04P81J55LK/p1698565305607839?thread_ts=1698428241.031979&cid=C04P81J55LK
Only applicable change was neondatabase/autoscaling#584, setting
pgbouncer auth_dbname=postgres in order to fix superuser connections
from preventing dropping databases.
## Problem
We'll need to switch `REMOTE_STORAGE_AZURE_REGION` from the current
`eastus2` region to something `eu-central-1`-like. This may require
changing `AZURE_STORAGE_ACCESS_KEY`.
To make it possible to switch from one place (not to break a lot of
builds on CI), move `REMOTE_STORAGE_AZURE_CONTAINER` and
`REMOTE_STORAGE_AZURE_REGION` to GitHub Variables.
See https://github.com/neondatabase/neon/settings/variables/actions
## Summary of changes
- Get values for `REMOTE_STORAGE_AZURE_CONTAINER` &
`REMOTE_STORAGE_AZURE_REGION` from GitHub Variables
## Problem
We started to store test results in a new format in
https://github.com/neondatabase/neon/pull/4549.
This PR switches scripts to query this db.
(we can completely remove old DB/ingestions scripts in a couple of
weeks after the PR merged)
## Summary of changes
- `scripts/benchmark_durations.py` query new database
- `scripts/flaky_tests.py` query new database
Only applicable change was neondatabase/autoscaling#571, removing the
postgres_exporter flags `--auto-discover-databases` and
`--exclude-databases=...`
Create Rust bindings for C functions from walproposer. This allows to
write better tests with real walproposer code without spawning multiple
processes and starting up the whole environment.
`make walproposer-lib` stage was added to build static libraries
`libwalproposer.a`, `libpgport.a`, `libpgcommon.a`. These libraries can
be statically linked to any executable to call walproposer functions.
`libs/walproposer/src/walproposer.rs` contains
`test_simple_sync_safekeepers` to test that walproposer can be called
from Rust to emulate sync_safekeepers logic. It can also be used as a
usage example.
## Problem
- Because we compress artifacts file by file, we don't need to put them
into `tar` containers (ie instead of `tar.gz` we can use just `gz`).
- Pythons gz single-threaded and pretty slow.
A benchmark has shown ~20 times speedup (19.876176291 vs
0.8748335830000009) on my laptop (for a pageserver.log size is 1.3M)
## Summary of changes
- Replace tarfile with zstandart
- Update allure to 2.24.0
## Problem
Before releasing new version to production, we'd like to run a set of
required checks on the incoming release.
The simplest approach, which doesn't require many changes — dedicate one
staging region to `preprod` installation.
The proposed changes to the release flow are the following:
- When a release PR is merged into the release branch — trigger
deployment from the release branch to a dedicated staging-preprod region
(for now, it's going to be `eu-west-1` — Ireland)
Corresponding infrastructure PR:
https://github.com/neondatabase/aws/pull/585
## Summary of changes
- Trigger `deploy.dev` workflow with `-f deployPreprodRegion=true` for
release branch
Only notable change is including neondatabase/autoscaling#523, which we
hope will help with making sure that TCP connections are properly
terminated before shutdown (which hopefully fixes a leak in the
pageserver).
## Problem
- `gh pr list` fails with `unknown argument "main"; please quote all
values that have spaces due to using a variable with the wrong name
- `permissions: write-all` are too wide for the job
## Summary of changes
- For variable name `HEAD` -> `BRANCH`
- Grant only required permissions for each job
---------
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
If @github-actions creates release PR, the CI pipeline is not triggered
(but we have `release-notify.yml` workflow that we expect to run on this
event).
I suspect this happened because @github-actions is not a repository
member.
Ref
https://github.com/neondatabase/neon/pull/5283#issuecomment-1715209291
## Summary of changes
- Use `CI_ACCESS_TOKEN` to create a PR
- Use `gh` instead of `thomaseizinger/create-pull-request`
- Restrict permissions for GITHUB_TOKEN to `contents: write` only
(required for `git push`)
## Problem
`ci-run/pr-*` branches (and attached PRs) should be deleted
automatically when their parent PRs get closed.
But there are not
## Summary of changes
- Fix if-condition
This adds PostgreSQL 16 as a vendored postgresql version, and adapts the
code to support this version.
The important changes to PostgreSQL 16 compared to the PostgreSQL 15
changeset include the addition of a neon_rmgr instead of altering Postgres's
original WAL format.
Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
## Problem
Another thing I overlooked regarding'approved-for-ci-run`:
- When we create a PR, the action is associated with @vipvap and this
triggers the pipeline — this is good.
- When we update the PR by force-pushing to the branch, the action is
associated with @github-actions, which doesn't trigger a pipeline — this
is bad.
Initially spotted in #5239 / #5211
([link](https://github.com/neondatabase/neon/actions/runs/6122249456/job/16633919558?pr=5239))
— `check-permissions` should not fail.
## Summary of changes
- Use `CI_ACCESS_TOKEN` to check out the repo (I expect this token will
be reused in the following `git push`)
## Problem
Add a CI pipeline that checks GitHub Workflows with
https://github.com/rhysd/actionlint (it uses `shellcheck` for shell
scripts in steps)
To run it locally: `SHELLCHECK_OPTS=--exclude=SC2046,SC2086 actionlint`
## Summary of changes
- Add `.github/workflows/actionlint.yml`
- Fix actionlint warnings
## Problem
This PR creates a GitHub release from a release tag with an
autogenerated changelog: https://github.com/neondatabase/neon/releases
## Summary of changes
- Call GitHub API to create a release
## Problem
A bunch of fixes for different test-related things
## Summary of changes
- Fix test_runner/pg_clients (`subprocess_capture` return value has
changed)
- Do not run create-test-report if check-permissions failed for not
cancelled jobs
- Fix Code Coverage comment layout after flaky tests. Add another
healing "\n"
- test_compatibility: add an instruction for local run
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Problem
We've got `approved-for-ci-run` to work 🎉
But it's still a bit rough, this PR should improve the UX for external
contributors.
## Summary of changes
- `build_and_test.yml`: add `check-permissions` job, which fails if PR is
created from a fork. Make all jobs in the workflow to be dependant on
`check-permission` to fail fast
- `approved-for-ci-run.yml`: add `cleanup` job to close `ci-run/pr-*`
PRs and delete linked branches when the parent PR is closed
- `approved-for-ci-run.yml`: fix the layout for the `ci-run/pr-*` PR
description
- GitHub Autocomment: add a comment with tests result to the original PR
(instead of a PR from `ci-run/pr-*` )
## Problem
- `SCALE: unbound variable` from
https://github.com/neondatabase/neon/pull/5079
- The layout of the GitHub auto-comment is broken if the code coverage
section follows flaky test section from
https://github.com/neondatabase/neon/pull/4999
## Summary of changes
- `benchmarking.yml`: Rename `SCALE` to `TEST_OLAP_SCALE`
- `comment-test-report.js`: Add an extra new-line before Code coverage
section
## Problem
When the next release is coming, we want to let everyone know about it by
posting a message to the Slack channel with a list of commits.
## Summary of changes
- `.github/workflows/release-notify.yml` is added
- the workflow sends a message to
`vars.SLACK_UPCOMING_RELEASE_CHANNEL_ID` (or
[#test-release-notifications](https://neondb.slack.com/archives/C05QQ9J1BRC)
if not configured)
- On each PR update, the workflow updates the list of commits in the
message (it doesn't send additional messages)
## Problem
`CI_ACCESS_TOKEN` has quite limited access (which is good), but this
doesn't allow it to remove labels from PRs (which is bad)
## Summary of changes
- Use `GITHUB_TOKEN` to remove labels
- Use `CI_ACCESS_TOKEN` to create PRs
## Problem
Pull Requests created by GitHub Actions bot doesn't have access to
secrets, so we need to use our bot for it to auto-trigger a tests run
See previous PRs #4663, #5210, #5212
## Summary of changes
- Use our bot to create PRs
## Problem
It's hard to find out which DB size we use for OLAP benchmarks (TPC-H in
particular).
This PR adds handling of `TEST_OLAP_SCALE` env var, which is get added
to a test name as a parameter.
This is required for performing larger periodic benchmarks.
## Summary of changes
- Handle `TEST_OLAP_SCALE` in
`test_runner/performance/test_perf_olap.py`
- Set `TEST_OLAP_SCALE` in `.github/workflows/benchmarking.yml` to a
TPC-H scale
## Problem
Continuation of #4663, #5210
We're still getting an error:
```
GraphQL: Resource not accessible by integration (removeLabelsFromLabelable)
```
## Summary of changes
- trigger `approved-for-ci-run.yml` workflow on `pull_request_target`
instead of `pull_request`
## Problem
When a remote custom extension build fails, it looks a bit confusing on
neon CI:
- `trigger-custom-extensions-build` is green
- `wait-for-extensions-build` is red
- `build-and-upload-extensions` is red
But to restart the build (to get everything green), you need to restart
the only passed `trigger-custom-extensions-build`.
## Summary of changes
- Merge `trigger-custom-extensions-build` and
`wait-for-extensions-build` jobs into
`trigger-custom-extensions-build-and-wait`
## Problem
Currently, the `deploy` job doesn't wait for the custom extension job
(in another repo) and can be started even with failed extensions build.
This PR adds another job that polls the status of the extension build job
and fails if the extension build fails.
## Summary of changes
- Add `wait-for-extensions-build` job, which waits for a custom
extension build in another repo.
## Problem
We still need to rerun some builds manually because flaky tests weren't
detected automatically.
I found two reasons for it:
- If a test is flaky on a particular build type, on a particular
Postgres version, there's a high chance that this test is flaky on all
configurations, but we don't automatically detect such cases.
- We detect flaky tests only on the main branch, which requires manual
retrigger runs for freshly made flaky tests.
Both of them are fixed in the PR.
## Summary of changes
- Spread flakiness of a single test to all configurations
- Detect flaky tests in all branches (not only in the main)
- Look back only at 7 days of test history (instead of 10)
Starts `postgres` in cgroup directly from `compute_ctl` instead of from
`vm-builder`. This is required because the `vm-monitor` cannot be in the
cgroup it is managing. Otherwise, it itself would be frozen when
freezing the cgroup.
Requires https://github.com/neondatabase/cloud/pull/6331, which adds the
`AUTOSCALING` environment variable letting `compute_ctl` know to start
`postgres` in the cgroup.
Requires https://github.com/neondatabase/autoscaling/pull/468, which
prevents `vm-builder` from starting the monitor and putting postgres in
a cgroup. This will require a `VM_BUILDER_VERSION` bump.