Commit Graph

567 Commits

Author SHA1 Message Date
Fedor Dikarev
723a79159c check for empty array '[]' not empty string [] 2024-11-20 16:41:02 +01:00
Fedor Dikarev
d133f831c0 check that changes not empty for build matrix 2024-11-20 15:38:36 +01:00
Fedor Dikarev
aa19a412e2 dont depend on itself and check that doesn;t rebuild when no changes 2024-11-20 15:33:35 +01:00
Fedor Dikarev
3a3fcb3745 now all together and check for build on macos-15 2024-11-20 15:16:26 +01:00
Fedor Dikarev
441e769d67 submodule update --depth 1 2024-11-20 15:12:03 +01:00
Fedor Dikarev
1bf8857962 run git submodule command 2024-11-20 15:05:05 +01:00
Fedor Dikarev
eb8a87d2ec use sparse checkout 2024-11-20 14:46:37 +01:00
Fedor Dikarev
8c9bf3e8d4 checkout only required submodule 2024-11-20 14:41:57 +01:00
Fedor Dikarev
c4ddac3fcc Merge branch 'main' into feat/ci_workflow_build_macos_2 2024-11-20 14:36:52 +01:00
Fedor Dikarev
3bb61ce8fa don't build and run on ubuntu for tests 2024-11-20 14:35:24 +01:00
Alexander Bayandin
46beecacce CI(benchmarking): route test failures to on-call-qa-staging-stream (#9813)
## Problem

We want to keep `#on-call-staging-stream` channel close to the prod one
and redirect notifications from failing benchmarks to another channel
for investigation.

## Summary of changes
- Send notifications regarding failures in `benchmarking` job to
`#on-call-staging-stream`
- Send notifications regarding failures in `periodic_pagebench` job to
`#on-call-staging-stream`
2024-11-20 12:23:41 +00:00
Fedor Dikarev
94e4a0e2a0 update macos version for runner (#9817)
Closes: https://github.com/neondatabase/neon/issues/9816

Run MacOs builds on `macos-15`.
As `pkg-config` is bundled in runner image, don't install it with `brew`
2024-11-20 13:04:14 +01:00
Alexander Bayandin
725e0a1ac9 CI(release): create reusable workflow for releases (#9806)
## Problem

We have a bunch of duplicated code for automated releases. There will be
even more, once we have `release-compute` branch
(https://github.com/neondatabase/neon/pull/9637).

Another issue with the current `release` workflow is that it creates a
PR from the main as is. If we create 2 different releases from the
same commit, GitHub could mix up results from different PRs.

## Summary of changes
- Create a reusable workflow for releases
- Create an empty commit to differentiate releases
2024-11-19 23:03:15 +00:00
Peter Bendel
a8ac895b83 re-acquire S3 OIDC token after long running tests for report upload to S3 (#9799)
## Problem

If a benchmark or test-case runs longer than the AWS OIDC token lifetime
successive upload of test reports to S3 fail - example:


https://github.com/neondatabase/neon/actions/runs/11905529176/job/33176168174#step:9:243

## Summary of changes

In actions that require access to S3 and which are invoked after a long
running python testcase we re-acquire the OIDC token explicitly.
Note that we need to pass down the aws_oicd_role_arn from the workflow
to the action because actions have no access to GitHub vars for security
reasons.

Sample run
https://github.com/neondatabase/neon/actions/runs/11912328276/job/33195676867
2024-11-19 18:22:51 +01:00
Peter Bendel
982cb1c15d Move logic for ingest benchmark from GitHub workflow into python testcase (#9762)
## Problem

The first version of the ingest benchmark had some parsing and reporting
logic in shell script inside GitHub workflow.
it is better to move that logic into a python testcase so that we can
also run it locally.

## Summary of changes

- Create new python testcase
- invoke pgcopydb inside python test case
- move the following logic into python testcase
  - determine backpressure
  - invoke pgcopydb and report its progress
  - parse pgcopydb log and extract metrics
  - insert metrics into perf test database
 
- add additional column to perf test database that can receive endpoint
ID used for pgcopydb run to have it available in grafana dashboard when
retrieving other metrics for an endpoint

## Example run


https://github.com/neondatabase/neon/actions/runs/11860622170/job/33056264386
2024-11-19 09:46:46 +00:00
Fedor Dikarev
73f494a0da will it compile on macos-14? 2024-11-19 09:46:29 +01:00
Fedor Dikarev
7ee766c8b1 runs-on: macos-15 2024-11-19 09:06:26 +01:00
Alexey Kondratov
5f0e9c9a94 feat(compute/tests): Report successful replication test runs as well (#9787)
It should increase the visibility of whether they run and pass.
2024-11-18 16:05:09 +00:00
Alexander Bayandin
913b5b7027 CI: remove separate check-build-tools-image workflow (#9708)
## Problem

We call `check-build-tools-image` twice for each workflow whenever we
use it, along with `build-build-tools-image`, once as a workflow itself,
and the second time from `build-build-tools-image`. This is not
necessary.

## Summary of changes
- Inline `check-build-tools-image` into `build-build-tools-image`
- Remove separate `check-build-tools-image` workflow
2024-11-18 13:14:28 +00:00
Peter Bendel
c3eecf6763 adapt pgvector bench to minor version upgrades of PostgreSql (#9784)
## Problem

pgvector benchmark is failing because after PostgreSQL minor version
upgrade previous version packages are no longer available in deb
repository

[example
failure](https://github.com/neondatabase/neon/actions/runs/11875503070/job/33092787149#step:4:40)

## Summary of changes

Update postgres minor version of packages to current version

[Example run after this
change](https://github.com/neondatabase/neon/actions/runs/11888978279/job/33124614605)
2024-11-18 10:47:43 +00:00
Fedor Dikarev
57f58801af just checkout with submodules 2024-11-15 11:29:16 +01:00
Fedor Dikarev
22963c7531 checkout repo for build 2024-11-15 11:27:00 +01:00
Fedor Dikarev
fb05a2e549 fix workflow syntax 2024-11-15 11:18:59 +01:00
Fedor Dikarev
3ab7297a51 fix workflow syntax 2024-11-15 11:16:52 +01:00
Fedor Dikarev
b2cf8797b0 Merge branch 'main' into feat/ci_workflow_build_macos_2 2024-11-15 11:12:51 +01:00
Fedor Dikarev
9da0b4d228 actually run makes 2024-11-15 11:04:37 +01:00
Fedor Dikarev
fde16f8614 use batch gh-workflow-stats-action with separate table (#9722)
We found that exporting GH Workflow Runs in batch is more efficient due
to
- better utilisation of Github API
- and gh runners usage is rounded to minutes, so even when ad-hoc export
is done in 5-10 seconds, we billed for one minute usage

So now we introduce batch exporting, with version v0.2.x of github
workflow stats exporter.
How it's expected to work now:
- every 15 minutes we query for the workflow runs, created in last 2
hours
- to avoid missing workflows that ran for more than 2 hours, every night
(00:25) we will query workflows created in past 24 hours and export them
as well
- should we have query for even longer periods?
- lets see how it works with current schedule
- for longer periods like for days or weeks, it may require to adjust
logic and concurrency of querying data, so lets for now use simpler
version
2024-11-11 20:33:29 +00:00
Peter Bendel
8db84d9964 new ingest benchmark (#9711)
## Problem

We have no specific benchmark testing project migration of postgresql
project with existing data into Neon.
Typical steps of such a project migration are
- schema creation in the neon project
- initial COPY of relations
- creation of indexes and constraints
- vacuum analyze

## Summary of changes

Add a periodic benchmark running 9 AM UTC every day.
In each run:
- copy a 200 GiB project that has realistic schema, data, tables,
indexes and constraints from another project into
  - a new Neon project (7 CU fixed)
- an existing tenant, (but new branch and new database) that already has
4 TiB of data
- use pgcopydb tool to automate all steps and parallelize COPY and index
creation
- parse pgcopydb output and report performance metrics in Neon
performance test database

## Logs

This benchmark has been tested first manually and then as part of
benchmarking.yml workflow, example run see

https://github.com/neondatabase/neon/actions/runs/11757679870
2024-11-11 17:51:15 +00:00
Alexander Bayandin
f510647c7e CI: retry actions/github-script for 5XX errors (#9703)
## Problem

GitHub API can return error 500, and it fails jobs that use
`actions/github-script` action.

## Summary of changes
- Add `retry: 500` to all `actions/github-script` usage
2024-11-11 12:42:32 +00:00
Alexander Bayandin
2fcac0e66b CI(pre-merge-checks): add required checks (#9700)
## Problem
The Merge queue doesn't work because it expects certain jobs, which we
don't have in the `pre-merge-checks` workflow.
But it turns out we can just create jobs/checks with the same names in
any workflow that we run.

## Summary of changes
- Add `conclusion` jobs
- Create `neon-cloud-e2e` status check
- Add a bunch of `if`s to handle cases with no relevant changes found
and prepare the workflow to run rust checks in the future
- List the workflow in `report-workflow-stats` to collect stats about it
2024-11-09 01:02:54 +00:00
Alexander Bayandin
b6bc954c5d CI: move check codestyle python to reusable workflow and run on a merge_group (#9683)
## Problem

To prevent breaking main after Python 3.11 PR get merged 
we need to enable merge queue and run `check-codestyle-python`
job on it

## Summary of changes
- Move `check-codestyle-python` to a reusable workflow
- Run this workflow on `merge_group` event
2024-11-08 17:32:56 +00:00
JC Grünhage
027889b06c ci: use set-docker-config-dir from dev-actions (#9638)
set-docker-config-dir was replicated over multiple repositories.

The replica of this action was removed from this repository and it's
using the version from github.com/neondatabase/dev-actions instead
2024-11-08 10:44:59 +01:00
Peter Bendel
8b3bcf71ee revert higher token expiration (#9605)
## Problem

The IAM role associated with our github action runner supports a max
token expiration which is lower than the value we tried.

## Summary of changes

Since we believe to have understood the performance regression we (by
ensuring availability zone affinity of compute and pageserver) the job
should again run in lower than 5 hours and we revert this change instead
of increasing the max session token expiration in the IAM role which
would reduce our security.
2024-11-01 12:46:02 +01:00
Peter Bendel
51fda118f6 increase lifetime of AWS session token to 12 hours (#9590)
## Problem

clickbench regression causes clickbench to run >9 hours and the AWS
session token is expired before the run completes

## Summary of changes

extend lifetime of session token for this job to 12 hours
2024-10-31 13:34:50 +00:00
Peter Bendel
45b558f480 temporarily increase timeout for clickbench benchmark until regression is resolved (#9554)
## Problem

click bench job in benchmarking workflow has a performance regression
causing it to run in timeout of max job run.

Suspected root cause:
Project has been migrated from single pageserver to storage controller
managed project on Oct 14th.
Since then the regression shows.

## Summary of changes

Increase timeout of pytest to 12 hours.
Increase job timeout to 12 hours
2024-10-29 10:53:28 +00:00
Sergey Melnikov
3bad52543f We don't have legacy proxies anymore (#9544)
We don't have legacy scram proxies anymore:
cc: https://github.com/neondatabase/cloud/issues/9745
2024-10-28 16:42:35 +00:00
Tristan Partin
3d64a7ddcd Add pg_mooncake to compute-node.Dockerfile
Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-28 11:23:30 -05:00
Rahul Patil
8dd555d396 ci(proxy): Update GH action flag on proxy deployment (#9535)
## Problem

Based on a recent proxy deployment issue, we deployed another proxy
version (proxy-scram), which was not needed when deploying a specific
proxy type. we have
[PR](https://github.com/neondatabase/infra/pull/2142) to update on the
infra branch and need to update CI in this repo which triggers proxy
deployment.

## Summary of changes

- Update proxy deployment flag 

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-10-28 13:17:09 +01:00
Alexander Bayandin
b8a311131e CI: remove git config --add safe.directory hack (#9391)
## Problem

We have `git config --global --add safe.directory ...` leftovers from the
past, but `actions/checkout` does it by default (since v3.0.2, we use v4)

## Summary of changes
- Remove `git config --global --add safe.directory ...` hack
2024-10-24 15:49:26 +01:00
Alexander Bayandin
163beaf9ad CI: use build-tools on Debian 12 whenever we use Neon artifact (#9463)
## Problem

```
+ /tmp/neon/pg_install/v16/bin/psql '***' -c 'SELECT version()'
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/bin/psql)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/bin/psql)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
```

## Summary of changes
- Use `build-tools:pinned-bookworm` whenever we download Neon artefact
2024-10-21 12:14:19 +01:00
Alexander Bayandin
5b37485c99 Rename dockerfiles from Dockerfile.<something> to <something>.Dockerfile (#9446)
## Problem

Our dockerfiles, for some historical reason, have unconventional names
`Dockerfile.<something>`, and some tools (like GitHub UI) fail to highlight
the syntax in them.

> Some projects may need distinct Dockerfiles for specific purposes. A
common convention is to name these `<something>.Dockerfile`

From: https://docs.docker.com/build/concepts/dockerfile/#filename

## Summary of changes
- Rename `Dockerfile.build-tools` -> `build-tools.Dockerfile`
- Rename `compute/Dockerfile.compute-node` ->
`compute/compute-node.Dockerfile`
2024-10-21 09:51:12 +01:00
Cihan Demirci
bc6b8cee01 don't trigger workflows in two repos (#9340)
https://github.com/neondatabase/cloud/issues/16723
2024-10-16 10:43:48 +01:00
Tristan Partin
061ea0de7a Add jsonnetfmt targets
This should make it a little bit easier for people wanting to check if
their files are formated correctly. Has the added bonus of making the CI
check simpler as well.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 20:01:13 -05:00
Tristan Partin
cf7a596a15 Generate sql_exporter config files with Jsonnet
There are quite a few benefits to this approach:

- Reduce config duplication
  - The two sql_exporter configs were super similar with just a few
    differences
- Pull SQL queries into standalone files
  - That means we could run a SQL formatter on the file in the future
  - It also means access to syntax highlighting
- In the future, run different queries for different PG versions
  - This is relevant because right now, we have queries that are failing
    on PG 17 due to catalog updates

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 11:18:38 -05:00
Fedor Dikarev
350ae9a9fe try to sparse-checkout 2024-10-15 00:15:06 +02:00
Fedor Dikarev
44c1f52e24 remove concurrency from build-macos 2024-10-15 00:01:46 +02:00
Fedor Dikarev
fbc6b7fae8 Merge branch 'main' into feat/ci_workflow_build_macos_2 2024-10-14 23:43:30 +02:00
Fedor Dikarev
bd517b1d60 fix build-macos.yml 2024-10-14 23:37:46 +02:00
Fedor Dikarev
10be4cbed8 fix mistype and remove unnecessary checks 2024-10-14 23:06:36 +02:00
Fedor Dikarev
f977a62727 disable build_and_test for now, run build-macos from neon_extra_build 2024-10-14 22:45:30 +02:00