When number of github actions workers is changed, some jobs get killed.
When helm if killed during the upgrade, release stuck in pending-upgrade
state. --atomic should initiate automatic rollback in this case.
## Describe your changes
Added a PR template
## Issue ticket number and link
#3162
## Checklist before requesting a review
- [ ] I have performed a self-review of my code
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
This reverts commit 56a4466d0a.
Seems that flackiness increased after this commit, while the time
decrease was a couple of seconds.
With every regular Python test spawing 1 etcd, 3 safekeepers, 1
pageserver, few CLI commands and post-run cleanup hooks, it might be
hard to run many such tests in parallel.
We could return to this later, after we consider alternative test
structure and/or CI runner structure.
It was nice to have and useful at the time, but unfortunately the method
used to gather the profiling data doesn't play nicely with 'async'. PR
#3228 will turn 'get_page_at_lsn' function async, which will break the
profiling support. Let's remove it, and re-introduce some kind of
profiling later, using some different method, if we feel like we need it
again.
I have experimented with the runner threads number, and looks like 8
threads win us a few seconds.
Bumping the thread count more did not improve the situation much:
* 20 threads were not allowed by pytest
* 16 threads were flacking quite notably
My guess would be that all pageservers, safekeepers, and other nodes we
start occupy quite much of the CPU and other resources to make this
approach more scalable.
Do not run Nightly Benchmarks on `neon-captest-new`.
This is a temporary solution to avoid spikes in the storage we consume
during the test run. To collect data for the default instance, we could
run tests weekly (i.e. not daily).
Replace actions/setup-python@v4 with the ansible image to fix
```
Version 3.10 was not found in the local cache
Error: The version '3.10' with architecture 'x64' was not found for this operating system.
```
Part of https://github.com/neondatabase/neon/pull/2410 and
https://github.com/neondatabase/neon/pull/2407
* adds `hashFiles('rust-toolchain.toml')` into Rust cache keys, thus
removing one of the manual steps to do when upgrading rustc
* copies Python and Rust style checks from the `codestyle.yml` workflow
* adjusts shell defaults in the main workflow
* replaces `codestyle.yml` with a `neon_extra_builds.yml` worlflow
The new workflow runs on commits to `main` (`codestyle.yml` was run per
PR), and runs two custom builds on GH agents:
* macos-latest, to ensure the entire project compiles on it (no tests
run)
There were no frequent breakages on macOs in our builds, so we can check
it rarely without making every storage PR to wait for it to complete.
The updated mac build use release builds now, so presumably should work
a bit faster due to overall smaller files to cache between builds.
* ubuntu-latest, without caches, to produce full compilation stats for
Rust builds and upload it as an artifact to GitHub
Old `clippy build --timings` stats were collected from the builds that
use caches and incremental calculation hence never could produce a full
report, it got removed.
Migrate Nightly Benchmarks from captest to staging.
- Migrate GitHub Workflows
- Replace `zenith-benchmarker` with regular runners
- Remove `environment` parameter from Neon GitHub Actions, add
`postgres_version`
- The only job left on captest is `neon-captest-reuse`, which will be
moved to staging after its project migration.
Ref https://github.com/neondatabase/cloud/issues/2836