Commit Graph

119 Commits

Author SHA1 Message Date
Sergey Melnikov
95bf19b85a Add --atomic to all helm upgrade operations (#3299)
When number of github actions workers is changed, some jobs get killed.
When helm if killed during the upgrade, release stuck in pending-upgrade
state. --atomic should initiate automatic rollback in this case.
2023-01-10 10:05:27 +00:00
Sergey Melnikov
14df37c108 Use GHA environments for gradual prod rollout (#3295)
Each release will wait for manual approval for each region
2023-01-09 20:18:16 +04:00
Sergey Melnikov
93c77b0383 Use GHA environment for per-region deploy approvals on staging (#3293)
Each main deploy will wait for manual approval for each region
2023-01-09 15:40:14 +04:00
Heikki Linnakangas
e9583db73b Remove code and test to generate flamegraph on GetPage requests. (#3257)
It was nice to have and useful at the time, but unfortunately the method
used to gather the profiling data doesn't play nicely with 'async'. PR
#3228 will turn 'get_page_at_lsn' function async, which will break the
profiling support. Let's remove it, and re-introduce some kind of
profiling later, using some different method, if we feel like we need it
again.
2023-01-03 20:11:32 +02:00
Vadim Kharitonov
0b428f7c41 Enable licenses check for 3rd-parties 2023-01-03 15:11:50 +01:00
Sergey Melnikov
c01f92c081 Fully remove old staging deploy (#3191) 2022-12-22 20:09:45 +01:00
Sergey Melnikov
7bc17b373e Fix calculate-deploy-targets (#3189)
Was broken in https://github.com/neondatabase/neon/pull/3180
2022-12-22 16:28:36 +01:00
Sergey Melnikov
5a496d82b0 Do not deploy storage and proxies to old staging (#3180)
We fully migrated out, this nodes will be soon decommissioned
2022-12-22 15:37:17 +01:00
Sergey Melnikov
707d1c1c94 Fix vm-compute-image upload to dockerhub (#3181) 2022-12-22 13:34:16 +01:00
Sergey Melnikov
f5f1197e15 Build vm-compute-node images (#3174) 2022-12-22 11:25:56 +01:00
Arseny Sher
70ce01d84d Deploy broker with L4 LB in new env. (#3125)
Seems to be fixing issue with missing keepalives.
2022-12-15 22:42:30 +01:00
Sergey Melnikov
827ee10b5a Disable neon-stress deploy (#3093) 2022-12-14 01:51:42 +01:00
Sergey Melnikov
826214ae56 Force ansible-galaxy to also use local ansible.cfg (#3091) 2022-12-13 21:06:18 +01:00
Sergey Melnikov
b39d6126bb Force ansible to use local ansible.cfg (#3089) 2022-12-13 21:57:39 +03:00
Alexander Bayandin
feb07ed510 deploy (old): replace actions/setup-python@v4 with ansible image (#3081)
Replace actions/setup-python@v4 with the ansible image to fix
```
Version 3.10 was not found in the local cache
Error: The version '3.10' with architecture 'x64' was not found for this operating system.
```
2022-12-13 14:01:29 +00:00
Sergey Melnikov
e5d523c86a Add new us-west-2 region (#3071) 2022-12-13 14:11:40 +01:00
Arseny Sher
544777e86b Fix storage_broker deploy typo. 2022-12-13 10:57:26 +03:00
Arseny Sher
e2ae4c09a6 Put e2e tag back.
32662ff1c4 required running e2e tests on patched branch of cloud repo; not
that it is merged, put the tag back.
2022-12-13 09:53:22 +03:00
Rory de Zoete
d1edc8aa00 Deprecate old runner for deploy job (#3070)
As we plan to no longer use them

Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2022-12-12 16:55:40 +01:00
Kirill Bulatov
0aa2f5c9a5 Regroup CI testing (#3049)
Part of https://github.com/neondatabase/neon/pull/2410 and
https://github.com/neondatabase/neon/pull/2407

* adds `hashFiles('rust-toolchain.toml')` into Rust cache keys, thus
removing one of the manual steps to do when upgrading rustc
* copies Python and Rust style checks from the `codestyle.yml` workflow
* adjusts shell defaults in the main workflow
* replaces `codestyle.yml` with a `neon_extra_builds.yml` worlflow

The new workflow runs on commits to `main` (`codestyle.yml` was run per
PR), and runs two custom builds on GH agents:

* macos-latest, to ensure the entire project compiles on it (no tests
run)

There were no frequent breakages on macOs in our builds, so we can check
it rarely without making every storage PR to wait for it to complete.
The updated mac build use release builds now, so presumably should work
a bit faster due to overall smaller files to cache between builds.

* ubuntu-latest, without caches, to produce full compilation stats for
Rust builds and upload it as an artifact to GitHub

Old `clippy build --timings` stats were collected from the builds that
use caches and incremental calculation hence never could produce a full
report, it got removed.
2022-12-12 12:58:55 +02:00
Vadim Kharitonov
26f4ff949a Add sentry to storage_broker. 2022-12-12 13:30:16 +03:00
Arseny Sher
a1fd0ba23b set tag to make proper e2e tests run 2022-12-12 13:30:16 +03:00
Arseny Sher
32662ff1c4 Replace etcd with storage_broker.
This is the replacement itself, the binary landed earlier. See
docs/storage_broker.md.

ref
https://github.com/neondatabase/neon/pull/2466
https://github.com/neondatabase/neon/issues/2394
2022-12-12 13:30:16 +03:00
Arseny Sher
4d6137e0e6 Try to fix docker image tag in broker deploy. 2022-12-09 20:43:54 +03:00
Lassi Pölönen
8684b1b582 Reduce the storage-broker deployment timeout to 5 minutes. 15 minutes is (#3047)
15 minutes is way too long, at least at this point and we want to see
the possible errors quicker. Hence drop it to 5min to have some safety
margin.
2022-12-09 14:37:53 +00:00
Alexander Bayandin
5c701f9a75 merge-allure-report: create report even if benchmarks is skipped (#3029) 2022-12-08 15:13:40 +00:00
Sergey Melnikov
f5a735ac3b Add proxy and broker to us-west-2 (#3027)
Co-authored-by: Lassi Pölönen <lassi.polonen@iki.fi>
2022-12-08 12:24:24 +01:00
Lassi Pölönen
c74dca95fc Helm values for old staging and one region in new staging (#2922)
helm values for the new `storage-broker`. gRPC, over secure connection
with a proper certificate, but no authentication.

Uses alb ingress in the old cluster and nginx ingress for the new one.

The chart is deployed and the addresses are functional, while the
pipeline doesn't exist yet.
2022-12-07 14:24:07 +02:00
Kliment Serafimov
8f2b3cbded Sentry integration for storage. (#2926)
Added basic instrumentation to integrate sentry with the proxy, pageserver, and safekeeper processes.
Currently in sentry there are three projects, one for each process. Sentry url is sent to all three processes separately via cli args.
2022-12-06 18:57:54 +00:00
Sergey Melnikov
0a4e5f8aa3 Setup legacy scram proxy in us-east-2 (#2943) 2022-11-28 17:21:35 +01:00
Sergey Melnikov
aee3eb6d19 Deploy link proxy to us-east-2 (#2905) 2022-11-23 18:11:44 +01:00
Sergey Melnikov
85f0975c5a Setup eu-west-1 as region for PR testing (#2757) 2022-11-23 10:54:39 +01:00
Rory de Zoete
53267969d7 Preparation for ARM runners (#2751)
Need to make the runner tag more specific else we inadvertently might
run workloads on the wrong arch

Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2022-11-16 11:28:57 +01:00
Alexander Bayandin
03190a2161 GitHub Actions: Do not create Allure report for cancelled jobs (#2813)
If a workflow is cancelled, do not delay its finishing by creating an allure
report.
2022-11-15 10:27:59 +00:00
Heikki Linnakangas
f30ef00439 Stop building the legacy "compute-node" docker image.
Before we had separate images for v14 and v15, the compute node image
was called just "neondatabase/compute-node". It has been superseded by
the "neondatabase/compute-node-v14" and "neondatabase/compute-node-v15"
images. The old image is not used by the cloud console build or tests
anymore.
2022-11-12 20:48:10 +02:00
Kirill Bulatov
03695261fc Test storage Docker images (#2767)
Closes https://github.com/neondatabase/neon/issues/2697
Example:
https://github.com/neondatabase/neon/actions/runs/3416774593/jobs/5688394855

Adds a set of tests on the storage Docker images before they are pushed
to the public registries:
* tests that pageserver binary has the correct version string (other
binaries are built with the same library, so it should be enough to test
one)
* tests that the compose file set-up works and all components are able
to start and perform a single SQL query (CREATE TABLE)
2022-11-11 19:42:26 +02:00
Alexander Bayandin
c4f9f1dc6d Add data format forward compatibility tests (#2766)
Add `test_forward_compatibility`, which checks if it's going to
be possible to roll back a release to the previous version.
The test uses artifacts (Neon & Postgres binaries) from the previous
release to start Neon on the repo created by the current version. It
performs exactly the same checks as `test_backward_compatibility` does.

Single `ALLOW_BREAKING_CHANGES` env var got replaced by
`ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` &
`ALLOW_FORWARD_COMPATIBILITY_BREAKAGE` and can be set by `backward
compatibility breakage` and `forward compatibility breakage` labels
respectively.
2022-11-10 09:06:34 +00:00
Kirill Bulatov
4a10e1b066 Pass pushed storage Docker tag to e2e jobs 2022-11-10 08:50:42 +02:00
Kirill Bulatov
6df4d5c911 Bump rustc to 1.62.1 (#2728)
Changelog: https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1621-2022-07-19
2022-11-02 01:21:33 +02:00
Sergey Melnikov
7481fb082c Fix bugs in #2713 (#2716) 2022-10-28 14:12:49 +00:00
Sergey Melnikov
59a3ca4ec6 Deploy proxy to new prod regions (#2713)
* Refactor proxy deploy

* Test new prod deploy

* Remove assume role

* Add new values

* Add all regions
2022-10-28 16:25:28 +03:00
Sergey Melnikov
e86a9105a4 Deploy storage to new prod regions (#2709) 2022-10-28 10:17:27 +00:00
Rory de Zoete
6dbf202e0d Update crane copy target (#2704)
Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
2022-10-27 16:00:40 +02:00
Sergey Melnikov
a3cb8c11e0 Do not release to new staging proxies on release (#2685) 2022-10-25 23:51:23 +00:00
Alexander Bayandin
834ffe1bac Add data format backward compatibility tests (#2626) 2022-10-25 16:41:50 +02:00
Sergey Melnikov
2709878b8b Deploy scram proxies into new account (#2643) 2022-10-21 14:21:22 +03:00
Sergey Melnikov
30984c163c Fix race between pushing image to ECR and copying to dockerhub (#2662) 2022-10-20 23:01:01 +03:00
Sergey Melnikov
0cd2d91b9d Fix deploy-new job by installing sivel.toiletwater (#2641) 2022-10-18 14:44:19 +00:00
Sergey Melnikov
546e9bdbec Deploy storage into new account and migrate to management API v2 (#2619)
Deploy storage into new account
Migrate safekeeper and pageserver initialisation to management api v2
2022-10-18 15:52:15 +03:00
Lassi Pölönen
5d6553d41d Fix pageserver configuration generation bug (#2584)
* We had an issue with `lineinfile` usage for pageserver configuration
file: if the S3 bucket related values were changed, it would have
resulted in duplicate keys, resulting in invalid toml.

So to fix the issue, we should keep the configuration in structured
format (yaml in this case) so we can always generate syntactically
correct toml.

Inventories are converted to yaml just so that it's easier to maintain
the configuration there. Another alternative would have been a separate
variable files.

* Keep the ansible collections dir, but locally installed collections
should not be tracked.
2022-10-16 11:37:10 +00:00