Commit Graph

268 Commits

Author SHA1 Message Date
Dmitry Rodionov
d53f9ab3eb delete timelines from s3 (#4384)
Delete data from s3 when timeline deletion is requested

## Summary of changes

UploadQueue is altered to support scheduling of delete operations in
stopped state. This looks weird, and I'm thinking whether there are
better options/refactorings for upload client to make it look better.

Probably can be part of https://github.com/neondatabase/neon/issues/4378

Deletion is implemented directly in existing endpoint because changes are not
that significant. If we want more safety we can separate those or create
feature flag for new behavior.

resolves [#4193](https://github.com/neondatabase/neon/issues/4193)

---------

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-06-08 15:01:22 +03:00
Alexander Bayandin
daa79b150f Code Coverage: store lcov report (#4358)
## Problem

In the future, we want to compare code coverage on a PR with coverage on
the main branch.
Currently, we store only code coverage HTML reports, I suggest we start
storing reports in "lcov info" format that we can use/parse in the
future. Currently, the file size is ~7Mb (it's a text-based format and
could be compressed into a ~400Kb archive)

- More about "lcov info" format:
https://manpages.ubuntu.com/manpages/jammy/man1/geninfo.1.html#files
- Part of https://github.com/neondatabase/neon/issues/3543

## Summary of changes
- Change `scripts/coverage` to output lcov coverage to
`report/lcov.info` file instead of stdout (we already upload the whole
`report/` directory to S3)
2023-05-30 14:05:41 +01:00
Em Sharnoff
ccf653c1f4 re-enable file cache integration for VM compute node (#4338)
#4155 inadvertently switched to a version of the VM builder that leaves
the file cache integration disabled by default. This re-enables the
vm-informant's file cache integration.

(as a refresher: The vm-informant is the autoscaling component that sits
inside the VM and manages postgres / compute_ctl)

See also: https://github.com/neondatabase/autoscaling/pull/265
2023-05-28 10:22:45 -07:00
Alexander Bayandin
339a3e3146 GitHub Autocomment: comment commits for branches (#4335)
## Problem

GitHub Autocomment script posts a comment only for PRs. It's harder
to debug failed tests on main or release branches.

## Summary of changes

- Change the GitHub Autocomment script to be able to post a comment to
either a PR or a commit of a branch
2023-05-26 14:49:42 +01:00
sharnoff
ae805b985d Bump vm-builder v0.7.3-alpha3 -> v0.8.0 (#4339)
Routine `vm-builder` version bump, from autoscaling repo release. You
can find the release notes here:
https://github.com/neondatabase/autoscaling/releases/tag/v0.8.0
The changes are from v0.7.2 — most of them were already included in
v0.7.3-alpha3.

Of particular note: This (finally) fixes the cgroup issues, so we should
now be able to scale up when we're about to run out of memory.

**NB:** This has the effect of limit the DB's memory usage in a way it
wasn't limited before. We may run into issues because of that. There is
currently no way to disable that behavior, other than switching the
endpoint back to the k8s-pod provisioner.
2023-05-25 09:33:18 -07:00
Sasha Krassovsky
6052ecee07 Add connector extension to send Role/Database updates to console (#3891)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review
- [x] I have performed a self-review of my code.
- [x] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-05-25 12:36:57 +03:00
Alex Chi Z
f276f21636 ci: use eu-central-1 bucket (#4315)
Probably increase CI success rate.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-25 00:00:21 +03:00
sharnoff
7f1973f8ac bump vm-builder, use Neon-specific version (#4155)
In the v0.6.0 release, vm-builder was changed to be Neon-specific, so
it's handling all the stuff that Dockerfile.vm-compute-node used to do.

This commit bumps vm-builder to v0.7.3-alpha3.
2023-05-23 15:20:20 -07:00
Alexander Bayandin
3837fca7a2 compute-node-image: fix postgis download (#4280)
## Problem

`osgeo.org` is experiencing some problems with DNS resolving which
breaks `compute-node-image` (because it can't download postgis)

## Summary of changes
- Add `140.211.15.30 download.osgeo.org` to /etc/hosts by passing it via
the container option
2023-05-19 15:34:22 +01:00
Alexander Bayandin
1b2ece3715 Re-enable compatibility tests on Postgres 15 (#4274)
- Enable compatibility tests for Postgres 15
- Also add `PgVersion::v_prefixed` property to return the version number
with, _guess what,_ v-prefix!
2023-05-18 19:56:09 +01:00
Alexander Bayandin
30fe310602 Code Coverage: upload reports to S3 (#4256)
## Problem

`neondatabase/zenith-coverage-data` is too big:
- It takes ~6 minutes to clone and push the repo
- GitHub fails to publish an HTML report to github.io

Part of https://github.com/neondatabase/neon/issues/3543

## Summary of changes
Replace pushing code coverage report to
`neondatabase/zenith-coverage-data` with uploading it to S3
2023-05-17 11:30:07 +01:00
Alexander Bayandin
131343ed45 Fix regress-tests job for Postgres 15 on release branch (#4253)
## Problem

Compatibility tests don't support Postgres 15 yet, but we're still
trying to upload compatibility snapshot (which we do not collect).

Ref
https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129

## Summary of changes

Add `pg_version` parameter to `run-python-test-set` actions and do not
upload compatibility snapshot for Postgres 15
2023-05-16 17:18:56 +01:00
Alexander Bayandin
a65e0774a5 Increase shared memory size for regression test run (#4232)
Should fix flakiness caused by the error
```
FATAL:  could not resize shared memory segment "/PostgreSQL.3944613150" to 1048576 bytes: No space left on device
```
2023-05-16 14:06:47 +01:00
Alexander Bayandin
0322e2720f Nightly Benchmarks: add neonvm to pgbench-compare (#4225) 2023-05-16 12:46:28 +01:00
Alexander Bayandin
bb06d281ea Run regressions tests on both Postgres 14 and 15 (#4192)
This PR adds tests runs on Postgres 15 and created unified Allure report
with results for all tests.

- Split `.github/actions/allure-report` into
`.github/actions/allure-report-store` and
`.github/actions/allure-report-generate`
- Add debug or release pytest parameter for all tests (depending on
`BUILD_TYPE` env variable)
- Add Postgres version as a pytest parameter for all tests (depending on
`DEFAULT_PG_VERSION` env variable)
- Fix `test_wal_restore` and `restore_from_wal.sh` to support path with
`[`/`]` in it (fixed by applying spellcheck to the script and fixing all
warnings), `restore_from_wal_archive.sh` is deleted as unused.
- All known failures on Postgres 15 marked with xfail
2023-05-12 15:28:51 +01:00
Sergey Melnikov
0d3d022eb1 Remove deploy workflows (#4157)
## Describe your changes
Removing deploy workflows (moving to aws repo)
2023-05-08 17:30:16 +02:00
Gleb Novikov
9860d59aa2 Public docker image repository by default 2023-05-08 15:51:54 +04:00
Alexander Bayandin
b114ef26c2 GitHub Autocomment: add a note if no tests were run (#4109)
- Always (if not cancelled) add a comment to a PR
- Mention in the comment if no tests were run / reports were not
generated.
2023-05-03 15:38:49 +01:00
Anton Chaporgin
db81242f4a add debug to pg-sni-router install (#4143) 2023-05-03 16:14:16 +03:00
Sergey Melnikov
093fafd6bd Deploy pg-sni-router (#4132) 2023-05-01 17:18:45 +02:00
Christian Schwarz
5b911e1f9f build: run clippy for powerset of features (#4077)
This will catch compiler & clippy warnings in all feature combinations.

We should probably use cargo hack for build and test as well, but,
that's quite expensive and would add to overall CI wait times.

obsoletes https://github.com/neondatabase/neon/pull/4073
refs https://github.com/neondatabase/neon/pull/4070
2023-04-27 15:01:27 +03:00
Sergey Melnikov
9d0cf08d5f Fix new storage-broker deploy for eu-central-1 (#4079) 2023-04-26 10:29:44 +03:00
Alexander Bayandin
2d6fd72177 GitHub Workflows: Fix crane for several registries (#4076)
Follow-up fix after https://github.com/neondatabase/neon/pull/4067

```
+ crane tag neondatabase/vm-compute-node-v14:3064 latest
Error: fetching "neondatabase/vm-compute-node-v14:3064": GET https://index.docker.io/v2/neondatabase/vm-compute-node-v14/manifests/3064: MANIFEST_UNKNOWN: manifest unknown; unknown tag=3064
```

I reverted back the previous approach for promoting images
(login to one registry, save images to local fs, logout and login to
another registry, and push images from local fs). It turns out what
works for one Google project (kaniko), doesn't work for another (crane)
[sigh]
2023-04-25 23:58:59 +01:00
Alexander Bayandin
05ac0e2493 Login to ECR and Docker Hub at once (#4067)
- Update kaniko to 1.9.2 (from 1.7.0), problem with reproducible build is fixed
- Login to ECR and Docker Hub at once, so we can push to several
registries, it makes job `push-docker-hub` unneeded
- `push-docker-hub` replaced with `promote-images` in `needs:` clause,
Pushing images to production ECR moved to `promote-images` job
2023-04-25 17:54:10 +01:00
Sergey Melnikov
78bbbccadb Deploy proxies for preview enviroments (#4052)
## Describe your changes
Deploy `main` proxies to the preview environments
We don't deploy storage there yet, as it's tricky.

## Issue ticket number and link
https://github.com/neondatabase/cloud/issues/4737
2023-04-25 16:46:52 +02:00
Cihan Demirci
0bfbae2d73 Add storage broker deployment to us-east-1 (#4048) 2023-04-18 18:41:09 +03:00
Cihan Demirci
0c083564ce Add us-east-1 hosts file and update regions (#4042)
## Describe your changes

## Issue ticket number and link

## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2023-04-17 15:25:27 +03:00
Alexander Bayandin
13e53e5dc8 GitHub Workflows: use '!cancelled' instead of 'success or failure' 2023-04-12 15:22:18 +01:00
Alexander Bayandin
c94b8998be GitHub Workflows: print error messages to stderr 2023-04-12 15:22:18 +01:00
Alexander Bayandin
218062ceba GitHub Workflows: use ref_name instead of ref 2023-04-12 15:22:18 +01:00
Alexander Bayandin
c79d5a947c Nightly Benchmarks: run third-party benchmarks once a week (#3987) 2023-04-11 10:58:04 +01:00
Alexander Bayandin
818e341af0 Nightly Benchmarks: replace neon-captest-prefetch with -new/-reuse (#3970)
We have enabled prefetch by default, let's use this in Nightly
Benchmarks:
- effective_io_concurrency=100 by default (instead of 32)
- maintenance_io_concurrency=100 by default (instead of 32)

Rename `neon-captest-prefetch` to `neon-captest-new` (for pgbench with
initialisation) and `neon-captest-reuse` (for OLAP scenarios)
2023-04-09 12:52:49 +01:00
Dmitry Rodionov
b45c92e533 tests: exclude compatibility tests by default (#3975)
This allows to skip compatibility tests based on `CHECK_ONDISK_DATA_COMPATIBILITY` environment variable. When the variable is missing (default) compatibility tests wont be run.
2023-04-06 21:21:39 +03:00
Alexander Bayandin
4d64edf8a5 Nightly Benchmarks: Add free tier sized compute (#3969)
- Add support for VMs and CU
- Add free tier limited benchmark (0.25 CU)
- Ensure we use 1 CU by default for pgbench workload
2023-04-06 19:18:24 +03:00
Alexander Bayandin
9310949b44 GitHub Autocomment: Retry on server errors (#3958)
Retry posting/updating a comment in case of 5XX errors from GitHub API
2023-04-05 22:08:06 +03:00
Alexander Bayandin
1d23b5d1de Comment PR with test results (#3907)
This PR adds posting a comment with test results. Each workflow run
updates the comment with new results.
The layout and the information that we post can be changed to our needs,
right now, it contains failed tests and test which changes status after
rerun (i.e. flaky tests)
2023-04-04 12:22:47 +01:00
Alexander Bayandin
105b8bb9d3 test_runner: automatically rerun flaky tests (#3880)
This PR adds a plugin that automatically reruns (up to 3 times) flaky
tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and
`pytest-rerunfailures` plugin.

As the first approximation we consider the test flaky if it has failed on 
the main branch in the last 10 days.

Flaky tests are fetched by `scripts/flaky_tests.py` script (it's
possible to use it in a standalone mode to learn which tests are flaky),
stored to a JSON file, and then the file is passed to the pytest plugin.
2023-04-04 12:21:54 +01:00
Alexander Bayandin
75ffe34b17 check-macos-build: fix cache key (#3926)
We don't have `${{ matrix.build_type }}` there, so it gets resolved to
an empty substring and looks like this

[`v1-macOS--pg-f8a650e49b06d39ad131b860117504044b01f312-dcccd010ff851b9f72bb451f28243fa3a341f07028034bbb46ea802413b36d80`](https://github.com/neondatabase/neon/actions/runs/4575422427/jobs/8078231907#step:26:2)
2023-03-31 21:45:59 +03:00
Joonas Koivunen
d0711d0896 build: fix git perms for deploy job (#3921)
copy pasted from `build-neon` job. it is interesting that this is only
needed by `build-neon` and `deploy`.

Fixes:
https://github.com/neondatabase/neon/actions/runs/4568077915/jobs/8070960178
which seems to have been going for a while.
2023-03-31 16:05:15 +03:00
Kirill Bulatov
9d714a8413 Split $CARGO_FLAGS and $CARGO_FEATURES to make e2e tests work 2023-03-29 00:08:30 +03:00
Kirill Bulatov
6c84cbbb58 Run new Rust IT test in CI 2023-03-29 00:08:30 +03:00
Vadim Kharitonov
e3cbcc2ea7 Revert "Add neondatabase/release team as a default reviewers for storage"
This reverts commit daeaa767c4.
2023-03-27 14:10:18 +04:00
Shany Pozin
809acb5fa9 Move neon-image-depot to a larger runner (#3860)
## Describe your changes
https://neondb.slack.com/archives/C039YKBRZB4/p1679413279637059
## Issue ticket number and link

## Checklist before requesting a review
- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-03-21 19:32:36 +02:00
Alexander Bayandin
3d869cbcde Replace flake8 and isort with ruff (#3810)
- Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort
- Update mypy and black
2023-03-14 13:25:44 +00:00
Vadim Kharitonov
daeaa767c4 Add neondatabase/release team as a default reviewers for storage
releases
2023-03-13 13:40:15 +01:00
Rory de Zoete
3c4f5af1b9 Try depot.dev for image building (#3768)
To see if it is faster. Run side-by-side for a while so we can gather
enough data.
2023-03-10 11:11:39 +01:00
Sergey Melnikov
2caece2077 Add -v to ansible invocations (#3670)
To get more debug output on failures
2023-02-21 23:11:52 +03:00
sharnoff
2153d2e00a Run compute_ctl in a cgroup in VMs (#3577) 2023-02-17 14:14:41 -08:00
Sergey Melnikov
a1b062123b Do not deploy storage to old account (#3630)
It's gone
2023-02-16 20:28:53 +00:00
Sergey Melnikov
c5c14368e3 Fix deploy-prod.yml syntax (#3556) 2023-02-07 15:27:31 +01:00