Commit Graph

53 Commits

Author SHA1 Message Date
John Spray
3cecbfc04d .github: reduce test concurrency (#8444)
## Problem

This is an experiment to see if 16x concurrency is actually helping, or
if it's just giving us very noisy results. If the total runtime with a
lower concurrency is similar, then a lower concurrency is preferable to
reduce the impact of resource-hungry tests running concurrently.
2024-07-26 11:55:37 +01:00
Tristan Partin
1c57f6bac3 Add long running replication tests
These tests will help verify that replication, both physical and
logical, works as expected in Neon.

Co-authored-by: Sasha Krassovsky <sasha@neon.tech>
2024-07-08 07:30:22 -07:00
Alexander Bayandin
6216df7765 CI(benchmarking): move psql queries to actions/run-python-test-set (#8230)
## Problem

Some of the Nightly benchmarks fail with the error
```
+ /tmp/neon/pg_install/v14/bin/pgbench --version
/tmp/neon/pg_install/v14/bin/pgbench: error while loading shared libraries: libpq.so.5: cannot open shared object file: No such file or directory
```
Originally, we added the `pgbench --version` call to check that
`pgbench` is installed and to fail earlier if it's not.
The failure happens because we don't have `LD_LIBRARY_PATH` set for
every job, and it also affects `psql` command.
We can move it to `actions/run-python-test-set` so as not to duplicate
code (as it already have `LD_LIBRARY_PATH` set).

## Summary of changes
- Remove `pgbench --version` call
- Move `psql` commands to common `actions/run-python-test-set`
2024-07-02 15:21:23 +00:00
Alexander Bayandin
e823b92947 CI(build-tools): Remove libpq from build image (#8206)
## Problem
We use `build-tools` image as a base image to build other images, and it
has a pretty old `libpq-dev` installed (v13; it wasn't that old until I
removed system Postgres 14 from `build-tools` image in
https://github.com/neondatabase/neon/pull/6540)

## Summary of changes
- Remove `libpq-dev` from `build-tools` image
- Set `LD_LIBRARY_PATH` for tests (for different Postgres binaries that
we use, like psql and pgbench)
- Set `PQ_LIB_DIR` to build Storage Controller
- Set `LD_LIBRARY_PATH`/`DYLD_LIBRARY_PATH` in the Storage Controller
where it calls Postgres binaries
2024-07-01 13:11:55 +01:00
Alexander Bayandin
54a06de4b5 CI: Use runner.arch in cache keys along with runner.os (#8175)
## Problem
The cache keys that we use on CI are the same for X64 and ARM64
(`runner.arch`)

## Summary of changes
- Include `runner.arch` along with `runner.os` into cache keys
2024-06-27 13:56:03 +01:00
Alexander Bayandin
c789ec21f6 CI: miscellaneous cleanups (#8073)
## Problem
There are a couple of small CI cleanups that seem too small for dedicated PRs

## Summary of changes
- Create release PR with the title that matches the title in the description
- Tune error message for disallowing `ubuntu-latest` to explicitly
mention what to do
- Remove junit output from pytest, we use allure instead
2024-06-19 19:21:09 +01:00
Alexander Bayandin
feb359b459 CI: Update deprecated GitHub Actions (#6822)
## Problem

We use a bunch of deprecated actions.
See https://github.com/neondatabase/neon/actions/runs/7958569728
(Annotations section)

```
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-java@v3, actions/cache@v3, actions/github-script@v6. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
```

## Summary of changes
- `actions/cache@v3` -> `actions/cache@v4`
- `actions/checkout@v3` -> `actions/checkout@v4`
- `actions/github-script@v6` -> `actions/github-script@v7`
- `actions/setup-java@v3` -> `actions/setup-java@v4`
- `actions/upload-artifact@v3` -> `actions/upload-artifact@v4`
2024-02-19 21:46:22 +00:00
Alexander Bayandin
f4cc7cae14 CI(build-tools): Update Python from 3.9.2 to 3.9.18 (#6615)
## Problem

We use an outdated version of Python (3.9.2)

## Summary of changes
- Update Python to the latest patch version (3.9.18)
- Unify the usage of python caches where possible
2024-02-06 20:30:43 +00:00
Alexander Bayandin
e65f0fe874 CI(benchmarks): make job split consistent across reruns (#6614)
## Problem

We've got several issues with the current `benchmarks` job setup:
- `benchmark_durations.json` file (that we generate in runtime to
split tests into several jobs[0]) is not consistent between these
jobs (and very not consistent with the file if we rerun the job). I.e.
test selection for each job can be different, which could end up in
missed tests in a test run.
- `scripts/benchmark_durations` doesn't fetch all tests from the
database (it doesn't expect any extra directories inside
`test_runner/performance`)
- For some reason, currently split into 4 groups ends up with the 4th
group has no tests to run, which fails the job[1]

- [0] https://github.com/neondatabase/neon/pull/4683
- [1] https://github.com/neondatabase/neon/issues/6629

## Summary of changes
- Generate `benchmark_durations.json` file once before we start
`benchmarks` jobs (this makes it consistent across the jobs) and pass
the file content through the GitHub Actions input (this makes it
consistent for reruns)
- `scripts/benchmark_durations` fix SQL query for getting all required
tests
- Split benchmarks into 5 jobs instead of 4 jobs.
2024-02-06 17:00:55 +00:00
MMeent
83e7e5dbbd Feat/postgres 16 (#4761)
This adds PostgreSQL 16 as a vendored postgresql version, and adapts the
code to support this version. 
The important changes to PostgreSQL 16 compared to the PostgreSQL 15
changeset include the addition of a neon_rmgr instead of altering Postgres's
original WAL format.

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2023-09-12 15:11:32 +02:00
Alexander Bayandin
7e39a96441 scripts/flaky_tests.py: Improve flaky tests detection (#5094)
## Problem

We still need to rerun some builds manually because flaky tests weren't
detected automatically.
I found two reasons for it:
- If a test is flaky on a particular build type, on a particular
Postgres version, there's a high chance that this test is flaky on all
configurations, but we don't automatically detect such cases.
- We detect flaky tests only on the main branch, which requires manual
retrigger runs for freshly made flaky tests.
Both of them are fixed in the PR.

## Summary of changes
- Spread flakiness of a single test to all configurations
- Detect flaky tests in all branches (not only in the main)
- Look back only at  7 days of test history (instead of 10)
2023-08-29 11:53:24 +01:00
Alexander Bayandin
b98419ee56 Fix allure report overwriting for different Postgres versions (#4806)
## Problem

We've got an example of Allure reports from 2 different runners for the
same build that started to upload at the exact second, making one
overwrite another

## Summary of changes
- Use the Postgres version to distinguish artifacts (along with the
build type)
2023-07-26 15:19:18 +01:00
Alexander Bayandin
4580f5085a test_runner: run benchmarks in parallel (#4683)
## Problem

Benchmarks run takes about an hour on main branch (in a single job),
which delays pipeline results. And it takes another hour if we want to
restart the job due to some failures.
 
## Summary of changes
- Use `pytest-split` plugin to run benchmarks on separate CI runners in
4 parallel jobs
- Add `scripts/benchmark_durations.py` for getting benchmark durations
from the database to help `pytest-split` schedule tests more evenly. It
uses p99 for the last 10 days' results (durations).

The current distribution could be better; each worker's durations vary
from 9m to 35m, but this could be improved in consequent PRs.
2023-07-17 20:09:45 +01:00
Alex Chi Z
f276f21636 ci: use eu-central-1 bucket (#4315)
Probably increase CI success rate.

---------

Signed-off-by: Alex Chi <iskyzh@gmail.com>
2023-05-25 00:00:21 +03:00
Alexander Bayandin
1b2ece3715 Re-enable compatibility tests on Postgres 15 (#4274)
- Enable compatibility tests for Postgres 15
- Also add `PgVersion::v_prefixed` property to return the version number
with, _guess what,_ v-prefix!
2023-05-18 19:56:09 +01:00
Alexander Bayandin
131343ed45 Fix regress-tests job for Postgres 15 on release branch (#4253)
## Problem

Compatibility tests don't support Postgres 15 yet, but we're still
trying to upload compatibility snapshot (which we do not collect).

Ref
https://github.com/neondatabase/neon/actions/runs/4991394158/jobs/8940369368#step:4:38129

## Summary of changes

Add `pg_version` parameter to `run-python-test-set` actions and do not
upload compatibility snapshot for Postgres 15
2023-05-16 17:18:56 +01:00
Alexander Bayandin
a5615bd8ea Fix Allure reports for different benchmark jobs (#4229)
- Fix Allure report generation failure for Nightly Benchmarks
- Fix GitHub Autocomment for `run-benchmarks` label
(`build_and_test.yml::benchmarks` job)
2023-05-15 13:04:03 +01:00
Alexander Bayandin
bb06d281ea Run regressions tests on both Postgres 14 and 15 (#4192)
This PR adds tests runs on Postgres 15 and created unified Allure report
with results for all tests.

- Split `.github/actions/allure-report` into
`.github/actions/allure-report-store` and
`.github/actions/allure-report-generate`
- Add debug or release pytest parameter for all tests (depending on
`BUILD_TYPE` env variable)
- Add Postgres version as a pytest parameter for all tests (depending on
`DEFAULT_PG_VERSION` env variable)
- Fix `test_wal_restore` and `restore_from_wal.sh` to support path with
`[`/`]` in it (fixed by applying spellcheck to the script and fixing all
warnings), `restore_from_wal_archive.sh` is deleted as unused.
- All known failures on Postgres 15 marked with xfail
2023-05-12 15:28:51 +01:00
Alexander Bayandin
13e53e5dc8 GitHub Workflows: use '!cancelled' instead of 'success or failure' 2023-04-12 15:22:18 +01:00
Alexander Bayandin
105b8bb9d3 test_runner: automatically rerun flaky tests (#3880)
This PR adds a plugin that automatically reruns (up to 3 times) flaky
tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and
`pytest-rerunfailures` plugin.

As the first approximation we consider the test flaky if it has failed on 
the main branch in the last 10 days.

Flaky tests are fetched by `scripts/flaky_tests.py` script (it's
possible to use it in a standalone mode to learn which tests are flaky),
stored to a JSON file, and then the file is passed to the pytest plugin.
2023-04-04 12:21:54 +01:00
Rory de Zoete
cd5732d9d8 Gen3 runners (#3220)
https://github.com/neondatabase/cloud/issues/2738

Co-authored-by: Rory de Zoete <rdezoete@Rorys-Mac-Studio.fritz.box>
Co-authored-by: Rory de Zoete <rdezoete@RorysMacStudio.fritz.box>
2023-01-26 10:46:06 +01:00
Kirill Bulatov
8932d14d50 Revert "Run Python tests in 8 threads (#3206)" (#3264)
This reverts commit 56a4466d0a.

Seems that flackiness increased after this commit, while the time
decrease was a couple of seconds.
With every regular Python test spawing 1 etcd, 3 safekeepers, 1
pageserver, few CLI commands and post-run cleanup hooks, it might be
hard to run many such tests in parallel.

We could return to this later, after we consider alternative test
structure and/or CI runner structure.
2023-01-04 17:31:51 +02:00
Kirill Bulatov
56a4466d0a Run Python tests in 8 threads (#3206)
I have experimented with the runner threads number, and looks like 8
threads win us a few seconds.

Bumping the thread count more did not improve the situation much:
* 20 threads were not allowed by pytest
* 16 threads were flacking quite notably

My guess would be that all pageservers, safekeepers, and other nodes we
start occupy quite much of the CPU and other resources to make this
approach more scalable.
2023-01-02 14:34:06 +02:00
Alexander Bayandin
03190a2161 GitHub Actions: Do not create Allure report for cancelled jobs (#2813)
If a workflow is cancelled, do not delay its finishing by creating an allure
report.
2022-11-15 10:27:59 +00:00
Alexander Bayandin
175779c0ef GitHub Actions: fix non-parallel benchmarks on CI (#2787)
Fix non-parallel pytest run by setting `--dist=loadgroup` only for
pytest command with xdist enabled (`-n` is set)
2022-11-10 12:51:47 +00:00
Alexander Bayandin
c4f9f1dc6d Add data format forward compatibility tests (#2766)
Add `test_forward_compatibility`, which checks if it's going to
be possible to roll back a release to the previous version.
The test uses artifacts (Neon & Postgres binaries) from the previous
release to start Neon on the repo created by the current version. It
performs exactly the same checks as `test_backward_compatibility` does.

Single `ALLOW_BREAKING_CHANGES` env var got replaced by
`ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` &
`ALLOW_FORWARD_COMPATIBILITY_BREAKAGE` and can be set by `backward
compatibility breakage` and `forward compatibility breakage` labels
respectively.
2022-11-10 09:06:34 +00:00
Alexander Bayandin
128dc8d405 Nightly Benchmarks: fix workflow (#2708) 2022-10-27 19:26:10 +03:00
Alexander Bayandin
834ffe1bac Add data format backward compatibility tests (#2626) 2022-10-25 16:41:50 +02:00
Alexander Bayandin
3e65209a06 Nightly Benchmarks: use Postgres binaries from artifacts (#2501) 2022-09-23 12:50:36 +01:00
Anastasia Lubennikova
eb9200abc8 Use version-specific path in pytest CI script 2022-09-22 18:12:41 +03:00
Anastasia Lubennikova
1fa7d6aebf Use DEFAULT_PG_VERSION env in CI pytest 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
d45de3d58f update build scripts to match pg_distrib_dir versioning schema 2022-09-22 14:15:13 +03:00
Alexander Bayandin
bb3c66d86f github/workflows: Make publishing perf reports more configurable (#2440) 2022-09-19 22:28:51 +00:00
Kirill Bulatov
18dafbb9ba Remove deceiving rust version from the CI files 2022-09-09 22:32:00 +03:00
Anastasia Lubennikova
05e263d0d3 Prepare pg 15 support (build system and submodules) (#2337)
* Add submodule postgres-15

* Support pg_15 in pgxn/neon

* Renamed zenith -> neon in Makefile

* fix name of codestyle check

* Refactor build system to prepare for building multiple Postgres versions.

Rename "vendor/postgres" to "vendor/postgres-v14"

Change Postgres build and install directory paths to be version-specific:

- tmp_install/build -> pg_install/build/14
- tmp_install/* -> pg_install/14/*

And Makefile targets:

- "make postgres" -> "make postgres-v14"
- "make postgres-headers" -> "make postgres-v14-headers"
- etc.

Add Makefile aliases:

- "make postgres" to build "postgres-v14" and in future, "postgres-v15"
- similarly for "make postgres-headers"

Fix POSTGRES_DISTRIB_DIR path in pytest scripts

* Make postgres version a variable in codestyle workflow

* Support vendor/postgres-v15 in codestyle check workflow

* Support postgres-v15 building in Makefile

* fix pg version in Dockerfile.compute-node

* fix kaniko path

* Build neon extensions in version-specific directories

* fix obsolete mentions of vendor/postgres

* use vendor/postgres-v14 in Dockerfile.compute-node.legacy

* Use PG_VERSION_NUM to gate dependencies in inmem_smgr.c

* Use versioned ECR repositories and image names for compute-node.
The image name format is compute-node-vXX, where XX is postgres major version number.
For now only v14 is supported.
Old format unversioned name (compute-node) is left, because cloud repo depends on it.

* update vendor/postgres submodule url (zenith->neondatabase rename)

* Fix postgres path in python tests after rebase

* fix path in regress test

* Use separate dockerfiles to build compute-node:
Dockerfile.compute-node-v15 should be identical to Dockerfile.compute-node-v14 except for the version number.
This is a hack, because Kaniko doesn't support build ARGs properly

* bump vendor/postgres-v14 and vendor/postgres-v15

* Don't use Kaniko cache for v14 and v15 compute-node images

* Build compute-node images for different versions in different jobs

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2022-09-05 18:30:54 +03:00
Alexander Bayandin
ee0071e90d Fix nightly benchmark reports (#2392) 2022-09-05 14:30:37 +01:00
Sergey Melnikov
46c8a93976 Fix PERF_TEST_RESULT_CONNSTR for benchmark init (#2375) 2022-09-01 15:06:52 +03:00
Alexander Bayandin
d7c9cfe7bb Create Allure report for perf tests (#2326) 2022-08-31 16:15:26 +01:00
Heikki Linnakangas
3aca717f3d Reorganize python tests.
Merge batch_others and batch_pg_regress. The original idea was to
split all the python tests into multiple "batches" and run each batch
in parallel as a separate CI job. However, the batch_pg_regress batch
was pretty short compared to all the tests in batch_others. We could
split batch_others into multiple batches, but it actually seems better
to just treat them as one big pool of tests and use pytest's handle
the parallelism on its own. If we need to split them across multiple
nodes in the future, we could use pytest-shard or something else,
instead of managing the batches ourselves.

Merge test_neon_regress.py, test_pg_regress.py and test_isolation.py
into one file, test_pg_regress.py. Seems more clear to group all
pg_regress-based tests into one file, now that they would all be in
the same directory.
2022-08-30 18:25:38 +03:00
Alexander Bayandin
277f2d6d3d Report test results to Allure (#2229) 2022-08-22 11:21:50 +01:00
Thang Pham
dc52436a8f Fix bug when import large (>1GB) relations (#2172)
Resolves #2097 

- use timeline modification's `lsn` and timeline's `last_record_lsn` to determine the corresponding LSN to query data in `DatadirModification::get`
- update `test_import_from_pageserver`. Split the test into 2 variants: `small` and `multisegment`. 
  + `small` is the old test
  + `multisegment` is to simulate #2097 by using a larger number of inserted rows to create multiple segment files of a relation. `multisegment` is configured to only run with a `release` build
2022-08-12 09:24:20 +07:00
Alexander Bayandin
4cb1074fe5 github/workflows: Fix git dubious ownership (#2223) 2022-08-05 13:44:57 +01:00
Dmitry Rodionov
bc2cb5382b run real s3 tests in CI 2022-08-04 11:14:05 +03:00
Alexander Bayandin
71f39bac3d github/workflows: upload artifacts to S3 (#2071) 2022-08-02 13:57:26 +01:00
Alexander Bayandin
539007c173 github/workflows: make bash more strict (#2197) 2022-08-01 12:54:39 +01:00
Kirill Bulatov
45680f9a2d Drop CircleCI runs (#2082) 2022-07-25 18:30:30 +03:00
Heikki Linnakangas
98dd2e4f52 Use zstd and multiple threads to compress artifact tarball.
For faster and better compression.
2022-07-19 21:31:34 +03:00
Heikki Linnakangas
71753dd947 Remove github CI 'build_postgres' job, merging it with 'build_neon'
Simplifies the workflow. Makes the overall build a little faster, as
the build_postgres step doesn't need to upload the pg.tgz artifact,
and the build_neon step doesn't need to download it again.

This effectively reverts commit a490f64a68. That commit changed the
workflow so that the Postgres binaries were not included in the
neon.tgz artifact. With this commit, the pg.tgz artifact is gone, and
the Postgres binaries are part of neon.tgz again.
2022-07-19 21:31:22 +03:00
Heikki Linnakangas
a490f64a68 Don't include Postgres binaries in neon.tgz
neon.tgz artifact in the github workflow included the contents of
'tmp_install', but that seems pointless, because the same files are
included earlier already in the pg.tgz artifact.
2022-07-15 12:33:13 +03:00
Alexander Bayandin
07df7c2edd github/actions: fix storing perf data for main (#2038) 2022-07-06 13:15:15 +01:00