rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-25 23:29:59 +00:00

Author	SHA1	Message	Date
Alexander Bayandin	7e39a96441	scripts/flaky_tests.py: Improve flaky tests detection (#5094 ) ## Problem We still need to rerun some builds manually because flaky tests weren't detected automatically. I found two reasons for it: - If a test is flaky on a particular build type, on a particular Postgres version, there's a high chance that this test is flaky on all configurations, but we don't automatically detect such cases. - We detect flaky tests only on the main branch, which requires manual retrigger runs for freshly made flaky tests. Both of them are fixed in the PR. ## Summary of changes - Spread flakiness of a single test to all configurations - Detect flaky tests in all branches (not only in the main) - Look back only at 7 days of test history (instead of 10)	2023-08-29 11:53:24 +01:00
Alek Westover	99a1be6c4e	remove upload step from neon, it is in private repo now (#5085 )	2023-08-24 17:14:40 +03:00
Alexander Bayandin	b9f84b9609	Improve test results format (#4549 ) ## Problem The current test history format is a bit inconvenient: - It stores all test results in one row, so all queries should include subqueries which expand the test - It includes duplicated test results if the rerun is triggered manually for one of the test jobs (for example, if we rerun `debug-pg14`, then the report will include duplicates for other build types/postgres versions) - It doesn't have a reference to run_id, which we use to create a link to allure report Here's the proposed new format: ``` id BIGSERIAL PRIMARY KEY, parent_suite TEXT NOT NULL, suite TEXT NOT NULL, name TEXT NOT NULL, status TEXT NOT NULL, started_at TIMESTAMPTZ NOT NULL, stopped_at TIMESTAMPTZ NOT NULL, duration INT NOT NULL, flaky BOOLEAN NOT NULL, build_type TEXT NOT NULL, pg_version INT NOT NULL, run_id BIGINT NOT NULL, run_attempt INT NOT NULL, reference TEXT NOT NULL, revision CHAR(40) NOT NULL, raw JSONB COMPRESSION lz4 NOT NULL, ``` ## Summary of changes - Misc allure changes: - Update allure to 2.23.1 - Delete files from previous runs in HTML report (by using `sync --delete` instead of `mv`) - Use `test-cases/*.json` instead of `suites.json`, using this directory allows us to catch all reruns. - Until we migrated `scripts/flaky_tests.py` and `scripts/benchmark_durations.py` store test results in 2 formats (in 2 different databases).	2023-08-08 20:09:38 +01:00
Alek Westover	d005c77ea3	Tar Remote Extensions (#4715 ) Add infrastructure to dynamically load postgres extensions and shared libraries from remote extension storage. Before postgres start downloads list of available remote extensions and libraries, and also downloads 'shared_preload_libraries'. After postgres is running, 'compute_ctl' listens for HTTP requests to load files. Postgres has new GUC 'extension_server_port' to specify port on which 'compute_ctl' listens for requests. When PostgreSQL requests a file, 'compute_ctl' downloads it. See more details about feature design and remote extension storage layout in docs/rfcs/024-extension-loading.md --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alek Westover <alek.westover@gmail.com>	2023-08-02 12:38:12 +03:00
Alek Westover	b9a7a661d0	add list of public extensions and lookup table for libraries (#4807 )	2023-07-26 15:55:55 -04:00
Alek Westover	5f8fd640bf	Upload Test Remote Extensions (#4792 ) We need some real extensions in S3 to accurately test the code for handling remote extensions. In this PR we just upload three extensions (anon, kq_imcx and postgis), which is enough for testing purposes for now. In addition to creating and uploading the extension archives, we must generate a file `ext_index.json` which specifies important metadata about the extensions. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-26 15:24:03 +03:00
Joonas Koivunen	762a8a7bb5	python: more linting (#4734 ) Ruff has "B" class of lints, including B018 which will nag on useless expressions, related to #4719. Enable such lints and fix the existing issues. Most notably: - https://beta.ruff.rs/docs/rules/mutable-argument-default/ - https://beta.ruff.rs/docs/rules/assert-false/ --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-07-18 12:56:40 +03:00
Alexander Bayandin	4580f5085a	test_runner: run benchmarks in parallel (#4683 ) ## Problem Benchmarks run takes about an hour on main branch (in a single job), which delays pipeline results. And it takes another hour if we want to restart the job due to some failures. ## Summary of changes - Use `pytest-split` plugin to run benchmarks on separate CI runners in 4 parallel jobs - Add `scripts/benchmark_durations.py` for getting benchmark durations from the database to help `pytest-split` schedule tests more evenly. It uses p99 for the last 10 days' results (durations). The current distribution could be better; each worker's durations vary from 9m to 35m, but this could be improved in consequent PRs.	2023-07-17 20:09:45 +01:00
Alexander Bayandin	ed845b644b	Prevent unintentional Postgres submodule update (#4692 ) ## Problem Postgres submodule can be changed unintentionally, and these changes are easy to miss during the review. Adding a check that should prevent this from happening, the check fails `build-neon` job with the following message: ``` Expected postgres-v14 rev to be at '1414141414141414141414141414141414141414', but it is at '1144aee1661c79eec65e784a8dad8bd450d9df79' Expected postgres-v15 rev to be at '1515151515151515151515151515151515151515', but it is at '1984832c740a7fa0e468bb720f40c525b652835d' Please update vendors/revisions.json if these changes are intentional. ``` This is an alternative approach to https://github.com/neondatabase/neon/pull/4603 ## Summary of changes - Add `vendor/revisions.json` file with expected revisions - Add built-time check (to `build-neon` job) that Postgres submodules match revisions from `vendor/revisions.json` - A couple of small improvements for logs from https://github.com/neondatabase/neon/pull/4603 - Fixed GitHub autocomment for no tests was run case --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-12 15:12:37 +01:00
Alexander Bayandin	78a7f68902	Make pg_version and build_type regular parameters (#4311 ) ## Problem All tests have already been parametrised by Postgres version and build type (to have them distinguishable in the Allure report), but despite it, it's anyway required to have DEFAULT_PG_VERSION and BUILD_TYPE env vars set to corresponding values, for example to run`test_timeline_deletion_with_files_stuck_in_upload_queue[release-pg14-local_fs]` test it's required to set `DEFAULT_PG_VERSION=14` and `BUILD_TYPE=release`. This PR makes the test framework pick up parameters from the test name itself. ## Summary of changes - Postgres version and build type related fixtures now are function-scoped (instead of being sessions scoped before) - Deprecate `--pg-version` argument in favour of DEFAULT_PG_VERSION env variable (it's easier to parse) - GitHub autocomment now includes only one command with all the failed tests + runs them in parallel	2023-07-03 13:51:40 +01:00
Alexander Bayandin	e60b70b475	Fix data ingestion scripts (#4515 ) ## Problem When I switched `psycopg2.connect` from context manager to a regular function call in https://github.com/neondatabase/neon/pull/4382 I embarrassingly forgot about commit, so it doesn't really put data into DB 😞 ## Summary of changes - Enable autocommit for data ingestion scripts	2023-06-15 15:01:06 +03:00
Alexander Bayandin	9484b96d7c	GitHub Autocomment: do not fail the job (#4478 ) ## Problem If the script fails to generate a test summary, the step also fails the job/workflow (despite this could be a non-fatal problem). ## Summary of changes - Separate JSON parsing and summarisation into separate functions - Wrap functions calling into try..catch block, add an error message to GitHub comment and do not fail the step - Make `scripts/comment-test-report.js` a CLI script that can be run locally (mock GitHub calls) to make it easier to debug issues locally	2023-06-14 15:07:30 +01:00
Alexander Bayandin	a0b3990411	Retry data ingestion scripts on connection errors (#4382 ) ## Problem From time to time, we're catching a race condition when trying to upload perf or regression test results. Ref: - https://neondb.slack.com/archives/C03H1K0PGKH/p1685462717870759 - https://github.com/neondatabase/cloud/issues/3686 ## Summary of changes Wrap `psycopg2.connect` method with `@backoff.on_exception` contextmanager	2023-06-13 22:33:42 +01:00
Alexander Bayandin	daa79b150f	Code Coverage: store lcov report (#4358 ) ## Problem In the future, we want to compare code coverage on a PR with coverage on the main branch. Currently, we store only code coverage HTML reports, I suggest we start storing reports in "lcov info" format that we can use/parse in the future. Currently, the file size is ~7Mb (it's a text-based format and could be compressed into a ~400Kb archive) - More about "lcov info" format: https://manpages.ubuntu.com/manpages/jammy/man1/geninfo.1.html#files - Part of https://github.com/neondatabase/neon/issues/3543 ## Summary of changes - Change `scripts/coverage` to output lcov coverage to `report/lcov.info` file instead of stdout (we already upload the whole `report/` directory to S3)	2023-05-30 14:05:41 +01:00
Alexander Bayandin	339a3e3146	GitHub Autocomment: comment commits for branches (#4335 ) ## Problem GitHub Autocomment script posts a comment only for PRs. It's harder to debug failed tests on main or release branches. ## Summary of changes - Change the GitHub Autocomment script to be able to post a comment to either a PR or a commit of a branch	2023-05-26 14:49:42 +01:00
Alexander Bayandin	08e7d2407b	Storage: use Postgres 15 as default (#2809 )	2023-05-25 15:55:46 +01:00
Alexander Bayandin	35bb10757d	scripts/ingest_perf_test_result.py: increase connection timeout (#4329 ) ## Problem Sometimes default connection timeout is not enough to connect to the DB with perf test results, [an example](https://github.com/neondatabase/neon/actions/runs/5064263522/jobs/9091692868#step:10:332). Similar changes were made for similar scripts: - For `scripts/flaky_tests.py` in https://github.com/neondatabase/neon/pull/4096 - For `scripts/ingest_regress_test_result.py` in https://github.com/neondatabase/neon/pull/2367 (from the very begginning) ## Summary of changes - Connection timeout increased to 30s for `scripts/ingest_perf_test_result.py`	2023-05-24 10:11:24 -04:00
Alexander Bayandin	2a3f54002c	test_runner: update dependencies (#4328 ) ## Problem `pytest` 6 truncates error messages and this is not configured. It's fixed in `pytest` 7, it prints the whole message (truncating limit is higher) if `--verbose` is set (it's set on CI). ## Summary of changes - `pytest` and `pytest` plugins are updated to their latest versions - linters (`black` and `ruff`) are updated to their latest versions - `mypy` and types are updated to their latest versions, new warnings are fixed - while we're here, allure updated its latest version as well	2023-05-24 12:47:01 +01:00
Alexander Bayandin	7b9e8be6e4	GitHub Autocomment: add a command to run all failed tests (#4200 ) - Group tests by Postgres version - Merge different build types - Add a command to GitHub comment on how to rerun all failed tests (different command for different Postgres versions) - Restore a link to a test report in the build summary	2023-05-17 11:38:41 +01:00
Alexander Bayandin	a5615bd8ea	Fix Allure reports for different benchmark jobs (#4229 ) - Fix Allure report generation failure for Nightly Benchmarks - Fix GitHub Autocomment for `run-benchmarks` label (`build_and_test.yml::benchmarks` job)	2023-05-15 13:04:03 +01:00
Alexander Bayandin	bb06d281ea	Run regressions tests on both Postgres 14 and 15 (#4192 ) This PR adds tests runs on Postgres 15 and created unified Allure report with results for all tests. - Split `.github/actions/allure-report` into `.github/actions/allure-report-store` and `.github/actions/allure-report-generate` - Add debug or release pytest parameter for all tests (depending on `BUILD_TYPE` env variable) - Add Postgres version as a pytest parameter for all tests (depending on `DEFAULT_PG_VERSION` env variable) - Fix `test_wal_restore` and `restore_from_wal.sh` to support path with `[`/`]` in it (fixed by applying spellcheck to the script and fixing all warnings), `restore_from_wal_archive.sh` is deleted as unused. - All known failures on Postgres 15 marked with xfail	2023-05-12 15:28:51 +01:00
Alexander Bayandin	59510f6449	scripts/flaky_tests.py: use retriesStatusChange from Allure	2023-05-10 16:59:03 +01:00
Alexander Bayandin	7fc778d251	GitHub Autocomment: fix flaky test notifications	2023-05-10 16:59:03 +01:00
Alexander Bayandin	b114ef26c2	GitHub Autocomment: add a note if no tests were run (#4109 ) - Always (if not cancelled) add a comment to a PR - Mention in the comment if no tests were run / reports were not generated.	2023-05-03 15:38:49 +01:00
Alexander Bayandin	c4e1cafb63	scripts/flaky_tests.py: handle connection error (#4096 ) - Increase `connect_timeout` to 30s, which should be enough for most of the cases - If the script cannot connect to the DB (or any other `psycopg2.OperationalError` occur) — do not fail the script, log the error and proceed. Problems with fetching flaky tests shouldn't block the PR	2023-04-27 17:08:00 +01:00
Alexander Bayandin	957acb51b5	GitHub Autocomment: Fix the link to the latest commit (#3952 )	2023-04-04 19:06:10 +03:00
Alexander Bayandin	1d23b5d1de	Comment PR with test results (#3907 ) This PR adds posting a comment with test results. Each workflow run updates the comment with new results. The layout and the information that we post can be changed to our needs, right now, it contains failed tests and test which changes status after rerun (i.e. flaky tests)	2023-04-04 12:22:47 +01:00
Alexander Bayandin	105b8bb9d3	test_runner: automatically rerun flaky tests (#3880 ) This PR adds a plugin that automatically reruns (up to 3 times) flaky tests. Internally, it uses data from `TEST_RESULT_CONNSTR` database and `pytest-rerunfailures` plugin. As the first approximation we consider the test flaky if it has failed on the main branch in the last 10 days. Flaky tests are fetched by `scripts/flaky_tests.py` script (it's possible to use it in a standalone mode to learn which tests are flaky), stored to a JSON file, and then the file is passed to the pytest plugin.	2023-04-04 12:21:54 +01:00
Arthur Petukhovsky	7456e5b71c	Add script to collect state from safekeepers (#3835 ) Add an ansible script to collect https://github.com/neondatabase/neon/pull/3710 state JSON from all safekeeper nodes and upload them to a postgres table.	2023-03-28 17:04:02 +03:00
Alexander Bayandin	3d869cbcde	Replace flake8 and isort with ruff (#3810 ) - Introduce ruff (https://beta.ruff.rs/) to replace flake8 and isort - Update mypy and black	2023-03-14 13:25:44 +00:00
Arthur Petukhovsky	7ed9eb4a56	Add script for safekeeper tenants cleanup (#3452 ) This script can be used to remove tenant directories on safekeepers for projects which do not longer exist (deleted in the console). To run this script you need to upload it to safekeeper (i.e. with SSH), and run it with python3. Ansible can be used to run this script on multiple safekeepers. Fixes https://github.com/neondatabase/cloud/issues/3356	2023-02-09 13:28:20 +02:00
Christian Schwarz	590695e845	improve query param parsing - add parse_query_param() - use Cow<> where possible - move param parsing code to utils::http::request This was originally PR https://github.com/neondatabase/neon/pull/3502 which targeted a different branch. closes #3510	2023-02-01 14:11:12 +01:00
Joonas Koivunen	9bb6a6c77c	pysync: override PYTHON_KEYRING_BACKEND (#3480 ) This bothers me everytime I have to call `pysync`.	2023-02-01 14:07:23 +02:00
Christian Schwarz	8963d830fb	add script to download all remote layers (#3294 ) For use in production in case on-demand download turns out to be problematic during tenant_attach, or when we eventually introduce layer eviction. Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>	2023-01-25 16:55:25 +03:00
Heikki Linnakangas	7ff591ffbf	On-Demand Download The code in this change was extracted from #2595 (Heikki’s on-demand download draft PR). High-Level Changes - New RemoteLayer Type - On-Demand Download As An Effect Of Page Reconstruction - Breaking Semantics For Physical Size Metrics There are several follow-up work items planned. Refer to the Epic issue on GitHub: https://github.com/neondatabase/neon/issues/2029 closes https://github.com/neondatabase/neon/pull/3013 Co-authored-by: Kirill Bulatov <kirill@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> New RemoteLayer Type ==================== Instead of downloading all layers during tenant attach, we create RemoteLayer instances for each of them and add them to the layer map. On-Demand Download As An Effect Of Page Reconstruction ====================================================== At the heart of pageserver is Timeline::get_reconstruct_data(). It traverses the layer map until it has collected all the data it needs to produce the page image. Most code in the code base uses it, though many layers of indirection. Before this patch, the function would use synchronous filesystem IO to load data from disk-resident layer files if the data was not cached. That is not possible with RemoteLayer, because the layer file has not been downloaded yet. So, we do the download when get_reconstruct_data gets there, i.e., “on demand”. The mechanics of how the download is done are rather involved, because of the infamous async-sync-async sandwich problem that plagues the async Rust world. We use the new PageReconstructResult type to work around this. Its introduction is the cause for a good amount of code churn in this patch. Refer to the block comment on `with_ondemand_download()` for details. Breaking Semantics For Physical Size Metrics ============================================ We rename prometheus metric pageserver_{current,resident}_physical_size to reflect what this metric actually represents with on-demand download. This intentionally BREAKS existing grafana dashboard and the cost model data pipeline. Breaking is desirable because the meaning of this metrics has changed with on-demand download. See https://docs.google.com/document/d/12AFpvKY-7FZdR5a4CaD6Ir_rI3QokdCLSPJ6upHxJBo/edit# for how we will handle this breakage. Likewise, we rename the new billing_metrics’s PhysicalSize => ResidentSize. This is not yet used anywhere, so, this is not a breaking change. There is still a field called TimelineInfo::current_physical_size. It is now the sum of the layer sizes in layer map, regardless of whether local or remote. To compute that sum, we added a new trait method PersistentLayer::file_size(). When updating the Python tests, we got rid of current_physical_size_non_incremental. An earlier commit removed it from the OpenAPI spec already, so this is not a breaking change. test_timeline_size.py has grown additional assertions on the resident_physical_size metric.	2022-12-21 19:16:39 +01:00
Alexander Bayandin	486a985629	mypy: enable check_untyped_defs (#3142 ) Enable `check_untyped_defs` and fix warnings.	2022-12-21 09:38:42 +00:00
Kirill Bulatov	03695261fc	Test storage Docker images (#2767 ) Closes https://github.com/neondatabase/neon/issues/2697 Example: https://github.com/neondatabase/neon/actions/runs/3416774593/jobs/5688394855 Adds a set of tests on the storage Docker images before they are pushed to the public registries: * tests that pageserver binary has the correct version string (other binaries are built with the same library, so it should be enough to test one) * tests that the compose file set-up works and all components are able to start and perform a single SQL query (CREATE TABLE)	2022-11-11 19:42:26 +02:00
Joonas Koivunen	5112142997	fix: use different port for temporary postgres (#2743 ) `test_tenant_relocation` ends up starting a temporary postgres instance with a fixed port. the change makes the port configurable at scripts/export_import_between_pageservers.py and uses that in test_tenant_relocation.	2022-11-02 18:37:48 +00:00
mikecaat	259a5f356e	Add a docker-compose example file (#1943 ) (#2666 ) Co-authored-by: Masahiro Ikeda <masahiro.ikeda.us@hco.ntt.co.jp>	2022-10-26 13:59:25 +03:00
Heikki Linnakangas	538876650a	Merge 'local' and 'remote' parts of TimelineInfo into one struct. The 'local' part was always filled in, so that was easy to merge into into the TimelineInfo itself. 'remote' only contained two fields, 'remote_consistent_lsn' and 'awaits_download'. I made 'remote_consistent_lsn' an optional field, and 'awaits_download' is now false if the timeline is not present remotely. However, I kept stub versions of the 'local' and 'remote' structs for backwards-compatibility, with a few fields that are actively used by the control plane. They just duplicate the fields from TimelineInfo now. They can be removed later, once the control plane has been updated to use the new fields.	2022-10-14 18:37:14 +03:00
Kirill Bulatov	3e35f10adc	Add a script to reformat the project	2022-10-09 08:21:11 +03:00
Anastasia Lubennikova	7c1695e87d	fix psql path in export_import_between_pageservers script	2022-09-22 18:12:41 +03:00
Anastasia Lubennikova	0fde59aa46	use pg_version in python tests	2022-09-22 14:15:13 +03:00
Anastasia Lubennikova	03c606f7c5	Pass pg_version parameter to timeline import command. Add pg_version field to LocalTimelineInfo. Use pg_version in the export_import_between_pageservers script	2022-09-22 14:15:13 +03:00
Egor Suvorov	65a5010e25	Use custom `install` command in Makefile to speed up incremental builds (#2458 ) Fixes #1873: previously any run of `make` caused the `postgres-v15-headers` target to build. It copied a bunch of headers via `install -C`. Unfortunately, some origins were symlinks in the `./pg_install/build` directory pointing inside `./vendor/postgres-v15` (e.g. `pg_config_os.h` pointing to `linux.h`). GNU coreutils' `install` ignores the `-C` key for non-regular files and always overwrites the destination if the origin is a symlink. That in turn made Cargo rebuild the `postgres_ffi` crate and all its dependencies because it thinks that Postgres headers changed, even if they did not. That was slow. Now we use a custom script that wraps the `install` program. It handles one specific case and makes sure individual headers are never copied if their content did not change. Hence, `postgres_ffi` is not rebuilt unless there were some changes to the C code. One may still have slow incremental single-threaded builds because Postgres Makefiles spawn about 2800 sub-makes even if no files have been changed. A no-op build takes "only" 3-4 seconds on my machine now when run with `-j30`, and 20 seconds when run with `-j1`.	2022-09-16 15:44:02 +00:00
Kirill Bulatov	b8eb908a3d	Rename old project name references	2022-09-14 08:14:05 +03:00
Kirill Bulatov	698d6d0bad	Use stable coverage API with rustc 1.60	2022-09-12 13:44:54 +03:00
Alexander Bayandin	9e3136ea37	scripts/ingest_regress_test_result.py: fix json data insertion (#2408 )	2022-09-07 21:40:08 +01:00
Alexander Bayandin	83dca73f85	Store Allure tests statistics in database (#2367 )	2022-09-07 14:16:48 +01:00
Alexander Bayandin	39a3bcac36	test_runner: fix flake8 warnings	2022-08-22 14:57:09 +01:00

1 2 3

117 Commits