rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Suhas Thalanki	2790461441	Delete scripts/neon_grep.txt	2025-07-29 14:36:22 -04:00
William Huang	fae734b15c	[BRC-3082] Monitor commit LSN lag among active SKs (#1002 ) Commit `e69c3d632b` added metrics (used for alerting) to indicate whether Safekeepers are operating with a degraded quorum due to some of them being down. However, even if all SKs are active/reachable, we probably still want to raise an alert if some of them are really slow or otherwise lagging behind, as it is technically still a "degraded quorum" situation. Added a new field `max_active_safekeeper_commit_lag` to the `neon_perf_counters` view that reports the lag between the most advanced and most lagging commit LSNs among active Safekeepers. Commit LSNs are received from `AppendResponse` messages from SKs and recorded in the `WalProposer`'s shared memory state. Note that this lag is calculated among active SKs only to keep this alert clean. If there are inactive SKs the previous metric on active SKs will capture that instead. Note: @chen-luo_data pointed out during the PR review that we can probably get the benefits of this metric with PromQL query like `max (safekeeper_commit_lsn) by (tenant_id, timeline_id) - min(safekeeper_commit_lsn) by (tenant_id, timeline_id)` on existing metrics exported by SKs. Given that this code is already ready, @haoyu-huang_data suggested that I just check in this change anyway, as the reliability of prometheus metrics (and especially the aggregation operators when the result set cardinality is high) is somewhat questionable based on our prior experience. Added integration test `test_wal_acceptor.py::test_max_active_safekeeper_commit_lag`.	2025-07-25 15:08:49 -04:00
William Huang	4897921de6	[BRC-3051] Walproposer: Safekeeper quorum health metrics (#930 ) Today we don't have any indications (other than spammy logs in PG that nobody monitors) if the Walproposer in PG cannot connect to/get votes from all Safekeepers. This means we don't have signals indicating that the Safekeepers are operating at degraded redundancy. We need these signals. Added plumbing in PG extension so that the `neon_perf_counters` view exports the following gauge metrics on safekeeper health: - `num_configured_safekeepers`: The total number of safekeepers configured in PG. - `num_active_safekeepers`: The number of safekeepers that PG is actively streaming WAL to. An alert should be raised whenever `num_active_safekeepers` < `num_configured_safekeepers`. The metrics are implemented by adding additional state to the Walproposer shared memory keeping track of the active statuses of safekeepers using a simple array. The status of the safekeeper is set to active (1) after the Walproposer acquires a quorum and starts streaming data to the safekeeper, and is set to inactive (0) when the connection with a safekeeper is shut down. We scan the safekeeper status array in Walproposer shared memory when collecting the metrics to produce results for the gauges. Added coverage for the metrics to integration test `test_wal_acceptor.py::test_timeline_disk_usage_limit`.	2025-07-25 14:50:59 -04:00
William Huang	f7dd2108bb	[BRC-2905] Feed back PS-detected data corruption signals to SK and PG walproposer (#895 ) Data corruptions are typically detected on the pageserver side when it replays WAL records. However, since PS doesn't synchronously replay WAL records as they are being ingested through safekeepers, we need some extra plumbing to feed information about pageserver-detected corruptions during compaction (and/or WAL redo in general) back to SK and PG for proper action. We don't yet know what actions PG/SK should take upon receiving the signal, but we should have the detection and feedback in place. Add an extra `corruption_detected` field to the `PageserverFeedback` message that is sent from PS -> SK -> PG. It's a boolean value that is set to true when PS detects a "critical error" that signals data corruption, and it's sent in all `PageserverFeedback` messages. Upon receiving this signal, the safekeeper raises a `safekeeper_ps_corruption_detected` gauge metric (value set to 1). The safekeeper then forwards this signal to PG where a `ps_corruption_detected` gauge metric (value also set to 1) is raised in the `neon_perf_counters` view. Added an integration test in `test_compaction.py::test_ps_corruption_detection_feedback` that confirms that the safekeeper and PG can receive the data corruption signal in the `PageserverFeedback` message in a simulated data corruption.	2025-07-25 12:10:03 -04:00
Ivan Efremov	a29772bf6e	Create proxy-bench periodic run in CI (#12242 ) Currently run for test only via pushing to the test-proxy-bench branch. Relates to the #22681	2025-06-24 09:54:43 +00:00
Peter Bendel	87fc0a0374	periodic pagebench on hetzner runners (#11963 ) ## Problem - Benchmark periodic pagebench had inconsistent benchmarking results even when run with the same commit hash. Hypothesis is this was due to running on dedicated but virtualized EC instance with varying CPU frequency. - the dedicated instance type used for the benchmark is quite "old" and we increasingly get `An error occurred (InsufficientInstanceCapacity) when calling the StartInstances operation (reached max retries: 2): Insufficient capacity.` - periodic pagebench uses a snapshot of pageserver timelines to have the same layer structure in each run and get consistent performance. Re-creating the snapshot was a painful manual process (see https://github.com/neondatabase/cloud/issues/27051 and https://github.com/neondatabase/cloud/issues/27653) ## Summary of changes - Run the periodic pagebench on a custom hetzner GitHub runner with large nvme disk and governor set to defined perf profile - provide a manual dispatch option for the workflow that allows to create a new snapshot - keep the manual dispatch option to specify a commit hash useful for bi-secting regressions - always use the newest created snapshot (S3 bucket uses date suffix in S3 key, example `s3://neon-github-public-dev/performance/pagebench/shared-snapshots-2025-05-17/` - `--ignore` `test_runner/performance/pageserver/pagebench/test_pageserver_max_throughput_getpage_at_latest_lsn.py` in regular benchmarks run for each commit - improve perf copying snapshot by using `cp` subprocess instead of traversing tree in python ## Example runs with code in this PR: - run which creates new snapshot https://github.com/neondatabase/neon/actions/runs/15083408849/job/42402986376#step:19:55 - run which uses latest snapshot - https://github.com/neondatabase/neon/actions/runs/15084907676/job/42406240745#step:11:65	2025-05-23 09:37:19 +00:00
Alexander Bayandin	30a7dd630c	ruff: enable TC — flake8-type-checking (#11368 ) ## Problem `TYPE_CHECKING` is used inconsistently across Python tests. ## Summary of changes - Update `ruff`: 0.7.0 -> 0.11.2 - Enable TC (flake8-type-checking): https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc - (auto)fix all new issues	2025-03-30 18:58:33 +00:00
JC Grünhage	8dfa8f0b94	feat(ci): don't build storage on compute-releases and vice versa (#10841 ) ## Problem Release CI is slow, because we're doing unnecessary work, for example building compute images on storage releases and vice versa. ## Summary of changes - Extract tag generation into reusable workflow and extend it with fetching of previous component releases - Don't build neon images on compute releases and don't build compute images on proxy and storage releases - Reuse images from previous releases for tests on branches where we don't build those images ## Open questions - We differentiate between `TAG` and `COMPUTE_TAG` in a few places, but we don't differentiate between storage and proxy releases. Since they use the same image, this will continue to work, but I'm not sure this is what we want.	2025-02-26 17:17:26 +00:00
JC Grünhage	1f0dea9a1a	feat(ci): push container images to ghcr.io as well (#10945 ) ## Problem There's new rate-limits coming on docker hub. To reduce our reliance on docker hub and the problems the limits are going to cause for us, we want to prepare for this by also pushing our container images to ghcr.io ## Summary of changes Push our images to ghcr.io as well and not just docker hub.	2025-02-24 17:45:23 +00:00
JC Grünhage	e52e93797f	refactor(ci): use variables for AWS account IDs (#10886 ) ## Problem Our AWS account IDs are copy-pasted all over the place. A wrong paste might only be caught late if we hardcode them, but will get flagged instantly by actionlint if we access them from github actions variables. Resolves https://github.com/neondatabase/neon/issues/10787, follow-up for https://github.com/neondatabase/neon/pull/10613. ## Summary of changes Access AWS account IDs using Github Actions variables.	2025-02-19 12:34:41 +00:00
JC Grünhage	b77dd66bc4	refactor(ci): overhaul container image pushing (#10613 ) ## Problem Retagging container images and pushing container images taken from one registry to another is very tangled up with artifact building and not separated by component. This makes not building compute for storage releases and vice versa pretty tricky. To enable that, I want to clean up retagging and pushing of container images and then continue on making the pipelines for releases leaner by not building unnecessary things. ## Summary of changes - Add a reusable workflow that can push to ACR, ECR and Docker Hub, while being very flexible in terms of source and target images. This allows for retagging and pushing images between container registries. - Stop pushing images to registries aside of docker hub in the jobs that build the images - Split image pushing into 4 different jobs (not mentioning special cases): - neon-dev - neon-prod - compute-dev - compute-prod ## TODO - Consider also using this for `pin-build-tools-image`, as it's basically another instance of the same thing. ## Known limitations - The ECR part of this workflow supports authenticating to multiple AWS accounts and therefore multiple ECR endpoints, but the ACR part only supports one Azure Account. If someone with more knowledge on Azure can tell me whether an equivalent to https://github.com/aws-actions/amazon-ecr-login?tab=readme-ov-file#login-to-ecr-on-multiple-aws-accounts is easily possible, that'd be great. - The `image_map` input is a bit complex. It expects something along the lines of ``` { "docker.io/neondatabase/compute-node-v14:13196061314": [ "docker.io/neondatabase/compute-node-v14:13196061314", "369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-v14:13196061314", "neoneastus2.azurecr.io/neondatabase/compute-node-v14:13196061314" ], "docker.io/neondatabase/compute-node-v15:13196061314": [ "docker.io/neondatabase/compute-node-v15:13196061314", "369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-v15:13196061314", "neoneastus2.azurecr.io/neondatabase/compute-node-v15:13196061314" ] } ``` to map from source to target image. We have a small python step to generate this map for the 4 main image pushing jobs. The concrete example is taken from https://github.com/neondatabase/neon/actions/runs/13196061314/job/36838584098?pr=10613#step:3:6 and shortened to two images.	2025-02-12 17:54:51 +00:00
Alexander Lakhin	977781e423	Enable sanitizers for postgres v17 (#10401 ) Add a build with sanitizers (asan, ubsan) to the CI pipeline and run tests on it. See https://github.com/neondatabase/neon/issues/6053 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-02-06 12:53:43 +00:00
a-masterov	b6c0f66619	CI(autocomment): add the lfc state (#10121 ) ## Problem Currently, the report does not contain the LFC state of the failed tests. ## Summary of changes Added the LFC state to the link to the allure report. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-01-23 14:52:07 +00:00
Alexander Bayandin	e04dd3be0b	test_runner: rerun all failed tests (#9917 ) ## Problem Currently, we rerun only known flaky tests. This approach was chosen to reduce the number of tests that go unnoticed (by forcing people to take a look at failed tests and rerun the job manually), but it has some drawbacks: - In PRs, people tend to push new changes without checking failed tests (that's ok) - In the main, tests are just restarted without checking (understandable) - Parametrised tests become flaky one by one, i.e. if `test[1]` is flaky `, test[2]` is not marked as flaky automatically (which may or may not be the case). I suggest rerunning all failed tests to increase the stability of GitHub jobs and using the Grafana Dashboard with flaky tests for deeper analysis. ## Summary of changes - Rerun all failed tests twice at max	2024-11-28 19:02:57 +00:00
Alexander Bayandin	6f7aeaa1c5	test_runner: use LFC by default (#8613 ) ## Problem LFC is not enabled by default in tests, but it is enabled in production. This increases the risk of errors in the production environment, which were not found during the routine workflow. However, enabling LFC for all the tests may overload the disk on our servers and increase the number of failures. So, we try enabling LFC in one case to evaluate the possible risk. ## Summary of changes A new environment variable, USE_LFC is introduced. If it is set to true, LFC is enabled by default in all the tests. In our workflow, we enable LFC for PG17, release, x86-64, and disabled for all other combinations. --------- Co-authored-by: Alexey Masterov <alexeymasterov@neon.tech> Co-authored-by: a-masterov <72613290+a-masterov@users.noreply.github.com>	2024-11-25 09:01:05 +00:00
Alexander Bayandin	51d26a261b	build(deps): bump mypy from 1.3.0 to 1.13.0 (#9670 ) ## Problem We use a pretty old version of `mypy` 1.3 (released 1.5 years ago), it produces false positives for `typing.Self`. ## Summary of changes - Bump `mypy` from 1.3 to 1.13 - Fix new warnings and errors - Use `typing.Self` whenever we `return self`	2024-11-22 14:31:36 +00:00
Alexander Bayandin	8d1c44039e	Python 3.11 (#9515 ) ## Problem On Debian 12 (Bookworm), Python 3.11 is the latest available version. ## Summary of changes - Update Python to 3.11 in build-tools - Fix ruff check / format - Fix mypy - Use `StrEnum` instead of pair `str`, `Enum` - Update docs	2024-11-21 16:25:31 +00:00
Peter Bendel	982cb1c15d	Move logic for ingest benchmark from GitHub workflow into python testcase (#9762 ) ## Problem The first version of the ingest benchmark had some parsing and reporting logic in shell script inside GitHub workflow. it is better to move that logic into a python testcase so that we can also run it locally. ## Summary of changes - Create new python testcase - invoke pgcopydb inside python test case - move the following logic into python testcase - determine backpressure - invoke pgcopydb and report its progress - parse pgcopydb log and extract metrics - insert metrics into perf test database - add additional column to perf test database that can receive endpoint ID used for pgcopydb run to have it available in grafana dashboard when retrieving other metrics for an endpoint ## Example run https://github.com/neondatabase/neon/actions/runs/11860622170/job/33056264386	2024-11-19 09:46:46 +00:00
Tristan Partin	1e16221f82	Update psycopg2 to latest version for complete PG 17 support Update the types to match. Changes the cursor import to match the C bindings[0]. Link: https://github.com/python/typeshed/issues/12578 [0] Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-11-04 18:21:59 -06:00
Tristan Partin	5bd8e2363a	Enable all pyupgrade checks in ruff This will help to keep us from using deprecated Python features going forward. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-08 14:32:26 -05:00
Arpad Müller	fa3fc73c1b	Address 1.82 clippy lints (#8944 ) Addresses the clippy lints of the beta 1.82 toolchain. The `too_long_first_doc_paragraph` lint complained a lot and was addressed separately: #8941	2024-09-06 21:05:18 +02:00
Alexander Bayandin	e62cd9e121	CI(autocomment): add arch to build type (#8809 ) ## Problem Failed / flaky tests for different arches don't have any difference in GitHub Autocomment ## Summary of changes - Add arch to build type for GitHub autocomment	2024-08-23 14:29:11 +01:00
Alexander Bayandin	75175f3628	CI(build-and-test): run regression tests on arm (#8552 ) ## Problem We want to run our regression test suite on ARM. ## Summary of changes - run regression tests on release ARM builds - run `build-neon` (including rust tests) on debug ARM builds - add `arch` parameter to test to distinguish them in the allure report and in a database	2024-08-21 14:29:11 +01:00
Alexander Bayandin	c96593b473	Make Postgres 16 default version (#8745 ) ## Problem The default Postgres version is set to 15 in code, while we use 16 in most of the other places (and Postgres 17 is coming) ## Summary of changes - Run `benchmarks` job with Postgres 16 (instead of Postgres 14) - Set `DEFAULT_PG_VERSION` to 16 in all places - Remove deprecated `--pg-version` pytest argument - Update `test_metadata_bincode_serde_ensure_roundtrip` for Postgres 16	2024-08-20 10:46:58 +01:00
Alex Chi Z.	d6c79b77df	test(pageserver): add test_gc_feedback_with_snapshots (#8474 ) should be working after https://github.com/neondatabase/neon/pull/8328 gets merged. Part of https://github.com/neondatabase/neon/issues/8002 adds a new perf benchmark case that ensures garbages can be collected with branches --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-07-31 17:55:19 -04:00
Christian Schwarz	4eedb3b6f1	test suite: allow overriding default compaction algorithm via env var (#7747 ) This PR allows setting the `PAGESERVER_DEFAULT_TENANT_CONFIG_COMPACTION_ALGORITHM` env var to override the `tenant_config.compaction_algorithm` field in the initial `pageserver.toml` for all tests. I tested manually that this works by halting a test using pdb and inspecting the `effective_config` in the tenant status managment API. If the env var is set, the tests are parametrized by the `kind` tag field, allowing to do a matrix build in CI and let Allure summarize everything in a nice report. If the env var is not set, the tests are not parametrized. So, merging this PR doesn't cause problems for flaky test detection. In fact, it doesn't cause any runtime change if the env var is not set. There are some tests in the test suite that set used to override the entire tenant_config using `NeonEnvBuilder.pageserver_config_override`. Since config overrides are merged non-recursively, such overrides that don't specify `kind = ` cause a fallback to pageserver's built-in `DEFAULT_COMPACTION_ALGORITHM`. Such cases can be found using ``` ["']tenant_config\s*[='"] ``` We'll deal with these tests in a future PR. closes https://github.com/neondatabase/neon/issues/7555	2024-05-14 18:03:08 +02:00
Joonas Koivunen	d9dcbffac3	python: allow using allowed_errors.py (#7719 ) See #7718. Fix it by renaming all `types.py` to `common_types.py`. Additionally, add an advert for using `allowed_errors.py` to test any added regex.	2024-05-13 15:16:23 +03:00
Vlad Lazar	d551bfee09	pageserver: remove import/export script previously used for breaking format changes (#7458 ) ## Problem The `export_import_between_pageservers` script us to do major storage format changes in the past. If we have to do such breaking changes in the future this approach wouldn't be suitable because: 1. It doesn't scale to the current size of the fleet 2. It loses history ## Summary of changes Remove the script and its associated test. Keep `fullbasebackup` and friends because it's useful for debugging. Closes https://github.com/neondatabase/cloud/issues/11648	2024-04-23 11:36:56 +01:00
Alexander Bayandin	4f4f787119	Update staging hostname (#7347 ) ## Problem ``` Could not resolve host: console.stage.neon.tech ``` ## Summary of changes - replace `console.stage.neon.tech` with `console-stage.neon.build`	2024-04-09 12:03:46 +01:00
Alexander Bayandin	bcab344490	CI(flaky-tests): remove outdated restriction (#7345 ) ## Problem After switching the default pageserver io-engine to `tokio-epoll-uring` on CI, we tuned a query that finds flaky tests (in https://github.com/neondatabase/neon/pull/7077). It has been almost a month since then, additional query tuning is not required anymore. ## Summary of changes - Remove extra condition from flaky tests query - Also return back parameterisation to the query	2024-04-09 10:50:43 +01:00
macdoos	3b95e8072a	test_runner: replace all `.format()` with f-strings (#7194 )	2024-04-02 14:32:14 +01:00
Christian Schwarz	ad6f538aef	tokio-epoll-uring: use it for on-demand downloads (#6992 ) # Problem On-demand downloads are still using `tokio::fs`, which we know is inefficient. # Changes - Add `pagebench ondemand-download-churn` to quantify on-demand download throughput - Requires dumping layer map, which required making `history_buffer` impl `Deserialize` - Implement an equivalent of `tokio::io::copy_buf` for owned buffers => `owned_buffers_io` module and children. - Make layer file download sensitive to `io_engine::get()`, using VirtualFile + above copy loop - For this, I had to move some code into the `retry_download`, e.g., `sync_all()` call. Drive-by: - fix missing escaping in `scripts/ps_ec2_setup_instance_store` - if we failed in retry_download to create a file, we'd try to remove it, encounter `NotFound`, and `abort()` the process using `on_fatal_io_error`. This PR adds treats `NotFound` as a success. # Testing Functional - The copy loop is generic & unit tested. Performance - Used the `ondemand-download-churn` benchmark to manually test against real S3. - Results (public Notion page): https://neondatabase.notion.site/Benchmarking-tokio-epoll-uring-on-demand-downloads-2024-04-15-newer-code-03c0fdc475c54492b44d9627b6e4e710?pvs=4 - Performance is equivalent at low concurrency. Jumpier situation at high concurrency, but, still less CPU / throughput with tokio-epoll-uring. - It’s a win. # Future Work Turn the manual performance testing described in the above results document into a performance regression test: https://github.com/neondatabase/neon/issues/7146	2024-03-15 18:57:05 +00:00
Christian Schwarz	17a3c9036e	follow-up(#7077 ): adjust flaky-test-detection cutoff date for tokio-epoll-uring (#7090 ) Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-03-11 16:36:49 +00:00
Christian Schwarz	2b0f3549f7	default to tokio-epoll-uring in CI tests & on Linux (#7077 ) All of production is using it now as of https://github.com/neondatabase/aws/pull/1121 The change in `flaky_tests.py` resets the flakiness detection logic. The alternative would have been to repeat the choice of io engine in each test name, which would junk up the various test reports too much. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-03-11 14:35:59 +00:00
Alexander Bayandin	4d2bf55e6c	CI: temporary disable coverage report for regression tests (#6798 ) ## Problem The merging coverage data step recently started to be too flaky. This failure blocks staging deployment and along with the flakiness of regression tests might require 4-5-6 manual restarts of a CI job. Refs: - https://github.com/neondatabase/neon/issues/4540 - https://github.com/neondatabase/neon/issues/6485 - https://neondb.slack.com/archives/C059ZC138NR/p1704131143740669 ## Summary of changes - Disable code coverage report for functional tests	2024-02-19 11:07:27 +00:00
Alexander Bayandin	e65f0fe874	CI(benchmarks): make job split consistent across reruns (#6614 ) ## Problem We've got several issues with the current `benchmarks` job setup: - `benchmark_durations.json` file (that we generate in runtime to split tests into several jobs[0]) is not consistent between these jobs (and very not consistent with the file if we rerun the job). I.e. test selection for each job can be different, which could end up in missed tests in a test run. - `scripts/benchmark_durations` doesn't fetch all tests from the database (it doesn't expect any extra directories inside `test_runner/performance`) - For some reason, currently split into 4 groups ends up with the 4th group has no tests to run, which fails the job[1] - [0] https://github.com/neondatabase/neon/pull/4683 - [1] https://github.com/neondatabase/neon/issues/6629 ## Summary of changes - Generate `benchmark_durations.json` file once before we start `benchmarks` jobs (this makes it consistent across the jobs) and pass the file content through the GitHub Actions input (this makes it consistent for reruns) - `scripts/benchmark_durations` fix SQL query for getting all required tests - Split benchmarks into 5 jobs instead of 4 jobs.	2024-02-06 17:00:55 +00:00
Abhijeet Patil	01c57ec547	Removed Uploading of perf result to git repo 'zenith-perf-data' (#6590 ) ## Problem We were archiving the pref benchmarks to - neon DB - git repo `zenith-perf-data` As the pref batch ran in parallel when the uploading of results to zenith-perf-data` git repo resulted in merge conflicts. Which made the run flaky and as a side effect the build started failing . The problem is been expressed in https://github.com/neondatabase/neon/issues/5160 ## Summary of changes As the results were not used from the git repo it was redundant hence in this PR cleaning up the results uploading of of perf results to git repo The shell script `generate_and_push_perf_report.sh` was using a py script [git-upload](https://github.com/neondatabase/neon/compare/remove-perf-benchmark-git-upload?expand=1#diff-c6d938e7f060e487367d9dc8055245c82b51a73c1f97956111a495a8a86e9a33) and [scripts/generate_perf_report_page.py](https://github.com/neondatabase/neon/pull/6590/files#diff-81af2147e72d07e4cf8ee4395632596d805d6168ba75c71cab58db2659956ef8) which are not used anywhere else in repo hence also cleaning that up ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat the commit message to not include the above checklist	2024-02-05 10:08:20 +00:00
Alexander Bayandin	fa52cd575e	Remove old tests results and old coverage collection (#6376 ) ## Problem We have switched to new test results and new coverage results, so no need to collect these data in old formats. ## Summary of changes - Remove "Upload coverage report" for old coverage report - Remove "Store Allure test stat in the DB" for old test results format	2024-02-01 13:36:55 +00:00
Christian Schwarz	918b03b3b0	integrate tokio-epoll-uring as alternative VirtualFile IO engine (#5824 )	2024-01-26 09:25:07 +01:00
Christian Schwarz	996abc9563	pagebench-based GetPage@LSN performance test (#6214 )	2024-01-24 12:51:53 +01:00
Christian Schwarz	b76454ae41	add script to set up EC2 storage-optimized instance store for benchmarking (#6350 ) Been using this all the time in https://github.com/neondatabase/neon/pull/6214 Part of https://github.com/neondatabase/neon/issues/5771 Should consider this in https://github.com/neondatabase/neon/issues/6297	2024-01-12 19:25:17 +00:00
Alexander Bayandin	7de829e475	test_runner: replace black with ruff format (#6268 ) ## Problem `black` is slow sometimes, we can replace it with `ruff format` (a new feature in 0.1.2 [0]), which produces pretty similar to black style [1]. On my local machine (MacBook M1 Pro 16GB): ``` # `black` on main $ hyperfine "BLACK_CACHE_DIR=/dev/null poetry run black ." Benchmark 1: BLACK_CACHE_DIR=/dev/null poetry run black . Time (mean ± σ): 3.131 s ± 0.090 s [User: 5.194 s, System: 0.859 s] Range (min … max): 3.047 s … 3.354 s 10 runs ``` ``` # `ruff format` on the current PR $ hyperfine "RUFF_NO_CACHE=true poetry run ruff format" Benchmark 1: RUFF_NO_CACHE=true poetry run ruff format Time (mean ± σ): 300.7 ms ± 50.2 ms [User: 259.5 ms, System: 76.1 ms] Range (min … max): 267.5 ms … 420.2 ms 10 runs ``` ## Summary of changes - Replace `black` with `ruff format` everywhere - [0] https://docs.astral.sh/ruff/formatter/ - [1] https://docs.astral.sh/ruff/formatter/#black-compatibility	2024-01-05 15:35:07 +00:00
Arthur Petukhovsky	0f56104a61	Make sk_collect_dumps also possible with teleport (#4739 ) Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-12-20 15:06:55 +00:00
John Spray	e89e41f8ba	tests: update for tenant generations (#5449 ) ## Problem Some existing tests are written in a way that's incompatible with tenant generations. ## Summary of changes Update all the tests that need updating: this is things like calling through the NeonPageserver.tenant_attach helper to get a generation number, instead of calling directly into the pageserver API. There are various more subtle cases.	2023-12-07 12:27:16 +00:00
Arpad Müller	31a54d663c	Migrate links from wiki to notion (#5862 ) See the slack discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1696429688621489?thread_ts=1695647103.117499	2023-11-14 15:36:47 +00:00
bojanserafimov	51570114ea	Remove outdated and flaky perf test (#5762 )	2023-11-02 10:43:59 -04:00
Alexander Bayandin	4778b6a12e	Switch to querying new tests results DB (#5616 ) ## Problem We started to store test results in a new format in https://github.com/neondatabase/neon/pull/4549. This PR switches scripts to query this db. (we can completely remove old DB/ingestions scripts in a couple of weeks after the PR merged) ## Summary of changes - `scripts/benchmark_durations.py` query new database - `scripts/flaky_tests.py` query new database	2023-10-25 14:25:13 +01:00
Heikki Linnakangas	810a355b9d	Add script to download a basebackup from pageserver. (#5344 ) I used this while investigating a production issue, and seems like it could come handy in the future, too.	2023-09-22 11:11:28 +00:00
Alexander Bayandin	15fd188fd6	Fix GitHub Autocomment for `ci-run/pr`s (#5268 ) ## Problem When PR `ci-run/pr-*` is created the GitHub Autocomment with test results are supposed to be posted to the original PR, currently, this doesn't work. I created this PR from a personal fork to debug and fix the issue. ## Summary of changes - `scripts/comment-test-report.js`: use `pull_request.head` instead of `pull_request.base` 🤦	2023-09-10 20:06:10 +01:00
Alexander Bayandin	028fbae161	Miscellaneous fixes for tests-related things (#5259 ) ## Problem A bunch of fixes for different test-related things ## Summary of changes - Fix test_runner/pg_clients (`subprocess_capture` return value has changed) - Do not run create-test-report if check-permissions failed for not cancelled jobs - Fix Code Coverage comment layout after flaky tests. Add another healing "\n" - test_compatibility: add an instruction for local run Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-08 16:28:09 +01:00

1 2 3

121 Commits