rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-07 14:10:43 +00:00

Files

Fedor Dikarev 9a6ace9bde introduce new runners: unit-perf and use them for benchmark jobs (#11409 )

## Problem
Benchmarks results are inconsistent on existing small-metal runners

## Summary of changes
Introduce new `unit-perf` runners, and lets run benchmark on them.

The new hardware has slower, but consistent, CPU frequency - if run with
default governor schedutil.
Thus we needed to adjust some testcases' timeouts and add some retry
steps where hard-coded timeouts couldn't be increased without changing
the system under test.
-
[wait_for_last_record_lsn](6592d69a67/test_runner/fixtures/pageserver/utils.py (L193))
1000s -> 2000s
-
[test_branch_creation_many](https://github.com/neondatabase/neon/pull/11409/files#diff-2ebfe76f89004d563c7e53e3ca82462e1d85e92e6d5588e8e8f598bbe119e927)
1000s
-
[test_ingest_insert_bulk](https://github.com/neondatabase/neon/pull/11409/files#diff-e90e685be4a87053bc264a68740969e6a8872c8897b8b748d0e8c5f683a68d9f)
- with back throttling disabled compute becomes unresponsive for more
than 60 seconds (PG hard-coded client authentication connection timeout)
-
[test_sharded_ingest](https://github.com/neondatabase/neon/pull/11409/files#diff-e8d870165bd44acb9a6d8350f8640b301c1385a4108430b8d6d659b697e4a3f1)
600s -> 1200s

Right now there are only 2 runners of that class, and if we decide to go
with them, we have to check how much that type of runners we need, so
jobs not stuck with waiting for that type of runners available.

However we now decided to run those runners with governor performance
instead of schedutil.
This achieves almost same performance as previous runners but still
achieves consistent results for same commit

Related issue to activate performance governor on these runners
https://github.com/neondatabase/runner/pull/138

## Verification that it helps

### analyze runtimes on new runner for same commit

Table of runtimes for the same commit on different runners in
[run](https://github.com/neondatabase/neon/actions/runs/14417589789)

| Run | Benchmarks (1) | Benchmarks (2) |Benchmarks (3) |Benchmarks (4)
| Benchmarks (5) |
|--------|--------|---------|---------|---------|---------|
| 1 | 1950.37s | 6374.55s |  3646.15s |  4149.48s |  2330.22s | 
| 2 | - | 6369.27s |  3666.65s |  4162.42s |  2329.23s | 
| Delta % |  - |  0,07 %  | 0,5 %   |   0,3 % | 0,04 %   |
| with governor performance | 1519.57s |  4131.62s |  - | -  |  - |
| second run gov. perf. | 1513.62s |  4134.67s |  - | -  |  - |
| Delta % |  0,3 % |  0,07 %  |  -  |  - | -   |
| speedup gov. performance | 22 % |  35 % |  - | -  |  - |
| current desktop class hetzner runners (main) | 1487.10s | 3699.67s | -
| - | - |
| slower than desktop class | 2 % |  12 % |  - | -  |  - |


In summary, the runtimes for the same commit on this hardware varies
less than 1 %.

---------

Co-authored-by: BodoBolero <peterbendel@neon.tech>

2025-04-15 08:21:44 +00:00

large_synthetic_oltp

Extend large tenant OLTP workload ... (#11166 )

2025-03-16 14:04:48 +00:00

many_relations

Run pgbench on 10 GB scale factor on database with n relations (e.g. 10k) (#10172 )

2024-12-19 10:25:44 +00:00

pageserver

pageserver: add metrics for get page batch breaking reasons (#11545 )

2025-04-14 13:24:47 +00:00

pgvector

Enable all pyupgrade checks in ruff

2024-10-08 14:32:26 -05:00

tpc-h

Nightly Benchmarks: add TPC-H benchmark (#2978 )

2022-12-08 15:32:49 +00:00

__init__.py

Enable all pyupgrade checks in ruff

2024-10-08 14:32:26 -05:00

README.md

page_service: add benchmark for batching (#9820 )

2024-11-25 15:52:39 +00:00

test_branch_creation.py

introduce new runners: unit-perf and use them for benchmark jobs (#11409 )

2025-04-15 08:21:44 +00:00

test_branching.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_bulk_insert.py

test_bulk_insert: fix typing for PgVersion (#9854 )

2024-11-22 16:13:53 +00:00

test_bulk_tenant_create.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_bulk_update.py

use a prod-like shared_buffers size for some perf unit tests (#11373 )

2025-04-02 10:43:05 +00:00

test_compaction.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_compare_pg_stats.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_compute_ctl_api.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_compute_startup.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_copy.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_cumulative_statistics_persistence.py

Bodobolero/test cum stats persistence (#10995 )

2025-02-27 10:45:13 +00:00

test_dup_key.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_gc_feedback.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_gist_build.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_hot_page.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_hot_table.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_ingest_insert_bulk.py

introduce new runners: unit-perf and use them for benchmark jobs (#11409 )

2025-04-15 08:21:44 +00:00

test_ingest_logical_message.py

use a prod-like shared_buffers size for some perf unit tests (#11373 )

2025-04-02 10:43:05 +00:00

test_latency.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_layer_map.py

Record more timings in test_layer_map (#10670 )

2025-02-05 17:00:26 +00:00

test_logical_replication.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_parallel_copy_to.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_parallel_copy.py

use a prod-like shared_buffers size for some perf unit tests (#11373 )

2025-04-02 10:43:05 +00:00

test_perf_ingest_using_pgcopydb.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_perf_many_relations.py

use a prod-like shared_buffers size for some perf unit tests (#11373 )

2025-04-02 10:43:05 +00:00

test_perf_olap.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_perf_oltp_large_tenant.py

large tenant oltp benchmark: reindex with downtime (remove concurrently) (#11498 )

2025-04-09 17:11:00 +00:00

test_perf_pgbench.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_perf_pgvector_queries.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_physical_replication.py

Fix logging in nightly physical replication benchmarks (#11541 )

2025-04-14 13:57:33 +00:00

test_random_writes.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_seqscans.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_sharded_ingest.py

introduce new runners: unit-perf and use them for benchmark jobs (#11409 )

2025-04-15 08:21:44 +00:00

test_sharding_autosplit.py

storcon: add repeated auto-splits and initial splits (#11122 )

2025-03-20 15:43:57 +00:00

test_storage_controller_scale.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_wal_backpressure.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

test_write_amplification.py

ruff: enable TC — flake8-type-checking (#11368 )

2025-03-30 18:58:33 +00:00

README.md

Running locally

First make a release build. The -s flag silences a lot of output, and makes it easier to see if you have compile errors without scrolling up. BUILD_TYPE=release CARGO_BUILD_FLAGS="--features=testing" make -s -j8

You may also need to run ./scripts/pysync.

Then run the tests DEFAULT_PG_VERSION=16 NEON_BIN=./target/release poetry run pytest test_runner/performance

Some handy pytest flags for local development:

-x tells pytest to stop on first error
-s shows test output
-k selects a test to run
--timeout=0 disables our default timeout of 300s (see setup.cfg)
--preserve-database-files to skip cleanup
--out-dir to produce a JSON with the recorded test metrics

What performance tests do we have and how we run them

Performance tests are built using the same infrastructure as our usual python integration tests. There are some extra fixtures that help to collect performance metrics, and to run tests against both vanilla PostgreSQL and Neon for comparison.

Tests that are run against local installation

Most of the performance tests run against a local installation. This is not very representative of a production environment. Firstly, Postgres, safekeeper(s) and the pageserver have to share CPU and I/O resources, which can add noise to the results. Secondly, network overhead is eliminated.

In the CI, the performance tests are run in the same environment as the other integration tests. We don't have control over the host that the CI runs on, so the environment may vary widely from one run to another, which makes the results across different runs noisy to compare.

Remote tests

There are a few tests that marked with pytest.mark.remote_cluster. These tests do not set up a local environment, and instead require a libpq connection string to connect to. So they can be run on any Postgres compatible database. Currently, the CI runs these tests on our staging and captest environments daily. Those are not an isolated environments, so there can be noise in the results due to activity of other clusters.

Noise

All tests run only once. Usually to obtain more consistent performance numbers, a test should be repeated multiple times and the results be aggregated, for example by taking min, max, avg, or median.

Results collection

Local test results for main branch, and results of daily performance tests, are stored in a neon project deployed in production environment. There is a Grafana dashboard that visualizes the results. Here is the dashboard. The main problem with it is the unavailability to point at particular commit, though the data for that is available in the database. Needs some tweaking from someone who knows Grafana tricks.

There is also an inconsistency in test naming. Test name should be the same across platforms, and results can be differentiated by the platform field. But currently, platform is sometimes included in test name because of the way how parametrization works in pytest. I.e. there is a platform switch in the dashboard with neon-local-ci and neon-staging variants. I.e. some tests under neon-local-ci value for a platform switch are displayed as Test test_runner/performance/test_bulk_insert.py::test_bulk_insert[vanilla] and Test test_runner/performance/test_bulk_insert.py::test_bulk_insert[neon] which is highly confusing.