rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-13 16:32:56 +00:00

Author	SHA1	Message	Date
Dmitry Rodionov	c75bc9b8b0	Change benchmark plugin layout so pytest loads it properly when running all tests (not necessary performance ones) resolves #837	2021-11-04 16:33:31 +03:00
Dmitry Rodionov	c6172dae47	implement performance tests against our staging environment tests are based on self-hosted runner which is physically close to our staging deployment in aws, currently tests consist of various configurations of pgbenchi runs. Also these changes rework benchmark fixture by removing globals and allowing to collect reports with desired metrics and dump them to json for further analysis. This is also applicable to usual performance tests which use local zenith binaries.	2021-11-04 02:15:46 +03:00
Dmitry Rodionov	5bc09074ea	add a flag to avoid non incremental size calculation in pageserver http api This calculation is not that heavy but it is needed only in tests, and in case the number of tenants/timelines is high the calculation can take noticeable time. Resolves https://github.com/zenithdb/zenith/issues/804	2021-10-27 13:30:34 +03:00
Heikki Linnakangas	1bc917324d	Use -m immediate for 'immediate' shutdown	2021-10-27 10:49:38 +03:00
Heikki Linnakangas	af429fb401	Improve 'zenith' CLI utility for safekeepers and a config file. The 'zenith' CLI utility can now be used to launch safekeepers. By default, one safekeeper is configured. There are new 'safekeeper start/stop' subcommands to manage the safekeepers. Each safekeeper is given a name that can be used to identify the safekeeper to start/stop with the 'zenith start/stop' commands. The safekeeper data is stored in '.zenith/safekeepers/<name>'. The 'zenith start' command now starts the pageserver and also all safekeepers. 'zenith stop' stops pageserver, all safekeepers, and all postgres nodes. Introduce new 'zenith pageserver start/stop' subcommands for starting/stopping just the page server. The biggest change here is to the 'zenith init' command. This adds a new 'zenith init --config=<path to toml file>' option. It takes a toml config file that describes the environment. In the config file, you can specify options for the pageserver, like the pg and http ports, and authentication. For each safekeeper, you can define a name and the pg and http ports. If you don't use the --config option, you get a default configuration with a pageserver and one safekeeper. Note that that's different from the previous default of no safekeepers. Any fields that are omitted in the configuration file are filled with defaults. You can also specify the initial tenant ID in the config file. A couple of sample config files are added in the control_plane/ directory. The --pageserver-pg-port, --pageserver-http-port, and --pageserver-auth options to 'zenith init' are removed. Use a config file instead. Finally, change the python test fixtures to use the new 'zenith' commands and the config file to describe the environment.	2021-10-27 10:49:38 +03:00
Heikki Linnakangas	41d48719e1	In python tests, skip ports that are already in use. We've seen some failures with "Address already in use" errors in the tests. It's not clear why, perhaps some server processes are not cleaned up properly after test, or maybe the socket is still in TIME_WAIT state. In any case, let's make the tests more robust by checking that the port is free, before trying to use it.	2021-10-27 00:46:24 +03:00
Heikki Linnakangas	66ec135676	Refactor pytest fixtures Instead of having a lot of separate fixtures for setting up the page server, the compute nodes, the safekeepers etc., have one big ZenithEnv object that encapsulates the whole environment. Every test either uses a shared "zenith_simple_env" fixture, which contains the default setup of a pageserver with no authentication, and no safekeepers. Tests that want to use safekeepers or authentication set up a custom test-specific ZenithEnv fixture. Gathering information about the whole environment into one object makes some things simpler. For example, when a new compute node is created, you no longer need to pass the 'wal_acceptors' connection string as argument to the 'postgres.create_start' function. The 'create_start' function fetches that information directly from the ZenithEnv object.	2021-10-25 14:14:47 +03:00
Heikki Linnakangas	28af3e5008	Remove some unnecessary fixture arguments	2021-10-25 14:14:45 +03:00
Heikki Linnakangas	f337d73a6c	Rearrange output dirs a bit Each test now gets its own test output directory, like 'test_output/test_foobar', even when TEST_SHARED_FIXTURES is used. When TEST_SHARED_FIXTURES is not used, the zenith repo for each test is created under a 'repo' subdir inside the test output dir, e.g. 'test_output/test_foobar/repo'	2021-10-25 14:14:43 +03:00
Heikki Linnakangas	57ce541521	Remove unnecessary 'pg_bin' object from 'postgres' fixture. It was only used in check_restored_datadir_content(), and that function can construct it easily from the other information it has.	2021-10-25 14:14:41 +03:00
Heikki Linnakangas	e14f24034f	Turn a few path-fixtures to global variables This way, they're readily accessible from the classes and functions that are not themselves fixtures	2021-10-25 14:14:38 +03:00
Egor Suvorov	ff563ff080	test_runner: fix mypy errors and force it on CI (#774 ) * Fix bugs found by mypy * Add some missing types and runtime checks, remove unused code * Make ZenithPageserver start right away for better type safety * Add `types-` packages to Pipfile Pin mypy version and run it on CircleCI	2021-10-21 13:51:54 +03:00
anastasia	7f9d2a7d05	Change 'zenith tenant list' API to return tenant state added in `0dc7a3fc`	2021-10-21 11:04:22 +03:00
Arthur Petukhovsky	13f4e173c9	Wait for safekeepers to catch up in test_restarts_under_load (#776 )	2021-10-20 14:42:53 +03:00
Egor Suvorov	eb706bc9f4	Force yapf (Python code formatter) in CI (#772 ) * Add yapf run to CircleCI * Pin yapf version * Enable `SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES` setting * Reformat all existing code with slight manual adjustments * test_runner/README: note that yapf is forced	2021-10-19 20:13:47 +03:00
Dmitry Rodionov	798df756de	suppress FileNotFound exception instead of missing_ok=True because the latter is added in python 3.8 and we claim to support >3.6	2021-10-19 17:13:42 +03:00
Dmitry Rodionov	732d13fe06	use cached-property package because python<3.8 doesnt have cached_property in functools	2021-10-19 17:13:42 +03:00
Heikki Linnakangas	c2b468c958	Separate node name from the branch name in ComputeControlPlane This is in preparation for supporting read-only nodes. You can launch multiple read-only nodes on the same brach, so we need an identifier for each node, separate from the branch name.	2021-10-19 09:48:10 +03:00
Arseny Sher	de744a44dd	Add /timeline http request to safekeeper returning its status. Which is mainly generational state (terms) and useful LSNs. Also add /status basic healthcheck request which is now used in tests to determine the safekeeper is up; this fixes #726. ref #115	2021-10-14 19:02:38 +03:00
Arthur Petukhovsky	4b87acb1f6	Use logging in python tests (#674 ) * Use logging in python tests * Use f-strings for logs * Don't log test output while running * Use only pytest logging handler * Add more info about pytest logging	2021-10-14 13:10:09 +03:00
Egor Suvorov	23f4c0a742	Rename `wal_acceptor` binary to `safekeeper` (#740 ), stage 1/2 * Rename wal_acceptor binary to safekeeper * Rename wal_acceptor.pid and wal_acceptor.log to safekeeper.pid and safekeeper.log * Change some mentions of WAL acceptor to safekeeper * Dockerfile: alias wal_acceptor to safekeeper temporarily until internal scripts are updated	2021-10-12 22:03:06 +03:00
anastasia	d7c9dd06f4	Implement graceful shutdown at 'pageserver stop': - perform checkpoint for each tenant repository. - wait for the completion of all threads. Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.	2021-10-11 13:35:01 +03:00
Egor Suvorov	403d9779d9	safekeeper: add initial metrics and HTTP handler (#699 , #541 ) * `wal_acceptor`: add HTTP handler, /metrics endpoint only, no authentication * Two gauges are currently reported: `flush_lsn` and `commit_lsn` * Add `DEFAULT_PG_LISTEN_PORT` and `DEFAULT_PG_LISTEN_PORT` consts for uniformity	2021-10-08 18:55:41 +03:00
Patrick Insinger	b3b8f18f61	tests - fix get_timeline_size signature	2021-10-07 15:38:22 -07:00
Heikki Linnakangas	c660926a06	Refactor duplicated code to get on-disk timeline size in tests. Move it to a common function. In the passing, remove the obsolete check to exclude the 'wal' directory. The 'wal' directory is no more.	2021-10-08 00:34:26 +03:00
Heikki Linnakangas	db4059cd6d	Measure peak memory usage in perf test. Another useful metric to keep an eye on.	2021-10-07 18:03:20 +03:00
Egor Suvorov	05fe39088b	Readme updates based on a fresher Ubuntu installation experience (#627 )	2021-10-05 19:19:25 +03:00
Egor Suvorov	7e190d72a5	Make `pageserver_` prefix for common metric names configurable (#681 )	2021-10-05 19:06:44 +03:00
Arthur Petukhovsky	d6fc74a412	Various fixes for test_sync_safekeepers (#668 ) * Send ProposerGreeting manually in tests * Move test_sync_safekeepers to test_wal_acceptor.py * Capture test_sync_safekeepers output * Add comment for handle_json_ctrl * Save captured output in CI	2021-09-28 19:25:05 +03:00
Arthur Petukhovsky	d4e037f1e7	Support for `--sync-safekeepers` in tests (#647 ) New command has been added to append specially crafted records in safekeeper WAL. This command takes json for append, encodes LogicalMessage based on json fields, and processes new AppendRequest to append and commit WAL in safekeeper. Python test starts up walkeepers and creates config for walproposer, then appends WAL and checks --sync-safekeepers works without errors. This test is simplest one, more useful test cases (like in #545) for different setups will be added soon.	2021-09-24 13:19:59 +03:00
Arthur Petukhovsky	8ebf2fe550	Add test for acceptor restarts under load (#591 ) In this test safekeepers are restarted one by one, while bank transactions are executed and validated in the background. Bank transactions consist of balance transfers and log writes. In the end balance sum should remain the same and there should be progress from every client, when 2 of 3 safekeeper nodes are up.	2021-09-22 11:59:20 +03:00
Dmitry Rodionov	b7aac87ec1	fix port distribution so services do not use ephemeral ports	2021-09-20 18:44:42 +03:00
Heikki Linnakangas	c2af6d98db	Don't print 'pg_controldata' output after every startup in tests. It's not interesting for most tests, and clutters the output. If there are individual tests where it is worthwhole, let's add pg_controldata calls to those tests, but I don't think it's needed for now.	2021-09-17 20:04:29 +03:00
Dmitry Ivanov	7b3fb760fa	[test_runner] psql should be oblivious to user's preferences This makes psql ignore $HOME/.psqlrc	2021-09-17 14:16:23 +03:00
Dmitry Rodionov	01ef2baef0	show more context for zenith cli run errors	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	9563336d9a	Bring back check for interferring processes, add more comments and descriptive errors	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	4ebe643d0c	Support parallel test running for python tests Support is done via pytest-xdist plugin. To use the feature add -n<concurrency> to pytest invocation e.g. pytest -n8 to run 8 tests in parallel. Changes in code are mostly about ports assigning. Previously port for pageserver was hardcoded without the ability to override through zenith cli and ports for started compute nodes were calculated twice, in zenith cli and in test code. Now zenith cli supports port arguments for pageserver and compute nodes to be passed explicitly. Tests are modified in such a way that each worker gets a non overlapping port range which can be configured and now contains 100 ports. These ports are distributed to test services (pageserver, wal acceptors, compute nodes) so they can work independently.	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	b4ecae33e4	add incremental tracking of logical timeline size In order to exclude problems with synchronizing disk and memory logical size is not stored in metadata on disk. It is calculated on timeline "start" by scanning the contents of layered repo and then size is maintained via an atomic variable. This patch also adds new endpoint to pageserver http api: branch detail. It allows retrieval of a particular branch info by its name. Size info is also added to the response of the endpoint and used in tests.	2021-09-07 18:25:15 +03:00
anastasia	6f0c065743	preserve filediff artifacts in CI	2021-09-07 16:58:21 +03:00
anastasia	94c50e3e90	Fix check_restored_datadir_content(). Call 'basebackup' command directly, instead of relying on CLI	2021-09-07 16:58:21 +03:00
anastasia	eb3fd7a8da	print diff for mismatching files in check_restored_datadir_content()	2021-09-06 18:21:23 +03:00
anastasia	1e172230ce	Add test funciton to compare files in compute nodes to catch bugs in SLRU replay. Compare files in existing compute node's pgdata with fresh basebackup at the same lsn. We expect that content is identical, except tmp files Use it after some tests.	2021-09-06 18:21:23 +03:00
Stas Kelvich	ed4eed0a19	Make use of `postgres --sync-safekeepers` in tests and CLI. Change control plane code to call `postgres --sync-safekeepers` before compute node start when safekeepers are enabled. Now `pg create` will create an empty data directory with the proper config file. Subsequent `pg start` will run `sync-safekeepers` and will call basebackup with the resulting LSN. Also change few tests to accommodate this new behavior.	2021-09-06 13:06:20 +03:00
Heikki Linnakangas	c6678c5dea	Include # of bytes written in pgbench benchmark result Now that the page server collects this metric (since commit `212920e47e`), let's include it in the performance test results The new metric looks like this: performance/test_perf_pgbench.py . [100%] --------------- Benchmark results ---------------- test_pgbench.init: 6.784 s test_pgbench.pageserver_writes: 466 MB <---- THIS IS NEW test_pgbench.5000_xacts: 8.196 s test_pgbench.size: 163 MB =============== 1 passed in 21.00s ===============	2021-09-03 09:00:26 +03:00
Kirill Bulatov	0e4cbe0165	Fix some typos	2021-09-02 17:27:18 +03:00
Stas Kelvich	ddd2c83c64	Change test_restart_compute to expose safekeeper problems. Make this test look like 'test_compute_restart.sh' by @ololobus, which was surprisingly good for checking safekeepers behavior. This test adds an intermediate compute node start with bulk select that causes a lot of FPI's and select itself wouldn't wait for all that WAL to be replicated. So if we kill compute node right after that we end up with lagging safekeepers with VCL != flush_lsn. And starting new node from that state takes special care. Also, run and print `pg_controldata` output after each compute node start to eyeball lsn/checkpoint info of basebackup. This commit only adds test without fixing the problem.	2021-09-02 12:06:12 +03:00
anastasia	27442c3daa	Add test for DROP DATABASE command	2021-08-30 17:29:29 +03:00
Heikki Linnakangas	074bd3bb12	Add basic performance test framework. This provides a pytest fixture to record metrics from pytest tests. The The recorded metrics are printed out at the end of the tests. As a starter, this includes on small test, using pgbench. It prints out three metrics: the initialization time, runtime of 5000 xacts, and the repository size after the tests.	2021-08-27 21:00:45 +03:00
Dmitry Rodionov	23b5249512	translate pageserver api to http	2021-08-24 19:05:00 +03:00
anastasia	20e6cd7724	Update test_twophase - check that we correctly restore files at compute node start.	2021-08-19 12:15:09 +03:00

1 2

78 Commits