rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-04 03:52:56 +00:00

Author	SHA1	Message	Date
Heikki Linnakangas	66ec135676	Refactor pytest fixtures Instead of having a lot of separate fixtures for setting up the page server, the compute nodes, the safekeepers etc., have one big ZenithEnv object that encapsulates the whole environment. Every test either uses a shared "zenith_simple_env" fixture, which contains the default setup of a pageserver with no authentication, and no safekeepers. Tests that want to use safekeepers or authentication set up a custom test-specific ZenithEnv fixture. Gathering information about the whole environment into one object makes some things simpler. For example, when a new compute node is created, you no longer need to pass the 'wal_acceptors' connection string as argument to the 'postgres.create_start' function. The 'create_start' function fetches that information directly from the ZenithEnv object.	2021-10-25 14:14:47 +03:00
Heikki Linnakangas	28af3e5008	Remove some unnecessary fixture arguments	2021-10-25 14:14:45 +03:00
Heikki Linnakangas	f337d73a6c	Rearrange output dirs a bit Each test now gets its own test output directory, like 'test_output/test_foobar', even when TEST_SHARED_FIXTURES is used. When TEST_SHARED_FIXTURES is not used, the zenith repo for each test is created under a 'repo' subdir inside the test output dir, e.g. 'test_output/test_foobar/repo'	2021-10-25 14:14:43 +03:00
Heikki Linnakangas	57ce541521	Remove unnecessary 'pg_bin' object from 'postgres' fixture. It was only used in check_restored_datadir_content(), and that function can construct it easily from the other information it has.	2021-10-25 14:14:41 +03:00
Heikki Linnakangas	e14f24034f	Turn a few path-fixtures to global variables This way, they're readily accessible from the classes and functions that are not themselves fixtures	2021-10-25 14:14:38 +03:00
Egor Suvorov	86a28458c6	test_runner: use Python 3.7 in CI and improve its support (#775 ) * We actually need Python 3.7 because of dataclasses * Rerun 'pipenv lock' under Python 3.7 and add 'pipenv' to dev deps * Update docs on developing for Python 3.7 * CircleCI: use Python 3.7 via Docker image instead of Orb	2021-10-21 20:01:29 +03:00
Konstantin Knizhnik	c310932121	Implement backpressure for compute node to avoid WAL overflow Co-authored-by: Arseny Sher <sher-ars@yandex.ru> Co-authored-by: Alexey Kondratov <kondratov.aleksey@gmail.com>	2021-10-21 18:15:50 +03:00
Egor Suvorov	ff563ff080	test_runner: fix mypy errors and force it on CI (#774 ) * Fix bugs found by mypy * Add some missing types and runtime checks, remove unused code * Make ZenithPageserver start right away for better type safety * Add `types-` packages to Pipfile Pin mypy version and run it on CircleCI	2021-10-21 13:51:54 +03:00
anastasia	7f9d2a7d05	Change 'zenith tenant list' API to return tenant state added in `0dc7a3fc`	2021-10-21 11:04:22 +03:00
Arthur Petukhovsky	13f4e173c9	Wait for safekeepers to catch up in test_restarts_under_load (#776 )	2021-10-20 14:42:53 +03:00
Egor Suvorov	e42c884c2b	test_runner/README: add note on capturing logs (#778 ) Became actual after #674	2021-10-20 01:55:49 +03:00
Egor Suvorov	eb706bc9f4	Force yapf (Python code formatter) in CI (#772 ) * Add yapf run to CircleCI * Pin yapf version * Enable `SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES` setting * Reformat all existing code with slight manual adjustments * test_runner/README: note that yapf is forced	2021-10-19 20:13:47 +03:00
Dmitry Rodionov	798df756de	suppress FileNotFound exception instead of missing_ok=True because the latter is added in python 3.8 and we claim to support >3.6	2021-10-19 17:13:42 +03:00
Dmitry Rodionov	732d13fe06	use cached-property package because python<3.8 doesnt have cached_property in functools	2021-10-19 17:13:42 +03:00
Heikki Linnakangas	feae7f39c1	Support read-only nodes Change 'zenith.signal' file to a human-readable format, similar to backup_label. It can contain a "PREV LSN: %X/%X" line, or a special value to indicate that it's OK to start with invalid LSN ('none'), or that it's a read-only node and generating WAL is forbidden ('invalid'). The 'zenith pg create' and 'zenith pg start' commands now take a node name parameter, separate from the branch name. If the node name is not given, it defaults to the branch name, so this doesn't break existing scripts. If you pass "foo@<lsn>" as the branch name, a read-only node anchored at that LSN is created. The anchoring is performed by setting the 'recovery_target_lsn' option in the postgresql.conf file, and putting the server into standby mode with 'standby.signal'. We no longer store the synthetic checkpoint record in the WAL segment. The postgres startup code has been changed to use the copy of the checkpoint record in the pg_control file, when starting in zenith mode.	2021-10-19 09:48:12 +03:00
Heikki Linnakangas	c2b468c958	Separate node name from the branch name in ComputeControlPlane This is in preparation for supporting read-only nodes. You can launch multiple read-only nodes on the same brach, so we need an identifier for each node, separate from the branch name.	2021-10-19 09:48:10 +03:00
Arseny Sher	de744a44dd	Add /timeline http request to safekeeper returning its status. Which is mainly generational state (terms) and useful LSNs. Also add /status basic healthcheck request which is now used in tests to determine the safekeeper is up; this fixes #726. ref #115	2021-10-14 19:02:38 +03:00
Arthur Petukhovsky	4b87acb1f6	Use logging in python tests (#674 ) * Use logging in python tests * Use f-strings for logs * Don't log test output while running * Use only pytest logging handler * Add more info about pytest logging	2021-10-14 13:10:09 +03:00
Egor Suvorov	23f4c0a742	Rename `wal_acceptor` binary to `safekeeper` (#740 ), stage 1/2 * Rename wal_acceptor binary to safekeeper * Rename wal_acceptor.pid and wal_acceptor.log to safekeeper.pid and safekeeper.log * Change some mentions of WAL acceptor to safekeeper * Dockerfile: alias wal_acceptor to safekeeper temporarily until internal scripts are updated	2021-10-12 22:03:06 +03:00
anastasia	d7c9dd06f4	Implement graceful shutdown at 'pageserver stop': - perform checkpoint for each tenant repository. - wait for the completion of all threads. Add new option 'immediate' to 'pageserver stop' command to terminate the pageserver immediately.	2021-10-11 13:35:01 +03:00
Heikki Linnakangas	b9119f11bf	Add perf test case for buffering GiST build. When a WAL record affects multiple pages, we currently duplicate the record for each affected page. That's a bit wasteful, but not too bad for b-tree splits and non-hot heap updates that affect two pages. But buffering GiST index build WAL-logs the whole relation in 32 page chunks, with one giant WAL record for each 32-page chunk. Currently we duplicate that giant record for each of the 32 pages, which is really wasteful. Github issue https://github.com/zenithdb/zenith/issues/720 tracks the problem. This commit adds a test case for it to demonstrate it.	2021-10-11 11:10:58 +03:00
Egor Suvorov	403d9779d9	safekeeper: add initial metrics and HTTP handler (#699 , #541 ) * `wal_acceptor`: add HTTP handler, /metrics endpoint only, no authentication * Two gauges are currently reported: `flush_lsn` and `commit_lsn` * Add `DEFAULT_PG_LISTEN_PORT` and `DEFAULT_PG_LISTEN_PORT` consts for uniformity	2021-10-08 18:55:41 +03:00
Patrick Insinger	b3b8f18f61	tests - fix get_timeline_size signature	2021-10-07 15:38:22 -07:00
Heikki Linnakangas	60dae0b4ac	Add test case that demonstrates Write Amplification.	2021-10-08 00:34:29 +03:00
Heikki Linnakangas	c660926a06	Refactor duplicated code to get on-disk timeline size in tests. Move it to a common function. In the passing, remove the obsolete check to exclude the 'wal' directory. The 'wal' directory is no more.	2021-10-08 00:34:26 +03:00
Heikki Linnakangas	db4059cd6d	Measure peak memory usage in perf test. Another useful metric to keep an eye on.	2021-10-07 18:03:20 +03:00
Heikki Linnakangas	e3945d94fd	Store unlogged tables locally, and replace PD_WAL_LOGGED. All the changes are in the vendor/postgres side. However, because we now generate fewer Full Page Writes, the 'branch_behind' test needs to be modified so that it still generates enough WAL to consume a few WAL segments.	2021-10-06 10:58:15 +03:00
Egor Suvorov	05fe39088b	Readme updates based on a fresher Ubuntu installation experience (#627 )	2021-10-05 19:19:25 +03:00
Egor Suvorov	7e190d72a5	Make `pageserver_` prefix for common metric names configurable (#681 )	2021-10-05 19:06:44 +03:00
Arseny Sher	4256231eb7	Enable test_start_compute with safekeepers. It should work now.	2021-10-04 16:50:46 +03:00
Andrey Taranik	ae27490281	wal_acceptors added to tenant creation tests	2021-10-04 08:58:49 +03:00
Andrey Taranik	fbd8ca2ff4	minor code beautification	2021-10-04 08:58:49 +03:00
Andrey Taranik	ec673a5d67	bulk tenant create test added	2021-10-04 08:58:49 +03:00
Arthur Petukhovsky	d6fc74a412	Various fixes for test_sync_safekeepers (#668 ) * Send ProposerGreeting manually in tests * Move test_sync_safekeepers to test_wal_acceptor.py * Capture test_sync_safekeepers output * Add comment for handle_json_ctrl * Save captured output in CI	2021-09-28 19:25:05 +03:00
Arseny Sher	7a370394a7	Wait till previous victim recovers in run_restarts_under_load. Fixes test flakiness, as recovery easily might take the whole iteration.	2021-09-28 19:15:41 +03:00
Arseny Sher	70b08923ed	Disable new safekeepers tests as not stable enough.	2021-09-26 22:33:58 +03:00
Heikki Linnakangas	ff5cbe2694	Support overlapping and nested Layers in the layer map. This introduces a new tree data structure for holding intervals, and queries of the form "which intervals contain the given point?". It then uses that to store the Layers in the layer map, instead of the BTreeMap. While we don't currently create overlapping layers in the page server, that situation might arise in the future if we start to create extra layers for performance purposes, or as part of some multi-stage garbage collection operation that creates new layers in some interval and then removes old ones. The situation might also arise if you have multiple page servers running on the same timeline, freezing layers at different points, and both uploading them to S3. So even though overlapping layers might not happen currently, let's avoid getting confused if it does happen for some reason. Fixes https://github.com/zenithdb/zenith/issues/517.	2021-09-24 14:10:52 +03:00
Heikki Linnakangas	2319e0ec8f	Define a layer's start and end bounds more precisely. After this, a layer's start bound is always defined to be inclusive, and end bound exclusive. For example, if you have a layer in the range 100-200, that layer can be used for GetPage@LSN requests at LSN 100, 199, or anything in between. But for LSN 200, you need to look at the next layer (if one exists). This is one part of a fix for https://github.com/zenithdb/zenith/issues/517. After this, the page server shouldn't create layers for the same segment with the same LSN, which avoids the issue. However, the same thing would still happen, if you managed to create layers with same start LSN again. That could happen e.g. if you had two page servers running, or in some weird crash/restart scenario, or due to bugs or features added later. The next commit makes the layer map more robust, so that it tolerates that situation without deleting wrong files.	2021-09-24 14:10:49 +03:00
Arthur Petukhovsky	d4e037f1e7	Support for `--sync-safekeepers` in tests (#647 ) New command has been added to append specially crafted records in safekeeper WAL. This command takes json for append, encodes LogicalMessage based on json fields, and processes new AppendRequest to append and commit WAL in safekeeper. Python test starts up walkeepers and creates config for walproposer, then appends WAL and checks --sync-safekeepers works without errors. This test is simplest one, more useful test cases (like in #545) for different setups will be added soon.	2021-09-24 13:19:59 +03:00
anastasia	a4fc6da57b	Fix gc_internal to treat dropped layers. Some dropped layers serve as tombstones for earlier layers and thus cannot be garbage collected. Add new fields to GcResult for layers that are preserved as tombstones	2021-09-23 12:21:47 +03:00
Arthur Petukhovsky	8ebf2fe550	Add test for acceptor restarts under load (#591 ) In this test safekeepers are restarted one by one, while bank transactions are executed and validated in the background. Bank transactions consist of balance transfers and log writes. In the end balance sum should remain the same and there should be progress from every client, when 2 of 3 safekeeper nodes are up.	2021-09-22 11:59:20 +03:00
Heikki Linnakangas	49c8c03465	Add performance test for bulk INSERT	2021-09-21 13:25:46 +03:00
Dmitry Rodionov	b7aac87ec1	fix port distribution so services do not use ephemeral ports	2021-09-20 18:44:42 +03:00
Heikki Linnakangas	c2af6d98db	Don't print 'pg_controldata' output after every startup in tests. It's not interesting for most tests, and clutters the output. If there are individual tests where it is worthwhole, let's add pg_controldata calls to those tests, but I don't think it's needed for now.	2021-09-17 20:04:29 +03:00
Heikki Linnakangas	540973eac4	Don't get confused on request of latest page version with very old LSN. If the 'latest' flag in the client request is true, the client wants the latest page version regardless of the LSN in the request. The LSN is just a hint in that case, indicating that the page hasn't been modified since since that LSN. The LSN can be very old, so it's possible that the page server has already garbage collected away the layer at that LSN. We tried to fetch the old layer and errored out if that happened. To fix, always fetch the data as of last-record-LSN, if 'latest' is set in the client request. We now only use the LSN to wait if the requested LSN hasn't been received and processed yet. Fixes https://github.com/zenithdb/zenith/issues/567	2021-09-17 18:56:05 +03:00
Dmitry Ivanov	7b3fb760fa	[test_runner] psql should be oblivious to user's preferences This makes psql ignore $HOME/.psqlrc	2021-09-17 14:16:23 +03:00
Dmitry Rodionov	01ef2baef0	show more context for zenith cli run errors	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	9563336d9a	Bring back check for interferring processes, add more comments and descriptive errors	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	4ebe643d0c	Support parallel test running for python tests Support is done via pytest-xdist plugin. To use the feature add -n<concurrency> to pytest invocation e.g. pytest -n8 to run 8 tests in parallel. Changes in code are mostly about ports assigning. Previously port for pageserver was hardcoded without the ability to override through zenith cli and ports for started compute nodes were calculated twice, in zenith cli and in test code. Now zenith cli supports port arguments for pageserver and compute nodes to be passed explicitly. Tests are modified in such a way that each worker gets a non overlapping port range which can be configured and now contains 100 ports. These ports are distributed to test services (pageserver, wal acceptors, compute nodes) so they can work independently.	2021-09-15 14:02:15 +03:00
Dmitry Rodionov	b4ecae33e4	add incremental tracking of logical timeline size In order to exclude problems with synchronizing disk and memory logical size is not stored in metadata on disk. It is calculated on timeline "start" by scanning the contents of layered repo and then size is maintained via an atomic variable. This patch also adds new endpoint to pageserver http api: branch detail. It allows retrieval of a particular branch info by its name. Size info is also added to the response of the endpoint and used in tests.	2021-09-07 18:25:15 +03:00

... 3 4 5 6 7

322 Commits