rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-06 04:52:55 +00:00

Author	SHA1	Message	Date
MMeent	f99ccb5041	Extract WalProposer into the neon extension (#2217 ) Including, but not limited to: * Fixes to neon management code to support walproposer-as-an-extension * Fix issue in expected output of pg settings serialization. * Show the logs of a failed --sync-safekeepers process in CI * Add compat layer for renamed GUCs in postgres.conf * Update vendor/postgres to the latest origin/main	2022-08-18 17:12:28 +02:00
Arthur Petukhovsky	116ecdf87a	Improve walreceiver logic (#2253 ) This patch makes walreceiver logic more complicated, but it should work better in most cases. Added `test_wal_lagging` to test scenarios where alive safekeepers can lag behind other alive safekeepers. - There was a bug which looks like `etcd_info.timeline.commit_lsn > Some(self.local_timeline.get_last_record_lsn())` filtered all safekeepers in some strange cases. I removed this filter, it should probably help with #2237 - Now walreceiver_connection reports status, including commit_lsn. This allows keeping safekeeper connection even when etcd is down. - Safekeeper connection now fails if pageserver doesn't receive safekeeper messages for some time. Usually safekeeper sends messages at least once per second. - `LaggingWal` check now uses `commit_lsn` directly from safekeeper. This fixes the issue with often reconnects, when compute generates WAL really fast. - `NoWalTimeout` is rewritten to trigger only when we know about the new WAL and the connected safekeeper doesn't stream any WAL. This allows setting a small `lagging_wal_timeout` because it will trigger only when we observe that the connected safekeeper has stuck.	2022-08-15 13:31:26 +03:00
Arseny Sher	e593cbaaba	Add pageserver checkpoint_timeout option. To flush inmemory layer eventually when no new data arrives, which helps safekeepers to suspend activity (stop pushing to the broker). Default 10m should be ok.	2022-08-11 22:54:09 +03:00
Dmitry Rodionov	7cd68a0c27	increase timeout to pass test with real s3	2022-08-08 12:15:16 +03:00
Dmitry Rodionov	5f71aa09d3	support running tests against real s3 implementation without mocking	2022-08-04 11:14:05 +03:00
Dmitry Rodionov	092a9b74d3	use only s3 in boto3-stubs and update mypy Newer version of mypy fixes buggy error when trying to update only boto3 stubs. However it brings new checks and starts to yell when we index into cusror.fetchone without checking for None first. So this introduces a wrapper to simplify quering for scalar values. I tried to use cursor_factory connection argument but without success. There can be a better way to do that, but this looks the simplest	2022-08-01 18:28:49 +03:00
Arthur Petukhovsky	09ddd34b2a	Fix checkpoints race condition in safekeeper tests (#2175 ) We should wait for WAL to arrive to pageserver before calling CHECKPOINT	2022-07-28 15:44:02 +03:00
Arthur Petukhovsky	aeb3f0ea07	Refactor test_race_conditions (#2162 ) Do not use python multiprocessing, make the test async	2022-07-28 14:38:37 +03:00
Arthur Petukhovsky	b445cf7665	Refactor test_unavailability (#2134 ) Now test_unavailability uses async instead of Process. The test is refactored to fix a possible race condition.	2022-07-20 22:13:05 +03:00
Dmitry Rodionov	65704708fa	remove unused imports, make more use of pathlib.Path	2022-07-01 18:56:51 +03:00
Arthur Petukhovsky	f862373ac0	Fix WAL timeout in test_s3_wal_replay (#1953 )	2022-06-17 20:43:54 +03:00
Arthur Petukhovsky	699f46cd84	Download WAL from S3 if it's not available in safekeeper dir (#1932 ) `send_wal.rs` and `WalReader` are now async. `test_s3_wal_replay` checks that WAL can be replayed after offloaded.	2022-06-17 15:33:39 +03:00
Egor Suvorov	0ac0fba77a	test_runner: test Safekeeper HTTP API Auth All endpoints except for POST /v1/timeline are tested, this one is not tested in any way yet. Three attempts for each endpoint: correctly authenticated, badly authenticated, unauthenticated.	2022-06-09 17:14:46 +02:00
Arseny Sher	5a723d44cd	Parametrize test_normal_work. I like to run small test locally, but let's avoid duplication.	2022-06-03 20:32:53 +04:00
Arseny Sher	70a53c4b03	Get backup test_safekeeper_normal_work, but skip by default. It is handy for development.	2022-06-03 16:12:14 +04:00
bojanserafimov	90e2c9ee1f	Rename zenith to neon in python tests (#1871 )	2022-06-02 16:21:28 -04:00
Dmitry Rodionov	b1b67cc5a0	improve test normal work to start several computes	2022-05-31 22:42:11 +03:00
Arseny Sher	36281e3b47	Extend test_wal_backup with compute restart.	2022-05-30 13:57:17 +04:00
Anastasia Lubennikova	e014cb6026	rename zenith.zenith_tenant to neon.tenant_id in test	2022-05-30 12:24:44 +03:00
Anastasia Lubennikova	67d6ff4100	Rename custom GUCs: - zenith.zenith_tenant -> neon.tenant_id - zenith.zenith_timeline -> neon.timeline_id	2022-05-30 11:11:01 +03:00
Anastasia Lubennikova	6a867bce6d	Rename 'zenith_admin' role to 'cloud_admin'	2022-05-30 11:11:01 +03:00
Anastasia Lubennikova	751f1191b4	Rename 'wal_acceptors' GUC to 'safekeepers'	2022-05-30 11:11:01 +03:00
Anastasia Lubennikova	3accde613d	Rename contrib/zenith to contrib/neon. Rename custom GUCs: - zenith.page_server_connstring -> neon.pageserver_connstring - zenith.zenith_tenant -> neon.tenantid - zenith.zenith_timeline -> neon.timelineid - zenith.max_cluster_size -> neon.max_cluster_size	2022-05-30 11:11:01 +03:00
Kian-Meng Ang	f1c51a1267	Fix typos	2022-05-28 14:02:05 +03:00
Arseny Sher	0e1bd57c53	Add WAL offloading to s3 on safekeepers. Separate task is launched for each timeline and stopped when timeline doesn't need offloading. Decision who offloads is done through etcd leader election; currently there is no pre condition for participating, that's a TODO. neon_local and tests infrastructure for remote storage in safekeepers added, along with the test itself. ref #1009 Co-authored-by: Anton Shyrabokau <ahtoxa@Antons-MacBook-Pro.local>	2022-05-27 06:19:23 +04:00
Kirill Bulatov	a884f4cf6b	Add etcd to neon_local	2022-05-17 01:17:44 +03:00
Kirill Bulatov	9a0fed0880	Enable at least 1 safekeeper in every test	2022-05-17 01:17:44 +03:00
Egor Suvorov	768c846eeb	Fix test_delete_force from #1653 conflicting with #1692	2022-05-13 17:36:18 +02:00
Egor Suvorov	bf899a57d9	Safekeeper: add timeline/tenant force delete HTTP endpoings (closes #895 ) * There is no auth in Safekeeper HTTP at all currently, so simply calling `check_permission` is not enough. * There are no checks of Safekeeper still working with the data, as "still working" is burry now: a timeline may be "active" while there are no compute nodes and all data is propagated. * Still, callmemaybe is deactivated, and timeline is removed from the internal map. It can easily sneak back in case of race conditions and implicit creations, though.	2022-05-13 15:43:52 +02:00
Arseny Sher	b9fd8a36ad	Remember timeline_start_lsn and local_start_lsn on safekeeper. Make it remember when timeline starts in general and on this safekeeper in particular (the point might be later on new safekeeper replacing failed one). Bumps control file and walproposer protocol versions. While protocol is bumped, also add safekeeper node id to AcceptorProposerGreeting. ref #1561	2022-05-04 14:32:03 +04:00
Arthur Petukhovsky	29539b0561	Set wal_keep_size to zero (#1507 ) wal_keep_size is already set to 0 in our cloud setup, but we don't use this value in tests. This commit fixes wal_keep_size in control_plane and adds tests for WAL recycling and lagging safekeepers.	2022-04-27 19:09:28 +03:00
Arseny Sher	8b9d523f3c	Remove old WAL on safekeepers. Remove when it is consumed by all of 1) pageserver (remote_consistent_lsn) 2) safekeeper peers 3) s3 WAL offloading. In test s3 offloading for now is mocked by directly bumping s3_wal_lsn. ref #1403	2022-04-26 23:02:23 +04:00
Kirill Bulatov	52e0816fa5	wal_acceptor -> safekeeper	2022-04-18 12:52:31 +03:00
Heikki Linnakangas	a009fe912a	Refactor connection option handling in python tests The PgProtocol.connect() function took extra options for username, database, etc. Remove those options, and have a generic way for each subclass of PgProtocol to provide some default options, with the capability override them in the connect() call.	2022-04-14 13:31:40 +03:00
Arthur Petukhovsky	6bc78a0e77	Log more info in test_many_timelines asserts (#1473 ) It will help to debug #1470 as soon as it happens again	2022-04-07 01:44:26 +03:00
Arseny Sher	ec3bc74165	Add safekeeper information exchange through etcd. Safekeers now publish to and pull from etcd per-timeline data. Immediate goal is WAL truncation, for which every safekeeper must know remote_consistent_lsn; the next would be callmemaybe replacement. Adds corresponding '--broker' argument to safekeeper and ability to run etcd in tests. Adds test checking remote_consistent_lsn is indeed communicated.	2022-03-29 18:16:49 +04:00
Kirill Bulatov	bd6bef468c	Provide single list timelines HTTP API handle	2022-03-21 13:42:21 +02:00
Kirill Bulatov	093ad8ab59	Send 409 HTTP responses on timeline and tenant creation for existing entity	2022-03-10 19:38:58 +02:00
Kirill Bulatov	c51d545fd9	Serialize Lsn as strings in http api	2022-03-10 19:38:58 +02:00
Kirill Bulatov	4d0f7fd1e4	Update Zenith CLI config between runs	2022-03-10 19:38:58 +02:00
Kirill Bulatov	f49990ed43	Allow creating timelines by branching off ancestors	2022-03-10 19:38:58 +02:00
Dmitry Rodionov	1d90b1b205	add node id to pageserver (#1310 ) * Add --id argument to safekeeper setting its unique u64 id. In preparation for storage node messaging. IDs are supposed to be monotonically assigned by the console. In tests it is issued by ZenithEnv; at the zenith cli level and fixtures, string name is completely replaced by integer id. Example TOML configs are adjusted accordingly. Sequential ids are chosen over Zid mainly because they are compact and easy to type/remember. * add node id to pageserver This adds node id parameter to pageserver configuration. Also I use a simple builder to construct pageserver config struct to avoid setting node id to some temporary invalid value. Some of the changes in test fixtures are needed to split init and start operations for envrionment. Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2022-03-04 01:10:42 +03:00
anastasia	abb422d5de	Fix SafekeeperMetrics parsing in python tests	2022-02-21 13:45:22 +03:00
Bojan Serafimov	ad262a46ad	Remove redundant pytest_plugins assignment	2022-02-17 13:41:49 +02:00
Kirill Bulatov	ce533835e5	Use uuid.UUID types for tenants and timelines more	2022-02-17 13:41:19 +02:00
Kirill Bulatov	e5bf520b18	Use types in zenith cli invocations in Python tests	2022-02-17 13:41:19 +02:00
bojanserafimov	ea13838be7	Add pgbench baseline test (#1204 ) Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2022-02-10 15:33:36 -05:00
Arthur Petukhovsky	cedde559b8	Add test for replacement of the failed safekeeper (#1179 ) * Add test to replace failed safekeeper * Restart safekeepers in test_replace_safekeeper * Update vendor/postgres	2022-01-27 17:26:55 +03:00
Arthur Petukhovsky	70778058d9	Add test for safekeeper setup without pageserver (#1000 )	2021-12-29 12:58:27 +03:00
Dmitry Rodionov	557e3024cd	Forward pageserver connection string from compute to safekeeper This is needed for implementation of tenant rebalancing. With this change safekeeper becomes aware of which pageserver is supposed to be used for replication from this particular compute.	2021-12-06 21:28:49 +03:00

1 2

61 Commits