rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-04 04:30:38 +00:00

Author	SHA1	Message	Date
Kirill Bulatov	33251a9d8f	Disable failing remote storage tests for now	2022-01-28 18:35:46 +03:00
Konstantin Knizhnik	c045ae7a9b	Fix random range for keys in test_gc_aggressive.py (#1199 )	2022-01-28 16:29:55 +03:00
Dmitry Rodionov	602ccb7d5f	distinguish failures for pre-initdb lsn and pre-ancestor lsn branching in test_branch_behind	2022-01-28 12:31:15 +03:00
Konstantin Knizhnik	08135910a5	Fix checkpoint.nextXid update (#1166 ) * Fix checkpoint.nextXid update * Add test for cehckpoint.nextXid * Fix indentation of test_next_xid.py * Fix mypy error in test_next_xid.py * Tidy up the test case. * Add a unit test Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-01-27 18:21:51 +03:00
Arthur Petukhovsky	cedde559b8	Add test for replacement of the failed safekeeper (#1179 ) * Add test to replace failed safekeeper * Restart safekeepers in test_replace_safekeeper * Update vendor/postgres	2022-01-27 17:26:55 +03:00
Arthur Petukhovsky	49d1d1ddf9	Don't call adjust_for_wal_acceptors after pg create (#1178 ) Now zenith_cli handles wal_acceptors config internally, and if we will append wal_acceptors to postgresql.conf in python tests, then it will contain duplicate wal_acceptors config.	2022-01-27 17:23:14 +03:00
Konstantin Knizhnik	79f0e44a20	Gc cutoff rwlock (#1139 ) * Reproduce github issue #1047. * Use RwLock to protect gc_cuttof_lsn * Eeduce number of updates in test_gc_aggressive * Change test_prohibit_get_page_at_lsn_for_garbage_collected_pages test * Change test_prohibit_get_page_at_lsn_for_garbage_collected_pages * Lock latest_gc_cutoff_lsn in all operations accessing storage to prevent race conditions with GC * Remove random sleep between wait_for_lsn and get_page_at_lsn * Initialize latest_gc_cutoff with initdb_lsn and remove separate check that lsn >= initdb_lsn * Update test_prohibit_branch_creation_on_pre_initdb_lsn test Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>	2022-01-27 14:41:16 +03:00
Dmitry Rodionov	63dd7bce7e	bandaid to avoid concurrent timeline downloading until proper refactoring/fix	2022-01-26 19:54:09 +03:00
Dmitry Rodionov	39591ef627	reduce flakiness	2022-01-24 17:20:15 +03:00
Dmitry Rodionov	37c440c5d3	Introduce first version of tenant migraiton between pageservers This patch includes attach/detach http endpoints in pageservers. Some changes in callmemaybe handling inside safekeeper and an integrational test to check migration with and without load. There are still some rough edges that will be addressed in follow up patches	2022-01-24 17:20:15 +03:00
Dmitry Rodionov	5f5a11525c	Switch our python package management solution to poetry. Mainly because it has better support for installing the packages from different python versions. It also has better dependency resolver than Pipenv. And supports modern standard for python dependency management. This includes usage of pyproject.toml for project specific configuration instead of per tool conf files. See following links for details: https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/ https://www.python.org/dev/peps/pep-0518/	2022-01-24 11:33:47 +03:00
Kirill Bulatov	924d8d489a	Allow enabling S3 mock in all existing tests with an env var	2022-01-20 18:42:47 +02:00
Dmitry Rodionov	026eb64a83	Use python lib to mock s3	2022-01-20 18:42:47 +02:00
Kirill Bulatov	38c6f6ce16	Allow specifying custom endpoint in s3	2022-01-20 18:42:47 +02:00
anastasia	7aba299dbd	Use safekeeper in test_branch_behind (#1068 ) to avoid a subtle race condition. Without safekeeper, walreceiver reconnection can stuck, because of IO deadlock between walsender auth and regular backend.	2022-01-12 14:38:04 +03:00
Kirill Bulatov	8ab4c8a050	Code review fixes	2022-01-11 15:44:23 +02:00
Kirill Bulatov	7c4a653230	Propagate Zenith CLI's RUST_LOG env var to subprocesses	2022-01-11 15:44:23 +02:00
Kirill Bulatov	a3cd8f0e6d	Add the remote storage test	2022-01-11 15:44:23 +02:00
Kirill Bulatov	65c851a451	Test pageserver's timeline http methods z	2022-01-11 15:44:23 +02:00
Kirill Bulatov	ce8d6ae958	Allow using remote storage in tests	2022-01-11 15:44:23 +02:00
Kirill Bulatov	384b2a91fa	Pass generic pageserver params through zenith cli	2022-01-11 15:44:23 +02:00
Heikki Linnakangas	722667f189	Add test case for performance issue #941 . The first COPY generates about 230 MB of write I/O, but the second COPY, after deleting most of the rows and vacuuming the rows away, generates 370 MB of writes. Both COPYs insert the same amount of data, so they should generate roughly the same amount of I/O. This commit doesn't try to fix the issue, just adds a test case to demonstrate it. Add a new 'checkpoint' command to the pageserver API. Previously, we've used 'do_gc' for that, but many tests, including this new one, really only want to perform a checkpoint and don't care about GC. For now, I only used the command in the new test, though, and didn't convert any existing tests to use it.	2022-01-04 11:26:37 +02:00
Arthur Petukhovsky	70778058d9	Add test for safekeeper setup without pageserver (#1000 )	2021-12-29 12:58:27 +03:00
anastasia	5ef2b1baf7	Add new test illustrating issue with sync-safekeepers. If safekeepers sync fast enough, callmemaybe thread may never make a call before receiving Unsubscribe request. This leads to the situation, when pageserver lacks data that exists on safekeepers.	2021-12-28 17:50:48 +03:00
anastasia	980f5f8440	Propagate remote_consistent_lsn to safekeepers. Change meaning of lsns in HOT_STANDBY_FEEDBACK: flush_lsn = disk_consistent_lsn, apply_lsn = remote_consistent_lsn Update compute node backpressure configuration respectively. Update compute node configuration: set 'synchronous_commit=remote_write' in setup without safekeepers. This way compute node doesn't have to wait for data checkpoint on pageserver. This doesn't guarantee data durability, but we only use this setup for tests, so it's fine.	2021-12-24 15:32:54 +03:00
Kirill Bulatov	114a757d1c	Use generic config parameters in pageserver cli Co-authored-by: Heikki Linnakangas <heikki.linnakangas@iki.fi>	2021-12-23 18:58:28 +02:00
Heikki Linnakangas	1cc181ca32	Fix WAL redo of commit records with subtransactions. If a commit record contains XIDs that are stored on different CLOG pages, we duplicate the commit record for each affected CLOG page. In the redo routine, we must only apply the parts of the record that apply to the CLOG page being restored. We got that right in the loop that handles the sub-XIDs, but incorrectly always set the bit that corresponds to the main XID.	2021-12-21 23:08:01 +02:00
Heikki Linnakangas	927587cec8	Fix comments in tests	2021-12-21 22:38:33 +02:00
Heikki Linnakangas	bcf80eaa95	Fix multixacts members WAL redo. The logic to compute the page number was broken, and as a result, only the first page of multixact members was updated correctly. All the rest were left as zeros. Improve test_multixact.py to generate more multixacts, to cover this case. Also fix the check that the restored PG data directory matches the original one. Previously, the test compared the 'pg_new' cluster, which is a bit silly because the test restored the 'pg_new' cluster only a few lines earlier, so if the multixact WAL redo is somehow broken, the comparison will just compare two broken data directories and report success. Change it to compare the original datadir, the one where the multixacts were originally created, with a restored image of the same.	2021-12-21 17:50:06 +02:00
Heikki Linnakangas	72ef59c378	Fix small typos in comments, add a comment. The introducing paragraph README could use some more love, but let's at least fix the typos.	2021-12-13 13:44:08 +02:00
Kirill Bulatov	673c297949	Download timelines on demand	2021-12-10 17:23:35 +02:00
Dmitry Rodionov	7dece8e4a0	skip temporary table files when comparing directories in regress tests	2021-12-09 12:53:26 +03:00
Dmitry Rodionov	557e3024cd	Forward pageserver connection string from compute to safekeeper This is needed for implementation of tenant rebalancing. With this change safekeeper becomes aware of which pageserver is supposed to be used for replication from this particular compute.	2021-12-06 21:28:49 +03:00
Dmitry Ivanov	7cec13d1df	Improve shutdown story for code coverage This patch introduces fixes for several problems affecting LLVM-based code coverage: * Daemonizing parent processes should call _exit() to prevent coverage data file corruption (.profraw) due to concurrent writes. Implement proper shutdown handlers in safekeeper.	2021-12-06 13:27:52 +03:00
Arseny Sher	cba4da3f4d	Add term history to safekeepers. Persist full history of term switches on safekeepers instead of storing only the single term of the highest entry (called epoch). This allows easily and correctly find the divergence point of two logs and truncate the obsolete part before overwriting it with entries of the newer proposer(s). Full history of the proposer is transferred in separate message before proposer starts streaming; it is immediately persisted by safekeeper, though he might not yet have entries for some older terms there. That's because we can't atomically append to WAL and update the control file anyway, so locally available WAL must be taken into account when looking at the history. We should sometimes purge term history entries beyond truncate_lsn; this is not done here. Per https://github.com/zenithdb/rfcs/pull/12 Closes #296. Bumps vendor/postgres.	2021-12-03 12:43:57 +03:00
Dmitry Rodionov	130184fee9	Prohibit branch creation and basebackup at out of scope lsns Out of scope LSNs include pre initdb LSNs, and LSNs prior to latest_gc_cutoff. To get there there was also two cleanups: * Fix error handling in Execute message handler. This fixes behaviour when basebackup retured an error. Previously pageserver thread just died. * Remove "ancestor" file which previously contained ancestor id and branch lsn. Currently the same data can be obtained from metadata file. And just the way we handled ancestor file in the code introduced the case when branching fails timeline directory is created but there is no data in it except ancestor file. And this confused gc because it scans directories. So it is better to just remove ancestor file and clean up this timeline directory creation so it happens after all validity checks have passed	2021-11-25 15:27:16 +03:00
Dmitry Rodionov	737a557f09	add check to python tests that afteer gc number of rows is unchanged in all branches	2021-11-22 11:39:20 +03:00
Dmitry Rodionov	44111e3ba3	Prohibit branch creation at lsn that was already garbage collected. This introduces new timeline field latest_gc_cutoff. It is updated before each gc iteration. New check is added to branch_timelines to prevent branch creation with start point less than latest_gc_cutoff. Also this adds a check to get_page_at_lsn which asserts that lsn at which the page is requested was not garbage collected. This check currently is triggered for readonly nodes which are pinned to specific lsn and because they are not tracked in pageserver garbage collection can remove data that still might be referenced. This is a bug and will be fixed separately.	2021-11-15 20:03:16 +03:00
Heikki Linnakangas	4ba521f53f	Add performance test case for parallel COPY TO	2021-11-15 14:49:53 +02:00
Heikki Linnakangas	849ac791a6	Bandaid fix for "page not found" errors, when a table is loaded. During parallel load of a table, Postgres sometimes requests a page from the page server for which no WAL has been generated yet. That's normal; Postgres expects the page to be full of zeros. There was a special case for that in LayeredTimeline::materialize_page, but the problem remained when you're crossing a segment boundary, so that there's no layer for the segment at all. It would be nice to have a more robust cross-check for this case. That might need help from the Postgres side. But this extends the bandaid fix we had in materialize_page() to the case where cross segment boundary. Fixes https://github.com/zenithdb/zenith/issues/841	2021-11-12 18:47:39 +02:00
Alexey Kondratov	de5e6a15ae	Set LD_LIBRARY_PATH in the check_restored_datadir_content() psql call Otherwise we may use outdated system libpq. Also print stdout/stderr if basebackup failed in check_restored_datadir_content()	2021-11-12 16:27:43 +03:00
Arthur Petukhovsky	9aaa02bc9a	Fix high CPU usage in walproposer (#860 ) * Bump vendor/postgres * Update time limits for test_restarts_under_load	2021-11-10 17:18:07 +03:00
Egor Suvorov	eaff0cd568	Check python for the whole repository and improve docs (#813 )	2021-11-09 22:23:29 +03:00
Egor Suvorov	587935ebed	Add Safekeeper metrics tests (#746 ) * zenith_fixtures.py: add SafekeeperHttpClient.get_metrics() * Ensure that `collect_lsn` and `flush_lsn`'s reported values look reasonable in `test_many_timelines`	2021-11-09 22:18:59 +03:00
Dmitry Rodionov	865870a8e5	Follow up staging benchmarking * change zenith-perf-data checkout ref to be main * set cluster id through secrets so there is no code changes required when we wipe out clusters on staging * display full pgbench output on error	2021-11-05 14:07:11 +03:00
Arthur Petukhovsky	d19263aec8	Adjust timeouts for test_restarts_under_load (#830 ) * Adjust timeouts for test_restarts_under_load * Add test timeout for test_restarts_under_load	2021-11-04 19:58:40 +03:00
Dmitry Rodionov	c75bc9b8b0	Change benchmark plugin layout so pytest loads it properly when running all tests (not necessary performance ones) resolves #837	2021-11-04 16:33:31 +03:00
Heikki Linnakangas	086a02ab92	Add performance test for simple seq scans. Fixes https://github.com/zenithdb/zenith/issues/831	2021-11-04 10:36:45 +02:00
Dmitry Rodionov	c6172dae47	implement performance tests against our staging environment tests are based on self-hosted runner which is physically close to our staging deployment in aws, currently tests consist of various configurations of pgbenchi runs. Also these changes rework benchmark fixture by removing globals and allowing to collect reports with desired metrics and dump them to json for further analysis. This is also applicable to usual performance tests which use local zenith binaries.	2021-11-04 02:15:46 +03:00
Dmitry Rodionov	5bc09074ea	add a flag to avoid non incremental size calculation in pageserver http api This calculation is not that heavy but it is needed only in tests, and in case the number of tenants/timelines is high the calculation can take noticeable time. Resolves https://github.com/zenithdb/zenith/issues/804	2021-10-27 13:30:34 +03:00

1 2 3 4

175 Commits