rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2025-12-26 15:49:58 +00:00

Author	SHA1	Message	Date
Stas Kelvich	72de70a8cc	Change test_restart_compute to expose safekeeper problems	2021-08-25 00:42:08 +03:00
anastasia	20e6cd7724	Update test_twophase - check that we correctly restore files at compute node start.	2021-08-19 12:15:09 +03:00
Heikki Linnakangas	9fed5c8fb7	Add test for page server restart.	2021-08-18 20:19:07 +03:00
Heikki Linnakangas	91f72fabc9	Work with smaller segments. Split each relish into fixed-sized 10 MB segments. Separate layers are created for each segment. This reduces the write amplification if you have a large relation and update only parts of it; the downside is that you have a lot more files. The 10 MB is just a guess, we should do some modeling and testing in the future to figure out the optimal size. Each segment tracks the size of the segment separately. To figure out the total size of a relish, you need to loop through the segment to find the highest segment that's in use. That's a bit inefficient, but will do for now. We might want to add a cache or something later.	2021-08-17 18:54:41 +03:00
anastasia	cbeb67067c	Issue #367 . Change CLI so that we always create node from scratch at 'pg start'. This operation preserve previously existing config Add new flag '--config-only' to 'pg create'. If this flag is passed, don't perform basebackup, just fill initial postgresql.conf for the node.	2021-08-17 18:12:31 +03:00
Dmitry Rodionov	0c4ab80eac	try to be more intelligent in WalAcceptor.start, added a bunch of typing sugar to wal acceptor fixtures	2021-08-16 14:27:44 +03:00
Heikki Linnakangas	2450f82de5	Introduce a new "layered" repository implementation. This replaces the RocksDB based implementation with an approach using "snapshot files" on disk, and in-memory btreemaps to hold the recent changes. This make the repository implementation a configuration option. You can choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>" The unit tests have been refactored to exercise both implementations. 'layered' is now the default. Push/pull is not implemented. The 'test_history_inmemory' test has been commented out accordingly. It's not clear how we will implement that functionality; probably by copying the snapshot files directly.	2021-08-16 10:06:48 +03:00
Heikki Linnakangas	97f9021c88	Fix JWT token encoding issue in test. On my laptop, the server was receiving the token as a string with extra b'...' escaping, e.g as "b'eyJ0....0ifQA'" instead of just "eyJ0....0ifQA". That was causing the test to fail. I'm using Python 3.9, while the CI is using Python 3.8. I suspect that's why. My version of pyjwt might be different too. See also https://github.com/jpadilla/pyjwt/issues/391.	2021-08-12 20:46:14 +03:00
Dmitry Rodionov	ce5333656f	Introduce authentication v0.1. Current state with authentication. Page server validates JWT token passed as a password during connection phase and later when performing an action such as create branch tenant parameter of an operation is validated to match one submitted in token. To allow access from console there is dedicated scope: PageServerApi, this scope allows access to all tenants. See code for access validation in: PageServerHandler::check_permission. Because we are in progress of refactoring of communication layer involving wal proposer protocol, and safekeeper<->pageserver. Safekeeper now doesn’t check token passed from compute, and uses “hardcoded” token passed via environment variable to communicate with pageserver. Compute postgres now takes token from environment variable and passes it as a password field in pageserver connection. It is not passed through settings because then user will be able to retrieve it using pg_settings or SHOW .. I’ve added basic test in test_auth.py. Probably after we add authentication to remaining network paths we should enable it by default and switch all existing tests to use it.	2021-08-11 20:05:54 +03:00
anastasia	949ac54401	Add test of clog (pg_xact) truncation	2021-08-11 05:49:24 +03:00
Stas Kelvich	02b9be488b	Disable GC test. Current GC test is flaky and overly strict. Since we are migrating to the layered repo format with different GC implementation let's just silence this test for now.	2021-08-04 18:33:33 +03:00
Alexey Kondratov	bcaa59c0b9	Test compute restart with AND without safekeepers	2021-08-04 00:05:19 +03:00
Heikki Linnakangas	9ff122835f	Refactor ObjectTags, intruducing a new concept called "relish" This clarifies - I hope - the abstractions between Repository and ObjectRepository. The ObjectTag struct was a mix of objects that could be accessed directly through the public Timeline interface, and also objects that were created and used internally by the ObjectRepository implementation and not supposed to be accessed directly by the callers. With the RelishTag separaate from ObjectTag, the distinction is more clear: RelishTag is used in the public interface, and ObjectTag is used internally between object_repository.rs and object_store.rs, and it contains the internal metadata object types. One awkward thing with the ObjectTag struct was that the Repository implementation had to distinguish between ObjectTags for relations, and track the size of the relation, while others were used to store "blobs". With the RelishTags, some relishes are considered "non-blocky", and the Repository implementation is expected to track their sizes, while others are stored as blobs. I'm not 100% happy with how RelishTag captures that either: it just knows that some relish kinds are blocky and some non-blocky, and there's an is_block() function to check that. But this does enable size-tracking for SLRUs, allowing us to treat them more like relations. This changes the way SLRUs are stored in the repository. Each SLRU segment, e.g. "pg_clog/0000", "pg_clog/0001", are now handled as a separate relish. This removes the need for the SLRU-specific put_slru_truncate() function in the Timeline trait. SLRU truncation is now handled by caling put_unlink() on the segment. This is more in line with how PostgreSQL stores SLRUs and handles their trunction. The SLRUs are "blocky", so they are accessed one 8k page at a time, and repository tracks their size. I considered an alternative design where we would treat each SLRU segment as non-blocky, and just store the whole file as one blob. Each SLRU segment is up to 256 kB in size, which isn't that large, so that might've worked fine, too. One reason I didn't do that is that it seems better to have the WAL redo routines be as close as possible to the PostgreSQL routines. It doesn't matter much in the repository, though; we have to track the size for relations anyway, so there's not much difference in whether we also do it for SLRUs. While working on this, I noticed that the CLOG and MultiXact redo code did not handle wraparound correctly. We need to fix that, but for now, I just commented them out with a FIXME comment.	2021-08-03 14:01:05 +03:00
Heikki Linnakangas	acc0f41985	Don't try to launch duplicate WAL redo thread if tenant already exists. The codepath for tenant_create command first launched the WAL redo thread, and then called branches::create_repo() which checked if the tenant's directory already exists. That's problematic, because launching the WAL redo thread will run initdb if the directory doesn't already exist. Race condition: If the tenant already exists, it will have a WAL redo thread already running, and the old and new WAL redo thread might try to run initdb at the same time, causing all kinds of weird failures. The test_pageserver_api test was failing 100% repeatably on my laptop because of this. I'm not sure why this doesn't occur on the CI: Jul 31 18:05:48.877 INFO running initdb in "./tenants/5227e4eb90894775ac6b8a8c76f24b2e/wal-redo-datadir", location: pageserver::walredo, pageserver/src/walredo.rs:483 thread 'WAL redo thread' panicked at 'initdb failed: The files belonging to this database system will be owned by user "heikki". This user must also own the server process. The database cluster will be initialized with locale "C". The default database encoding has accordingly been set to "SQL_ASCII". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory ./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Europe/Helsinki creating configuration files ... ok running bootstrap script ... stderr: 2021-07-31 15:05:48.875 GMT [282569] LOG: could not open configuration file "/home/heikki/git-sandbox/zenith/test_output/test_tenant_list/repo/./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir/postgresql.conf": No such file or directory 2021-07-31 15:05:48.875 GMT [282569] FATAL: configuration file "/home/heikki/git-sandbox/zenith/test_output/test_tenant_list/repo/./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir/postgresql.conf" contains errors child process exited with exit code 1 initdb: removing data directory "./tenants/0305b1326f3ea33add0929d516da7cb6/wal-redo-datadir"	2021-07-31 18:13:21 +03:00
Dmitry Rodionov	767590bbd5	support tenants this patch adds support for tenants. This touches mostly pageserver. Directory layout on disk is changed to contain new layer of indirection. Now path to particular repository has the following structure: <pageserver workdir>/tenants/<tenant id>. Tenant id has the same format as timeline id. Tenant id is included in pageserver commands when needed. Also new commands are available in pageserver: tenant_list, tenant_create. This is also reflected CLI. During init default tenant is created and it's id is saved in CLI config, so following commands can use it without extra options. Tenant id is also included in compute postgres configuration, so it can be passed via ServerInfo to safekeeper and in connection string to pageserver. For more info see docs/multitenancy.md.	2021-07-22 20:54:20 +03:00
Stas Kelvich	791312824d	set superuser name in python tests too	2021-07-21 17:22:22 +03:00
Konstantin Knizhnik	eb0a56eb22	Replay non-relational WAL records on page server	2021-07-16 18:43:07 +03:00
Dmitry Rodionov	75e717fe86	allow both domains and ip addresses in connection options for pageserver and wal keeper. Also updated PageServerNode definition in control plane to account for that. resolves #303	2021-07-09 16:46:21 +03:00
Heikki Linnakangas	ced338fd20	Handle relation DROPs in page server. Add back code to parse transaction commit and abort records, and in particular the list of dropped relations in them. Add 'put_unlink' function to the Timeline trait and implementation. We had the code to handle dropped relations in the GC code and elsewhere in ObjectRepository already, but there was nothing to create the RelationSizeEntry::Unlink tombstone entries until now. Also add a test to check that GC correctly removes all page versions of a dropped relation. Implements https://github.com/zenithdb/zenith/issues/232, except for the "orphaned" rels. Reviewed-by: Konstantin Knizhnik	2021-06-29 00:27:10 +03:00
Heikki Linnakangas	ec44f4b299	Add test for Garbage Collection. This expose a command in in page server to run GC immediately on a given timeline. It's just for testing purposes.	2021-06-28 17:07:28 +03:00
Dmitry Ivanov	257ade0688	Extract PostgreSQL connection logic into PgProtocol This patch aims to: * Unify connection & querying logic of ZenithPagerserver and Postgres. * Mitigate changes to transaction machinery introduced in `psycopg2 >= 2.9`. Now it's possible to acquire db connection using the corresponding method: ```python pg = postgres.create_start('main') conn = pg.connect() ... conn.close() ``` This pattern can be further improved with the help of `closing`: ```python from contextlib import closing pg = postgres.create_start('main') with closing(pg.connect()) as conn: ... ``` All connections produced by this method will have autocommit enabled by default.	2021-06-17 20:19:04 +03:00
Dmitry Ivanov	43ece6e2a2	Fix test_runner's fixtures for python 3.6 Apparently, Literal type is only available since 3.8.	2021-06-17 20:19:04 +03:00
Arseny Sher	37b0236e9a	Move wal acceptor tests to python. Includes fixtures for wal acceptors and associated setup. Nothing really new here, but surprisingly this caught some issues in walproposer. ref #182	2021-06-15 15:14:27 +03:00
Dmitry Ivanov	96c7594d29	Enable some kind of gradual typing in test_runner (#222 ) It's not realistic to enable full-blown type checks within test_runner's codebase, since the amount of warnings revealed by mypy is overwhelming. Tests are supposed to be easy to use, so we can't cripple everybody's workflow for the sake of imaginary benefit. Ultimately, the purpose of this attempt is three-fold: * Facilitate code navigation when paired with python-language-server. * Make method signatures apparent to a fellow programmer. * Occasionally catch some obvious type errors.	2021-06-10 22:53:15 +03:00
Dmitry Ivanov	bb1446e33a	Change behavior of ComputeControlPlane::new_node() (#235 ) Previously, transaction commit could happen regardless of whether pageserver has caught up or not. This patch aims to fix that. There are two notable changes: 1. ComputeControlPlane::new_node() now sets the `synchronous_standby_names = 'pageserver'` parameter to delay transaction commit until pageserver acting as a standby has fetched and ack'd a relevant portion of WAL. 2. pageserver now has to: - Specify the `application_name = pageserver` which matches the one in `synchronous_standby_names`. - Properly reply with the ack'd LSNs. This means that some tests don't need sleeps anymore. TODO: We should probably make this behavior configurable. Fixes #187.	2021-06-09 11:24:55 +03:00
anastasia	05a681be2c	add createuser test to test shared catalog restore	2021-06-09 00:31:09 +03:00
Dmitry Ivanov	244fcffc50	Fix typos found by codespell	2021-06-01 21:43:26 +03:00
Dmitry Ivanov	8c3c9c3394	Update README.md	2021-06-01 21:31:29 +03:00
Dmitry Ivanov	00ce635da9	Reformat tests using yapf	2021-06-01 21:09:09 +03:00
Dmitry Ivanov	7d5f7462c1	Tidy up pytest-based tests	2021-06-01 21:09:09 +03:00
Heikki Linnakangas	ac60b68d50	Handle VM and FSM truncation WAL records in the page server. Fixes issue #190. Original patch by Konstantin Knizhnik.	2021-05-31 23:36:17 +03:00
Arseny Sher	fd20101e5c	Configure pipenv for python tests.	2021-05-31 16:43:01 +03:00
anastasia	ccb2eea7fd	Add test_isolation that runs pg_isolation_regress for zenith	2021-05-28 11:38:46 +03:00
Heikki Linnakangas	adc0e04205	Misc cleanup of the 'zenith_regress' tests - Remove serial_schedule. As was alluded to in the README, it's really quote pointless. - Remove unused PORT/HOST variables - Fix typos	2021-05-27 23:12:02 +03:00
Heikki Linnakangas	d1d2d5ce69	Make multixact test more robust There was no guarantee that the SELECT FOR KEY SHARE queries actually run in parallel. With unlucky timing, one query might finish before the next one starts, so that the server doesn't need to create a multixact. I got a failure like that on the CI: batch_others/test_multixact.py:56: in test_multixact assert(int(next_multixact_id) > int(next_multixact_id_old)) E AssertionError: assert 1 > 1 E + where 1 = int('1') E + and 1 = int('1') This could be reproduced by adding a random sleep in the runQuery function, to make each query run at different times. To fix, keep the transactions open after running the queries, so that they will surely be open concurrently. With that, we can run the queries serially, and don't need the 'multiprocessing' module anymore. Fixes https://github.com/zenithdb/zenith/issues/196	2021-05-27 20:00:52 +03:00
Heikki Linnakangas	1af6607fc3	Add a test for restarting and recreating compute node. This is working; let's keep it that way. This also adds test coverage for the 'zenith pg stop --destroy' option that was added in commit `6ad6e5bd`.	2021-05-27 12:59:45 +03:00
Heikki Linnakangas	e8f0a9bb80	Add test for prepared transactions.	2021-05-25 11:11:32 +03:00
anastasia	fb230dcf32	Add test_multixact to check that we replay multixact and advance next_multixact_id correctly	2021-05-24 15:17:35 +03:00
Konstantin Knizhnik	35a1c3d521	Specify right LSN in test_createdb.py	2021-05-21 12:20:38 +03:00
Heikki Linnakangas	22b7e74c83	Add test for following relmapper files at CREATE DATABASE	2021-05-21 12:13:47 +03:00
anastasia	f38c2e620e	Add test_zenith_regress.py that runs pg_regress styled tests from test_runner/zenith_regress. TODO: remove similar tests from vendor/postgres testset	2021-05-20 17:24:39 +03:00
Alexey Kondratov	0ec56cd21f	Issue #144 : Branching output of `zenith branch` * Add ancestor_id to pg_list->branch_list output of pageserver. * Display branching point (LSN) for each non-root branch. * Add tests for `zenith branch`.	2021-05-20 12:49:04 +03:00
Stas Kelvich	58f34a8d76	Rework pg subcommand in CLI. 1. Create data directory on start 2. Remove distinct pg names, now pg name == branch name.	2021-05-19 22:17:48 +03:00
Heikki Linnakangas	ab2f0ad1a8	Fix and reorganize python tests. - The 'pageserver' fixture now sets up the repository and starts up the Page Server automatically. In other words, the 'pageserver' fixture provides a Page Server that's up and running and ready to use in tests. - The 'pageserver' fixture now also creates a branch called 'empty', right after initializing the repository. By convention, all the tests start by createing a new branch off 'empty' for the test. This allows running all the tests against the same Page Server concurrently. (I haven't tested that though. pytest doensn't provide an option to run tests in parallel but there are extensions for that.) - Remove the 'zen_simple' fixture. Now that 'pageserver' provides server that's up and running, it's pretty simple to use the 'pageserver' and 'postgres' fixtures directly. - Don't assume host name or ports in the tests. They now use the fields in the fixtures for that. That allows assigning the ports dynamically, making it possible to run multiple page servers in parallel, or running the tests in parallel with another page server. This commit still hard codes the Page Server's port in the fixture, though, so more work is needed to actually make it possible. - I made some changes to the 'postgres' fixture in commit `532918e13d`, which broke the other tests. Fix them. - Divide the tests into two "batches" of roughly equal runtime, which can be run in parallel - Merge the 'test_file' and 'test_filter' options in CircleCI config into one 'test_selection' option, for simplicity.	2021-05-17 20:44:00 +03:00
Stas Kelvich	746f667311	Refactor CLI and CLI<->pageserver interfaces to support remote pageserver This patch started as an effort to support CLI working against remote pageserver, but turned into a pretty big refactoring. * CLI now does not look into repository files directly. New commands 'branch_create' and 'identify_system' were introduced into page_service to support that. * Branch management that was scattered between local_env and zenith/main.rs is moved into pageserver/branches.rs. That code could better fit in Repository/Timeline impl, but I'll leave that for a different patch. * All tests-related code from local_env went into integration_tests/src/lib.rs as an extension to PostgresNode trait. * Paths-generating functions were concentrated around corresponding config types (LocalEnv and PageserverConf).	2021-05-17 19:17:51 +03:00
Heikki Linnakangas	532918e13d	Fix branch creation at a point other than end-of-WAL When creating a new branch, we copied all WAL from the source timeline to the new one, and it was being picked up and digested into the repository on first use of the timeline. Fix by copying the WAL only up to the branch's starting point. We should probably move the branch-creation code from the CLI to page server itself - that's what I was starting to hack on when I noticed this bug - but let's fix this first. Add a regression test. To test multiple branches, enhance the python test fixture to manage multiple running Postgres instances. Also, for convenience, add a function to the postgres fixture to open a connection to the server with psycopg2.	2021-05-17 10:09:34 +03:00
anastasia	7b281900f9	Add a function to change postgresql.conf in python tests. Add test_config as an example	2021-05-14 13:55:02 +03:00
Patrick Insinger	99d80aba52	use pageserver for pg list command	2021-05-12 12:34:03 +03:00
Patrick Insinger	372617a4f5	test_runner - pgrep remove -c arg macOS doesn't support it	2021-05-11 17:52:22 -04:00
Patrick Insinger	49d1921a28	page_server - add python api tests	2021-05-11 14:16:22 -04:00

1 2

56 Commits