rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-03 19:42:55 +00:00

Author	SHA1	Message	Date
Alexander Bayandin	66a7a226f8	test_runner: use toml instead of formatted strings (#6088 ) ## Problem A bunch of refactorings extracted from https://github.com/neondatabase/neon/pull/6087 (not required for it); the most significant one is using toml instead of formatted strings. ## Summary of changes - Use toml instead of formatted strings for config - Skip pageserver log check if `pageserver.log` doesn't exist - `chmod -x test_runner/regress/test_config.py`	2023-12-11 15:13:27 +00:00
Rahul Modpur	46f20faa0d	neon_local: fix endpoint api to prevent two primary endpoints (#5520 ) `neon_local endpoint` subcommand currently allows creating two primary endpoints for the same branch which leads to shutdown of both endpoints `neon_local endpoint start` new behavior: 1. Fail if endpoint doesn't exist 2. Fail if two primary conflict detected Fixes #4959 Closes #5426 Signed-off-by: Rahul Modpur <rmodpur2@gmail.com> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-29 19:38:03 +02:00
Anastasia Lubennikova	92bc2bb132	Refactor remote extensions feature to request extensions from proxy (#5836 ) instead of direct S3 request. Pros: - simplify code a lot (no need to provide AWS credentials and paths); - reduce latency of downloading extension data as proxy resides near computes; -reduce AWS costs as proxy has cache and 1000 computes asking the same extension will not generate 1000 downloads from S3. - we can use only one S3 bucket to store extensions (and rid of regional buckets which were introduced to reduce latency); Changes: - deprecate remote-ext-config compute_ctl parameter, use http://pg-ext-s3-gateway if any old format remote-ext-cofig is provided; - refactor tests to use mock http server;	2023-11-27 12:10:23 +00:00
Arpad Müller	31a54d663c	Migrate links from wiki to notion (#5862 ) See the slack discussion: https://neondb.slack.com/archives/C033A2WE6BZ/p1696429688621489?thread_ts=1695647103.117499	2023-11-14 15:36:47 +00:00
Alexander Bayandin	70b17981a7	Enable compatibility tests on Postgres 16 (#5314 ) ## Problem We didn't have a Postgres 16 snapshot of data to run compatibility tests on, but now we have it (since the release). ## Summary of changes - remove `@skip_on_postgres(PgVersion.V16, ...)` from compatibility tests	2023-09-18 12:58:34 +01:00
MMeent	83e7e5dbbd	Feat/postgres 16 (#4761 ) This adds PostgreSQL 16 as a vendored postgresql version, and adapts the code to support this version. The important changes to PostgreSQL 16 compared to the PostgreSQL 15 changeset include the addition of a neon_rmgr instead of altering Postgres's original WAL format. Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-12 15:11:32 +02:00
Joonas Koivunen	6f28da1737	fix: LocalFs root in test_compatibility is PosixPath('...') (#5261 ) I forgot a `str(...)` conversion in #5243. This lead to log lines such as: ``` Using fs root 'PosixPath('/tmp/test_output/test_backward_compatibility[debug-pg14]/compatibility_snapshot/repo/local_fs_remote_storage/pageserver')' as a remote storage ``` This surprisingly works, creating hierarchy of under current working directory (`repo_dir` for tests): - `PosixPath('` - `tmp` .. up until .. `local_fs_remote_storage` - `pageserver')` It should not work but right now test_compatibility.py tests finds local metadata and layers, which end up used. After #5172 when remote storage is the source of truth it will no longer work.	2023-09-08 20:27:00 +03:00
Alexander Bayandin	028fbae161	Miscellaneous fixes for tests-related things (#5259 ) ## Problem A bunch of fixes for different test-related things ## Summary of changes - Fix test_runner/pg_clients (`subprocess_capture` return value has changed) - Do not run create-test-report if check-permissions failed for not cancelled jobs - Fix Code Coverage comment layout after flaky tests. Add another healing "\n" - test_compatibility: add an instruction for local run Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-08 16:28:09 +01:00
John Spray	7b6337db58	tests: enable multiple pageservers in `neon_local` and `neon_fixture` (#5231 ) ## Problem Currently our testing environment only supports running a single pageserver at a time. This is insufficient for testing failover and migrations. - Dependency of writing tests for #5207 ## Summary of changes - `neon_local` and `neon_fixture` now handle multiple pageservers - This is a breaking change to the `.neon/config` format: any local environments will need recreating - Existing tests continue to work unchanged: - The default number of pageservers is 1 - `NeonEnv.pageserver` is now a helper property that retrieves the first pageserver if there is only one, else throws. - Pageserver data directories are now at `.neon/pageserver_{n}` where n is 1,2,3... - Compatibility tests get some special casing to migrate neon_local configs: these are not meant to be backward/forward compatible, but they were treated that way by the test.	2023-09-08 16:19:57 +01:00
Joonas Koivunen	ff87fc569d	test: Remote storage refactorings (#5243 ) Remote storage cleanup split from #5198: - pageserver, extensions, and safekeepers now have their separate remote storage - RemoteStorageKind has the configuration code - S3Storage has the cleanup code - with MOCK_S3, pageserver, extensions, safekeepers use different buckets - with LOCAL_FS, `repo_dir / "local_fs_remote_storage" / $user` is used as path, where $user is `pageserver`, `safekeeper` - no more `NeonEnvBuilder.enable_xxx_remote_storage` but one `enable_{pageserver,extensions,safekeeper}_remote_storage` Should not have any real changes. These will allow us to default to `LOCAL_FS` for pageserver on the next PR, remove `RemoteStorageKind.NOOP`, work towards #5172. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-09-08 13:54:23 +03:00
Alexander Bayandin	c8094ee51e	test_compatibility: run amcheck unconditionally (#4985 ) ## Problem The previous version of neon (that we use in the forward compatibility test) has installed `amcheck` extension now. We can run `pg_amcheck` unconditionally. ## Summary of changes - Run `pg_amcheck` in compatibility tests unconditionally	2023-08-17 11:46:00 +01:00
Dmitry Rodionov	1497a42296	tests: split neon_fixtures.py (#4871 ) ## Problem neon_fixtures.py has grown to unmanageable size. It attracts conflicts. When adding specific utils under for example `fixtures/pageserver` things sometimes need to import stuff from `neon_fixtures.py` which creates circular import. This is usually only needed for type annotations, so `typing.TYPE_CHECKING` flag can mask the issue. Nevertheless I believe that splitting neon_fixtures.py into smaller parts is a better approach. Currently the PR contains small things, but I plan to continue and move NeonEnv to its own `fixtures.env` module. To keep the diff small I think this PR can already be merged to cause less conflicts. UPD: it looks like currently its not really possible to fully avoid usage of `typing.TYPE_CHECKING`, because some components directly depend on each other. I e Env -> Cli -> Env cycle. But its still worth it to avoid it in as many places as possible. And decreasing neon_fixture's size still makes sense.	2023-08-03 17:20:24 +03:00
Alexander Bayandin	cd33089a66	test_runner: set AWS credentials for endpoints (#4887 ) ## Problem If AWS credentials are not set locally (via AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY env vars) `test_remote_library[release-pg15-mock_s3]` test fails with the following error: ``` ERROR could not start the compute node: Failed to download a remote file: Failed to download S3 object: failed to construct request ``` ## Summary of changes - set AWS credentials for endpoints programmatically	2023-08-03 16:44:48 +03:00
Alexander Bayandin	39e458f049	test_compatibility: fix pg_tenant_only_port port collision (#4850 ) ## Problem Compatibility tests fail from time to time due to `pg_tenant_only_port` port collision (added in https://github.com/neondatabase/neon/pull/4731) ## Summary of changes - replace `pg_tenant_only_port` value in config with new port - remove old logic, than we don't need anymore - unify config overrides	2023-07-31 20:49:46 +03:00
Alexander Bayandin	7374634845	test_runner: clean up test_compatibility (#4770 ) ## Problem We have some amount of outdated logic in test_compatibility, that we don't need anymore. ## Summary of changes - Remove `PR4425_ALLOWED_DIFF` and tune `dump_differs` method to accept allowed diffs in the future (a cleanup after https://github.com/neondatabase/neon/pull/4425) - Remote etcd related code (a cleanup after https://github.com/neondatabase/neon/pull/2733) - Don't set `preserve_database_files`	2023-07-28 16:15:31 +01:00
Alexander Bayandin	9fdd3a4a1e	test_runner: add amcheck to test_compatibility (#4772 ) Run `pg_amcheck` in forward and backward compatibility tests to catch some data corruption. ## Summary of changes - Add amcheck compiling to Makefile - Add `pg_amcheck` to test_compatibility	2023-07-28 16:00:55 +01:00
cui fliter	f2e2b8a7f4	fix some typos (#4662 ) Typos fix Signed-off-by: cui fliter <imcusg@gmail.com>	2023-07-25 14:39:29 +03:00
Alexander Bayandin	ed845b644b	Prevent unintentional Postgres submodule update (#4692 ) ## Problem Postgres submodule can be changed unintentionally, and these changes are easy to miss during the review. Adding a check that should prevent this from happening, the check fails `build-neon` job with the following message: ``` Expected postgres-v14 rev to be at '1414141414141414141414141414141414141414', but it is at '1144aee1661c79eec65e784a8dad8bd450d9df79' Expected postgres-v15 rev to be at '1515151515151515151515151515151515151515', but it is at '1984832c740a7fa0e468bb720f40c525b652835d' Please update vendors/revisions.json if these changes are intentional. ``` This is an alternative approach to https://github.com/neondatabase/neon/pull/4603 ## Summary of changes - Add `vendor/revisions.json` file with expected revisions - Add built-time check (to `build-neon` job) that Postgres submodules match revisions from `vendor/revisions.json` - A couple of small improvements for logs from https://github.com/neondatabase/neon/pull/4603 - Fixed GitHub autocomment for no tests was run case --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-12 15:12:37 +01:00
Alexander Bayandin	33c2d94ba6	Fix git-env version for PRs (#4641 ) ## Problem Binaries created from PRs (both in docker images and for tests) have wrong git-env versions, they point to phantom merge commits. ## Summary of changes - Prefer GIT_VERSION env variable even if git information was accessible - Use `${{ github.event.pull_request.head.sha \|\| github.sha }}` instead of `${{ github.sha }}` for `GIT_VERSION` in workflows So the builds will still happen from this phantom commit, but we will report the PR commit. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-10 20:01:01 +01:00
Sasha Krassovsky	b1477b4448	Create neon_superuser role, grant it to roles created from control plane (#4425 ) ## Problem Currently, if a user creates a role, it won't by default have any grants applied to it. If the compute restarts, the grants get applied. This gives a very strange UX of being able to drop roles/not have any access to anything at first, and then once something triggers a config application, suddenly grants are applied. This removes these grants.	2023-06-24 01:38:27 +03:00
Dmitry Rodionov	472cc17b7a	propagate lock guard to background deletion task (#4495 ) ## Problem 1. During the rollout we got a panic: "timeline that we were deleting was concurrently removed from 'timelines' map" that was caused by lock guard not being propagated to the background part of the deletion. Existing test didnt catch it because failpoint that was used for verification was placed earlier prior to background task spawning. 2. When looking at surrounding code one more bug was detected. We removed timeline from the map before deletion is finished, which breaks client retry logic, because it will indicate 404 before actual deletion is completed which can lead to client stopping its retry poll earlier. ## Summary of changes 1. Carry the lock guard over to background deletion. Ensure existing test case fails without applied patch (second deletion becomes stuck without it, which eventually leads to a test failure). 2. Move delete_all call earlier so timeline is removed from the map is the last thing done during deletion. Additionally I've added timeline_id to the `update_gc_info` span, because `debug_assert_current_span_has_tenant_and_timeline_id` in `download_remote_layer` was firing when `update_gc_info` lead to on-demand downloads via `find_lsn_for_timestamp` (caught by @problame). This is not directly related to the PR but fixes possible flakiness. Another smaller set of changes involves deletion wrapper used in python tests. Now there is a simpler wrapper that waits for deletions to complete `timeline_delete_wait_completed`. Most of the test_delete_timeline.py tests make negative tests, i.e., "does ps_http.timeline_delete() fail in this and that scenario". These can be left alone. Other places when we actually do the deletions, we need to use the helper that polls for completion. Discussion https://neondb.slack.com/archives/C03F5SM1N02/p1686668007396639 resolves #4496 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-06-15 17:30:12 +03:00
Heikki Linnakangas	df3bae2ce3	Use `compute_ctl` to manage Postgres in tests. (#3886 ) This adds test coverage for 'compute_ctl', as it is now used by all the python tests. There are a few differences in how 'compute_ctl' is called in the tests, compared to the real web console: - In the tests, the postgresql.conf file is included as one large string in the spec file, and it is written out as it is to the data directory. I added a new field for that to the spec file. The real web console, however, sets all the necessary settings in the 'settings' field, and 'compute_ctl' creates the postgresql.conf from those settings. - In the tests, the information needed to connect to the storage, i.e. tenant_id, timeline_id, connection strings to pageserver and safekeepers, are now passed as new fields in the spec file. The real web console includes them as the GUCs in the 'settings' field. (Both of these are different from what the test control plane used to do: It used to write the GUCs directly in the postgresql.conf file). The plan is to change the control plane to use the new method, and remove the old method, but for now, support both. Some tests that were sensitive to the amount of WAL generated needed small changes, to accommodate that compute_ctl runs the background health monitor which makes a few small updates. Also some tests shut down the pageserver, and now that the background health check can run some queries while the pageserver is down, that can produce a few extra errors in the logs, which needed to be allowlisted. Other changes: - remove obsolete comments about PostgresNode; - create standby.signal file for Static compute node; - log output of `compute_ctl` and `postgres` is merged into `endpoints/compute.log`. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-06-06 14:59:36 +01:00
Alexander Bayandin	1b2ece3715	Re-enable compatibility tests on Postgres 15 (#4274 ) - Enable compatibility tests for Postgres 15 - Also add `PgVersion::v_prefixed` property to return the version number with, _guess what,_ v-prefix!	2023-05-18 19:56:09 +01:00
Alexander Bayandin	bb06d281ea	Run regressions tests on both Postgres 14 and 15 (#4192 ) This PR adds tests runs on Postgres 15 and created unified Allure report with results for all tests. - Split `.github/actions/allure-report` into `.github/actions/allure-report-store` and `.github/actions/allure-report-generate` - Add debug or release pytest parameter for all tests (depending on `BUILD_TYPE` env variable) - Add Postgres version as a pytest parameter for all tests (depending on `DEFAULT_PG_VERSION` env variable) - Fix `test_wal_restore` and `restore_from_wal.sh` to support path with `[`/`]` in it (fixed by applying spellcheck to the script and fixing all warnings), `restore_from_wal_archive.sh` is deleted as unused. - All known failures on Postgres 15 marked with xfail	2023-05-12 15:28:51 +01:00
Alexander Bayandin	653e633c59	test_runner: add --pg-version pytest argument (#4037 ) - allows setting Postgres version for testing using --pg-version argument - fixes tests for the non-default Postgres version.	2023-05-05 02:57:47 +03:00
Heikki Linnakangas	53f438a8a8	Rename "Postgres nodes" in control_plane to endpoints. We use the term "endpoint" in for compute Postgres nodes in the web UI and user-facing documentation now. Adjust the nomenclature in the code. This changes the name of the "neon_local pg" command to "neon_local endpoint". Also adjust names of classes, variables etc. in the python tests accordingly. This also changes the directory structure so that endpoints are now stored in: .neon/endpoints/<endpoint id> instead of: .neon/pgdatadirs/tenants/<tenant_id>/<endpoint (node) name> The tenant ID is no longer part of the path. That means that you cannot have two endpoints with the same name/ID in two different tenants anymore. That's consistent with how we treat endpoints in the real control plane and proxy: the endpoint ID must be globally unique.	2023-04-13 14:34:29 +03:00
Dmitry Rodionov	bfeb428d1b	tests: make neon_fixtures a bit thinner by splitting out some pageserver related helpers (#3977 ) neon_fixture is quite big and messy, lets clean it up a bit.	2023-04-07 13:47:28 +03:00
Dmitry Rodionov	b45c92e533	tests: exclude compatibility tests by default (#3975 ) This allows to skip compatibility tests based on `CHECK_ONDISK_DATA_COMPATIBILITY` environment variable. When the variable is missing (default) compatibility tests wont be run.	2023-04-06 21:21:39 +03:00
Heikki Linnakangas	77107607f3	Allow JWT key generation to fail if authentication is not enabled. This allows you to run without the 'openssl' binary as long as you don't enable authentication. This becomes more important with the next commit, which switches the JWT algorithm to EdDSA. LibreSSL does not support EdDSA, and LibreSSL comes with macOS, so the next commit makes it much more likely for the key generation to fail for macOS users. To allow running without a keypair, don't generate the authentication token in the 'neon_local init' step. Instead, generate a new token on every request that needs one, using the private key.	2023-03-20 16:28:01 +02:00
Heikki Linnakangas	10a5d36af8	Separate mgmt and libpq authentication configs in pageserver. (#3773 ) This makes it possible to enable authentication only for the mgmt HTTP API or the compute API. The HTTP API doesn't need to be directly accessible from compute nodes, and it can be secured through network policies. This also allows rolling out authentication in a piecemeal fashion.	2023-03-15 13:52:29 +02:00
Stas Kelvich	069b5b0a06	Make `postgres --wal-redo` more embeddable. * Stop allocating and maintaining 128MB hash table for last written LSN cache as it is not needed in wal-redo. * Do not require access to the initialized data directory. That saves few dozens megabytes of empty but initialized data directory. Currently such directories do occupy about 10% of the disk space on the pageservers as most of tenants are empty. * Move shmem-initialization code to the extension instead of postgres	2023-03-07 15:01:14 +02:00
Kirill Bulatov	4d201619ed	Remove large database files after every test suite (#3090 ) Closes https://github.com/neondatabase/neon/issues/1984 Closes https://github.com/neondatabase/neon/pull/2830 A follow-up of https://github.com/neondatabase/neon/pull/2830, I've noticed that benchmarks failed again due to out of space issues. Removes most of the pageserver and safekeeper files from disk after every pytest suite run. ``` $ poetry run pytest -vvsk "test_tenant_redownloads_truncated_file_on_startup[local_fs]" # ... $ du -h test_output/test_tenant_redownloads_truncated_file_on_startup\[local_fs\] # ... 104K test_output/test_tenant_redownloads_truncated_file_on_startup[local_fs] $ poetry run pytest -vvsk "test_tenant_redownloads_truncated_file_on_startup[local_fs]" --preserve-database-files # ... $ du -h test_output/test_tenant_redownloads_truncated_file_on_startup\[local_fs\] # ... 123M test_output/test_tenant_redownloads_truncated_file_on_startup[local_fs] ``` Co-authored-by: Bojan Serafimov <bojan.serafimov7@gmail.com>	2022-12-14 13:09:08 +00:00
Arseny Sher	32662ff1c4	Replace etcd with storage_broker. This is the replacement itself, the binary landed earlier. See docs/storage_broker.md. ref https://github.com/neondatabase/neon/pull/2466 https://github.com/neondatabase/neon/issues/2394	2022-12-12 13:30:16 +03:00
Alexander Bayandin	2b728bc69e	test_forward_compatibility: fix path to pg_distrib_dir (#2826 ) Set correct `pg_distrib_dir` in `pageserver.toml` and in neon_local `config`. `test_forward_compatibility` shows flakiness during `neon_local pg start`, so hopefully, the patch will help. ``` 2022-11-15 16:07:34.091 GMT [13338] LOG: starting with zenith basebackup at LSN 0/A6A9310, prev 0/0 2022-11-15 16:07:34.091 GMT [13338] FATAL: cannot start in read-write mode from this base backup 2022-11-15 16:07:34.091 GMT [13337] LOG: startup process (PID 13338) exited with exit code 1 ```	2022-11-16 15:14:36 +00:00
Heikki Linnakangas	46d30bf054	Check for errors in pageserver log after each test. If there are any unexpected ERRORs or WARNs in pageserver.log after test finishes, fail the test. This requires whitelisting the errors that are expected in each test, and there's also a few common errors that are printed by most tests, which are whitelisted in the fixture itself. With this, we don't need the special abort() call in testing mode, when compaction or GC fails. Those failures will print ERRORs to the logs, which will be picked up by this new mechanisms. A bunch of errors are currently whitelisted that we probably shouldn't be emitting in the first place, but fixing those is out of scope for this commit, so I just left FIXME comments on them.	2022-11-15 18:47:28 +02:00
Alexander Bayandin	c4f9f1dc6d	Add data format forward compatibility tests (#2766 ) Add `test_forward_compatibility`, which checks if it's going to be possible to roll back a release to the previous version. The test uses artifacts (Neon & Postgres binaries) from the previous release to start Neon on the repo created by the current version. It performs exactly the same checks as `test_backward_compatibility` does. Single `ALLOW_BREAKING_CHANGES` env var got replaced by `ALLOW_BACKWARD_COMPATIBILITY_BREAKAGE` & `ALLOW_FORWARD_COMPATIBILITY_BREAKAGE` and can be set by `backward compatibility breakage` and `forward compatibility breakage` labels respectively.	2022-11-10 09:06:34 +00:00
Alexander Bayandin	c1a76eb0e5	test_runner: replace global variables with fixtures (#2754 ) This PR replaces the following global variables in the test framework with fixtures to make tests more configurable. I mainly need this for the forward compatibility tests (draft in https://github.com/neondatabase/neon/pull/2766). ``` base_dir neon_binpath pg_distrib_dir top_output_dir default_pg_version (this one got replaced with a fixture named pg_version) ``` Also, this PR adds more `Path` type where the code implies it.	2022-11-07 18:39:51 +00:00
Alexander Bayandin	0a0595b98d	test_backward_compatibility: assign random port to compute (#2738 )	2022-11-02 15:22:38 +00:00
Kirill Bulatov	d42700280f	Remove daemonize from storage components (#2677 ) Move daemonization logic into `control_plane`. Storage binaries now only crate a lockfile to avoid concurrent services running in the same directory.	2022-11-02 02:26:37 +02:00
Arseny Sher	596d622a82	Fix test_prepare_snapshot. It should checkpoint pageserver after waiting for all data arrival, not before.	2022-10-28 22:12:31 +04:00
Alexander Bayandin	0cbae6e8f3	test_backward_compatibility: friendlier error message (#2707 )	2022-10-27 15:54:49 +00:00
Alexander Bayandin	834ffe1bac	Add data format backward compatibility tests (#2626 )	2022-10-25 16:41:50 +02:00

1 2

92 Commits