rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 05:52:55 +00:00

Author	SHA1	Message	Date
Konstantin Knizhnik	66fa176cc8	Handle update of VM in XLOG_HEAP_LOCK/XLOG_HEAP2_LOCK_UPDATED WAL records (#4896 ) ## Problem VM should be updated if XLH_LOCK_ALL_FROZEN_CLEARED flags is set in XLOG_HEAP_LOCK,XLOG_HEAP_2_LOCK_UPDATED WAL records ## Summary of changes Add handling of this records in walingest.rs ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-15 17:47:29 +03:00
Heikki Linnakangas	9e6b5b686c	Add a test case for "CREATE DATABASE STRATEGY=file_copy". (#5301 ) It was utterly broken on v15 before commit `83e7e5dbbd`, which fixed the incorrect definition of XLOG_DBASE_CREATE_WAL_LOG. We never noticed because we had no tests for it.	2023-09-15 16:50:57 +03:00
Rahul Modpur	e6985bd098	Move tenant & timeline dir method to NeonPageserver and use them everywhere (#5262 ) ## Problem In many places in test code, paths are built manually from what NeonEnv.tenant_dir and NeonEnv.timeline_dir could do. ## Summary of changes 1. NeonEnv.tenant_dir and NeonEnv.timeline_dir moved under class NeonPageserver as the path they use is per-pageserver instance. 2. Used these everywhere to replace manual path building Closes #5258 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-09-15 11:17:18 +01:00
Konstantin Knizhnik	e400a38fb9	References to old and new blocks were mixed in xlog_heap_update handler (#5312 ) ## Problem See https://neondb.slack.com/archives/C05L7D1JAUS/p1694614585955029 https://www.notion.so/neondatabase/Duplicate-key-issue-651627ce843c45188fbdcb2d30fd2178 ## Summary of changes Swap old/new block references ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-15 10:32:25 +03:00
Alexander Bayandin	bd36d1c44a	approved-for-ci-run.yml: fix variable name and permissions (#5307 ) ## Problem - `gh pr list` fails with `unknown argument "main"; please quote all values that have spaces due to using a variable with the wrong name - `permissions: write-all` are too wide for the job ## Summary of changes - For variable name `HEAD` -> `BRANCH` - Grant only required permissions for each job --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-14 20:18:49 +03:00
Alexander Bayandin	0501b74f55	Update checksum for pg_hint_plan (#5309 ) ## Problem The checksum for `pg_hint_plan` doesn't match: ``` sha256sum: WARNING: 1 computed checksum did NOT match ``` Ref https://github.com/neondatabase/neon/actions/runs/6185715461/job/16793609251?pr=5307 It seems that the release was retagged yesterday: https://github.com/ossc-db/pg_hint_plan/releases/tag/REL16_1_6_0 I don't see any malicious changes from 15_1.5.1: https://github.com/ossc-db/pg_hint_plan/compare/REL15_1_5_1...REL16_1_6_0, so it should be ok to update. ## Summary of changes - Update checksum for `pg_hint_plan` 16_1.6.0	2023-09-14 18:17:50 +03:00
Em Sharnoff	3895829bda	vm-monitor: Fix cgroup throttling (#5303 ) I believe this (not actual IO problems) is the cause of the "disk speed issue" that we've had for VMs recently. See e.g.: 1. https://neondb.slack.com/archives/C03H1K0PGKH/p1694287808046179?thread_ts=1694271790.580099&cid=C03H1K0PGKH 2. https://neondb.slack.com/archives/C03H1K0PGKH/p1694511932560659 The vm-informant (and now, the vm-monitor, its replacement) is supposed to gradually increase the `neon-postgres` cgroup's memory.high value, because otherwise the kernel will throttle all the processes in the cgroup. This PR fixes a bug with the vm-monitor's implementation of this behavior. --- Other references, for the vm-informant's implementation: - Original issue: neondatabase/autoscaling#44 - Original PR: neondatabase/autoscaling#223	2023-09-14 13:21:50 +03:00
Joonas Koivunen	ffd146c3e5	refactor: globals in tests (#5298 ) Refactor tests to have less globals. This will allow to hopefully write more complex tests for our new metric collection requirements in #5297. Includes reverted work from #4761 related to test globals. Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: MMeent <matthias@neon.tech>	2023-09-13 22:05:30 +03:00
Konstantin Knizhnik	1697e7b319	Fix lfc_ensure_function which now disables LFC (#5294 ) ## Problem There was a bug in lfc_ensure_opened which actually disables LFC ## Summary of changes Return true ifLFC file is normally opened ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-13 08:56:03 +03:00
bojanserafimov	8556d94740	proxy http: reproduce issue with transactions in pool (#5293 ) xfail test reproducing issue https://github.com/neondatabase/neon/issues/4698	2023-09-12 17:13:25 -04:00
MMeent	3b6b847d76	Fixes for Pg16: (#5292 ) - pagestore_smgr.c had unnecessary WALSync() (see #5287 ) - Compute node dockerfile didn't build the neon_rmgr extension - Add PostgreSQL 16 image to docker-compose tests - Fix issue with high CPU usage in Safekeeper due to a bug in WALSender Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-09-12 22:02:03 +03:00
Alexander Bayandin	2641ff3d1a	Use CI_ACCESS_TOKEN to create release PR (#5286 ) ## Problem If @github-actions creates release PR, the CI pipeline is not triggered (but we have `release-notify.yml` workflow that we expect to run on this event). I suspect this happened because @github-actions is not a repository member. Ref https://github.com/neondatabase/neon/pull/5283#issuecomment-1715209291 ## Summary of changes - Use `CI_ACCESS_TOKEN` to create a PR - Use `gh` instead of `thomaseizinger/create-pull-request` - Restrict permissions for GITHUB_TOKEN to `contents: write` only (required for `git push`)	2023-09-12 20:01:21 +01:00
Alexander Bayandin	e1661c3c3c	approved-for-ci-run.yml: fix ci-run/pr-* branch deletion (#5278 ) ## Problem `ci-run/pr-*` branches (and attached PRs) should be deleted automatically when their parent PRs get closed. But there are not ## Summary of changes - Fix if-condition	2023-09-12 19:29:26 +03:00
Alexander Bayandin	9c3f38e10f	Document how to run CI for external contributors (#5279 ) ## Problem We don't have this instruction written anywhere but in internal Slack ## Summary of changes - Add `How to run a CI pipeline on Pull Requests from external contributors` section to `CONTRIBUTING.md` --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-09-12 16:53:13 +01:00
Christian Schwarz	ab1f37e908	revert recent VirtualFile asyncification changes (#5291 ) Motivation ========== We observed two "indigestion" events on staging, each shortly after restarting `pageserver-0.eu-west-1.aws.neon.build`. It has ~8k tenants. The indigestion manifests as `Timeline::get` calls failing with `exceeded evict iter limit` . The error is from `page_cache.rs`; it was unable to find a free page and hence failed with the error. The indigestion events started occuring after we started deploying builds that contained the following commits: ``` [~/src/neon]: git log --oneline c0ed362790caa368aa65ba57d352a2f1562fd6bf..15eaf78083ecff62b7669 091da1a1c8b4f60ebf8 `15eaf7808` Disallow block_in_place and Handle::block_on (#5101) `a18d6d9ae` Make File opening in VirtualFile async-compatible (#5280) `76cc87398` Use tokio locks in VirtualFile and turn with_file into macro (#5247) ``` The second and third commit are interesting. They add .await points to the VirtualFile code. Background ========== On the read path, which is the dominant user of page cache & VirtualFile during pageserver restart, `Timeline::get` `page_cache` and VirtualFile interact as follows: 1. Timeline::get tries to read from a layer 2. This read goes through the page cache. 3. If we have a page miss (which is known to be common after restart), page_cache uses `find_victim` to find an empty slot, and once it has found a slot, it gives exclusive ownership of it to the caller through a `PageWriteGuard`. 4. The caller is supposed to fill the write guard with data from the underlying backing store, i.e., the layer `VirtualFile`. 5. So, we call into `VirtualFile::read_at`` to fill the write guard. The `find_victim` method finds an empty slot using a basic implementation of clock page replacement algorithm. Slots that are currently in use (`PageReadGuard` / `PageWriteGuard`) cannot become victims. If there have been too many iterations, `find_victim` gives up with error `exceeded evict iter limit`. Root Cause For Indigestion ========================== The second and third commit quoted in the "Motivation" section introduced `.await` points in the VirtualFile code. These enable tokio to preempt us and schedule another future __while__ we hold the `PageWriteGuard` and are calling `VirtualFile::read_at`. This was not possible before these commits, because there simply were no await points that weren't Poll::Ready immediately. With the offending commits, there is now actual usage of `tokio::sync::RwLock` to protect the VirtualFile file descriptor cache. And we __know__ from other experiments that, during the post-restart "rush", the VirtualFile fd cache __is__ too small, i.e., all slots are taken by _ongoing_ VirtualFile operations and cannot be victims. So, assume that VirtualFile's `find_victim_slot`'s `RwLock::write().await` calls _will_ yield control to the executor. The above can lead to the pathological situation if we have N runnable tokio tasks, each wanting to do `Timeline::get`, but only M slots, N >> M. Suppose M of the N tasks win a PageWriteGuard and get preempted at some .await point inside `VirtualFile::read_at`. Now suppose tokio schedules the remaining N-M tasks for fairness, then schedules the first M tasks again. Each of the N-M tasks will run `find_victim()` until it hits the `exceeded evict iter limit`. Why? Because the first M tasks took all the slots and are still holding them tight through their `PageWriteGuard`. The result is massive wastage of CPU time in `find_victim()`. The effort to find a page is futile, but each of the N-M tasks still attempts it. This delays the time when tokio gets around to schedule the first M tasks again. Eventually, tokio will schedule them, they will make progress, fill the `PageWriteGuard`, release it. But in the meantime, the N-M tasks have already bailed with error `exceeded evict iter limit`. Eventually, higher level mechanisms will retry for the N-M tasks, and this time, there won't be as many concurrent tasks wanting to do `Timeline::get`. So, it will shake out. But, it's a massive indigestion until then. This PR ======= This PR reverts the offending commits until we find a proper fix. ``` Revert "Use tokio locks in VirtualFile and turn with_file into macro (#5247)" This reverts commit `76cc87398c`. Revert "Make File opening in VirtualFile async-compatible (#5280)" This reverts commit `a18d6d9ae3`. ```	2023-09-12 17:38:31 +02:00
MMeent	83e7e5dbbd	Feat/postgres 16 (#4761 ) This adds PostgreSQL 16 as a vendored postgresql version, and adapts the code to support this version. The important changes to PostgreSQL 16 compared to the PostgreSQL 15 changeset include the addition of a neon_rmgr instead of altering Postgres's original WAL format. Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-12 15:11:32 +02:00
Christian Schwarz	5be8d38a63	fix deadlock around TENANTS (#5285 ) The sequence that can lead to a deadlock: 1. DELETE request gets all the way to `tenant.shutdown(progress, false).await.is_err() ` , while holding TENANTS.read() 2. POST request for tenant creation comes in, calls `tenant_map_insert`, it does `let mut guard = TENANTS.write().await;` 3. Something that `tenant.shutdown()` needs to wait for needs a `TENANTS.read().await`. The only case identified in exhaustive manual scanning of the code base is this one: Imitate size access does `get_tenant().await`, which does `TENANTS.read().await` under the hood. In the above case (1) waits for (3), (3)'s read-lock request is queued behind (2)'s write-lock, and (2) waits for (1). Deadlock. I made a reproducer/proof-that-above-hypothesis-holds in https://github.com/neondatabase/neon/pull/5281 , but, it's not ready for merge yet and we want the fix _now_. fixes https://github.com/neondatabase/neon/issues/5284	2023-09-12 11:23:46 +02:00
John Spray	36c261851f	s3_scrubber: remove atty dependency (#5171 ) ## Problem - https://github.com/neondatabase/neon/security/dependabot/28 ## Summary of changes Remove atty, and remove the `with_ansi` arg to scrubber's stdout logger.	2023-09-12 10:11:41 +01:00
Arpad Müller	15eaf78083	Disallow block_in_place and Handle::block_on (#5101 ) ## Problem `block_in_place` is a quite expensive operation, and if it is used, we should explicitly have to opt into it by allowing the `clippy::disallowed_methods` lint. For more, see https://github.com/neondatabase/neon/pull/5023#discussion_r1304194495. Similar arguments exist for `Handle::block_on`, but we don't do this yet as there is still usages. ## Summary of changes Adds a clippy.toml file, configuring the [`disallowed_methods` lint](https://rust-lang.github.io/rust-clippy/master/#/disallowed_method).	2023-09-12 00:11:16 +00:00
Arpad Müller	a18d6d9ae3	Make File opening in VirtualFile async-compatible (#5280 ) ## Problem Previously, we were using `observe_closure_duration` in `VirtualFile` file opening code, but this doesn't support async open operations, which we want to use as part of #4743. ## Summary of changes * Move the duration measurement from the `with_file` macro into a `observe_duration` macro. * Some smaller drive-by fixes to replace the old strings with the new variant names introduced by #5273 Part of #4743, follow-up of #5247.	2023-09-11 18:41:08 +02:00
Arpad Müller	76cc87398c	Use tokio locks in VirtualFile and turn with_file into macro (#5247 ) ## Problem For #4743, we want to convert everything up to the actual I/O operations of `VirtualFile` to `async fn`. ## Summary of changes This PR is the last change in a series of changes to `VirtualFile`: #5189, #5190, #5195, #5203, and #5224. It does the last preparations before the I/O operations are actually made async. We are doing the following things: * First, we change the locks for the file descriptor cache to tokio's locks that support Send. This is important when one wants to hold locks across await points (which we want to do), otherwise the Future won't be Send. Also, one shouldn't generally block in async code as executors don't like that. * Due to the lock change, we now take an approach for the `VirtualFile` destructors similar to the one proposed by #5122 for the page cache, to use `try_write`. Similarly to the situation in the linked PR, one can make an argument that if we are in the destructor and the slot has not been reused yet, we are the only user accessing the slot due to owning the lock mutably. It is still possible that we are not obtaining the lock, but the only cause for that is the clock algorithm touching the slot, which should be quite an unlikely occurence. For the instance of `try_write` failing, we spawn an async task to destroy the lock. As just argued however, most of the time the code path where we spawn the task should not be visited. * Lastly, we split `with_file` into a macro part, and a function part that contains most of the logic. The function part returns a lock object, that the macro uses. The macro exists to perform the operation in a more compact fashion, saving code from putting the lock into a variable and then doing the operation while measuring the time to run it. We take the locks approach because Rust has no support for async closures. One can make normal closures return a future, but that approach gets into lifetime issues the moment you want to pass data to these closures via parameters that has a lifetime (captures work). For details, see [this](https://smallcultfollowing.com/babysteps/blog/2023/03/29/thoughts-on-async-closures/) and [this](https://users.rust-lang.org/t/function-that-takes-an-async-closure/61663) link. In #5224, we ran into a similar problem with the `test_files` function, and we ended up passing the path and the `OpenOptions` by-value instead of by-ref, at the expense of a few extra copies. This can be done as the data is cheaply copyable, and we are in test code. But here, we are not, and while `File::try_clone` exists, it [issues system calls internally](`1e746d7741/library/std/src/os/fd/owned.rs (L94-L111)`). Also, it would allocate an entirely new file descriptor, something that the fd cache was built to prevent. * We change the `STORAGE_IO_TIME` metrics to support async. Part of #4743.	2023-09-11 17:35:05 +02:00
bojanserafimov	c0ed362790	Measure pageserver wal recovery time and fix flush() method (#5240 )	2023-09-11 09:46:06 -04:00
duguorong009	d7fa2dba2d	fix(pageserver): update the `STORAGE_IO_TIME` metrics to avoid expensive operations (#5273 ) Introduce the `StorageIoOperation` enum, `StorageIoTime` struct, and `STORAGE_IO_TIME_METRIC` static which provides lockless access to histograms consumed by `VirtualFile`. Closes #5131 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-11 14:58:15 +03:00
Joonas Koivunen	a55a78a453	Misc test flakyness fixes (#5233 ) Assorted flakyness fixes from #5198, might not be flaky on `main`. Migrate some tests using neon_simple_env to just neon_env_builder and using initial_tenant to make flakyness understanding easier. (Did not understand the flakyness of `test_timeline_create_break_after_uninit_mark`.) `test_download_remote_layers_api` is flaky because we have no atomic "wait for WAL, checkpoint, wait for upload and do not receive any more WAL". `test_tenant_size` fixes are just boilerplate which should had always existed; we should wait for the tenant to be active. similarly for `test_timeline_delete`. `test_timeline_size_post_checkpoint` fails often for me with reading zero from metrics. Give it a few attempts.	2023-09-11 11:42:49 +03:00
Rahul Modpur	999fe668e7	Ack tenant detach before local files are deleted (#5211 ) ## Problem Detaching a tenant can involve many thousands of local filesystem metadata writes, but the control plane would benefit from us not blocking detach/delete responses on these. ## Summary of changes After rename of local tenant directory ack tenant detach and delete tenant directory in background #5183 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-09-10 22:59:51 +03:00
Alexander Bayandin	d33e1b1b24	approved-for-ci-run.yml: use token to checkout the repo (#5266 ) ## Problem Another thing I overlooked regarding'approved-for-ci-run`: - When we create a PR, the action is associated with @vipvap and this triggers the pipeline — this is good. - When we update the PR by force-pushing to the branch, the action is associated with @github-actions, which doesn't trigger a pipeline — this is bad. Initially spotted in #5239 / #5211 ([link](https://github.com/neondatabase/neon/actions/runs/6122249456/job/16633919558?pr=5239)) — `check-permissions` should not fail. ## Summary of changes - Use `CI_ACCESS_TOKEN` to check out the repo (I expect this token will be reused in the following `git push`)	2023-09-10 20:12:38 +01:00
Alexander Bayandin	15fd188fd6	Fix GitHub Autocomment for `ci-run/pr`s (#5268 ) ## Problem When PR `ci-run/pr-*` is created the GitHub Autocomment with test results are supposed to be posted to the original PR, currently, this doesn't work. I created this PR from a personal fork to debug and fix the issue. ## Summary of changes - `scripts/comment-test-report.js`: use `pull_request.head` instead of `pull_request.base` 🤦	2023-09-10 20:06:10 +01:00
Alexander Bayandin	34e39645c4	GitHub Workflows: add actionlint (#5265 ) ## Problem Add a CI pipeline that checks GitHub Workflows with https://github.com/rhysd/actionlint (it uses `shellcheck` for shell scripts in steps) To run it locally: `SHELLCHECK_OPTS=--exclude=SC2046,SC2086 actionlint` ## Summary of changes - Add `.github/workflows/actionlint.yml` - Fix actionlint warnings	2023-09-10 20:05:07 +01:00
Em Sharnoff	1cac923af8	vm-monitor: Rate-limit upscale requests (#5263 ) Some VMs, when already scaled up as much as possible, end up spamming the autoscaler-agent with upscale requests that will never be fulfilled. If postgres is using memory greater than the cgroup's memory.high, it can emit new memory.high events 1000 times per second, which... just means unnecessary load on the rest of the system. This changes the vm-monitor so that we skip sending upscale requests if we already sent one within the last second, to avoid spamming the autoscaler-agent. This matches previous behavior that the vm-informant hand.	2023-09-10 20:33:53 +03:00
Em Sharnoff	853552dcb4	vm-monitor: Don't include Args in top-level span (#5264 ) It makes the logs too verbose. ref https://neondb.slack.com/archives/C03F5SM1N02/p1694281232874719?thread_ts=1694272777.207109&cid=C03F5SM1N02	2023-09-10 20:15:53 +03:00
Alexander Bayandin	1ea93af56c	Create GitHub release from release tag (#5246 ) ## Problem This PR creates a GitHub release from a release tag with an autogenerated changelog: https://github.com/neondatabase/neon/releases ## Summary of changes - Call GitHub API to create a release	2023-09-09 22:02:28 +01:00
Konstantin Knizhnik	f64b338ce3	Ingore DISK_FULL error when performing availability check for client (#5010 ) See #5001 No space is what's expected if we're at size limit. Of course if SK incorrectly returned "no space", the availability check wouldn't fire. But users would notice such a bug quite soon anyways. So ignoring "no space" is the right trade-off. ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-09 21:51:04 +03:00
Konstantin Knizhnik	ba06ea26bb	Fix issues with reanabling LFC (#5209 ) refer #5208 ## Problem See https://neondb.slack.com/archives/C03H1K0PGKH/p1693938336062439?thread_ts=1693928260.704799&cid=C03H1K0PGKH #5208 disable LFC forever in case of error. It is not good because the problem causing this error (for example ENOSPC) can be resolved anti will be nice to reenable it after fixing. Also #5208 disables LFC locally in one backend. But other backends may still see corrupted data. It should not cause problems right now with "permission denied" error because there should be no backend which is able to normally open LFC. But in case of out-of-disk-space error, other backend can read corrupted data. ## Summary of changes 1. Cleanup hash table after error to prevent access to stale or corrupted data 2. Perform disk write under exclusive lock (hoping it will not affect performance because usually write just copy data from user to system space) 3. Use generations to prevent access to stale data in lfc_read ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-09-09 17:51:16 +03:00
Joonas Koivunen	6f28da1737	fix: LocalFs root in test_compatibility is PosixPath('...') (#5261 ) I forgot a `str(...)` conversion in #5243. This lead to log lines such as: ``` Using fs root 'PosixPath('/tmp/test_output/test_backward_compatibility[debug-pg14]/compatibility_snapshot/repo/local_fs_remote_storage/pageserver')' as a remote storage ``` This surprisingly works, creating hierarchy of under current working directory (`repo_dir` for tests): - `PosixPath('` - `tmp` .. up until .. `local_fs_remote_storage` - `pageserver')` It should not work but right now test_compatibility.py tests finds local metadata and layers, which end up used. After #5172 when remote storage is the source of truth it will no longer work.	2023-09-08 20:27:00 +03:00
Heikki Linnakangas	60050212e1	Update rdkit to version 2023_03_03. (#5260 ) It includes PostgreSQL 16 support.	2023-09-08 19:40:29 +03:00
Joonas Koivunen	66633ef2a9	rust-toolchain: use 1.72.0, same as CI (#5256 ) Switches everyone without an `rustup override` to 1.72.0. Code changes required already done in #5255. Depends on https://github.com/neondatabase/build/pull/65.	2023-09-08 19:36:02 +03:00
Alexander Bayandin	028fbae161	Miscellaneous fixes for tests-related things (#5259 ) ## Problem A bunch of fixes for different test-related things ## Summary of changes - Fix test_runner/pg_clients (`subprocess_capture` return value has changed) - Do not run create-test-report if check-permissions failed for not cancelled jobs - Fix Code Coverage comment layout after flaky tests. Add another healing "\n" - test_compatibility: add an instruction for local run Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-08 16:28:09 +01:00
John Spray	7b6337db58	tests: enable multiple pageservers in `neon_local` and `neon_fixture` (#5231 ) ## Problem Currently our testing environment only supports running a single pageserver at a time. This is insufficient for testing failover and migrations. - Dependency of writing tests for #5207 ## Summary of changes - `neon_local` and `neon_fixture` now handle multiple pageservers - This is a breaking change to the `.neon/config` format: any local environments will need recreating - Existing tests continue to work unchanged: - The default number of pageservers is 1 - `NeonEnv.pageserver` is now a helper property that retrieves the first pageserver if there is only one, else throws. - Pageserver data directories are now at `.neon/pageserver_{n}` where n is 1,2,3... - Compatibility tests get some special casing to migrate neon_local configs: these are not meant to be backward/forward compatible, but they were treated that way by the test.	2023-09-08 16:19:57 +01:00
Konstantin Knizhnik	499d0707d2	Perform throttling for concurrent build index which is done outside transaction (#5048 ) See https://neondb.slack.com/archives/C03H1K0PGKH/p1692550646191429 ## Problem Build index concurrently is writing WAL outside transaction. `backpressure_throttling_impl` doesn't perform throttling for read-only transactions (not assigned XID). It cause huge write lag which can cause large delay of accessing the table. ## Summary of changes Looks at `PROC_IN_SAFE_IC` in process state set during concurrent index build. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-09-08 18:05:08 +03:00
Joonas Koivunen	720d59737a	rust-1.72.0 changes (#5255 ) Prepare to upgrade rust version to latest stable. - `rustfmt` has learned to format `let irrefutable = $expr else { ... };` blocks - There's a new warning about virtual (workspace) crate resolver, picked the latest resolver as I suspect everyone would expect it to be the latest; should not matter anyways - Some new clippies, which seem alright	2023-09-08 16:28:41 +03:00
Joonas Koivunen	ff87fc569d	test: Remote storage refactorings (#5243 ) Remote storage cleanup split from #5198: - pageserver, extensions, and safekeepers now have their separate remote storage - RemoteStorageKind has the configuration code - S3Storage has the cleanup code - with MOCK_S3, pageserver, extensions, safekeepers use different buckets - with LOCAL_FS, `repo_dir / "local_fs_remote_storage" / $user` is used as path, where $user is `pageserver`, `safekeeper` - no more `NeonEnvBuilder.enable_xxx_remote_storage` but one `enable_{pageserver,extensions,safekeeper}_remote_storage` Should not have any real changes. These will allow us to default to `LOCAL_FS` for pageserver on the next PR, remove `RemoteStorageKind.NOOP`, work towards #5172. Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-09-08 13:54:23 +03:00
Heikki Linnakangas	cdc65c1857	Update pg_cron to version 1.6.0 (#5252 ) This includes PostgreSQL 16 support. There are no catalog changes, so this is a drop-in replacement, no need to run "ALTER EXTENSION UPDATE".	2023-09-08 12:42:46 +03:00
Heikki Linnakangas	dac995e7e9	Update plpgsql_check extension to version v2.4.0 (#5249 ) This brings v16 support.	2023-09-08 10:46:02 +03:00
Alexander Bayandin	b80740bf9f	test_startup: increase timeout (#5238 ) ## Problem `test_runner/performance/test_startup.py::test_startup` started to fail more frequently because of the timeout. Let's increase the timeout to see the failures on the perf dashboard. ## Summary of changes - Increase timeout for`test_startup` from 600 to 900 seconds	2023-09-08 01:57:38 +01:00
Heikki Linnakangas	57c1ea49b3	Update hypopg extension to version 1.4.0 (#5245 ) The v1.4.0 includes changes to make it compile with PostgreSQL 16. The commit log doesn't call it out explicitly, but I tested it manually. v1.4.0 includes some new functions, but I tested manually that the the v1.3.1 functionality works with the v1.4.0 version of the library. That means that this doesn't break existing installations. Users can do "ALTER EXTENSION hypopg UPDATE" if they want to use the new v1.4.0 functionality, but they don't have to.	2023-09-08 03:30:11 +03:00
Heikki Linnakangas	6c31a2d342	Upgrade prefix extension to version 1.2.10 (#5244 ) This version includes trivial changes to make it compile with PostgreSQL 16. No functional changes.	2023-09-08 02:10:01 +03:00
Heikki Linnakangas	252b953f18	Upgrade postgresql-hll to version 2.18. (#5241 ) This includes PostgreSQL 16 support. No other changes, really. The extension version in the upstream was changed from 2.17 to 2.18, however, there is no difference between the catalog objects. So if you had installed 2.17 previously, it will continue to work. You can run "ALTER EXTENSION hll UPDATE", but all it will do is update the version number in the pg_extension table.	2023-09-08 02:07:17 +03:00
Heikki Linnakangas	b414360afb	Upgrade ip4r to version 2.4.2 (#5242 ) Includes PostgreSQL v16 support. No functional changes.	2023-09-08 02:06:53 +03:00
Arpad Müller	d206655a63	Make VirtualFile::{open, open_with_options, create,sync_all,with_file} async fn (#5224 ) ## Problem Once we use async file system APIs for `VirtualFile`, these functions will also need to be async fn. ## Summary of changes Makes the functions `open, open_with_options, create,sync_all,with_file` of `VirtualFile` async fn, including all functions that call it. Like in the prior PRs, the actual I/O operations are not using async APIs yet, as per request in the #4743 epic. We switch towards not using `VirtualFile` in the par_fsync module, hopefully this is only temporary until we can actually do fully async I/O in `VirtualFile`. This might cause us to exhaust fd limits in the tests, but it should only be an issue for the local developer as we have high ulimits in prod. This PR is a follow-up of #5189, #5190, #5195, and #5203. Part of #4743.	2023-09-08 00:50:50 +02:00
Heikki Linnakangas	e5adc4efb9	Upgrade h3-pg to version 4.1.3. (#5237 ) This includes v16 support.	2023-09-07 21:39:12 +03:00

1 2 3 4 5 ...

3747 Commits