rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-25 17:10:38 +00:00

Author	SHA1	Message	Date
Christian Schwarz	7b9ce9510f	review validation	2023-09-15 14:57:21 +02:00
Christian Schwarz	556211b701	review comments	2023-09-14 14:42:53 +02:00
Christian Schwarz	dc1c6b28db	move lsn visibility related stuff into separate module	2023-09-14 14:42:53 +02:00
Christian Schwarz	1a92a107f6	move DeletionList stuff into separate module	2023-09-14 14:42:53 +02:00
Christian Schwarz	ef9e081866	Revert "unimpl the parts that support !generation.is_none()" This reverts commit 641130a959d05aaf1708a3fa3a107341474ace4d.	2023-09-14 14:42:53 +02:00
Christian Schwarz	d62723ea57	unimpl the parts that support !generation.is_none()	2023-09-14 14:42:51 +02:00
John Spray	0ba442f1e0	Update pageserver/src/lib.rs Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-12 16:54:35 +01:00
John Spray	7d4dff0738	Get rid of a couple of spurious `mut`	2023-09-12 16:36:39 +01:00
John Spray	20a3a9be70	refactor Arc<dyn> to generics for control plane client mocking	2023-09-12 16:02:58 +01:00
John Spray	db9a49ed91	Refactor remote_consistent_lsn updates to use an atomic instead of a channel	2023-09-12 15:50:26 +01:00
John Spray	28d7f6b643	reinstate log line per layer in deletion scheduling	2023-09-12 15:35:20 +01:00
John Spray	7b05ec2825	pageserver: refactor http deletion_queue_flush	2023-09-12 15:27:27 +01:00
John Spray	b76f08e863	deletion queue: wrap workers in an opaque struct	2023-09-12 15:03:09 +01:00
John Spray	fad9c45f11	pageserver: fix executor flush	2023-09-12 09:46:05 +01:00
John Spray	ee1fba8729	pageserver: fix deletion queue flush on shutdown	2023-09-12 09:18:18 +01:00
John Spray	a5aa6652c6	clippy deletion queue	2023-09-11 18:10:37 +01:00
John Spray	34007e12a1	libs: add Generation::next	2023-09-11 18:06:40 +01:00
John Spray	278eb70522	tests: add test_pageserver_generations	2023-09-11 18:06:40 +01:00
John Spray	da96d34fa2	tests: optional delimiter to list_prefix	2023-09-11 18:06:40 +01:00
John Spray	8b2160793a	tests: update existing deletion tests	2023-09-11 18:06:40 +01:00
John Spray	9a4f9a1b7c	pageserver: add a const constructor for Key, for use in test consts	2023-09-11 18:06:40 +01:00
John Spray	412819ac20	control_plane: fix attach_hook in attachment_service	2023-09-11 18:06:40 +01:00
John Spray	1f43fed305	pageserver: add flush admin API	2023-09-11 18:06:40 +01:00
John Spray	9ccad00474	libs: add ApiError::ShuttingDown	2023-09-11 18:06:40 +01:00
John Spray	960dd9a206	pageserver: use deferred updates to remote_consistent_lsn	2023-09-11 18:06:40 +01:00
John Spray	37f4972291	pageserver: cut over to using deletion queue	2023-09-11 18:06:40 +01:00
John Spray	38b41e5c34	pageserver: wire deletion queue through to tenant	2023-09-11 18:06:40 +01:00
John Spray	eb464d5322	pageserver: instantiate deletion queue	2023-09-11 18:06:40 +01:00
John Spray	60241567ce	pageserver: add deletion queue	2023-09-11 18:06:40 +01:00
John Spray	b6183a9e65	pageserver: refactor ControlPlaneClient into a mockable trait	2023-09-11 17:59:30 +01:00
John Spray	6e0b977bc8	libs: add RemotePath::strip_prefix	2023-09-11 17:59:30 +01:00
John Spray	3e09cabb6a	libs: implement Generation Into<u32>	2023-09-11 17:59:30 +01:00
John Spray	145685201a	pageserver: add validate to control plane client	2023-09-11 17:59:30 +01:00
John Spray	d545e3f03b	pageserver: add deletion queue metrics	2023-09-11 17:59:30 +01:00
John Spray	6a0cc9e526	pageserver: add deletion path definitions to config	2023-09-11 17:59:30 +01:00
John Spray	1fed35a481	pageserver/tenant: remote_layer_path take Generation instead of layer metadata	2023-09-11 17:59:30 +01:00
John Spray	3d6c5c8d37	pageserver: update unit tests to keep TenantHarness alive This controls the lifetime of the MockDeletionQueue.	2023-09-11 17:59:30 +01:00
John Spray	d5c9bfa75e	pageserver: enable disabling control_plane_api with an override This is just for testing. Eventually we'll remove this after everything is upgraded.	2023-09-11 17:59:30 +01:00
John Spray	8d5d36ed12	remote_storage: expose MAX_KEYS_PER_DELETE constant	2023-09-11 17:59:30 +01:00
John Spray	9c64d95467	remote_storage: implement Serialize/Deserialize for RemotePath	2023-09-11 17:59:30 +01:00
Arpad Müller	a18d6d9ae3	Make File opening in VirtualFile async-compatible (#5280 ) ## Problem Previously, we were using `observe_closure_duration` in `VirtualFile` file opening code, but this doesn't support async open operations, which we want to use as part of #4743. ## Summary of changes * Move the duration measurement from the `with_file` macro into a `observe_duration` macro. * Some smaller drive-by fixes to replace the old strings with the new variant names introduced by #5273 Part of #4743, follow-up of #5247.	2023-09-11 18:41:08 +02:00
Arpad Müller	76cc87398c	Use tokio locks in VirtualFile and turn with_file into macro (#5247 ) ## Problem For #4743, we want to convert everything up to the actual I/O operations of `VirtualFile` to `async fn`. ## Summary of changes This PR is the last change in a series of changes to `VirtualFile`: #5189, #5190, #5195, #5203, and #5224. It does the last preparations before the I/O operations are actually made async. We are doing the following things: * First, we change the locks for the file descriptor cache to tokio's locks that support Send. This is important when one wants to hold locks across await points (which we want to do), otherwise the Future won't be Send. Also, one shouldn't generally block in async code as executors don't like that. * Due to the lock change, we now take an approach for the `VirtualFile` destructors similar to the one proposed by #5122 for the page cache, to use `try_write`. Similarly to the situation in the linked PR, one can make an argument that if we are in the destructor and the slot has not been reused yet, we are the only user accessing the slot due to owning the lock mutably. It is still possible that we are not obtaining the lock, but the only cause for that is the clock algorithm touching the slot, which should be quite an unlikely occurence. For the instance of `try_write` failing, we spawn an async task to destroy the lock. As just argued however, most of the time the code path where we spawn the task should not be visited. * Lastly, we split `with_file` into a macro part, and a function part that contains most of the logic. The function part returns a lock object, that the macro uses. The macro exists to perform the operation in a more compact fashion, saving code from putting the lock into a variable and then doing the operation while measuring the time to run it. We take the locks approach because Rust has no support for async closures. One can make normal closures return a future, but that approach gets into lifetime issues the moment you want to pass data to these closures via parameters that has a lifetime (captures work). For details, see [this](https://smallcultfollowing.com/babysteps/blog/2023/03/29/thoughts-on-async-closures/) and [this](https://users.rust-lang.org/t/function-that-takes-an-async-closure/61663) link. In #5224, we ran into a similar problem with the `test_files` function, and we ended up passing the path and the `OpenOptions` by-value instead of by-ref, at the expense of a few extra copies. This can be done as the data is cheaply copyable, and we are in test code. But here, we are not, and while `File::try_clone` exists, it [issues system calls internally](`1e746d7741/library/std/src/os/fd/owned.rs (L94-L111)`). Also, it would allocate an entirely new file descriptor, something that the fd cache was built to prevent. * We change the `STORAGE_IO_TIME` metrics to support async. Part of #4743.	2023-09-11 17:35:05 +02:00
bojanserafimov	c0ed362790	Measure pageserver wal recovery time and fix flush() method (#5240 )	2023-09-11 09:46:06 -04:00
duguorong009	d7fa2dba2d	fix(pageserver): update the `STORAGE_IO_TIME` metrics to avoid expensive operations (#5273 ) Introduce the `StorageIoOperation` enum, `StorageIoTime` struct, and `STORAGE_IO_TIME_METRIC` static which provides lockless access to histograms consumed by `VirtualFile`. Closes #5131 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-11 14:58:15 +03:00
Joonas Koivunen	a55a78a453	Misc test flakyness fixes (#5233 ) Assorted flakyness fixes from #5198, might not be flaky on `main`. Migrate some tests using neon_simple_env to just neon_env_builder and using initial_tenant to make flakyness understanding easier. (Did not understand the flakyness of `test_timeline_create_break_after_uninit_mark`.) `test_download_remote_layers_api` is flaky because we have no atomic "wait for WAL, checkpoint, wait for upload and do not receive any more WAL". `test_tenant_size` fixes are just boilerplate which should had always existed; we should wait for the tenant to be active. similarly for `test_timeline_delete`. `test_timeline_size_post_checkpoint` fails often for me with reading zero from metrics. Give it a few attempts.	2023-09-11 11:42:49 +03:00
Rahul Modpur	999fe668e7	Ack tenant detach before local files are deleted (#5211 ) ## Problem Detaching a tenant can involve many thousands of local filesystem metadata writes, but the control plane would benefit from us not blocking detach/delete responses on these. ## Summary of changes After rename of local tenant directory ack tenant detach and delete tenant directory in background #5183 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-09-10 22:59:51 +03:00
Alexander Bayandin	d33e1b1b24	approved-for-ci-run.yml: use token to checkout the repo (#5266 ) ## Problem Another thing I overlooked regarding'approved-for-ci-run`: - When we create a PR, the action is associated with @vipvap and this triggers the pipeline — this is good. - When we update the PR by force-pushing to the branch, the action is associated with @github-actions, which doesn't trigger a pipeline — this is bad. Initially spotted in #5239 / #5211 ([link](https://github.com/neondatabase/neon/actions/runs/6122249456/job/16633919558?pr=5239)) — `check-permissions` should not fail. ## Summary of changes - Use `CI_ACCESS_TOKEN` to check out the repo (I expect this token will be reused in the following `git push`)	2023-09-10 20:12:38 +01:00
Alexander Bayandin	15fd188fd6	Fix GitHub Autocomment for `ci-run/pr`s (#5268 ) ## Problem When PR `ci-run/pr-*` is created the GitHub Autocomment with test results are supposed to be posted to the original PR, currently, this doesn't work. I created this PR from a personal fork to debug and fix the issue. ## Summary of changes - `scripts/comment-test-report.js`: use `pull_request.head` instead of `pull_request.base` 🤦	2023-09-10 20:06:10 +01:00
Alexander Bayandin	34e39645c4	GitHub Workflows: add actionlint (#5265 ) ## Problem Add a CI pipeline that checks GitHub Workflows with https://github.com/rhysd/actionlint (it uses `shellcheck` for shell scripts in steps) To run it locally: `SHELLCHECK_OPTS=--exclude=SC2046,SC2086 actionlint` ## Summary of changes - Add `.github/workflows/actionlint.yml` - Fix actionlint warnings	2023-09-10 20:05:07 +01:00
Em Sharnoff	1cac923af8	vm-monitor: Rate-limit upscale requests (#5263 ) Some VMs, when already scaled up as much as possible, end up spamming the autoscaler-agent with upscale requests that will never be fulfilled. If postgres is using memory greater than the cgroup's memory.high, it can emit new memory.high events 1000 times per second, which... just means unnecessary load on the rest of the system. This changes the vm-monitor so that we skip sending upscale requests if we already sent one within the last second, to avoid spamming the autoscaler-agent. This matches previous behavior that the vm-informant hand.	2023-09-10 20:33:53 +03:00

1 2 3 4 5 ...

3768 Commits