rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-08 06:30:37 +00:00

Author	SHA1	Message	Date
Christian Schwarz	a0c2a85505	timeline_init_and_sync: don't hold Tenant::timelines while load_layer_map This patch inlines `initialize_with_lock` and then reorganizes the code such that we can `load_layer_map` without holding the `Tenant::timelines` lock. As a nice aside, we can get rid of the dummy() uninit mark, which has always been a terrible hack.	2023-05-25 23:01:40 +02:00
Christian Schwarz	dd0f5c4ef3	Merge remote-tracking branch 'origin/main' into problame/async-timeline-get/dont-hold-timelines-lock-inside-tenant-state-send-modify	2023-05-25 22:20:52 +02:00
Christian Schwarz	057cceb559	refactor: make timeline activation infallible (#4319 ) Timeline::activate() was only fallible because `launch_wal_receiver` was. `launch_wal_receiver` was fallible only because of some preliminary checks in `WalReceiver::start`. Turns out these checks can be shifted to the type system by delaying creatinon of the `WalReceiver` struct to the point where we activate the timeline. The changes in this PR were enabled by my previous refactoring that funneled the broker_client from pageserver startup to the activate() call sites. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-25 20:26:43 +02:00
sharnoff	ae805b985d	Bump vm-builder v0.7.3-alpha3 -> v0.8.0 (#4339 ) Routine `vm-builder` version bump, from autoscaling repo release. You can find the release notes here: https://github.com/neondatabase/autoscaling/releases/tag/v0.8.0 The changes are from v0.7.2 — most of them were already included in v0.7.3-alpha3. Of particular note: This (finally) fixes the cgroup issues, so we should now be able to scale up when we're about to run out of memory. NB: This has the effect of limit the DB's memory usage in a way it wasn't limited before. We may run into issues because of that. There is currently no way to disable that behavior, other than switching the endpoint back to the k8s-pod provisioner.	2023-05-25 09:33:18 -07:00
Joonas Koivunen	85e76090ea	test: fix ancestor is stopping flakyness (#4234 ) Flakyness most likely introduced in #4170, detected in https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4232/4980691289/index.html#suites/542b1248464b42cc5a4560f408115965/18e623585e47af33. Opted to allow it globally because it can happen in other tests as well, basically whenever compaction is enabled and we stop pageserver gracefully.	2023-05-25 16:22:58 +00:00
Alexander Bayandin	08e7d2407b	Storage: use Postgres 15 as default (#2809 )	2023-05-25 15:55:46 +01:00
Alex Chi Z	ab2757f64a	bump dependencies version (#4336 ) proceeding https://github.com/neondatabase/neon/pull/4237, this PR bumps AWS dependencies along with all other dependencies to the latest compatible semver. Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-25 10:21:15 -04:00
Christian Schwarz	e5617021a7	refactor: eliminate global storage_broker client state (#4318 ) (This is prep work to make `Timeline::activate` infallible.) This patch removes the global storage_broker client instance from the pageserver codebase. Instead, pageserver startup instantiates it and passes it down to the `Timeline::activate` function, which in turn passes it to the WalReceiver, which is the entity that actually uses it. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-25 16:47:42 +03:00
Christian Schwarz	de780d2e0f	make TenantState::{Loading,Attaching,Activating} owned by spawn_load / spawn_attach See the Mermaid diagram in the doc comment for the now-possible state transitions. The two core insights / changes are: - spawn_load and spawn_attach own the tenant state until they're done - once load()/attach() calls are done - if they failed, transition them to Broken directly (we know that there's no background activity because we didn't call activate yet) - if they succeed, call activate. We can make it infallible. How? Later. - set_broken() and set_stopping() are changed to wait for spawn_load() / spawn_attach() to finish. This sounds scary because it might hinder detach or shutdown, but actually, concurrent attach+detach, or attach+shutdown, or load+shutdown, or attach+shutdown were just racy. With this change, they're not anymore. We can add a CancellationToken stored in Tenant for load/attach and cancel it from set_stopping() or set_broken() if necessary in the future. So, why can activate() be infallible now: because we declare that spawn_load and spawn_attach own the tenant state until they're done. And we enforce that ownership using the wait_for at the start of set_stopping and set_broken.	2023-05-25 15:02:43 +02:00
Christian Schwarz	f18d9f555b	Revert "Revert "use tokio::sync:⌚:Receiver::wait_for"" This reverts commit `eaf270c648`.	2023-05-25 14:58:49 +02:00
Christian Schwarz	05a2fe08d1	Merge branch 'problame/infallible-timeline-activate/4-make-infallible' into problame/async-timeline-get/dont-hold-timelines-lock-inside-tenant-state-send-modify	2023-05-25 14:58:19 +02:00
Christian Schwarz	eaf270c648	Revert "use tokio::sync:⌚:Receiver::wait_for" This reverts commit `fe4ef121b6`.	2023-05-25 14:57:41 +02:00
Christian Schwarz	ddad0928c5	Merge branch 'problame/infallible-timeline-activate/3-funnel-storage-broker-client' into problame/infallible-timeline-activate/4-make-infallible	2023-05-25 14:53:32 +02:00
Christian Schwarz	96c550222b	apply heikki's comment suggestion	2023-05-25 14:53:20 +02:00
Christian Schwarz	cf8ff7edad	explainer comment on storage_broker::connect async weirdness	2023-05-25 14:51:48 +02:00
Christian Schwarz	83ba02b431	tenant_status: don't InternalServerError if tenant not found (#4337 ) Note this also changes the status code to the (correct) 404. Not sure if that's relevant to Console. Context: https://neondb.slack.com/archives/C04PSBP2SAF/p1684746238831449?thread_ts=1684742106.169859&cid=C04PSBP2SAF Atop #4300 because it cleans up the mgr::get_tenant() error type and I want eyes on that PR.	2023-05-25 11:38:04 +02:00
Christian Schwarz	37ecebe45b	mgr::get_tenant: distinguished error type (#4300 ) Before this patch, it would use error type `TenantStateError` which has many more error variants than can actually happen with `mgr::get_tenant`. Along the way, I also introduced `SetNewTenantConfigError` because it uses `mgr::get_tenant` and also can only fail in much fewer ways than `TenantStateError` suggests. The new `page_service.rs`'s `GetActiveTimelineError` and `GetActiveTenantError` types were necessary to avoid an `Other` variant on the `GetTenantError`. This patch is a by-product of reading code that subscribes to `Tenant::state` changes. Can't really connect it to any given project.	2023-05-25 11:37:12 +02:00
Sasha Krassovsky	6052ecee07	Add connector extension to send Role/Database updates to console (#3891 ) ## Describe your changes ## Issue ticket number and link ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [x] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-05-25 12:36:57 +03:00
Christian Schwarz	da6573f551	Merge branch 'problame/infallible-timeline-activate/3-funnel-storage-broker-client' into problame/infallible-timeline-activate/4-make-infallible	2023-05-25 10:54:30 +02:00
Christian Schwarz	2fee8c884f	Merge remote-tracking branch 'origin/main' into problame/infallible-timeline-activate/3-funnel-storage-broker-client	2023-05-25 10:54:03 +02:00
Christian Schwarz	e11ba24ec5	tenant loops: operate on the Arc<Tenant> directly (#4298 ) (Instead of going through mgr every iteration.) The `wait_for_active_tenant` function's `wait` argument could be removed because it was only used for the loop that waits for the tenant to show up in the tenants map. Since we're passing the tenant in, we now longer need to get it from the tenants map. NB that there's no guarantee that the tenant object is in the tenants map at the time the background loop function starts running. But the tenant mgr guarantees that it will be quite soon. See `tenant_map_insert` way upwards in the call hierarchy for details. This is prep work to eliminate `subscribe_for_state_updates` (PR #4299 ) Fixes: #3501	2023-05-25 10:49:09 +02:00
Christian Schwarz	fe4ef121b6	use tokio::sync:⌚:Receiver::wait_for	2023-05-25 10:44:26 +02:00
Christian Schwarz	641ca994dc	assert_eq suggestion	2023-05-25 09:55:32 +02:00
Alex Chi Z	f276f21636	ci: use eu-central-1 bucket (#4315 ) Probably increase CI success rate. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-25 00:00:21 +03:00
Alex Chi Z	7126197000	pagectl: refactor ctl and support dump kv in delta (#4268 ) This PR refactors the original page_binutils with a single tool pagectl, use clap derive for better command line parsing, and adds the dump kv tool to extract information from delta file. This helps me better understand what's inside the page server. We can add support for other types of file and more functionalities in the future. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-24 19:36:07 +03:00
Christian Schwarz	413598b19b	fix merge fallout (?)	2023-05-24 17:42:51 +02:00
Christian Schwarz	b345f32e3f	Merge branch 'problame/infallible-timeline-activate/4-make-infallible' into problame/async-timeline-get/dont-hold-timelines-lock-inside-tenant-state-send-modify	2023-05-24 17:25:35 +02:00
Christian Schwarz	69cfa9fe61	launch_wal_receiver: apply joonas's review suggestion (visibility + doc comment)	2023-05-24 17:20:03 +02:00
Christian Schwarz	2c424c8f4e	Revert "activate_timelines counter is now == not_broken_timelines.len()" not_broken_timelines is an iterator, doesn't have `len()`. This reverts commit `4001f441c0`.	2023-05-24 17:19:22 +02:00
Christian Schwarz	4001f441c0	activate_timelines counter is now == not_broken_timelines.len()	2023-05-24 17:14:49 +02:00
Christian Schwarz	ef956c47fc	make it clear that `walreceiver_status` is always used in the branch where it's produced	2023-05-24 17:12:35 +02:00
Christian Schwarz	8606b6abe5	Merge remote-tracking branch 'origin/problame/infallible-timeline-activate/3-funnel-storage-broker-client' into problame/infallible-timeline-activate/4-make-infallible	2023-05-24 17:02:18 +02:00
Christian Schwarz	732f60317b	Merge remote-tracking branch 'origin/main' into problame/infallible-timeline-activate/3-funnel-storage-broker-client	2023-05-24 16:58:25 +02:00
Christian Schwarz	afc48e2cd9	refactor responsibility for tenant/timeline activation (#4317 ) (This is prep work to make `Timeline::activate()` infallible.) The current possibility for failure in `Timeline::activate()` is the broker client's presence / absence. It should be an assert, but we're careful with these. So, I'm planning to pass in the broker client to activate(), thereby eliminating the possiblity of its absence. In the unit tests, we don't have a broker client. So, I thought I'd be in trouble because the unit tests also called `activate()` before this PR. However, closer inspection reveals a long-standing FIXME about this, which is addressed by this patch. It turns out that the unit tests don't actually need the background loops to be running. They just need the state value to be `Active`. So, for the tests, we just set it to that value but don't spawn the background loops. We'll need to revisit this if we ever do more Rust unit tests in the future. But right now, this refactoring improves the code, so, let's revisit when we get there. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-24 16:54:11 +02:00
Christian Schwarz	df52587bef	attach-time tenant config (#4255 ) This PR adds support for supplying the tenant config upon /attach. Before this change, when relocating a tenant using `/detach` and `/attach`, the tenant config after `/attach` would be the default config from `pageserver.toml`. That is undesirable for settings such as the PITR-interval: if the tenant's config on the source was `30 days` and the default config on the attach-side is `7 days`, then the first GC run would eradicate 23 days worth of PITR capability. The API change is backwards-compatible: if the body is empty, we continue to use the default config. We'll remove that capability as soon as the cloud.git code is updated to use attach-time tenant config (https://github.com/neondatabase/neon/issues/4282 keeps track of this). unblocks https://github.com/neondatabase/cloud/issues/5092 fixes https://github.com/neondatabase/neon/issues/1555 part of https://github.com/neondatabase/neon/issues/886 (Tenant Relocation) Implementation ============== The preliminary PRs for this work were (most-recent to least-recent) * https://github.com/neondatabase/neon/pull/4279 * https://github.com/neondatabase/neon/pull/4267 * https://github.com/neondatabase/neon/pull/4252 * https://github.com/neondatabase/neon/pull/4235	2023-05-24 17:46:30 +03:00
Alexander Bayandin	35bb10757d	scripts/ingest_perf_test_result.py: increase connection timeout (#4329 ) ## Problem Sometimes default connection timeout is not enough to connect to the DB with perf test results, [an example](https://github.com/neondatabase/neon/actions/runs/5064263522/jobs/9091692868#step:10:332). Similar changes were made for similar scripts: - For `scripts/flaky_tests.py` in https://github.com/neondatabase/neon/pull/4096 - For `scripts/ingest_regress_test_result.py` in https://github.com/neondatabase/neon/pull/2367 (from the very begginning) ## Summary of changes - Connection timeout increased to 30s for `scripts/ingest_perf_test_result.py`	2023-05-24 10:11:24 -04:00
Alexander Bayandin	2a3f54002c	test_runner: update dependencies (#4328 ) ## Problem `pytest` 6 truncates error messages and this is not configured. It's fixed in `pytest` 7, it prints the whole message (truncating limit is higher) if `--verbose` is set (it's set on CI). ## Summary of changes - `pytest` and `pytest` plugins are updated to their latest versions - linters (`black` and `ruff`) are updated to their latest versions - `mypy` and types are updated to their latest versions, new warnings are fixed - while we're here, allure updated its latest version as well	2023-05-24 12:47:01 +01:00
Christian Schwarz	b54431bbd3	pass the BrokerClientChannel by value & clone it as necessary It's a wrapper around an inner Arc anyways Also, this gets rid of the OnceCell	2023-05-24 12:29:05 +02:00
Christian Schwarz	def5eb8542	Merge branch 'problame/infallible-timeline-activate/2-pushup-tenant-and-timeline-activation' into problame/infallible-timeline-activate/3-funnel-storage-broker-client	2023-05-24 11:57:37 +02:00
Christian Schwarz	07da786ed3	apply joonas's suggestion to use parent: None + follows_from	2023-05-24 11:56:26 +02:00
Christian Schwarz	75c3c43b2e	don't unwrap() the `activate()` result in spawn_load / spawn_attach	2023-05-24 11:36:07 +02:00
Christian Schwarz	bdf03eab58	Merge branch 'problame/infallible-timeline-activate/2-pushup-tenant-and-timeline-activation' into problame/infallible-timeline-activate/3-funnel-storage-broker-client	2023-05-24 11:32:38 +02:00
Christian Schwarz	32c85fa87a	Merge remote-tracking branch 'origin/main' into problame/infallible-timeline-activate/2-pushup-tenant-and-timeline-activation	2023-05-24 11:31:00 +02:00
Joonas Koivunen	f3769d45ae	chore: upgrade tokio to 1.28.1 (#4294 ) no major changes, but this is the most recent LTS release and will be required by #4292.	2023-05-24 08:15:39 +03:00
Arseny Sher	c200ebc096	proxy: log endpoint name everywhere. Checking out proxy logs for the endpoint is a frequent (often first) operation during user issues investigation; let's remove endpoint id -> session id mapping annoying extra step here.	2023-05-24 09:11:23 +04:00
Konstantin Knizhnik	417f37b2e8	Pass set of wanted image layers from GC to compaction (#3673 ) ## Describe your changes Right now the only criteria for image layer generation is number of delta layer since last image layer. If we have "stairs" layout of delta layers (see link below) then it can happen that there a lot of old delta layers which can not be reclaimed by GC because are not fully covered with image layers. This PR constructs list of "wanted" image layers in GC (which image layers are needed to be able to remove old layers) and pass this list to compaction task which performs generation of image layers. So right now except deltas count criteria we also take in account "wishes" of GC. ## Issue ticket number and link See https://neondb.slack.com/archives/C033RQ5SPDH/p1676914249982519 ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-05-24 08:01:41 +03:00
sharnoff	7f1973f8ac	bump vm-builder, use Neon-specific version (#4155 ) In the v0.6.0 release, vm-builder was changed to be Neon-specific, so it's handling all the stuff that Dockerfile.vm-compute-node used to do. This commit bumps vm-builder to v0.7.3-alpha3.	2023-05-23 15:20:20 -07:00
Christian Schwarz	00f7fc324d	tenant_map_insert: don't expose the vacant entry to the closure (#4316 ) This tightens up the API a little. Byproduct of some refactoring work that I'm doing right now.	2023-05-23 15:16:12 -04:00
Christian Schwarz	b2e0c58a8c	Merge branch 'problame/infallible-timeline-activate/4-make-infallible' into problame/async-timeline-get/dont-hold-timelines-lock-inside-tenant-state-send-modify	2023-05-23 20:44:34 +02:00
Christian Schwarz	94f30f0660	Merge branch 'problame/infallible-timeline-activate/3-funnel-storage-broker-client' into problame/infallible-timeline-activate/4-make-infallible	2023-05-23 20:44:12 +02:00

1 2 3 4 5 ...

3255 Commits