rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-08 22:50:37 +00:00

Author	SHA1	Message	Date
Christian Schwarz	64efcbf8da	Merge remote-tracking branch 'origin/main' into problame/async-timeline-get/refactor-timeline-initialization-to-avoid-holding-tenants-timelines-lock	2023-06-07 18:04:28 +02:00
Alex Chi Z	2e687bca5b	refactor: use LayerDesc in layer map (part 1) (#4408 ) ## Problem part of https://github.com/neondatabase/neon/issues/4392 ## Summary of changes This PR adds a new HashMap that maps persistent layer desc to the layer object inside LayerMap. Originally I directly went towards adding such layer cache in Timeline, but the changes are too many and cannot be reviewed as a reasonably-sized PR. Therefore, we take this intermediate step to change part of the codebase to use persistent layer desc, and come up with other PRs to move this hash map of layer desc to the timeline struct. Also, file_size is now part of the layer desc. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-06-07 18:28:18 +03:00
Dmitry Rodionov	1a1019990a	map TenantState::Broken to TenantAttachmentStatus::Failed (#4371 ) ## Problem Attach failures are not reported in public part of the api (in `attachment_status` field of TenantInfo). ## Summary of changes Expose TenantState::Broken as TenantAttachmentStatus::Failed In the way its written Failed status will be reported even if no attachment happened. (I e if tenant become broken on startup). This is in line with other members. I e Active will be resolved to Attached even if no actual attach took place. This can be tweaked if needed. At the current stage it would be overengineering without clear motivation resolves #4344	2023-06-07 18:25:30 +03:00
Christian Schwarz	a97b21ab13	Merge remote-tracking branch 'origin/main' into problame/async-timeline-get/refactor-timeline-initialization-to-avoid-holding-tenants-timelines-lock	2023-06-07 14:47:19 +02:00
Joonas Koivunen	5761190e0d	feat: three phased startup order (#4399 ) Initial logical size calculation could still hinder our fast startup efforts in #4397. See #4183. In deployment of 2023-06-06 about a 200 initial logical sizes were calculated on hosts which took the longest to complete initial load (12s). Implements the three step/tier initialization ordering described in #4397: 1. load local tenants 2. do initial logical sizes per walreceivers for 10s 3. background tasks Ordering is controlled by: - waiting on `utils::completion::Barrier`s on background tasks - having one attempt for each Timeline to do initial logical size calculation - `pageserver/src/bin/pageserver.rs` releasing background jobs after timeout or completion of initial logical size calculation The timeout is there just to safeguard in case a legitimate non-broken timeline initial logical size calculation goes long. The timeout is configurable, by default 10s, which I think would be fine for production systems. In the test cases I've been looking at, it seems that these steps are completed as fast as possible. Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-06-07 14:29:23 +03:00
Joonas Koivunen	0cef7e977d	refactor: just one way to shutdown a tenant (#4407 ) We have 2 ways of tenant shutdown, we should have just one. Changes are mostly mechanical simple refactorings. Added `warn!` on the "shutdown all remaining tasks" should trigger test failures in the between time of not having solved the "tenant/timeline owns all spawned tasks" issue. Cc: #4327.	2023-06-06 15:30:55 +03:00
Joonas Koivunen	18a9d47f8e	test: restore NotConnected being allowed globally (#4426 ) Flakyness introduced by #4402 evidence [^1]. I had assumed the NotConnected would had been an expected io error, but it's not. Restore the global `allowed_error`. [^1]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4407/5185897757/index.html#suites/82004ab4e3720b47bf78f312dabe7c55/14f636d0ecd3939d/	2023-06-06 13:51:39 +03:00
Joonas Koivunen	77598f5d0a	Better walreceiver logging (#4402 ) walreceiver logs are a bit hard to understand because of partial span usage, extra messages, ignored errors popping up as huge stacktraces. Fixes #3330 (by spans, also demote info -> debug). - arrange walreceivers spans into a hiearchy: - `wal_connection_manager{tenant_id, timeline_id}` -> `connection{node_id}` -> `poller` - unifies the error reporting inside `wal_receiver`: - All ok errors are now `walreceiver connection handling ended: {e:#}` - All unknown errors are still stacktraceful task_mgr reported errors with context `walreceiver connection handling failure` - Remove `connect` special casing, was: `DB connection stream finished` for ok errors - Remove `done replicating` special casing, was `Replication stream finished` for ok errors - lowered log levels for (non-exhaustive list): - `WAL receiver manager started, connecting to broker` (at startup) - `WAL receiver shutdown requested, shutting down` (at shutdown) - `Connection manager loop ended, shutting down` (at shutdown) - `sender is dropped while join handle is still alive` (at lucky shutdown, see #2885) - `timeline entered terminal state {:?}, stopping wal connection manager loop` (at shutdown) - `connected!` (at startup) - `Walreceiver db connection closed` (at disconnects?, was without span) - `Connection cancelled` (at shutdown, was without span) - `observed timeline state change, new state is {new_state:?}` (never after Timeline::activate was made infallible) - changed: - `Timeline dropped state updates sender, stopping wal connection manager loop` - was out of date; sender is not dropped but `Broken \| Stopping` state transition - also made `debug!` - `Timeline dropped state updates sender before becoming active, stopping wal connection manager loop` - was out of date: sender is again not dropped but `Broken \| Stopping` state transition - also made `debug!` - log fixes: - stop double reporting panics via JoinError	2023-06-05 17:35:23 +03:00
Joonas Koivunen	8142edda01	test: Less flaky gc (#4416 ) Solves a flaky test error in the wild[^1] by: - Make the gc shutdown signal reading an `allowed_error` - Note the gc shutdown signal readings as being in `allowed_error`s - Allow passing tenant conf to init_start to avoid unncessary tenants [^1]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-4399/5176432780/index.html#suites/b97efae3a617afb71cb8142f5afa5224/2cd76021ea011f93	2023-06-05 15:43:52 +03:00
Joonas Koivunen	8caef2c0c5	fix: delay `eviction_task` as well (#4397 ) As seen on deployment of 2023-06-01 release, times were improving but there were some outliers caused by: - timelines `eviction_task` starting while activating and running imitation - timelines `initial logical size` calculation This PR fixes it so that `eviction_task` is delayed like other background tasks fixing an oversight from earlier #4372. After this PR activation will be two phases: 1. load and activate tenants AND calculate some initial logical sizes 2. rest of initial logical sizes AND background tasks - compaction, gc, disk usage based eviction, timelines `eviction_task`, consumption metrics	2023-06-05 09:37:53 +03:00
Heikki Linnakangas	9787227c35	Shield HTTP request handlers from async cancellations. (#4314 ) We now spawn a new task for every HTTP request, and wait on the JoinHandle. If Hyper drops the Future, the spawned task will keep running. This protects the rest of the pageserver code from unexpected async cancellations. This creates a CancellationToken for each request and passes it to the handler function. If the HTTP request is dropped by the client, the CancellationToken is signaled. None of the handler functions make use for the CancellationToken currently, but they now they could. The CancellationToken arguments also work like documentation. When you're looking at a function signature and you see that it takes a CancellationToken as argument, it's a nice hint that the function might run for a long time, and won't be async cancelled. The default assumption in the pageserver is now that async functions are not cancellation-safe anyway, unless explictly marked as such, but this is a nice extra reminder. Spawning a task for each request is OK from a performance point of view because spawning is very cheap in Tokio, and none of our HTTP requests are very performance critical anyway. Fixes issue #3478	2023-06-02 08:28:13 -04:00
Alex Chi Z	66cdba990a	refactor: use PersistentLayerDesc for persistent layers (#4398 ) ## Problem Part of https://github.com/neondatabase/neon/issues/4373 ## Summary of changes This PR adds `PersistentLayerDesc`, which will be used in LayerMap mapping and probably layer cache. After this PR and after we change LayerMap to map to layer desc, we can safely drop RemoteLayerDesc. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-06-01 22:06:28 +03:00
Alex Chi Z	82484e8241	pgserver: add more metrics for better observability (#4323 ) ## Problem This PR includes doc changes to the current metrics as well as adding new metrics. With the new set of metrics, we can quantitatively analyze the read amp., write amp. and space amp. in the system, when used together with https://github.com/neondatabase/neonbench close https://github.com/neondatabase/neon/issues/4312 ref https://github.com/neondatabase/neon/issues/4347 compaction metrics TBD, a novel idea is to print L0 file number and number of layers in the system, and we can do this in the future when we start working on compaction. ## Summary of changes * Add `READ_NUM_FS_LAYERS` for computing read amp. * Add `MATERIALIZED_PAGE_CACHE_HIT_UPON_REQUEST`. * Add `GET_RECONSTRUCT_DATA_TIME`. GET_RECONSTRUCT_DATA_TIME + RECONSTRUCT_TIME + WAIT_LSN_TIME should be approximately total time of reads. * Add `5.0` and `10.0` to `STORAGE_IO_TIME_BUCKETS` given some fsync runs slow (i.e., > 1s) in some cases. * Some `WAL_REDO` metrics are only used when Postgres is involved in the redo process. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-06-01 21:46:04 +03:00
bojanserafimov	330083638f	Fix stale and misleading comment in LayerMap (#4297 )	2023-06-01 05:04:46 +03:00
Konstantin Knizhnik	952d6e43a2	Add pageserver parameter forced_image_creation_limit which can be used… (#4353 ) This parameter can be use to restrict number of image layers generated because of GC request (wanted image layers). Been set to zero it completely eliminates creation of such image layers. So it allows to avoid extra storage consumption after merging #3673 ## Problem PR #3673 forces generation of missed image layers. So i short term is cause cause increase (in worst case up to two times) size of storage. It was intended (by me) that GC period is comparable with PiTR interval. But looks like it is not the case now - GC is performed much more frequently. It may cause the problem with space exhaustion: GC forces new image creation while large PiTR still prevent GC from collecting old layers. ## Summary of changes Add new pageserver parameter` forced_image_creation_limit` which restrict number of created image layers which are requested by GC. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-05-31 21:37:20 +03:00
bojanserafimov	b6447462dc	Fix layer map correctness bug (#4342 )	2023-05-31 12:23:00 -04:00
Joonas Koivunen	f4db85de40	Continued startup speedup (#4372 ) Startup continues to be slow, work towards to alleviate it. Summary of changes: - pretty the functional improvements from #4366 into `utils::completion::{Completion, Barrier}` - extend "initial load completion" usage up to tenant background tasks - previously only global background tasks - spawn_blocking the tenant load directory traversal - demote some logging - remove some unwraps - propagate some spans to `spawn_blocking` Runtime effects should be major speedup to loading, but after that, the `BACKGROUND_RUNTIME` will be blocked for a long time (minutes). Possible follow-ups: - complete initial tenant sizes before allowing background tasks to block the `BACKGROUND_RUNTIME`	2023-05-30 16:25:07 +03:00
Joonas Koivunen	db14355367	revert: static global init logical size limiter (#4368 ) added in #4366. revert for testing without it; it may have unintenteded side-effects, and it's very difficult to understand the results from the 10k load testing environments. earlier results: https://github.com/neondatabase/neon/pull/4366#issuecomment-1567491064	2023-05-30 10:40:37 +03:00
Joonas Koivunen	cb83495744	try: startup speedup (#4366 ) Startup can take a long time. We suspect it's the initial logical size calculations. Long term solution is to not block the tokio executors but do most of I/O in spawn_blocking. See: #4025, #4183 Short-term solution to above: - Delay global background tasks until initial tenant loads complete - Just limit how many init logical size calculations can we have at the same time to `cores / 2` This PR is for trying in staging.	2023-05-29 21:48:38 +03:00
Christian Schwarz	f4f300732a	refactor TenantState transitions (#4321 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). The motivation is to avoid locking `Tenant::timelines` in places that can't be `async`, because in #4333 we want to convert Tenant::timelines from `std::sync::Mutex` to `tokio::sync::Mutex`. But, the changes here are useful in general because they clean up & document tenant state transitions. That also paves the way for #4350, which is an alternative to #4333 that refactors the pageserver code so that we can keep the `Tenant::timelines` mutex sync. This patch consists of the following core insights and changes: * spawn_load and spawn_attach own the tenant state until they're done * once load()/attach() calls are done ... * if they failed, transition them to Broken directly (we know that there's no background activity because we didn't call activate yet) * if they succeed, call activate. We can make it infallible. How? Later. * set_broken() and set_stopping() are changed to wait for spawn_load() / spawn_attach() to finish. * This sounds scary because it might hinder detach or shutdown, but actually, concurrent attach+detach, or attach+shutdown, or load+shutdown, or attach+shutdown were just racy before this PR. So, with this change, they're not anymore. In the future, we can add a `CancellationToken` stored in Tenant to cancel `load` and `attach` faster, i.e., make `spawn_load` / `spawn_attach` transition them to Broken state sooner. See the doc comments on TenantState for the state transitions that are now possible. It might seem scary, but actually, this patch reduces the possible state transitions. We introduce a new state `TenantState::Activating` to avoid grabbing the `Tenant::timelines` lock inside the `send_modify` closure. These were the humble beginnings of this PR (see Motivation section), and I think it's still the right thing to have this `Activating` state, even if we decide against async `Tenant::timelines` mutex. The reason is that `send_modify` locks internally, and by moving locking of Tenant::timelines out of the closure, the internal locking of `send_modify` becomes a leaf of the lock graph, and so, we eliminate deadlock risk. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-29 17:52:41 +03:00
Heikki Linnakangas	2d6a022bb8	Don't allow two timeline_delete operations to run concurrently. (#4313 ) If the timeline is already being deleted, return an error. We used to notice the duplicate request and error out in persist_index_part_with_deleted_flag(), but it's better to detect it earlier. Add an explicit lock for the deletion. Note: This doesn't do anything about the async cancellation problem (github issue #3478): if the original HTTP request dropped, because the client disconnected, the timeline deletion stops half-way through the operation. That needs to be fixed, too, but that's a separate story. (This is a simpler replacement for PR #4194. I'm also working on the cancellation shielding, see PR #4314.)	2023-05-27 15:55:43 +03:00
Heikki Linnakangas	2cdf07f12c	Refactor RequestSpan into a function. Previously, you used it like this: \|r\| RequestSpan(my_handler).handle(r) But I don't see the point of the RequestSpan struct. It's just a wrapper around the handler function. With this commit, the call becomes: \|r\| request_span(r, my_handler) Which seems a little simpler. At first I thought that the RequestSpan struct would allow "chaining" other kinds of decorators like RequestSpan, so that you could do something like this: \|r\| CheckPermissions(RequestSpan(my_handler)).handle(r) But it doesn't work like that. If each of those structs wrap a handler function, it would actually look like this: \|r\| CheckPermissions(\|r\| RequestSpan(my_handler).handle(r))).handle(r) This commit doesn't make that kind of chaining any easier, but seems a little more straightforward anyway.	2023-05-27 11:47:22 +03:00
Alex Chi Z	4e359db4c7	pgserver: spawn_blocking in compaction (#4265 ) Compaction is usually a compute-heavy process and might affect other futures running on the thread of the compaction. Therefore, we add `block_in_place` as a temporary solution to avoid blocking other futures on the same thread as compaction in the runtime. As we are migrating towards a fully-async-style pageserver, we can revert this change when everything is async and when we move compaction to a separate runtime. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-26 17:15:47 -04:00
Christian Schwarz	9d3267e474	clippy continued	2023-05-26 19:33:11 +02:00
Christian Schwarz	6b25cb5030	clippy	2023-05-26 18:36:13 +02:00
Christian Schwarz	f91ad65fb3	Merge branch 'problame/async-timeline-get/dont-hold-timelines-lock-inside-tenant-state-send-modify' into problame/async-timeline-get/refactor-timeline-initialization-to-avoid-holding-tenants-timelines-lock	2023-05-26 18:24:23 +02:00
Christian Schwarz	9a4789ec73	demote warn line to info-level, as the log line in set_stopping() is also info!() This should fix the faile regress tests that barked on allowed_errors	2023-05-26 18:22:41 +02:00
Christian Schwarz	72159ee686	Merge remote-tracking branch 'origin/main' into problame/async-timeline-get/dont-hold-timelines-lock-inside-tenant-state-send-modify	2023-05-26 18:03:35 +02:00
Christian Schwarz	be74662d05	re-introduce the check for 0 layers, based on cause	2023-05-26 17:52:26 +02:00
Christian Schwarz	e7c4ef9f4f	don't hold TENANTS lock while waiting for set_stopping()	2023-05-26 17:46:09 +02:00
Christian Schwarz	13d3f4c29f	set_stopping(): report in result if not transitioning to Stopping	2023-05-26 17:46:09 +02:00
Christian Schwarz	ba3e3bdddf	clippy	2023-05-26 17:07:15 +02:00
Christian Schwarz	71f9bbef0d	fix the test timeline creation functions	2023-05-26 16:01:45 +02:00
Christian Schwarz	4680f8c60b	finish WIP: keep the real timeline from create_empty_timeline outside of timelines map until it has finished filling	2023-05-26 15:29:19 +02:00
Heikki Linnakangas	a560b28829	Make new tenant/timeline IDs mandatory in create APIs. (#4304 ) We used to generate the ID, if the caller didn't specify it. That's bad practice, however, because network is never fully reliable, so it's possible we create a new tenant but the caller doesn't know about it, and because it doesn't know the tenant ID, it has no way of retrying or checking if it succeeded. To discourage that, make it mandatory. The web control plane has not relied on the auto-generation for a long time.	2023-05-26 16:19:36 +03:00
Christian Schwarz	3c1fc2617c	WIP	2023-05-26 14:24:23 +02:00
Christian Schwarz	60cc197ce3	fix test_timeline_create_break_after_uninit_mark (the refactoring added .context())	2023-05-26 10:19:24 +02:00
Christian Schwarz	609a929968	instrument shutdown_all_tenants code path, include timeline_id in logs if failed to flush This can be extracted into an independent commit.	2023-05-26 10:12:33 +02:00
Christian Schwarz	b09beaa4fe	log while waiting for tenant to finish activation	2023-05-26 09:34:12 +02:00
Christian Schwarz	122e23071b	fix the tests (commenting out too-conservative "Timeline has no ancestor and no layer files" assert)	2023-05-26 09:23:26 +02:00
Christian Schwarz	696c6ed6ff	fix cfg(test) code to the extent that clippy passes	2023-05-26 08:49:42 +02:00
Christian Schwarz	0874e27023	refactor timeline initialization High-level ideas: - placeholder Timeline object in timelines map during a timeline creation - the timeline creations (branch, bootstrap, import_from_basebackup) prepare durable state (on-disk & remote)state, if necessary using _another_ _temporary_ Timeline object - once the timeline creations have prepared the durable state, they use the normal load routine (load_local_timeline) that is also used during pageserver startup - Once the loading is done, we replace the placheolder timeline object with the real one	2023-05-25 23:01:40 +02:00
Christian Schwarz	6fe39ecbf7	add ability to have fake metrics (needed in next patch so we can have to Timeline objects with the same id in memory)	2023-05-25 23:01:40 +02:00
Christian Schwarz	a0c2a85505	timeline_init_and_sync: don't hold Tenant::timelines while load_layer_map This patch inlines `initialize_with_lock` and then reorganizes the code such that we can `load_layer_map` without holding the `Tenant::timelines` lock. As a nice aside, we can get rid of the dummy() uninit mark, which has always been a terrible hack.	2023-05-25 23:01:40 +02:00
Christian Schwarz	dd0f5c4ef3	Merge remote-tracking branch 'origin/main' into problame/async-timeline-get/dont-hold-timelines-lock-inside-tenant-state-send-modify	2023-05-25 22:20:52 +02:00
Christian Schwarz	057cceb559	refactor: make timeline activation infallible (#4319 ) Timeline::activate() was only fallible because `launch_wal_receiver` was. `launch_wal_receiver` was fallible only because of some preliminary checks in `WalReceiver::start`. Turns out these checks can be shifted to the type system by delaying creatinon of the `WalReceiver` struct to the point where we activate the timeline. The changes in this PR were enabled by my previous refactoring that funneled the broker_client from pageserver startup to the activate() call sites. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-25 20:26:43 +02:00
Alexander Bayandin	08e7d2407b	Storage: use Postgres 15 as default (#2809 )	2023-05-25 15:55:46 +01:00
Christian Schwarz	e5617021a7	refactor: eliminate global storage_broker client state (#4318 ) (This is prep work to make `Timeline::activate` infallible.) This patch removes the global storage_broker client instance from the pageserver codebase. Instead, pageserver startup instantiates it and passes it down to the `Timeline::activate` function, which in turn passes it to the WalReceiver, which is the entity that actually uses it. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-25 16:47:42 +03:00
Christian Schwarz	de780d2e0f	make TenantState::{Loading,Attaching,Activating} owned by spawn_load / spawn_attach See the Mermaid diagram in the doc comment for the now-possible state transitions. The two core insights / changes are: - spawn_load and spawn_attach own the tenant state until they're done - once load()/attach() calls are done - if they failed, transition them to Broken directly (we know that there's no background activity because we didn't call activate yet) - if they succeed, call activate. We can make it infallible. How? Later. - set_broken() and set_stopping() are changed to wait for spawn_load() / spawn_attach() to finish. This sounds scary because it might hinder detach or shutdown, but actually, concurrent attach+detach, or attach+shutdown, or load+shutdown, or attach+shutdown were just racy. With this change, they're not anymore. We can add a CancellationToken stored in Tenant for load/attach and cancel it from set_stopping() or set_broken() if necessary in the future. So, why can activate() be infallible now: because we declare that spawn_load and spawn_attach own the tenant state until they're done. And we enforce that ownership using the wait_for at the start of set_stopping and set_broken.	2023-05-25 15:02:43 +02:00
Christian Schwarz	f18d9f555b	Revert "Revert "use tokio::sync:⌚:Receiver::wait_for"" This reverts commit `eaf270c648`.	2023-05-25 14:58:49 +02:00

1 2 3 4 5 ...

1394 Commits