rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-28 18:40:38 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	8e6b27bf7c	fix: avoid busy loop on replacement failure (#3613 ) Add an AtomicBool per RemoteLayer, use it to mark together with closed semaphore that remotelayer is unusable until restart or ignore+load. https://github.com/neondatabase/neon/issues/3533#issuecomment-1431481554	2023-02-17 14:15:29 +02:00
Anastasia Lubennikova	0d3aefb274	Only use active timelines in synthetic_size calculation	2023-02-16 17:58:53 +02:00
Anastasia Lubennikova	d9ba3c5f5e	Revert "Add debug messages around timeline.get_current_logical_size" This reverts commit `a5ce2b5330`.	2023-02-16 17:46:15 +02:00
Joonas Koivunen	0cf7fd0fb8	Compaction with on-demand download (#3598 ) Repeatedly (twice) try to download the compaction targeted layers before actual compaction. Adds tests for both L0 compaction downloading layers and image creation downloading layers. Image creation support existed already. Fixes #3591 Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-02-16 15:36:13 +02:00
Heikki Linnakangas	ddbdcdddd7	Tenant size calculation: refactor, rewrite, and add SVG (#2817 ) Refactor the tenant_size_model code. Segment now contains just the minimum amount of information needed to calculate the size. Other information that is useful for building up the segment tree, and for display purposes, is now kept elsewhere. The code in 'main.rs' has a new ScenarioBuilder struct for that. Calculating which Segments are "needed" is now the responsibility of the caller of tenant_size_mode, not part of the calculation itself. So it's up to the caller to make all the decisions with retention periods for each branch. The output of the sizing calculation is now a Vec of SizeResults, rather than a tree. It uses a tree representation internally, when doing the calculation, but it's not exposed to the caller anymore. Refactor the way the recursive calculation is performed. Rewrite the code in size.rs that builds the Segment model. Get rid of the intermediate representation with Update structs. Build the Segments directly, with some local HashMaps and Vecs to track branch points to help with that. retention_period is now an input to gather_inputs(), rather than an output. Update pageserver http API: rename /size endpoint to /synthetic_size with following parameters: - /synthetic_size?inputs_only to get debug info; - /synthetic_size?retention_period=0 to override cutoff that is used to calculate the size; pass header -H "Accept: text/html" to get HTML output, otherwise JSON is returned Update python tests and openapi spec. --------- Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-02-16 10:53:46 +02:00
Anastasia Lubennikova	a5ce2b5330	Add debug messages around timeline.get_current_logical_size	2023-02-15 16:02:02 +02:00
Joonas Koivunen	e6618f1cc0	Update current logical size gauge (#3592 ) Alternative to #3586. Introduces usage of current_logical_size.current_size as a boundary after which we start to update the metric gauge on ingested wal. Previously any incremented value (ingested wal) would had updated the gauge, but this would had left the metric at zero for timelines which never receive any wal even if size had been calculated. Now the gauge is updated right away as the calculation completes, not requiring any wal to be received.	2023-02-14 13:17:34 +02:00
Christian Schwarz	a4256b3250	allow on-demand downloads in walreceiver connection handler Without this patch, basebackup fails if we evict all layers before that. This slipped in as part of commit `01b4b0c2f3` Author: Christian Schwarz <christian@neon.tech> Date: Fri Jan 13 17:02:22 2023 +0100 Introduce RequestContext	2023-02-09 13:39:04 +01:00
Christian Schwarz	175a577ad4	automatic layer eviction This patch adds a per-timeline periodic task that executes an eviction policy. The eviction policy is configurable per tenant. Two policies exist: - NoEviction (the default one) - LayerAccessThreshold The LayerAccessThreshold policy examines the last access timestamp per layer in the layer map and evicts the layer if that last access is further in the past than a configurable threshold value. This policy kind is evaluated periodically at a configurable period. It logs a summary statistic at `info!()` or `warn!()` level, depending on whether any evictions failed. This feature has no explicit killswitch since it's off by default.	2023-02-09 13:33:55 +01:00
Joonas Koivunen	1fdf01e3bc	fix: readable Debug for Layers (#3575 ) #3536 added the custom Debug implementations but it using derived Debug on Key lead to too verbose output. Instead of making `Key`'s `Debug` unconditionally or conditionally do the `Display` variant (for table space'd keys), opted to build a newtype to provide `Debug` for `Range<Key>` via `Display` which seemed to work unconditionally. Also orders Key to have: 1. comment, 2. derive, 3. `struct Key`.	2023-02-09 13:55:37 +02:00
Christian Schwarz	446a39e969	make LayerAccesStatFullDetails Copy Method to_api_model renamed to as_api_model because of Clippy complaint: https://rust-lang.github.io/rust-clippy/master/index.html#wrong_self_convention	2023-02-09 12:35:45 +01:00
Joonas Koivunen	f07d6433b6	fix: one leftover Arc::ptr_eq (#3573 ) @knizhnik noticed that one instance of `Arc::<dyn PersistentLayer>::ptr_eq` was missed in #3558. Now all `ptr_eq` which remain are in comments.	2023-02-09 13:02:07 +02:00
Christian Schwarz	7ed93fff06	refactor: allow for eviction of layers in a batch The auto-eviction PR (#3552) operates in two phaes: 1. find candidate layers 2. evict them. For (2), a batch API like the one added in this commit is useful. Note that this PR requires #3558 to be merged first. Otherwise, the tests won't pass.	2023-02-08 14:40:47 +01:00
Joonas Koivunen	a6dffb6ef9	fix: stop using Arc::ptr_eq with dyn Trait (#3558 ) This changes the way we compare `Arc<dyn PersistentLayer>` in Timeline's `LayerMap` not to use `Arc::ptr_eq` which has been witnessed in development of #3557 to yield wrong results. It gives wrong results because it compares fat pointers, which are `(object, vtable)` tuples for `dyn Trait` and there are no guarantees that the `vtable`s are unique. As in there were multiple vtables for `RemoteLayer` which is why the comparison failed in #3557. This is a known issue in rust, clippy warns against it and rust std might be moving to the solution which has been reproduced on this PR: compare only object pointers by "casting out" the vtable pointer.	2023-02-08 12:25:25 +00:00
Joonas Koivunen	fcb905f519	Use LayerMap::replace in eviction (#3544 ) Follow-up to #3536, to actually use the new `Debug` in replacing the layers, and use replacement with manual eviction endpoint. Turns out the two paths share a lot of handling of `Replacement` but didn't unify the two (need 3). There are also upcoming refactorings from other PRs to this.	2023-02-07 11:08:55 +02:00
Christian Schwarz	58fa4f0eb7	maintain access stats for historic layers This patch adds basic access statistics for historic layers and exposes them in the management API's `LayerMapInfo`. We record the accesses in the `{Delta,Image}Layer::load()` function because it's the common path of * page_service (`Timline::get_reconstruct_data()`) * Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`) The stats survive residence status changes, and record these as well. When scraping the layer map endpoint to record its evolution over time, one must account for stat resets because they are in-memory only and will reset on pageserver restart. Use the launch timestamp header added by (#3527) to identify pageserver restarts. This is PR https://github.com/neondatabase/neon/pull/3496	2023-02-06 17:01:38 +01:00
Joonas Koivunen	678fe0684f	std::fmt::Debug for Layer implementations (#3536 ) Follow-up to #3513. This removes the old blanket `std::fmt::Debug` impl on `dyn Layer` which did not seem to be used from anywhere (no compilation errors after removing). Adds `std::fmt::Debug` requirement and implementations for `trait Layer` implementors: - LayerDescriptor (derived) - RemoteLayer (manual) - DeltaLayer (manual) - ImageLayer (manual) Manual implementations are used to skip PageserverConf, tenant and timeline ids, large collections. Adds and adjusts some doc comments to be more rustdoc alike.	2023-02-06 14:21:51 +02:00
Kirill Bulatov	ec3a3aed37	Dump current tenant config (#3534 ) The PR adds an endpoint to show tenant's current config: `GET /v1/tenant/:tenant_id/config` Tenant's config consists of two parts: tenant overrides (could be changed via other management API requests) and the default part, substituting all missing overrides (constant, hardcoded in pageserver). The API returns the custom overrides and the final tenant config, after applying all the defaults. Along the way, it had to fix two things in the config: * allow to shorten the json version and omit all `null`'s (same as toml serializer behaves by default), and to understand such shortened format when deserialized. A unit test is added * fix a bug, when `PUT /v1/tenant/config` endpoint rewritten the local file with what had came in the request, but updating (not rewriting the old values) the in-memory state instead. That got uncovered during adjusting the e2e test and fixed to do the replacement everywhere, otherwise there's no way to revert existing overrides. Fixes #3471 (commit `dc688affe8`) * fixes https://github.com/neondatabase/neon/issues/3472 by reordering the config saving operations	2023-02-04 01:32:29 +02:00
Joonas Koivunen	f2d89761c2	feat: LayerMap::replace (#3513 ) Cc: #3486 Adds a method to replace a particular layer from the LayerMap for the purposes of remote layer download and layer eviction. In those use cases read lock on layer map needs to be released after initial search, but other operations could modify layermap before replacing thread gets to run. Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>	2023-02-03 15:33:46 +02:00
Kirill Bulatov	f6a10f4693	Use regular layer names instead of a special test ones (#3524 ) We do not need special enum variant for testing the file names, neither its special handling across the code. Current tests are able to create regular layers with normal layer names, as the PR shows.	2023-02-02 14:52:17 +02:00
Kirill Bulatov	2759f1a22e	Evict layers on demand (#3486 ) Closes https://github.com/neondatabase/neon/issues/3439 Adds a set of commands to manipulate the layer map: * dump the layer map contents * evict the layer form the layer map (remove the local file, put the remote layer instead in the layer map) * download the layer (operation, reversing the eviction) The commands will change later, when the statistics is added on top, so the swagger schema is not adjusted. The commands might have issues with big amount of layers: no pagination is done for the dump command, eviction and download commands look for the layer to evict/download by iterating all layers sequentially and comparing the layer names. For now, that seems to be tolerable ("big" number of layers is ~2_000) and further experiments are needed. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-02-02 12:14:44 +02:00
Christian Schwarz	f1aece1ba0	add RequestContext plumbing for layer access stats In preparation for #3496 plumb through RequestContext to the data access methods of `PersistentLayer`. This is PR https://github.com/neondatabase/neon/pull/3504	2023-02-01 15:29:01 +02:00
Konstantin Knizhnik	895f929bce	Add layer_map_analyzer tool (#3451 ) See #3348	2023-01-31 15:50:52 +02:00
Lassi Pölönen	20b38acff0	Replace per timeline `pageserver_storage_operations_seconds` with a global one (#3409 ) Related to: https://github.com/neondatabase/neon/issues/2848 `pageserver_storage_operations_seconds` is the most expensive metric we have, as there are a lot of tenants/timelines and the histogram had 42 buckets. These are quite sparse too, so instead of having a histogram per timeline, create a new histogram `pageserver_storage_operations_seconds_global` without tenant and timeline dimensions and replace `pageserver_storage_operations_seconds` with sum and counter. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-01-30 17:10:29 +02:00
Kirill Bulatov	c61bc25ef9	Clean up NeedsDownload error (#3464 )	2023-01-30 16:08:23 +02:00
Shany Pozin	67d418e91c	Set the last_record_gauge to the value which was persisted metadata (#3460 ) ## Describe your changes Whenever a tenant is detached or the pageserver is restarted the pageserver_last_record_lsn metric is dropped This fix resurrects the value from the metadata whenever the tenant is attached again ## Issue ticket number and link [3571](https://github.com/neondatabase/cloud/issues/3571) ## Checklist before requesting a review - [X] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-01-29 12:40:50 +02:00
Joonas Koivunen	0ec84e2f1f	Allow creating config for attached tenant (#3446 ) Currently `attach` doesn't write a tenant config, because we don't back it up in the first place. The current implementation of `Tenant::persist_tenant_config` does not allow changing tenant's configuration through the http api which will fail because the file wasn't created on attach and `OpenOptions::truncate(true).write(true).create_new(false)` is used. I think this patch allows for least controversial middle ground which enables changing tenant configuration even for attached tenants (not just created tenants).	2023-01-27 15:34:59 +02:00
Christian Schwarz	99399c112a	move walreceiver module under timeline Walreceiver is a per-timeline abstraction. Move it there to reflect the hierarchy of abstractions and task_mgr tasks. The code that sets up the global storage_broker client is not timeline-scoped. So, break it out into a separate module. The motivation for this change is to prepare the code base for replacing the task_mgr global task registry with a more ownership-oriented approach to manage task lifetimes. I removed TaskStateUpdate::Init because, after doing the changes, rustc warned that it was never constructed. A quick search through the commit history shows that this has always been true since commit `fb68d01449` Author: Dmitry Rodionov <dmitry@neon.tech> Date: Mon Sep 26 23:57:02 2022 +0300 Preserve task result in TaskHandle by keeping join handle around (#2521) So, the warning is not an indication of some accidental code removal. This is PR: https://github.com/neondatabase/neon/pull/3456	2023-01-27 12:23:17 +01:00
Christian Schwarz	dc64962ffc	tenant::mgr: explicit tracking of initializing & shutting-down states This patch wrap the tenants hashmap into an enum that represents the tenant manager's three major states: - Initializing - Open for business - Shutting down. See the enum doc comments for details. In response, all the users of `TENANTS` are now forced to distinguish those states. The only major change is in `run_if_no_tenant_in_memory`, which, before this patch, was used by the /attach and /load endpoints. This patch rewrites that method under the name `tenant_map_insert`, replacing the anyhow::Result with a std Result and a dedicated error type. Introducing this error types allows using `tenant_map_insert` in `tenant_create`, thereby unifying all code paths that create tenants objects to use `tenant_map_insert`. This is beneficial because we can now systematically prevent tenants from being created, attached, or `/load`ed during pageserver shutdown. The management API remains available, but the endpoints that create new tenants will fail with an error. More work would need to be done to properly distinguish these errors through HTTP status codes such as 503.	2023-01-26 11:24:48 +01:00
bojanserafimov	0a09589403	Increase gc period to 1h (#3432 )	2023-01-25 15:18:41 -05:00
Christian Schwarz	01b4b0c2f3	Introduce RequestContext Motivation ========== Layer Eviction Needs Context ---------------------------- Before we start implementing layer eviction, we need to collect some access statistics per layer file or maybe even page. Part of these statistics should be the initiator of a page read request to answer the question of whether it was page_service vs. one of the background loops, and if the latter, which of them? Further, it would be nice to learn more about what activity in the pageserver initiated an on-demand download of a layer file. We will use this information to test out layer eviction policies. Read more about the current plan for layer eviction here: https://github.com/neondatabase/neon/issues/2476#issuecomment-1370822104 task_mgr problems + cancellation + tenant/timeline lifecycle ------------------------------------------------------------ Apart from layer eviction, we have long-standing problems with task_mgr, task cancellation, and various races around tenant / timeline lifecycle transitions. One approach to solve these is to abandon task_mgr in favor of a mechanism similar to Golang's context.Context, albeit extended to support waiting for completion, and specialized to the needs in the pageserver. Heikki solves all of the above at once in PR https://github.com/neondatabase/neon/pull/3228 , which is not yet merged at the time of writing. What Is This Patch About ======================== This patch addresses the immediate needs of layer eviction by introducing a `RequestContext` structure that is plumbed through the pageserver - all the way from the various entrypoints (page_service, management API, tenant background loops) down to Timeline::{get,get_reconstruct_data}. The struct carries a description of the kind of activity that initiated the call. We re-use task_mgr::TaskKind for this. Also, it carries the desired on-demand download behavior of the entrypoint. Timeline::get_reconstruct_data can then log the TaskKind that initiated the on-demand download. I developed this patch by git-checking-out Heikki's big RequestContext PR https://github.com/neondatabase/neon/pull/3228 , then deleting all the functionality that we do not need to address the needs for layer eviction. After that, I added a few things on top: 1. The concept of attached_child and detached_child in preparation for cancellation signalling through RequestContext, which will be added in a future patch. 2. A kill switch to turn DownloadBehavior::Error into a warning. 3. Renamed WalReceiverConnection to WalReceiverConnectionPoller and added an additional TaskKind WalReceiverConnectionHandler.These were necessary to create proper detached_child-type RequestContexts for the various tasks that walreceiver starts. How To Review This Patch ======================== Start your review with the module-level comment in context.rs. It explains the idea of RequestContext, what parts of it are implemented in this patch, and the future plans for RequestContext. Then review the various `task_mgr::spawn` call sites. At each of them, we should be creating a new detached_child RequestContext. Then review the (few) RequestContext::attached_child call sites and ensure that the spawned tasks do not outlive the task that spawns them. If they do, these call sites should use detached_child() instead. Then review the todo_child() call sites and judge whether it's worth the trouble of plumbing through a parent context from the caller(s). Lastly, go through the bulk of mechanical changes that simply forwards the &ctx.	2023-01-25 14:53:30 +01:00
Christian Schwarz	0b673c12d7	timeline: don't transition Active=>Active during pageserver startup Before this patch, when `initialize_with_lock` was called via `timeline_init_and_sync`, we would transition the timeline like so: load_local_timeline/load_remote_timeline: timeline_init_and_sync Timeline::new () => Loading initialize_with_lock: set_state(Active) Loading => Active timeline.activate() Active => Active	2023-01-24 15:56:02 +01:00
Christian Schwarz	7a333cfb12	be noisy about unexpected Timeline state transitions	2023-01-24 15:56:02 +01:00
Christian Schwarz	55c184fcd7	fix some anyhow::Context::context calls that should use with_context(format!(...)) Noticed this while combing through some production logs.	2023-01-24 12:22:33 +01:00
Christian Schwarz	6b6570b580	remove TimelineState::Suspended, introduce TimelineState::Loading The TimelineState::Suspsended was dubious to begin with. I suppose that the intention was that timelines could transition back and forth between Active and Suspended states. But practically, the code before this patch never did that. The transitions were: () ==Timeline::new==> Suspended ====> {Active,Broken,Stopping} One exception: Tenant::set_stopping() could transition timelines like so: !Broken ==Tenant::set_stopping()==> Suspended But Tenant itself cannot transition from stopping state to any other state. Thus, this patch removes TimelineState::Suspended and introduces a new state Loading. The aforementioned transitions change as follows: - () ==Timeline::new==> Suspended ====> {Active,Broken,Stopping} + () ==Timeline::new==> Loading ==*==> {Active,Broken,Stopping} - !Broken ==Tenant::set_stopping()==> Suspended + !Broken ==Tenant::set_stopping()==> Stopping Walreceiver's connection manager loop watches TimelineState to decide whether it should retry connecting, or exit. This patch changes the loop to exit when it observes the transition into Stopping state. Walreceiver isn't supposed to be started until the timeline transitions into Active state. So, this patch also adds some warn!() messages in case this happens anyways.	2023-01-23 17:22:49 +01:00
Joonas Koivunen	7704caa3ac	More tenant size fixes (#3410 ) Small changes, but hopefully this will help with the panic detected in staging, for which we cannot get the debugging information right now (end-of-branch before branch-point).	2023-01-23 17:12:51 +02:00
bojanserafimov	a3d7ad2d52	Implement layer map using immutable BST (#2998 )	2023-01-20 16:10:12 -05:00
Anastasia Lubennikova	36f048d6b0	Fix tenant size orphans (#3377 ) Before only the timelines which have passed the `gc_horizon` were processed which failed with orphans at the tree_sort phase. Example input in added `test_branched_empty_timeline_size` test case. The PR changes iteration to happen through all timelines, and in addition to that, any learned branch points will be calculated as they would had been in the original implementation if the ancestor branch had been over the `gc_horizon`. This also changes how tenants where all timelines are below `gc_horizon` are handled. Previously tenant_size 0 was returned, but now they will have approximately `initdb_lsn` worth of tenant_size. The PR also adds several new tenant size tests that describe various corner cases of branching structure and `gc_horizon` setting. They are currently disabled to not consume time during CI. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>	2023-01-20 20:21:36 +02:00
Christian Schwarz	8ba1699937	Revert "Use actual temporary dir for pageserver unit tests" This reverts commit `826e89b9ce`. The problem with that commit was that it deletes the TempDir while there are still EphemeralFile instances open. At first I thought this could be fixed by simply adding Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None)) to TenantHarness::drop, but it turned out to be insufficient. So, reverting the commit until we find a proper solution. refs https://github.com/neondatabase/neon/issues/3385	2023-01-19 20:16:56 +01:00
bojanserafimov	a9bd05760f	Improve layer map docstrings (#3382 )	2023-01-19 10:29:15 -05:00
Kirill Bulatov	826e89b9ce	Use actual temporary dir for pageserver unit tests	2023-01-18 17:43:27 +02:00
Kirill Bulatov	1ebd145c29	Actualize the comment (#3362 ) Follow-up of https://github.com/neondatabase/neon/pull/3326#issuecomment-1384265759	2023-01-17 13:30:42 +02:00
Christian Schwarz	58c8c1076c	download_all_remote_layers API: require client to specify max_concurrent_downloads Before this patch, we would start all layer downloads simultaneously. There is at most one download_all_remote_layers task per timeline. Hence, the specified limit is per timeline. There is still no global concurrency limit for layer downloads. We'll have to revisit that at some point and also prioritize on-demand initiated downloads over download_all_remote_layers downloads. But that's for another day.	2023-01-16 19:29:06 +01:00
Anastasia Lubennikova	c6d383e239	code cleanup	2023-01-13 11:51:28 +02:00
Anastasia Lubennikova	5e3e0fbf6f	remove unneeded Cargo.lock changes	2023-01-13 11:51:28 +02:00
Anastasia Lubennikova	26f39c03f2	review code cleanup: - handle errors in calculate_synthetic_size_worker. Don't exit the bgworker if one tenant failed. - add cached_synthetic_tenant_size to cache values calculated by the bgworker - code cleanup: remove unneeded info! messages, clean comments - handle collect_metrics_task() error. Don't exit collect_metrics worker if one task failed. - add unit test to cover case when we have multiple branches at the same lsn	2023-01-13 11:51:28 +02:00
Anastasia Lubennikova	148e020fb9	Fix logical size calculation: sort updates in topological order so that the parent timeline always preceeds its children. fixes #3179	2023-01-13 11:51:28 +02:00
Heikki Linnakangas	57a6e931ea	Comment, formatting and other cosmetic cleanup.	2023-01-12 19:05:13 +02:00
Heikki Linnakangas	c1731bc4f0	Push on-demand download into Timeline::get() function itself. This makes Timeline::get() async, and all functions that call it directly or indirectly with it. The with_ondemand_download() mechanism is gone, Timeline::get() now always downloads files, whether you want it or not. That is what all the current callers want, so even though this loses the capability to get a page only if it's already in the pageserver, without downloading, we were not using that capability. There were some places that used 'no_ondemand_download' in the WAL ingestion code that would error out if a layer file was not found locally, but those were dubious. We do actually want to on-demand download in all of those places. Per discussion at https://github.com/neondatabase/neon/pull/3233#issuecomment-1368032358	2023-01-12 11:53:10 +02:00
Christian Schwarz	8eebd5f039	run on-demand compaction in a task_mgr task With this patch, tenant_detach and timeline_delete's task_mgr::shutdown_tasks() call will wait for on-demand compaction to finish. Before this patch, the on-demand compaction would grab the layer_removal_cs after tenant_detach / timeline_delete had removed the timeline directory. This resulted in error No such file or directory (os error 2) NB: I already implemented this pattern for ondemand GC a while back. fixes https://github.com/neondatabase/neon/issues/3136	2023-01-09 19:08:22 +01:00

1 2 3

131 Commits