- add parse_query_param()
- use Cow<> where possible
- move param parsing code to utils::http::request
This was originally PR https://github.com/neondatabase/neon/pull/3502
which targeted a different branch.
closes #3510
Related to: https://github.com/neondatabase/neon/issues/2848
`pageserver_storage_operations_seconds` is the most expensive metric we
have, as there are a lot of tenants/timelines and the histogram had 42
buckets. These are quite sparse too, so instead of having a histogram
per timeline, create a new histogram
`pageserver_storage_operations_seconds_global` without tenant and
timeline dimensions and replace `pageserver_storage_operations_seconds`
with sum and counter.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Describe your changes
Added a metric that allow to monitor tenants state
## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3161
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [X] I have added an e2e test for it.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Describe your changes
Whenever a tenant is detached or the pageserver is restarted the
pageserver_last_record_lsn metric is dropped
This fix resurrects the value from the metadata whenever the tenant is
attached again
## Issue ticket number and link
[3571](https://github.com/neondatabase/cloud/issues/3571)
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
Currently `attach` doesn't write a tenant config, because we don't back
it up in the first place. The current implementation of
`Tenant::persist_tenant_config` does not allow changing tenant's
configuration through the http api which will fail because the file
wasn't created on attach and
`OpenOptions::truncate(true).write(true).create_new(false)` is used.
I think this patch allows for least controversial middle ground which
*enables* changing tenant configuration even for attached tenants (not
just created tenants).
Walreceiver is a per-timeline abstraction. Move it there to reflect
the hierarchy of abstractions and task_mgr tasks.
The code that sets up the global storage_broker client
is not timeline-scoped. So, break it out into a separate module.
The motivation for this change is to prepare the code base for replacing
the task_mgr global task registry with a more ownership-oriented
approach to manage task lifetimes.
I removed TaskStateUpdate::Init because, after doing the changes,
rustc warned that it was never constructed.
A quick search through the commit history shows that this
has always been true since
commit fb68d01449
Author: Dmitry Rodionov <dmitry@neon.tech>
Date: Mon Sep 26 23:57:02 2022 +0300
Preserve task result in TaskHandle by keeping join handle around (#2521)
So, the warning is not an indication of some accidental code removal.
This is PR: https://github.com/neondatabase/neon/pull/3456
Change the signature so that it takes an Arc<Timeline> reference to the
source timeline, instead of just the ID. All the callers have an Arc
reference at hand, so this is more convenient for everyone.
Reorder the code a bit and improve the comments, to make it more clear
what it does and why.
This patch wrap the tenants hashmap into an enum that represents the
tenant manager's three major states:
- Initializing
- Open for business
- Shutting down.
See the enum doc comments for details.
In response, all the users of `TENANTS` are now forced to distinguish
those states.
The only major change is in `run_if_no_tenant_in_memory`, which,
before this patch, was used by the /attach and /load endpoints.
This patch rewrites that method under the name `tenant_map_insert`,
replacing the anyhow::Result with a std Result and a dedicated error
type.
Introducing this error types allows using `tenant_map_insert` in
`tenant_create`, thereby unifying all code paths that create tenants
objects to use `tenant_map_insert`.
This is beneficial because we can now systematically prevent tenants
from being created, attached, or `/load`ed during pageserver shutdown.
The management API remains available, but the endpoints that create
new tenants will fail with an error.
More work would need to be done to properly distinguish these errors
through HTTP status codes such as 503.
Motivation
==========
Layer Eviction Needs Context
----------------------------
Before we start implementing layer eviction, we need to collect some
access statistics per layer file or maybe even page.
Part of these statistics should be the initiator of a page read request
to answer the question of whether it was page_service vs. one of the
background loops, and if the latter, which of them?
Further, it would be nice to learn more about what activity in the pageserver
initiated an on-demand download of a layer file.
We will use this information to test out layer eviction policies.
Read more about the current plan for layer eviction here:
https://github.com/neondatabase/neon/issues/2476#issuecomment-1370822104
task_mgr problems + cancellation + tenant/timeline lifecycle
------------------------------------------------------------
Apart from layer eviction, we have long-standing problems with task_mgr,
task cancellation, and various races around tenant / timeline lifecycle
transitions.
One approach to solve these is to abandon task_mgr in favor of a
mechanism similar to Golang's context.Context, albeit extended to
support waiting for completion, and specialized to the needs in the
pageserver.
Heikki solves all of the above at once in PR
https://github.com/neondatabase/neon/pull/3228 , which is not yet
merged at the time of writing.
What Is This Patch About
========================
This patch addresses the immediate needs of layer eviction by
introducing a `RequestContext` structure that is plumbed through the
pageserver - all the way from the various entrypoints (page_service,
management API, tenant background loops) down to
Timeline::{get,get_reconstruct_data}.
The struct carries a description of the kind of activity that initiated
the call. We re-use task_mgr::TaskKind for this.
Also, it carries the desired on-demand download behavior of the entrypoint.
Timeline::get_reconstruct_data can then log the TaskKind that initiated
the on-demand download.
I developed this patch by git-checking-out Heikki's big RequestContext
PR https://github.com/neondatabase/neon/pull/3228 , then deleting all
the functionality that we do not need to address the needs for layer
eviction.
After that, I added a few things on top:
1. The concept of attached_child and detached_child in preparation for
cancellation signalling through RequestContext, which will be added in
a future patch.
2. A kill switch to turn DownloadBehavior::Error into a warning.
3. Renamed WalReceiverConnection to WalReceiverConnectionPoller and
added an additional TaskKind WalReceiverConnectionHandler.These were
necessary to create proper detached_child-type RequestContexts for the
various tasks that walreceiver starts.
How To Review This Patch
========================
Start your review with the module-level comment in context.rs.
It explains the idea of RequestContext, what parts of it are implemented
in this patch, and the future plans for RequestContext.
Then review the various `task_mgr::spawn` call sites. At each of them,
we should be creating a new detached_child RequestContext.
Then review the (few) RequestContext::attached_child call sites and
ensure that the spawned tasks do not outlive the task that spawns them.
If they do, these call sites should use detached_child() instead.
Then review the todo_child() call sites and judge whether it's worth the
trouble of plumbing through a parent context from the caller(s).
Lastly, go through the bulk of mechanical changes that simply forwards
the &ctx.
Before this patch, when `initialize_with_lock` was called via
`timeline_init_and_sync`, we would transition the timeline like so:
load_local_timeline/load_remote_timeline:
timeline_init_and_sync
Timeline::new
() => Loading
initialize_with_lock:
set_state(Active)
Loading => Active
timeline.activate()
Active => Active
this makes debugging problematic cases in the future easier, as we can
just request the model inputs, use them locally to reproduce the issue
with the model.
The TimelineState::Suspsended was dubious to begin with. I suppose
that the intention was that timelines could transition back and
forth between Active and Suspended states.
But practically, the code before this patch never did that.
The transitions were:
() ==Timeline::new==> Suspended ==*==> {Active,Broken,Stopping}
One exception: Tenant::set_stopping() could transition timelines like
so:
!Broken ==Tenant::set_stopping()==> Suspended
But Tenant itself cannot transition from stopping state to any other
state.
Thus, this patch removes TimelineState::Suspended and introduces a new
state Loading. The aforementioned transitions change as follows:
- () ==Timeline::new==> Suspended ==*==> {Active,Broken,Stopping}
+ () ==Timeline::new==> Loading ==*==> {Active,Broken,Stopping}
- !Broken ==Tenant::set_stopping()==> Suspended
+ !Broken ==Tenant::set_stopping()==> Stopping
Walreceiver's connection manager loop watches TimelineState to decide
whether it should retry connecting, or exit.
This patch changes the loop to exit when it observes the transition
into Stopping state.
Walreceiver isn't supposed to be started until the timeline transitions
into Active state. So, this patch also adds some warn!() messages
in case this happens anyways.
Small changes, but hopefully this will help with the panic detected in
staging, for which we cannot get the debugging information right now
(end-of-branch before branch-point).
Before only the timelines which have passed the `gc_horizon` were
processed which failed with orphans at the tree_sort phase. Example
input in added `test_branched_empty_timeline_size` test case.
The PR changes iteration to happen through all timelines, and in
addition to that, any learned branch points will be calculated as they
would had been in the original implementation if the ancestor branch had
been over the `gc_horizon`.
This also changes how tenants where all timelines are below `gc_horizon`
are handled. Previously tenant_size 0 was returned, but now they will
have approximately `initdb_lsn` worth of tenant_size.
The PR also adds several new tenant size tests that describe various corner
cases of branching structure and `gc_horizon` setting.
They are currently disabled to not consume time during CI.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
This reverts commit 826e89b9ce.
The problem with that commit was that it deletes the TempDir while
there are still EphemeralFile instances open.
At first I thought this could be fixed by simply adding
Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None))
to TenantHarness::drop, but it turned out to be insufficient.
So, reverting the commit until we find a proper solution.
refs https://github.com/neondatabase/neon/issues/3385
Before this patch, we would start all layer downloads simultaneously.
There is at most one download_all_remote_layers task per timeline.
Hence, the specified limit is per timeline.
There is still no global concurrency limit for layer downloads.
We'll have to revisit that at some point and also prioritize on-demand
initiated downloads over download_all_remote_layers downloads.
But that's for another day.
- handle errors in calculate_synthetic_size_worker. Don't exit the bgworker if one tenant failed.
- add cached_synthetic_tenant_size to cache values calculated by the bgworker
- code cleanup: remove unneeded info! messages, clean comments
- handle collect_metrics_task() error. Don't exit collect_metrics worker if one task failed.
- add unit test to cover case when we have multiple branches at the same lsn