Fixes#3468.
This does change how the panics look, and most importantly, make sure
they are not interleaved with other messages. Adds a `GET /v1/panic`
endpoint for panic testing (useful for sentry dedup and this hook
testing).
The panics are now logged within a new error level span called `panic`
which separates it from other error level events. The panic info is
unpacked into span fields:
- thread=mgmt request worker
- location="pageserver/src/http/routes.rs:898:9"
Co-authored-by: Christian Schwarz <christian@neon.tech>
Repeatedly (twice) try to download the compaction targeted layers before actual
compaction. Adds tests for both L0 compaction downloading layers and image
creation downloading layers. Image creation support existed already.
Fixes#3591
Co-authored-by: Christian Schwarz <christian@neon.tech>
Refactor the tenant_size_model code. Segment now contains just the
minimum amount of information needed to calculate the size. Other
information that is useful for building up the segment tree, and for
display purposes, is now kept elsewhere. The code in 'main.rs' has a new
ScenarioBuilder struct for that.
Calculating which Segments are "needed" is now the responsibility of the
caller of tenant_size_mode, not part of the calculation itself. So it's
up to the caller to make all the decisions with retention periods for
each branch.
The output of the sizing calculation is now a Vec of SizeResults, rather
than a tree. It uses a tree representation internally, when doing the
calculation, but it's not exposed to the caller anymore.
Refactor the way the recursive calculation is performed.
Rewrite the code in size.rs that builds the Segment model. Get rid of
the intermediate representation with Update structs. Build the Segments
directly, with some local HashMaps and Vecs to track branch points to
help with that.
retention_period is now an input to gather_inputs(), rather than an
output.
Update pageserver http API: rename /size endpoint to /synthetic_size
with following parameters:
- /synthetic_size?inputs_only to get debug info;
- /synthetic_size?retention_period=0 to override cutoff that is used to
calculate the size;
pass header -H "Accept: text/html" to get HTML output, otherwise JSON is
returned
Update python tests and openapi spec.
---------
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Alternative to #3586.
Introduces usage of current_logical_size.current_size as a boundary after which we start to update the metric gauge on ingested wal. Previously any incremented value (ingested wal) would had updated the gauge, but this would had left the metric at zero for timelines which never receive any wal even if size had been calculated. Now the gauge is updated right away as the calculation completes, not requiring any wal to be received.
Without this patch, basebackup fails if we evict all layers before that.
This slipped in as part of
commit 01b4b0c2f3
Author: Christian Schwarz <christian@neon.tech>
Date: Fri Jan 13 17:02:22 2023 +0100
Introduce RequestContext
This patch adds a per-timeline periodic task that executes an eviction
policy. The eviction policy is configurable per tenant.
Two policies exist:
- NoEviction (the default one)
- LayerAccessThreshold
The LayerAccessThreshold policy examines the last access timestamp per
layer in the layer map and evicts the layer if that last access is
further in the past than a configurable threshold value.
This policy kind is evaluated periodically at a configurable period.
It logs a summary statistic at `info!()` or `warn!()` level, depending
on whether any evictions failed.
This feature has no explicit killswitch since it's off by default.
#3536 added the custom Debug implementations but it using derived Debug
on Key lead to too verbose output. Instead of making `Key`'s `Debug`
unconditionally or conditionally do the `Display` variant (for table
space'd keys), opted to build a newtype to provide `Debug` for
`Range<Key>` via `Display` which seemed to work unconditionally.
Also orders Key to have: 1. comment, 2. derive, 3. `struct Key`.
The auto-eviction PR (#3552) operates in two phaes:
1. find candidate layers
2. evict them.
For (2), a batch API like the one added in this commit is useful.
Note that this PR requires #3558 to be merged first.
Otherwise, the tests won't pass.
This changes the way we compare `Arc<dyn PersistentLayer>` in Timeline's
`LayerMap` not to use `Arc::ptr_eq` which has been witnessed in
development of #3557 to yield wrong results. It gives wrong results
because it compares fat pointers, which are `(object, vtable)` tuples
for `dyn Trait` and there are no guarantees that the `vtable`s are
unique. As in there were multiple vtables for `RemoteLayer` which is why
the comparison failed in #3557.
This is a known issue in rust, clippy warns against it and rust std
might be moving to the solution which has been reproduced on this PR:
compare only object pointers by "casting out" the vtable pointer.
Follow-up to #3536, to actually use the new `Debug` in replacing the
layers, and use replacement with manual eviction endpoint.
Turns out the two paths share a lot of handling of `Replacement` but
didn't unify the two (need 3). There are also upcoming refactorings
from other PRs to this.
This patch adds basic access statistics for historic layers
and exposes them in the management API's `LayerMapInfo`.
We record the accesses in the `{Delta,Image}Layer::load()` function
because it's the common path of
* page_service (`Timline::get_reconstruct_data()`)
* Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`)
The stats survive residence status changes, and record these as well.
When scraping the layer map endpoint to record its evolution over time,
one must account for stat resets because they are in-memory only and
will reset on pageserver restart.
Use the launch timestamp header added by (#3527) to identify pageserver restarts.
This is PR https://github.com/neondatabase/neon/pull/3496
Add new pageserver config setting `cached_metric_collection_interval`
with default `1 hour`.
This setting controls how often unchanged cached consumption metrics are sent to
the HTTP endpoint.
This is a workaround for billing service limitations.
fixes#3485
Follow-up to #3513.
This removes the old blanket `std::fmt::Debug` impl on `dyn Layer` which
did not seem to be used from anywhere (no compilation errors after
removing).
Adds `std::fmt::Debug` requirement and implementations for `trait Layer`
implementors:
- LayerDescriptor (derived)
- RemoteLayer (manual)
- DeltaLayer (manual)
- ImageLayer (manual)
Manual implementations are used to skip PageserverConf, tenant and
timeline ids, large collections.
Adds and adjusts some doc comments to be more rustdoc alike.
## Describe your changes
Expose the currently calculated synthetic size as a Prometheus metric
## Issue ticket number and link
#3509
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
The PR adds an endpoint to show tenant's current config: `GET
/v1/tenant/:tenant_id/config`
Tenant's config consists of two parts: tenant overrides (could be
changed via other management API requests) and the default part,
substituting all missing overrides (constant, hardcoded in pageserver).
The API returns the custom overrides and the final tenant config, after
applying all the defaults.
Along the way, it had to fix two things in the config:
* allow to shorten the json version and omit all `null`'s (same as toml
serializer behaves by default), and to understand such shortened format
when deserialized. A unit test is added
* fix a bug, when `PUT /v1/tenant/config` endpoint rewritten the local
file with what had came in the request, but updating (not rewriting the
old values) the in-memory state instead.
That got uncovered during adjusting the e2e test and fixed to do the
replacement everywhere, otherwise there's no way to revert existing
overrides. Fixes#3471 (commit
dc688affe8)
* fixes https://github.com/neondatabase/neon/issues/3472 by reordering
the config saving operations
This patch adds a LaunchTimestamp type to the `metrics` crate,
along with a `libmetric_` Prometheus metric.
The initial user is pageserver.
In addition to exposing the Prometheus metric, it also reproduces
the launch timestamp as a header in the API responses.
The motivation for this is that we plan to scrape the pageserver's
/v1/tenant/:tenant_id/timeline/:timeline_id/layer
HTTP endpoint over time. It will soon expose access metrics (#3496)
which reset upon process restart. We will use the pageserver's launch
ID to identify a restart between two scrape points.
However, there are other potential uses. For example, we could use
the Prometheus metric to annotate Grafana plots whenever the launch
timestamp changes.
Cc: #3486
Adds a method to replace a particular layer from the LayerMap for the
purposes of remote layer download and layer eviction. In those use cases
read lock on layer map needs to be released after initial search, but
other operations could modify layermap before replacing thread gets to
run.
Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
We do not need special enum variant for testing the file names, neither
its special handling across the code.
Current tests are able to create regular layers with normal layer names,
as the PR shows.
Closes https://github.com/neondatabase/neon/issues/3439
Adds a set of commands to manipulate the layer map:
* dump the layer map contents
* evict the layer form the layer map (remove the local file, put the
remote layer instead in the layer map)
* download the layer (operation, reversing the eviction)
The commands will change later, when the statistics is added on top, so
the swagger schema is not adjusted.
The commands might have issues with big amount of layers: no pagination
is done for the dump command, eviction and download commands look for
the layer to evict/download by iterating all layers sequentially and
comparing the layer names.
For now, that seems to be tolerable ("big" number of layers is ~2_000)
and further experiments are needed.
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
- add parse_query_param()
- use Cow<> where possible
- move param parsing code to utils::http::request
This was originally PR https://github.com/neondatabase/neon/pull/3502
which targeted a different branch.
closes #3510
Related to: https://github.com/neondatabase/neon/issues/2848
`pageserver_storage_operations_seconds` is the most expensive metric we
have, as there are a lot of tenants/timelines and the histogram had 42
buckets. These are quite sparse too, so instead of having a histogram
per timeline, create a new histogram
`pageserver_storage_operations_seconds_global` without tenant and
timeline dimensions and replace `pageserver_storage_operations_seconds`
with sum and counter.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
## Describe your changes
Added a metric that allow to monitor tenants state
## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3161
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [X] I have added an e2e test for it.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
## Describe your changes
Whenever a tenant is detached or the pageserver is restarted the
pageserver_last_record_lsn metric is dropped
This fix resurrects the value from the metadata whenever the tenant is
attached again
## Issue ticket number and link
[3571](https://github.com/neondatabase/cloud/issues/3571)
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
Currently `attach` doesn't write a tenant config, because we don't back
it up in the first place. The current implementation of
`Tenant::persist_tenant_config` does not allow changing tenant's
configuration through the http api which will fail because the file
wasn't created on attach and
`OpenOptions::truncate(true).write(true).create_new(false)` is used.
I think this patch allows for least controversial middle ground which
*enables* changing tenant configuration even for attached tenants (not
just created tenants).
Walreceiver is a per-timeline abstraction. Move it there to reflect
the hierarchy of abstractions and task_mgr tasks.
The code that sets up the global storage_broker client
is not timeline-scoped. So, break it out into a separate module.
The motivation for this change is to prepare the code base for replacing
the task_mgr global task registry with a more ownership-oriented
approach to manage task lifetimes.
I removed TaskStateUpdate::Init because, after doing the changes,
rustc warned that it was never constructed.
A quick search through the commit history shows that this
has always been true since
commit fb68d01449
Author: Dmitry Rodionov <dmitry@neon.tech>
Date: Mon Sep 26 23:57:02 2022 +0300
Preserve task result in TaskHandle by keeping join handle around (#2521)
So, the warning is not an indication of some accidental code removal.
This is PR: https://github.com/neondatabase/neon/pull/3456
Change the signature so that it takes an Arc<Timeline> reference to the
source timeline, instead of just the ID. All the callers have an Arc
reference at hand, so this is more convenient for everyone.
Reorder the code a bit and improve the comments, to make it more clear
what it does and why.
This patch wrap the tenants hashmap into an enum that represents the
tenant manager's three major states:
- Initializing
- Open for business
- Shutting down.
See the enum doc comments for details.
In response, all the users of `TENANTS` are now forced to distinguish
those states.
The only major change is in `run_if_no_tenant_in_memory`, which,
before this patch, was used by the /attach and /load endpoints.
This patch rewrites that method under the name `tenant_map_insert`,
replacing the anyhow::Result with a std Result and a dedicated error
type.
Introducing this error types allows using `tenant_map_insert` in
`tenant_create`, thereby unifying all code paths that create tenants
objects to use `tenant_map_insert`.
This is beneficial because we can now systematically prevent tenants
from being created, attached, or `/load`ed during pageserver shutdown.
The management API remains available, but the endpoints that create
new tenants will fail with an error.
More work would need to be done to properly distinguish these errors
through HTTP status codes such as 503.
Motivation
==========
Layer Eviction Needs Context
----------------------------
Before we start implementing layer eviction, we need to collect some
access statistics per layer file or maybe even page.
Part of these statistics should be the initiator of a page read request
to answer the question of whether it was page_service vs. one of the
background loops, and if the latter, which of them?
Further, it would be nice to learn more about what activity in the pageserver
initiated an on-demand download of a layer file.
We will use this information to test out layer eviction policies.
Read more about the current plan for layer eviction here:
https://github.com/neondatabase/neon/issues/2476#issuecomment-1370822104
task_mgr problems + cancellation + tenant/timeline lifecycle
------------------------------------------------------------
Apart from layer eviction, we have long-standing problems with task_mgr,
task cancellation, and various races around tenant / timeline lifecycle
transitions.
One approach to solve these is to abandon task_mgr in favor of a
mechanism similar to Golang's context.Context, albeit extended to
support waiting for completion, and specialized to the needs in the
pageserver.
Heikki solves all of the above at once in PR
https://github.com/neondatabase/neon/pull/3228 , which is not yet
merged at the time of writing.
What Is This Patch About
========================
This patch addresses the immediate needs of layer eviction by
introducing a `RequestContext` structure that is plumbed through the
pageserver - all the way from the various entrypoints (page_service,
management API, tenant background loops) down to
Timeline::{get,get_reconstruct_data}.
The struct carries a description of the kind of activity that initiated
the call. We re-use task_mgr::TaskKind for this.
Also, it carries the desired on-demand download behavior of the entrypoint.
Timeline::get_reconstruct_data can then log the TaskKind that initiated
the on-demand download.
I developed this patch by git-checking-out Heikki's big RequestContext
PR https://github.com/neondatabase/neon/pull/3228 , then deleting all
the functionality that we do not need to address the needs for layer
eviction.
After that, I added a few things on top:
1. The concept of attached_child and detached_child in preparation for
cancellation signalling through RequestContext, which will be added in
a future patch.
2. A kill switch to turn DownloadBehavior::Error into a warning.
3. Renamed WalReceiverConnection to WalReceiverConnectionPoller and
added an additional TaskKind WalReceiverConnectionHandler.These were
necessary to create proper detached_child-type RequestContexts for the
various tasks that walreceiver starts.
How To Review This Patch
========================
Start your review with the module-level comment in context.rs.
It explains the idea of RequestContext, what parts of it are implemented
in this patch, and the future plans for RequestContext.
Then review the various `task_mgr::spawn` call sites. At each of them,
we should be creating a new detached_child RequestContext.
Then review the (few) RequestContext::attached_child call sites and
ensure that the spawned tasks do not outlive the task that spawns them.
If they do, these call sites should use detached_child() instead.
Then review the todo_child() call sites and judge whether it's worth the
trouble of plumbing through a parent context from the caller(s).
Lastly, go through the bulk of mechanical changes that simply forwards
the &ctx.
Before this patch, when `initialize_with_lock` was called via
`timeline_init_and_sync`, we would transition the timeline like so:
load_local_timeline/load_remote_timeline:
timeline_init_and_sync
Timeline::new
() => Loading
initialize_with_lock:
set_state(Active)
Loading => Active
timeline.activate()
Active => Active