Commit Graph

1166 Commits

Author SHA1 Message Date
Bojan Serafimov
c471c25744 Clone less 2023-02-06 14:42:17 -05:00
Bojan Serafimov
e030830397 WIP 2023-02-06 13:55:53 -05:00
Christian Schwarz
58fa4f0eb7 maintain access stats for historic layers
This patch adds basic access statistics for historic layers
and exposes them in the management API's `LayerMapInfo`.

We record the accesses in the `{Delta,Image}Layer::load()` function
because it's the common path of
* page_service (`Timline::get_reconstruct_data()`)
* Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`)

The stats survive residence status changes, and record these as well.

When scraping the layer map endpoint to record its evolution over time,
one must account for stat resets because they are in-memory only and
will reset on pageserver restart.
Use the launch timestamp header added by (#3527) to identify pageserver restarts.

This is PR https://github.com/neondatabase/neon/pull/3496
2023-02-06 17:01:38 +01:00
Anastasia Lubennikova
877a2d70e3 Periodically send cached consumption metrics (#3520)
Add new pageserver config setting `cached_metric_collection_interval`
with default `1 hour`.
This setting controls how often unchanged cached consumption metrics are sent to
the HTTP endpoint.

This is a workaround for billing service limitations.
fixes #3485
2023-02-06 17:53:10 +02:00
Joonas Koivunen
678fe0684f std::fmt::Debug for Layer implementations (#3536)
Follow-up to #3513.

This removes the old blanket `std::fmt::Debug` impl on `dyn Layer` which
did not seem to be used from anywhere (no compilation errors after
removing).

Adds `std::fmt::Debug` requirement and implementations for `trait Layer`
implementors:
- LayerDescriptor (derived)
- RemoteLayer (manual)
- DeltaLayer (manual)
- ImageLayer (manual)

Manual implementations are used to skip PageserverConf, tenant and
timeline ids, large collections.

Adds and adjusts some doc comments to be more rustdoc alike.
2023-02-06 14:21:51 +02:00
Shany Pozin
c9821f13e0 Expose the tenant calculated synthetic size as a Prometheus metric (#3541)
## Describe your changes
Expose the currently calculated synthetic size as a Prometheus metric
## Issue ticket number and link
#3509

## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-02-06 09:25:15 +02:00
Kirill Bulatov
ec3a3aed37 Dump current tenant config (#3534)
The PR adds an endpoint to show tenant's current config: `GET
/v1/tenant/:tenant_id/config`

Tenant's config consists of two parts: tenant overrides (could be
changed via other management API requests) and the default part,
substituting all missing overrides (constant, hardcoded in pageserver).
The API returns the custom overrides and the final tenant config, after
applying all the defaults.

Along the way, it had to fix two things in the config:

* allow to shorten the json version and omit all `null`'s (same as toml
serializer behaves by default), and to understand such shortened format
when deserialized. A unit test is added
* fix a bug, when `PUT /v1/tenant/config` endpoint rewritten the local
file with what had came in the request, but updating (not rewriting the
old values) the in-memory state instead.
That got uncovered during adjusting the e2e test and fixed to do the
replacement everywhere, otherwise there's no way to revert existing
overrides. Fixes #3471 (commit
dc688affe8)
* fixes https://github.com/neondatabase/neon/issues/3472 by reordering
the config saving operations
2023-02-04 01:32:29 +02:00
Christian Schwarz
87cd2bae77 introduce LaunchTimestamp to identify process restarts
This patch adds a LaunchTimestamp type to the `metrics` crate,
along with a `libmetric_` Prometheus metric.

The initial user is pageserver.
In addition to exposing the Prometheus metric, it also reproduces
the launch timestamp as a header in the API responses.

The motivation for this is that we plan to scrape the pageserver's
/v1/tenant/:tenant_id/timeline/:timeline_id/layer
HTTP endpoint over time. It will soon expose access metrics (#3496)
which reset upon process restart. We will use the pageserver's launch
ID to identify a restart between two scrape points.

However, there are other potential uses. For example, we could use
the Prometheus metric to annotate Grafana plots whenever the launch
timestamp changes.
2023-02-03 18:12:17 +01:00
Joonas Koivunen
f2d89761c2 feat: LayerMap::replace (#3513)
Cc: #3486

Adds a method to replace a particular layer from the LayerMap for the
purposes of remote layer download and layer eviction. In those use cases
read lock on layer map needs to be released after initial search, but
other operations could modify layermap before replacing thread gets to
run.

Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-02-03 15:33:46 +02:00
Anastasia Lubennikova
83048a4adc Handle errors during metric collection. (#3521)
Don't exit the loop if one of the tenants failed to scrape its metrics.
fixes #3490
2023-02-03 12:37:34 +02:00
Kirill Bulatov
f6a10f4693 Use regular layer names instead of a special test ones (#3524)
We do not need special enum variant for testing the file names, neither
its special handling across the code.
Current tests are able to create regular layers with normal layer names,
as the PR shows.
2023-02-02 14:52:17 +02:00
Kirill Bulatov
2759f1a22e Evict layers on demand (#3486)
Closes https://github.com/neondatabase/neon/issues/3439

Adds a set of commands to manipulate the layer map:
* dump the layer map contents
* evict the layer form the layer map (remove the local file, put the
remote layer instead in the layer map)
* download the layer (operation, reversing the eviction)

The commands will change later, when the statistics is added on top, so
the swagger schema is not adjusted.

The commands might have issues with big amount of layers: no pagination
is done for the dump command, eviction and download commands look for
the layer to evict/download by iterating all layers sequentially and
comparing the layer names.
For now, that seems to be tolerable ("big" number of layers is ~2_000)
and further experiments are needed.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-02-02 12:14:44 +02:00
Christian Schwarz
f1aece1ba0 add RequestContext plumbing for layer access stats
In preparation for #3496  plumb through RequestContext to the data
access methods of `PersistentLayer`.

This is PR https://github.com/neondatabase/neon/pull/3504
2023-02-01 15:29:01 +02:00
Christian Schwarz
590695e845 improve query param parsing
- add parse_query_param()
- use Cow<> where possible
- move param parsing code to utils::http::request

This was originally PR https://github.com/neondatabase/neon/pull/3502
which targeted a different branch.

closes  #3510
2023-02-01 14:11:12 +01:00
Konstantin Knizhnik
895f929bce Add layer_map_analyzer tool (#3451)
See #3348
2023-01-31 15:50:52 +02:00
Lassi Pölönen
20b38acff0 Replace per timeline pageserver_storage_operations_seconds with a global one (#3409)
Related to: https://github.com/neondatabase/neon/issues/2848

`pageserver_storage_operations_seconds` is the most expensive metric we
have, as there are a lot of tenants/timelines and the histogram had 42
buckets. These are quite sparse too, so instead of having a histogram
per timeline, create a new histogram
`pageserver_storage_operations_seconds_global` without tenant and
timeline dimensions and replace `pageserver_storage_operations_seconds`
with sum and counter.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-01-30 17:10:29 +02:00
Kirill Bulatov
c61bc25ef9 Clean up NeedsDownload error (#3464) 2023-01-30 16:08:23 +02:00
Shany Pozin
ddb9c2fe94 Add metrics for tenants state (#3448)
## Describe your changes
Added a metric that allow to monitor tenants state 
## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3161

## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [X] I have added an e2e test for it.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-01-29 14:04:06 +02:00
Shany Pozin
67d418e91c Set the last_record_gauge to the value which was persisted metadata (#3460)
## Describe your changes
Whenever a tenant is detached or the pageserver is restarted the
pageserver_last_record_lsn metric is dropped
This fix resurrects the value from the metadata whenever the tenant is
attached again
## Issue ticket number and link
[3571](https://github.com/neondatabase/cloud/issues/3571)
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-01-29 12:40:50 +02:00
Konstantin Knizhnik
c5ca7d0c68 Implement asynchronous pipe for communication with walredo process (#3368)
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-01-27 18:36:24 +02:00
Joonas Koivunen
0ec84e2f1f Allow creating config for attached tenant (#3446)
Currently `attach` doesn't write a tenant config, because we don't back
it up in the first place. The current implementation of
`Tenant::persist_tenant_config` does not allow changing tenant's
configuration through the http api which will fail because the file
wasn't created on attach and
`OpenOptions::truncate(true).write(true).create_new(false)` is used.

I think this patch allows for least controversial middle ground which
*enables* changing tenant configuration even for attached tenants (not
just created tenants).
2023-01-27 15:34:59 +02:00
Christian Schwarz
99399c112a move walreceiver module under timeline
Walreceiver is a per-timeline abstraction. Move it there to reflect
the hierarchy of abstractions and task_mgr tasks.
The code that sets up the global storage_broker client
is not timeline-scoped. So, break it out into a separate module.

The motivation for this change is to prepare the code base for replacing
the task_mgr global task registry with a more ownership-oriented
approach to manage task lifetimes.

I removed TaskStateUpdate::Init because, after doing the changes,
rustc warned that it was never constructed.
A quick search through the commit history shows that this
has always been true since

    commit fb68d01449
    Author: Dmitry Rodionov <dmitry@neon.tech>
    Date:   Mon Sep 26 23:57:02 2022 +0300

        Preserve task result in TaskHandle by keeping join handle around (#2521)

So, the warning is not an indication of some accidental code removal.

This is PR: https://github.com/neondatabase/neon/pull/3456
2023-01-27 12:23:17 +01:00
Heikki Linnakangas
bf63f129ae Make 'branch_timeline' function more clear.
Change the signature so that it takes an Arc<Timeline> reference to the
source timeline, instead of just the ID. All the callers have an Arc
reference at hand, so this is more convenient for everyone.

Reorder the code a bit and improve the comments, to make it more clear
what it does and why.
2023-01-27 02:12:07 +02:00
Christian Schwarz
dc64962ffc tenant::mgr: explicit tracking of initializing & shutting-down states
This patch wrap the tenants hashmap into an enum that represents the
tenant manager's three major states:
- Initializing
- Open for business
- Shutting down.
See the enum doc comments for details.

In response, all the users of `TENANTS` are now forced to distinguish
those states.
The only major change is in `run_if_no_tenant_in_memory`, which,
before this patch, was used by the /attach and /load endpoints.
This patch rewrites that method under the name `tenant_map_insert`,
replacing the anyhow::Result with a std Result and a dedicated error
type.
Introducing this error types allows using `tenant_map_insert` in
`tenant_create`, thereby unifying all code paths that create tenants
objects to use `tenant_map_insert`.

This is beneficial because we can now systematically prevent tenants
from being created, attached, or `/load`ed during pageserver shutdown.
The management API remains available, but the endpoints that create
new tenants will fail with an error.
More work would need to be done to properly distinguish these errors
through HTTP status codes such as 503.
2023-01-26 11:24:48 +01:00
bojanserafimov
0a09589403 Increase gc period to 1h (#3432) 2023-01-25 15:18:41 -05:00
Christian Schwarz
01b4b0c2f3 Introduce RequestContext
Motivation
==========

Layer Eviction Needs Context
----------------------------

Before we start implementing layer eviction, we need to collect some
access statistics per layer file or maybe even page.
Part of these statistics should be the initiator of a page read request
to answer the question of whether it was page_service vs. one of the
background loops, and if the latter, which of them?

Further, it would be nice to learn more about what activity in the pageserver
initiated an on-demand download of a layer file.
We will use this information to test out layer eviction policies.

Read more about the current plan for layer eviction here:
https://github.com/neondatabase/neon/issues/2476#issuecomment-1370822104

task_mgr problems + cancellation + tenant/timeline lifecycle
------------------------------------------------------------

Apart from layer eviction, we have long-standing problems with task_mgr,
task cancellation, and various races around tenant / timeline lifecycle
transitions.
One approach to solve these is to abandon task_mgr in favor of a
mechanism similar to Golang's context.Context, albeit extended to
support waiting for completion, and specialized to the needs in the
pageserver.

Heikki solves all of the above at once in PR
https://github.com/neondatabase/neon/pull/3228 , which is not yet
merged at the time of writing.

What Is This Patch About
========================

This patch addresses the immediate needs of layer eviction by
introducing a `RequestContext` structure that is plumbed through the
pageserver - all the way from the various entrypoints (page_service,
management API, tenant background loops) down to
Timeline::{get,get_reconstruct_data}.

The struct carries a description of the kind of activity that initiated
the call. We re-use task_mgr::TaskKind for this.

Also, it carries the desired on-demand download behavior of the entrypoint.
Timeline::get_reconstruct_data can then log the TaskKind that initiated
the on-demand download.

I developed this patch by git-checking-out Heikki's big RequestContext
PR https://github.com/neondatabase/neon/pull/3228 , then deleting all
the functionality that we do not need to address the needs for layer
eviction.

After that, I added a few things on top:

1. The concept of attached_child and detached_child in preparation for
   cancellation signalling through RequestContext, which will be added in
   a future patch.
2. A kill switch to turn DownloadBehavior::Error into a warning.
3. Renamed WalReceiverConnection to WalReceiverConnectionPoller and
   added an additional TaskKind WalReceiverConnectionHandler.These were
   necessary to create proper detached_child-type RequestContexts for the
   various tasks that walreceiver starts.

How To Review This Patch
========================

Start your review with the module-level comment in context.rs.
It explains the idea of RequestContext, what parts of it are implemented
in this patch, and the future plans for RequestContext.

Then review the various `task_mgr::spawn` call sites. At each of them,
we should be creating a new detached_child RequestContext.

Then review the (few) RequestContext::attached_child call sites and
ensure that the spawned tasks do not outlive the task that spawns them.
If they do, these call sites should use detached_child() instead.

Then review the todo_child() call sites and judge whether it's worth the
trouble of plumbing through a parent context from the caller(s).

Lastly, go through the bulk of mechanical changes that simply forwards
the &ctx.
2023-01-25 14:53:30 +01:00
Kirill Bulatov
572332ab50 Tone down page_service timeouts (#3426)
Closes https://github.com/neondatabase/neon/issues/3341
2023-01-25 13:40:08 +02:00
Vadim Kharitonov
bc4f594ed6 Fix Sentry Version 2023-01-25 12:07:38 +01:00
Kirill Bulatov
ea6f41324a Tone down postgres client io errors (#3435)
Closes https://github.com/neondatabase/neon/issues/3343
2023-01-25 10:50:33 +00:00
Kirill Bulatov
1c3636d848 Tone down walreceiver connection timeout errors (#3425)
Closes https://github.com/neondatabase/neon/issues/3342
2023-01-24 18:03:33 +02:00
Kirill Bulatov
0c16ad8591 Tone down broker subscription errors 2023-01-24 17:23:33 +02:00
Christian Schwarz
0b673c12d7 timeline: don't transition Active=>Active during pageserver startup
Before this patch, when `initialize_with_lock` was called via
`timeline_init_and_sync`, we would transition the timeline like so:

    load_local_timeline/load_remote_timeline:
        timeline_init_and_sync
            Timeline::new
                () => Loading
            initialize_with_lock:
                set_state(Active)
                    Loading => Active
        timeline.activate()
            Active => Active
2023-01-24 15:56:02 +01:00
Christian Schwarz
7a333cfb12 be noisy about unexpected Timeline state transitions 2023-01-24 15:56:02 +01:00
Christian Schwarz
f7ec33970a add doc comment that outlines which tokio tasks walreceiver creates 2023-01-24 15:23:48 +01:00
Joonas Koivunen
98d0a0d242 fix(http): omit needless string allocs (#3421)
Drive-by fix noticed while #3419.
2023-01-24 14:53:39 +02:00
Joonas Koivunen
f74080cbad feat(http): support ?inputs_only=true for tenant_size (#3419)
this makes debugging problematic cases in the future easier, as we can
just request the model inputs, use them locally to reproduce the issue
with the model.
2023-01-24 13:57:13 +02:00
Christian Schwarz
55c184fcd7 fix some anyhow::Context::context calls that should use with_context(format!(...))
Noticed this while combing through some production logs.
2023-01-24 12:22:33 +01:00
Christian Schwarz
6b6570b580 remove TimelineState::Suspended, introduce TimelineState::Loading
The TimelineState::Suspsended was dubious to begin with. I suppose
that the intention was that timelines could transition back and
forth between Active and Suspended states.
But practically, the code before this patch never did that.
The transitions were:

    () ==Timeline::new==> Suspended ==*==> {Active,Broken,Stopping}

One exception: Tenant::set_stopping() could transition timelines like
so:

    !Broken ==Tenant::set_stopping()==> Suspended

But Tenant itself cannot transition from stopping state to any other
state.

Thus, this patch removes TimelineState::Suspended and introduces a new
state Loading. The aforementioned transitions change as follows:

    - () ==Timeline::new==> Suspended ==*==> {Active,Broken,Stopping}
    + () ==Timeline::new==> Loading   ==*==> {Active,Broken,Stopping}

    - !Broken ==Tenant::set_stopping()==> Suspended
    + !Broken ==Tenant::set_stopping()==> Stopping

Walreceiver's connection manager loop watches TimelineState to decide
whether it should retry connecting, or exit.
This patch changes the loop to exit when it observes the transition
into Stopping state.

Walreceiver isn't supposed to be started until the timeline transitions
into Active state. So, this patch also adds some warn!() messages
in case this happens anyways.
2023-01-23 17:22:49 +01:00
Joonas Koivunen
7704caa3ac More tenant size fixes (#3410)
Small changes, but hopefully this will help with the panic detected in
staging, for which we cannot get the debugging information right now
(end-of-branch before branch-point).
2023-01-23 17:12:51 +02:00
Konstantin Knizhnik
5c865f46ba Fix slru_segment_key_range function: segno was assigned to incorrect Key field (#3354) 2023-01-23 10:51:09 +02:00
bojanserafimov
a3d7ad2d52 Implement layer map using immutable BST (#2998) 2023-01-20 16:10:12 -05:00
Anastasia Lubennikova
36f048d6b0 Fix tenant size orphans (#3377)
Before only the timelines which have passed the `gc_horizon` were
processed which failed with orphans at the tree_sort phase. Example
input in added `test_branched_empty_timeline_size` test case.

The PR changes iteration to happen through all timelines, and in
addition to that, any learned branch points will be calculated as they
would had been in the original implementation if the ancestor branch had
been over the `gc_horizon`.

This also changes how tenants where all timelines are below `gc_horizon`
are handled. Previously tenant_size 0 was returned, but now they will
have approximately `initdb_lsn` worth of tenant_size.

The PR also adds several new tenant size tests that describe various corner
cases of branching structure and `gc_horizon` setting.
They are currently disabled to not consume time during CI.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-01-20 20:21:36 +02:00
Joonas Koivunen
58fb6fe861 fix: dont stop pageserver if we fail to calculate synthetic size 2023-01-20 19:55:19 +02:00
Christian Schwarz
8ba1699937 Revert "Use actual temporary dir for pageserver unit tests"
This reverts commit 826e89b9ce.

The problem with that commit was that it deletes the TempDir while
there are still EphemeralFile instances open.

At first I thought this could be fixed by simply adding

  Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None))

to TenantHarness::drop, but it turned out to be insufficient.

So, reverting the commit until we find a proper solution.

refs https://github.com/neondatabase/neon/issues/3385
2023-01-19 20:16:56 +01:00
bojanserafimov
a9bd05760f Improve layer map docstrings (#3382) 2023-01-19 10:29:15 -05:00
Kirill Bulatov
90f66aa51b Enable logs in unit tests 2023-01-18 17:43:27 +02:00
Kirill Bulatov
826e89b9ce Use actual temporary dir for pageserver unit tests 2023-01-18 17:43:27 +02:00
Kirill Bulatov
c6b56d2967 Add more io::Error context when fail to operate on a path (#3254)
I have a test failure that shows 

```
Caused by:
    0: Failed to reconstruct a page image:
    1: Directory not empty (os error 39)
```

but does not really show where exactly that happens.

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d

The PR aims to add more context in debugging that issue.
2023-01-17 22:07:38 +02:00
Kirill Bulatov
1ebd145c29 Actualize the comment (#3362)
Follow-up of
https://github.com/neondatabase/neon/pull/3326#issuecomment-1384265759
2023-01-17 13:30:42 +02:00
Christian Schwarz
48dd9565ac TaskHandle: tone down sender is dropped while join handle is still alive
Rationale: see comments added as part of this commit.

fixes https://github.com/neondatabase/neon/issues/3339
2023-01-17 09:42:22 +01:00