Commit Graph

1158 Commits

Author SHA1 Message Date
Joonas Koivunen
f2d89761c2 feat: LayerMap::replace (#3513)
Cc: #3486

Adds a method to replace a particular layer from the LayerMap for the
purposes of remote layer download and layer eviction. In those use cases
read lock on layer map needs to be released after initial search, but
other operations could modify layermap before replacing thread gets to
run.

Co-authored-by: bojanserafimov <bojan.serafimov7@gmail.com>
2023-02-03 15:33:46 +02:00
Anastasia Lubennikova
83048a4adc Handle errors during metric collection. (#3521)
Don't exit the loop if one of the tenants failed to scrape its metrics.
fixes #3490
2023-02-03 12:37:34 +02:00
Kirill Bulatov
f6a10f4693 Use regular layer names instead of a special test ones (#3524)
We do not need special enum variant for testing the file names, neither
its special handling across the code.
Current tests are able to create regular layers with normal layer names,
as the PR shows.
2023-02-02 14:52:17 +02:00
Kirill Bulatov
2759f1a22e Evict layers on demand (#3486)
Closes https://github.com/neondatabase/neon/issues/3439

Adds a set of commands to manipulate the layer map:
* dump the layer map contents
* evict the layer form the layer map (remove the local file, put the
remote layer instead in the layer map)
* download the layer (operation, reversing the eviction)

The commands will change later, when the statistics is added on top, so
the swagger schema is not adjusted.

The commands might have issues with big amount of layers: no pagination
is done for the dump command, eviction and download commands look for
the layer to evict/download by iterating all layers sequentially and
comparing the layer names.
For now, that seems to be tolerable ("big" number of layers is ~2_000)
and further experiments are needed.

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-02-02 12:14:44 +02:00
Christian Schwarz
f1aece1ba0 add RequestContext plumbing for layer access stats
In preparation for #3496  plumb through RequestContext to the data
access methods of `PersistentLayer`.

This is PR https://github.com/neondatabase/neon/pull/3504
2023-02-01 15:29:01 +02:00
Christian Schwarz
590695e845 improve query param parsing
- add parse_query_param()
- use Cow<> where possible
- move param parsing code to utils::http::request

This was originally PR https://github.com/neondatabase/neon/pull/3502
which targeted a different branch.

closes  #3510
2023-02-01 14:11:12 +01:00
Konstantin Knizhnik
895f929bce Add layer_map_analyzer tool (#3451)
See #3348
2023-01-31 15:50:52 +02:00
Lassi Pölönen
20b38acff0 Replace per timeline pageserver_storage_operations_seconds with a global one (#3409)
Related to: https://github.com/neondatabase/neon/issues/2848

`pageserver_storage_operations_seconds` is the most expensive metric we
have, as there are a lot of tenants/timelines and the histogram had 42
buckets. These are quite sparse too, so instead of having a histogram
per timeline, create a new histogram
`pageserver_storage_operations_seconds_global` without tenant and
timeline dimensions and replace `pageserver_storage_operations_seconds`
with sum and counter.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
2023-01-30 17:10:29 +02:00
Kirill Bulatov
c61bc25ef9 Clean up NeedsDownload error (#3464) 2023-01-30 16:08:23 +02:00
Shany Pozin
ddb9c2fe94 Add metrics for tenants state (#3448)
## Describe your changes
Added a metric that allow to monitor tenants state 
## Issue ticket number and link
https://github.com/neondatabase/neon/issues/3161

## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [X] I have added an e2e test for it.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-01-29 14:04:06 +02:00
Shany Pozin
67d418e91c Set the last_record_gauge to the value which was persisted metadata (#3460)
## Describe your changes
Whenever a tenant is detached or the pageserver is restarted the
pageserver_last_record_lsn metric is dropped
This fix resurrects the value from the metadata whenever the tenant is
attached again
## Issue ticket number and link
[3571](https://github.com/neondatabase/cloud/issues/3571)
## Checklist before requesting a review
- [X] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.
2023-01-29 12:40:50 +02:00
Konstantin Knizhnik
c5ca7d0c68 Implement asynchronous pipe for communication with walredo process (#3368)
Co-authored-by: Christian Schwarz <christian@neon.tech>
2023-01-27 18:36:24 +02:00
Joonas Koivunen
0ec84e2f1f Allow creating config for attached tenant (#3446)
Currently `attach` doesn't write a tenant config, because we don't back
it up in the first place. The current implementation of
`Tenant::persist_tenant_config` does not allow changing tenant's
configuration through the http api which will fail because the file
wasn't created on attach and
`OpenOptions::truncate(true).write(true).create_new(false)` is used.

I think this patch allows for least controversial middle ground which
*enables* changing tenant configuration even for attached tenants (not
just created tenants).
2023-01-27 15:34:59 +02:00
Christian Schwarz
99399c112a move walreceiver module under timeline
Walreceiver is a per-timeline abstraction. Move it there to reflect
the hierarchy of abstractions and task_mgr tasks.
The code that sets up the global storage_broker client
is not timeline-scoped. So, break it out into a separate module.

The motivation for this change is to prepare the code base for replacing
the task_mgr global task registry with a more ownership-oriented
approach to manage task lifetimes.

I removed TaskStateUpdate::Init because, after doing the changes,
rustc warned that it was never constructed.
A quick search through the commit history shows that this
has always been true since

    commit fb68d01449
    Author: Dmitry Rodionov <dmitry@neon.tech>
    Date:   Mon Sep 26 23:57:02 2022 +0300

        Preserve task result in TaskHandle by keeping join handle around (#2521)

So, the warning is not an indication of some accidental code removal.

This is PR: https://github.com/neondatabase/neon/pull/3456
2023-01-27 12:23:17 +01:00
Heikki Linnakangas
bf63f129ae Make 'branch_timeline' function more clear.
Change the signature so that it takes an Arc<Timeline> reference to the
source timeline, instead of just the ID. All the callers have an Arc
reference at hand, so this is more convenient for everyone.

Reorder the code a bit and improve the comments, to make it more clear
what it does and why.
2023-01-27 02:12:07 +02:00
Christian Schwarz
dc64962ffc tenant::mgr: explicit tracking of initializing & shutting-down states
This patch wrap the tenants hashmap into an enum that represents the
tenant manager's three major states:
- Initializing
- Open for business
- Shutting down.
See the enum doc comments for details.

In response, all the users of `TENANTS` are now forced to distinguish
those states.
The only major change is in `run_if_no_tenant_in_memory`, which,
before this patch, was used by the /attach and /load endpoints.
This patch rewrites that method under the name `tenant_map_insert`,
replacing the anyhow::Result with a std Result and a dedicated error
type.
Introducing this error types allows using `tenant_map_insert` in
`tenant_create`, thereby unifying all code paths that create tenants
objects to use `tenant_map_insert`.

This is beneficial because we can now systematically prevent tenants
from being created, attached, or `/load`ed during pageserver shutdown.
The management API remains available, but the endpoints that create
new tenants will fail with an error.
More work would need to be done to properly distinguish these errors
through HTTP status codes such as 503.
2023-01-26 11:24:48 +01:00
bojanserafimov
0a09589403 Increase gc period to 1h (#3432) 2023-01-25 15:18:41 -05:00
Christian Schwarz
01b4b0c2f3 Introduce RequestContext
Motivation
==========

Layer Eviction Needs Context
----------------------------

Before we start implementing layer eviction, we need to collect some
access statistics per layer file or maybe even page.
Part of these statistics should be the initiator of a page read request
to answer the question of whether it was page_service vs. one of the
background loops, and if the latter, which of them?

Further, it would be nice to learn more about what activity in the pageserver
initiated an on-demand download of a layer file.
We will use this information to test out layer eviction policies.

Read more about the current plan for layer eviction here:
https://github.com/neondatabase/neon/issues/2476#issuecomment-1370822104

task_mgr problems + cancellation + tenant/timeline lifecycle
------------------------------------------------------------

Apart from layer eviction, we have long-standing problems with task_mgr,
task cancellation, and various races around tenant / timeline lifecycle
transitions.
One approach to solve these is to abandon task_mgr in favor of a
mechanism similar to Golang's context.Context, albeit extended to
support waiting for completion, and specialized to the needs in the
pageserver.

Heikki solves all of the above at once in PR
https://github.com/neondatabase/neon/pull/3228 , which is not yet
merged at the time of writing.

What Is This Patch About
========================

This patch addresses the immediate needs of layer eviction by
introducing a `RequestContext` structure that is plumbed through the
pageserver - all the way from the various entrypoints (page_service,
management API, tenant background loops) down to
Timeline::{get,get_reconstruct_data}.

The struct carries a description of the kind of activity that initiated
the call. We re-use task_mgr::TaskKind for this.

Also, it carries the desired on-demand download behavior of the entrypoint.
Timeline::get_reconstruct_data can then log the TaskKind that initiated
the on-demand download.

I developed this patch by git-checking-out Heikki's big RequestContext
PR https://github.com/neondatabase/neon/pull/3228 , then deleting all
the functionality that we do not need to address the needs for layer
eviction.

After that, I added a few things on top:

1. The concept of attached_child and detached_child in preparation for
   cancellation signalling through RequestContext, which will be added in
   a future patch.
2. A kill switch to turn DownloadBehavior::Error into a warning.
3. Renamed WalReceiverConnection to WalReceiverConnectionPoller and
   added an additional TaskKind WalReceiverConnectionHandler.These were
   necessary to create proper detached_child-type RequestContexts for the
   various tasks that walreceiver starts.

How To Review This Patch
========================

Start your review with the module-level comment in context.rs.
It explains the idea of RequestContext, what parts of it are implemented
in this patch, and the future plans for RequestContext.

Then review the various `task_mgr::spawn` call sites. At each of them,
we should be creating a new detached_child RequestContext.

Then review the (few) RequestContext::attached_child call sites and
ensure that the spawned tasks do not outlive the task that spawns them.
If they do, these call sites should use detached_child() instead.

Then review the todo_child() call sites and judge whether it's worth the
trouble of plumbing through a parent context from the caller(s).

Lastly, go through the bulk of mechanical changes that simply forwards
the &ctx.
2023-01-25 14:53:30 +01:00
Kirill Bulatov
572332ab50 Tone down page_service timeouts (#3426)
Closes https://github.com/neondatabase/neon/issues/3341
2023-01-25 13:40:08 +02:00
Vadim Kharitonov
bc4f594ed6 Fix Sentry Version 2023-01-25 12:07:38 +01:00
Kirill Bulatov
ea6f41324a Tone down postgres client io errors (#3435)
Closes https://github.com/neondatabase/neon/issues/3343
2023-01-25 10:50:33 +00:00
Kirill Bulatov
1c3636d848 Tone down walreceiver connection timeout errors (#3425)
Closes https://github.com/neondatabase/neon/issues/3342
2023-01-24 18:03:33 +02:00
Kirill Bulatov
0c16ad8591 Tone down broker subscription errors 2023-01-24 17:23:33 +02:00
Christian Schwarz
0b673c12d7 timeline: don't transition Active=>Active during pageserver startup
Before this patch, when `initialize_with_lock` was called via
`timeline_init_and_sync`, we would transition the timeline like so:

    load_local_timeline/load_remote_timeline:
        timeline_init_and_sync
            Timeline::new
                () => Loading
            initialize_with_lock:
                set_state(Active)
                    Loading => Active
        timeline.activate()
            Active => Active
2023-01-24 15:56:02 +01:00
Christian Schwarz
7a333cfb12 be noisy about unexpected Timeline state transitions 2023-01-24 15:56:02 +01:00
Christian Schwarz
f7ec33970a add doc comment that outlines which tokio tasks walreceiver creates 2023-01-24 15:23:48 +01:00
Joonas Koivunen
98d0a0d242 fix(http): omit needless string allocs (#3421)
Drive-by fix noticed while #3419.
2023-01-24 14:53:39 +02:00
Joonas Koivunen
f74080cbad feat(http): support ?inputs_only=true for tenant_size (#3419)
this makes debugging problematic cases in the future easier, as we can
just request the model inputs, use them locally to reproduce the issue
with the model.
2023-01-24 13:57:13 +02:00
Christian Schwarz
55c184fcd7 fix some anyhow::Context::context calls that should use with_context(format!(...))
Noticed this while combing through some production logs.
2023-01-24 12:22:33 +01:00
Christian Schwarz
6b6570b580 remove TimelineState::Suspended, introduce TimelineState::Loading
The TimelineState::Suspsended was dubious to begin with. I suppose
that the intention was that timelines could transition back and
forth between Active and Suspended states.
But practically, the code before this patch never did that.
The transitions were:

    () ==Timeline::new==> Suspended ==*==> {Active,Broken,Stopping}

One exception: Tenant::set_stopping() could transition timelines like
so:

    !Broken ==Tenant::set_stopping()==> Suspended

But Tenant itself cannot transition from stopping state to any other
state.

Thus, this patch removes TimelineState::Suspended and introduces a new
state Loading. The aforementioned transitions change as follows:

    - () ==Timeline::new==> Suspended ==*==> {Active,Broken,Stopping}
    + () ==Timeline::new==> Loading   ==*==> {Active,Broken,Stopping}

    - !Broken ==Tenant::set_stopping()==> Suspended
    + !Broken ==Tenant::set_stopping()==> Stopping

Walreceiver's connection manager loop watches TimelineState to decide
whether it should retry connecting, or exit.
This patch changes the loop to exit when it observes the transition
into Stopping state.

Walreceiver isn't supposed to be started until the timeline transitions
into Active state. So, this patch also adds some warn!() messages
in case this happens anyways.
2023-01-23 17:22:49 +01:00
Joonas Koivunen
7704caa3ac More tenant size fixes (#3410)
Small changes, but hopefully this will help with the panic detected in
staging, for which we cannot get the debugging information right now
(end-of-branch before branch-point).
2023-01-23 17:12:51 +02:00
Konstantin Knizhnik
5c865f46ba Fix slru_segment_key_range function: segno was assigned to incorrect Key field (#3354) 2023-01-23 10:51:09 +02:00
bojanserafimov
a3d7ad2d52 Implement layer map using immutable BST (#2998) 2023-01-20 16:10:12 -05:00
Anastasia Lubennikova
36f048d6b0 Fix tenant size orphans (#3377)
Before only the timelines which have passed the `gc_horizon` were
processed which failed with orphans at the tree_sort phase. Example
input in added `test_branched_empty_timeline_size` test case.

The PR changes iteration to happen through all timelines, and in
addition to that, any learned branch points will be calculated as they
would had been in the original implementation if the ancestor branch had
been over the `gc_horizon`.

This also changes how tenants where all timelines are below `gc_horizon`
are handled. Previously tenant_size 0 was returned, but now they will
have approximately `initdb_lsn` worth of tenant_size.

The PR also adds several new tenant size tests that describe various corner
cases of branching structure and `gc_horizon` setting.
They are currently disabled to not consume time during CI.

Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
2023-01-20 20:21:36 +02:00
Joonas Koivunen
58fb6fe861 fix: dont stop pageserver if we fail to calculate synthetic size 2023-01-20 19:55:19 +02:00
Christian Schwarz
8ba1699937 Revert "Use actual temporary dir for pageserver unit tests"
This reverts commit 826e89b9ce.

The problem with that commit was that it deletes the TempDir while
there are still EphemeralFile instances open.

At first I thought this could be fixed by simply adding

  Handle::current().block_on(task_mgr::shutdown(None, Some(tenant_id), None))

to TenantHarness::drop, but it turned out to be insufficient.

So, reverting the commit until we find a proper solution.

refs https://github.com/neondatabase/neon/issues/3385
2023-01-19 20:16:56 +01:00
bojanserafimov
a9bd05760f Improve layer map docstrings (#3382) 2023-01-19 10:29:15 -05:00
Kirill Bulatov
90f66aa51b Enable logs in unit tests 2023-01-18 17:43:27 +02:00
Kirill Bulatov
826e89b9ce Use actual temporary dir for pageserver unit tests 2023-01-18 17:43:27 +02:00
Kirill Bulatov
c6b56d2967 Add more io::Error context when fail to operate on a path (#3254)
I have a test failure that shows 

```
Caused by:
    0: Failed to reconstruct a page image:
    1: Directory not empty (os error 39)
```

but does not really show where exactly that happens.

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-3227/release/3823785365/index.html#categories/c0057473fc9ec8fb70876fd29a171ce8/7088dab272f2c7b7/?attachment=60fe6ed2add4d82d

The PR aims to add more context in debugging that issue.
2023-01-17 22:07:38 +02:00
Kirill Bulatov
1ebd145c29 Actualize the comment (#3362)
Follow-up of
https://github.com/neondatabase/neon/pull/3326#issuecomment-1384265759
2023-01-17 13:30:42 +02:00
Christian Schwarz
48dd9565ac TaskHandle: tone down sender is dropped while join handle is still alive
Rationale: see comments added as part of this commit.

fixes https://github.com/neondatabase/neon/issues/3339
2023-01-17 09:42:22 +01:00
Christian Schwarz
58c8c1076c download_all_remote_layers API: require client to specify max_concurrent_downloads
Before this patch, we would start all layer downloads simultaneously.

There is at most one download_all_remote_layers task per timeline.
Hence, the specified limit is per timeline.

There is still no global concurrency limit for layer downloads.
We'll have to revisit that at some point and also prioritize on-demand
initiated downloads over download_all_remote_layers downloads.
But that's for another day.
2023-01-16 19:29:06 +01:00
Joonas Koivunen
a8a9bee602 walredo: simple tests and bench updates (#3045)
Separated from #2875.

The microbenchmark has been validated to show similar difference as to
larger scale OLTP benchmark.
2023-01-16 18:24:45 +02:00
Anastasia Lubennikova
2cbe84b78f Proxy metrics (#3290)
Implement proxy metrics collection.
Only collect metric for outbound traffic.

Add proxy CLI parameters:
- metric-collection-endpoint
- metric-collection-interval.

Add test_proxy_metric_collection test.

Move shared consumption metrics code to libs/consumption_metrics.
Refactor the code.
2023-01-16 15:17:28 +00:00
Kirill Bulatov
bce4233d3a Rework Cargo.toml dependencies (#3322)
* Use workspace variables from cargo, coming with rustc
[1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22)

See
https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table
and
https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table
sections.

Now, all dependencies in all non-root `Cargo.toml` files are defined as 
```
clap.workspace = true
```

sometimes, when extra features are needed, as 
```
bytes = {workspace = true, features = ['serde'] }
```

With the actual declarations (with shared features and version
numbers/file paths/etc.) in the root Cargo.toml.
Features are additive:

https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#inheriting-a-dependency-from-a-workspace

* Uses the mechanism above to set common, 2021, edition and license across the
workspace

* Mechanically bumps a few dependencies

* Updates hakari format, as it suggested:
```
work/neon/neon kb/cargo-templated ❯ cargo hakari generate
info: no changes detected
info: new hakari format version available: 3 (current: 2)
(add or update `dep-format-version = "3"` in hakari.toml, then run `cargo hakari generate && cargo hakari manage-deps`)
```
2023-01-13 18:13:34 +02:00
Kirill Bulatov
99808558de Avoid duplicate timeline insert (#3326)
`initialize_with_lock` inserts `Arc<Timeline>` before returning it:
c1731bc4f0/pageserver/src/tenant.rs (L222)

but `setup_timeline` function did another insert, which got removed in this PR:
c1731bc4f0/pageserver/src/tenant.rs (L486)


On top, a better comment and function renames are added.
2023-01-13 12:05:54 +00:00
Anastasia Lubennikova
c6d383e239 code cleanup 2023-01-13 11:51:28 +02:00
Anastasia Lubennikova
5e3e0fbf6f remove unneeded Cargo.lock changes 2023-01-13 11:51:28 +02:00
Anastasia Lubennikova
26f39c03f2 review code cleanup:
- handle errors in calculate_synthetic_size_worker. Don't exit the bgworker if one tenant failed.

- add cached_synthetic_tenant_size to cache values calculated by the bgworker

- code cleanup: remove unneeded info! messages, clean comments

- handle collect_metrics_task() error. Don't exit collect_metrics worker if one task failed.

 - add unit test to cover case when we have multiple branches at the same lsn
2023-01-13 11:51:28 +02:00