Commit Graph

117 Commits

Author SHA1 Message Date
Heikki Linnakangas
b513619503 Remove obsolete 'awaits_download' field.
It used to be a separate piece of state, but after 9a6c0be823 it's just
an alias for the Tenant being in Attaching state. It was only used in
one assertion in a test, but that check doesn't make sense anymore, so
just remove it.

Fixes https://github.com/neondatabase/neon/issues/2930
2022-12-07 13:13:54 +02:00
Kirill Bulatov
d6bfe955c6 Add commands to unload and load the tenant in memory (#2977)
Closes https://github.com/neondatabase/neon/issues/2537

Follow-up of https://github.com/neondatabase/neon/pull/2950
With the new model that prevents attaching without the remote storage,
it has started to be even more odd to add attach-with-files
functionality (in addition to the issues raised previously).

Adds two separate commands:
* `POST {tenant_id}/ignore` that places a mark file to skip such tenant
on every start and removes it from memory
* `POST {tenant_id}/schedule_load` that tries to load a tenant from
local FS similar to what pageserver does now on startup, but without
directory removals
2022-12-06 15:30:02 +00:00
Kirill Bulatov
38af453553 Use async RwLock around tenants (#3009)
A step towards more async code in our repo, to help avoid most of the
odd blocking calls, that might deadlock, as mentioned in
https://github.com/neondatabase/neon/issues/2975
2022-12-05 22:48:45 +02:00
Heikki Linnakangas
9a6c0be823 storage_sync2
The code in this change was extracted from PR #2595, i.e., Heikki’s draft
PR for on-demand download.

High-Level Changes

- storage_sync module rewrite
- Changes to Tenant Loading
- Changes to Timeline States
- Crash-safe & Resumable Tenant Attach

There are several follow-up work items planned.
Refer to the Epic issue on GitHub:
https://github.com/neondatabase/neon/issues/2029

Metadata:

closes https://github.com/neondatabase/neon/pull/2785

unsquashed history of this patch: archive/pr-2785-storage-sync2/pre-squash

Co-authored-by: Dmitry Rodionov <dmitry@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>

===============================================================================

storage_sync module rewrite
===========================

The storage_sync code is rewritten. New module name is storage_sync2, mostly to
make a more reasonable git diff.

The updated block comment in storage_sync2.rs describes the changes quite well,
so, we will not reproduce that comment here. TL;DR:
- Global sync queue and RemoteIndex are replaced with per-timeline
  `RemoteTimelineClient` structure that contains a queue for UploadOperations
  to ensure proper ordering and necessary metadata.
- Before deleting local layer files, wait for ongoing UploadOps to finish
  (wait_completion()).
- Download operations are not queued and executed immediately.

Changes to Tenant Loading
=========================

Initial sync part was rewritten as well and represents the other major change
that serves as a foundation for on-demand downloads. Routines for attaching and
loading shifted directly to Tenant struct and now are asynchronous and spawned
into the background.

Since this patch doesn’t introduce on-demand download of layers we fully
synchronize with the remote during pageserver startup. See details in
`Timeline::reconcile_with_remote` and `Timeline::download_missing`.

Changes to Tenant States
========================

The “Active” state has lost its “background_jobs_running: bool” member. That
variable indicated whether the GC & Compaction background loops are spawned or
not. With this patch, they are now always spawned. Unit tests (#[test]) use the
TenantConf::{gc_period,compaction_period} to disable their effect (15db566).

This patch introduces a new tenant state, “Attaching”. A tenant that is being
attached starts in this state and transitions to “Active” once it finishes
download.

The `GET /tenant` endpoints returns `TenantInfo::has_in_progress_downloads`. We
derive the value for that field from the tenant state now, to remain
backwards-compatible with cloud.git. We will remove that field when we switch
to on-demand downloads.

Changes to Timeline States
==========================

The TimelineInfo::awaits_download field is now equivalent to the tenant being
in Attaching state.  Previously, download progress was tracked per timeline.
With this change, it’s only tracked per tenant. When on-demand downloads
arrive, the field will be completely obsolete.  Deprecation is tracked in
isuse #2930.

Crash-safe & Resumable Tenant Attach
====================================

Previously, the attach operation was not persistent. I.e., when tenant attach
was interrupted by a crash, the pageserver would not continue attaching after
pageserver restart. In fact, the half-finished tenant directory on disk would
simply be skipped by tenant_mgr because it lacked the metadata file (it’s
written last). This patch introduces an “attaching” marker file inside that is
present inside the tenant directory while the tenant is attaching. During
pageserver startup, tenant_mgr will resume attach if that file is present. If
not, it assumes that the local tenant state is consistent and tries to load the
tenant. If that fails, the tenant transitions into Broken state.
2022-11-29 18:55:20 +01:00
Heikki Linnakangas
1f1324ebed Require tenant to be active when calculating tenant size.
It's not clear if the calculation would work or make sense, if the
tenant is only partially loaded. Let's play it safe, and require it to
be Active.
2022-11-29 14:10:45 +02:00
Joonas Koivunen
f277140234 Small fixes (#2949)
Nothing interesting in these changes. Passing through the
RUST_BACKTRACE=full will hopefully save someone else panick reproduction
time.

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
2022-11-29 10:29:25 +02:00
Egor Suvorov
1ca76776d0 pageserver: require management permissions on HTTP /status 2022-11-25 04:17:42 +03:00
Egor Suvorov
2ce5d8137d Separate permission checks for Pageserver and Safekeeper
There will be different scopes for those two, so authorization code should be different.

The `check_permission` function is now not in the shared library. Its implementation
is very similar to the one which will be added for Safekeeper. In fact, we may reuse
the same existing root-like 'PageServerApi' scope, but I would prefer to have separate
root-like scopes for services.

Also, generate_management_token in tests is generate_pageserver_token now.
2022-11-25 04:17:42 +03:00
Egor Suvorov
b6989e8928 pageserver: make wal_source_connstring: String a 'wal_source_connconf: PgConnectionConfig` 2022-11-24 14:02:23 +03:00
Alexey Kondratov
4bf3087aed [pageserver] list latest_gc_cutoff_lsn in the OpenAPI spec (#2894)
It seems that it's present in the API response for quite a while. It's
just not listed in the spec, fix it.
2022-11-22 21:10:49 +01:00
bojanserafimov
7fd88fab59 Trace read requests (#2762) 2022-11-10 16:43:04 -05:00
Heikki Linnakangas
7b7f84f1b4 Refactor layer flushing task
Extracted from https://github.com/neondatabase/neon/pull/2595
2022-11-06 13:42:18 +02:00
Joonas Koivunen
cf68963b18 Add initial tenant sizing model and a http route to query it (#2714)
Tenant size information is gathered by using existing parts of
`Tenant::gc_iteration` which are now separated as
`Tenant::refresh_gc_info`. `Tenant::refresh_gc_info` collects branch
points, and invokes `Timeline::update_gc_info`; nothing was supposed to
be changed there. The gathered branch points (through Timeline's
`GcInfo::retain_lsns`), `GcInfo::horizon_cutoff`, and
`GcInfo::pitr_cutoff` are used to build up a Vec of updates fed into the
`libs/tenant_size_model` to calculate the history size.

The gathered information is now exposed using `GET
/v1/tenant/{tenant_id}/size`, which which will respond with the actual
calculated size. Initially the idea was to have this delivered as tenant
background task and exported via metric, but it might be too
computationally expensive to run it periodically as we don't yet know if
the returned values are any good.

Adds one new metric:
- pageserver_storage_operations_seconds with label `logical_size`
    - separating from original `init_logical_size`

Adds a pageserver wide configuration variable:
- `concurrent_tenant_size_logical_size_queries` with default 1

This leaves a lot of TODO's, tracked on issue #2748.
2022-11-03 12:39:19 +00:00
Christian Schwarz
b154992510 timeline_list_handler: avoid spawn_blocking
As per https://github.com/neondatabase/neon/issues/2731#issuecomment-1299335813

refs https://github.com/neondatabase/neon/issues/2731
2022-11-02 16:22:58 +01:00
Christian Schwarz
590f894db8 tenant_status: remove unnecessary spawn_blocking
The spawn_blocking is pointless in this cases: get_tenant is not
expected to block for any meaningful amount of time. There are
get_tenant calls in most other functions in the file too, and they don't
bother with spawn_blocking. Let's remove the spawn_blocking from
tenant_status, too, to be consistent.

fixes https://github.com/neondatabase/neon/issues/2731
2022-11-02 16:22:58 +01:00
Kirill Bulatov
5928cb33c5 Introduce timeline state (#2651)
Similar to https://github.com/neondatabase/neon/pull/2395, introduces a state field in Timeline, that's possible to subscribe to.

Adjusts

* walreceiver to not to have any connections if timeline is not Active
* remote storage sync to not to schedule uploads if timeline is Broken
* not to create timelines if a tenant/timeline is broken
* automatically switches timelines' states based on tenant state

Does not adjust timeline's gc, checkpointing and layer flush behaviour much, since it's not safe to cancel these processes abruptly and there's task_mgr::shutdown_tasks that does similar thing.
2022-10-21 15:51:48 +00:00
bojanserafimov
7734929a82 Remove stale todos (#2630) 2022-10-19 22:59:22 +00:00
Joonas Koivunen
c709354579 Add layer sizes to index_part.json (#2582)
This is the first step in verifying layer files. Next up on the road is
hashing the files and verifying the hashes.

The metadata additions do not require any migration. The idea is that
the change is backward and forward-compatible with regard to
`index_part.json` due to the softness of JSON schema and the
deserialization options in use.

New types added:

- LayerFileMetadata for tracking the file metadata
    - starting with only the file size
    - in future hopefully a sha256 as well
- IndexLayerMetadata, the serialized counterpart of LayerFileMetadata

LayerFileMetadata needing to have all fields Option is a problem but
that is not possible to handle without conflicting a lot more with other
ongoing work.

Co-authored-by: Kirill Bulatov <kirill@neon.tech>
2022-10-17 12:21:04 +03:00
Heikki Linnakangas
9c24de254f Add description and license fields to OpenAPI spec.
These were added earlier to the control plane's copy of this file.
This is the master version of this file, so let's keep it in sync.
2022-10-14 18:37:58 +03:00
Heikki Linnakangas
538876650a Merge 'local' and 'remote' parts of TimelineInfo into one struct.
The 'local' part was always filled in, so that was easy to merge into
into the TimelineInfo itself. 'remote' only contained two fields,
'remote_consistent_lsn' and 'awaits_download'. I made
'remote_consistent_lsn' an optional field, and 'awaits_download' is now
false if the timeline is not present remotely.

However, I kept stub versions of the 'local' and 'remote' structs for
backwards-compatibility, with a few fields that are actively used by
the control plane. They just duplicate the fields from TimelineInfo
now. They can be removed later, once the control plane has been
updated to use the new fields.
2022-10-14 18:37:14 +03:00
Heikki Linnakangas
500239176c Make TimelineInfo.local field mandatory.
It was only None when you queried the status of a timeline with
'timeline_detail' mgmt API call, and it was still being downloaded. You
can check for that status with the 'tenant_status' API call instead,
checking for has_in_progress_downloads field.

Anothere case was if an error happened while trying to get the current
logical size, in a 'timeline_detail' request. It might make sense to
tolerate such errors, and leave the fields we cannot fill in as empty,
None, 0 or similar, but it doesn't make sense to me to leave the whole
'local' struct empty in tht case.
2022-10-14 18:37:14 +03:00
Heikki Linnakangas
47366522a8 Make the return type of 'list_timelines' simpler.
It's enough to return just the Timeline references. You can get the
timeline's ID easily from Timeline.
2022-10-11 17:47:41 +03:00
Kirill Bulatov
01d2c52c82 Tidy up feature reporting 2022-10-09 08:21:11 +03:00
Andrés
47bae68a2e Make get_lsn_by_timestamp available in mgmt API (#2536) (#2560)
Co-authored-by: andres <andres.rodriguez@outlook.es>
2022-10-06 12:42:50 +03:00
sharnoff
580584c8fc Remove control_plane deps on pageserver/safekeeper (#2513)
Creates new `pageserver_api` and `safekeeper_api` crates to serve as the
shared dependencies. Should reduce both recompile times and cold compile
times.

Decreases the size of the optimized `neon_local` binary: 380M -> 179M.
No significant changes for anything else (mostly as expected).
2022-10-04 11:14:45 -07:00
Kirill Bulatov
d823e84ed5 Allow attaching tenants with zero timelines 2022-10-04 18:13:51 +03:00
Anastasia Lubennikova
ed6b75e301 show pg_version in create_timeline info span 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
862902f9e5 Update readme and openapi spec 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
5dddeb8d88 Use non-versioned pg_distrib dir 2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
03c606f7c5 Pass pg_version parameter to timeline import command.
Add pg_version field to LocalTimelineInfo.
Use pg_version in the export_import_between_pageservers script
2022-09-22 14:15:13 +03:00
Anastasia Lubennikova
86bf491981 Support pg 15
- Split postgres_ffi into two version specific files.
- Preserve pg_version in timeline metadata.
- Use pg_version in safekeeper code. Check for postgres major version mismatch.
- Clean up the code to use DEFAULT_PG_VERSION constant everywhere, instead of hardcoding.

-  Parameterize python tests: use DEFAULT_PG_VERSION env and pg_version fixture.
   To run tests using a specific PostgreSQL version, pass the DEFAULT_PG_VERSION environment variable:
   'DEFAULT_PG_VERSION='15' ./scripts/pytest test_runner/regress'
 Currently don't all tests pass, because rust code relies on the default version of PostgreSQL in a few places.
2022-09-22 14:15:13 +03:00
sharnoff
6f949e1556 Improve pageserver/safekeepeer HTTP API errors (#2461)
Part of the general work on improving pageserver logs.

Brief summary of changes:

* Remove `ApiError::from_err`
* Remove `impl From<anyhow::Error> for ApiError`
* Convert `ApiError::{BadRequest, NotFound}` to use `anyhow::Error`
  * Note: `NotFound` has more verbose formatting because it's more
    likely to have useful information for the receiving "user"
* Explicitly convert from `tokio::task::JoinError`s into
  `InternalServerError`s where appropriate

Also note: many of the places where errors were implicitly converted to
500s have now been updated to return a more appropriate error. Some
places where it's not yet possible to distinguish the error types have
been left as 500s.
2022-09-20 17:02:10 -07:00
Kirill Bulatov
6fc719db13 Merge timelines.rs with tenant.rs 2022-09-20 23:43:52 +03:00
sharnoff
4a3b3ff11d Move testing pageserver libpq cmds to HTTP api (#2429)
Closes #2422.

The APIs have been feature gated with the `testing_api!` macro so that
they return 400s when support hasn't been compiled in.
2022-09-20 11:28:12 -07:00
Kirill Bulatov
b8eb908a3d Rename old project name references 2022-09-14 08:14:05 +03:00
Dmitry Rodionov
1d53173e62 update openapi spec (tenant state has changed) 2022-09-13 21:53:59 +03:00
Kirill Bulatov
1a8c8b04d7 Merge Repository and Tenant entities, rework tenant background jobs 2022-09-13 15:39:39 +03:00
Heikki Linnakangas
40c845e57d Switch to async for all concurrency in the pageserver.
Instead of spawning helper threads, we now use Tokio tasks. There
are multiple Tokio runtimes, for different kinds of tasks. One for
serving libpq client connections, another for background operations
like GC and compaction, and so on. That's not strictly required, we
could use just one runtime, but with this you can still get an
overview of what's happening with "top -H".

There's one subtle behavior in how TenantState is updated. Before this
patch, if you deleted all timelines from a tenant, its GC and
compaction loops were stopped, and the tenant went back to Idle
state. We no longer do that. The empty tenant stays Active. The
changes to test_tenant_tasks.py are related to that.

There's still plenty of synchronous code and blocking. For example, we
still use blocking std::io functions for all file I/O, and the
communication with WAL redo processes is still uses low-level unix
poll(). We might want to rewrite those later, but this will do for
now. The model is that local file I/O is considered to be fast enough
that blocking - and preventing other tasks running in the same thread -
is acceptable.
2022-09-12 14:21:00 +03:00
Kirill Bulatov
c9e7c2f014 Ensure all temporary and empty directories and files are cleansed on pageserver startup 2022-09-09 16:36:45 +03:00
Heikki Linnakangas
35b4816f09 Turn GenericRemoteStorage into just a newtype around 'Arc<dyn RemoteStorage>'
We had a pattern like this:

    match remote_storage {
        GenericRemoteStorage::Local(storage) => {
            let source = storage.remote_object_id(&file_path)?;
            ...
            storage
                .function(&source, ...)
                .await
       },
       GenericRemoteStorage::S3(storage) => {
	    ... exact same code as for the Local case ...
       },

This removes the code duplication, by allowing you to call the functions
directly on GenericRemoteStorage.

Also change RemoveObjectId to be just a type alias for String. Now that
the callers of GenericRemoteStorage functions don't know whether they're
dealing with the LocalFs or S3 implementation, RemoveObjectId must be the
same type for both.
2022-09-08 19:59:42 +03:00
Heikki Linnakangas
65b592d4bd Remove deprecated management API for timeline detach.
It is no longer used anywhere.
2022-09-06 18:54:04 +03:00
Heikki Linnakangas
7a3e8bb7fb Make tracing span names consistent for mgmt API handlers. 2022-09-05 11:02:13 +03:00
Kirill Bulatov
2db20e5587 Remove [Un]Loaded timeline code (#2359) 2022-09-02 14:31:28 +03:00
Kirill Bulatov
f78a542cba Calculate timeline initial logical size in the background
Start the calculation on the first size request, return
partially calculated size during calculation, retry if failed.

Remove "fast" size init through the ancestor: the current approach is
fast enough for now and there are better ways to optimize the
calculation via incremental ancestor size computation
2022-09-02 14:31:28 +03:00
Kirill Bulatov
a4803233bb Remove RemoteObjectName and many remote storage generics in pageserver (#2360) 2022-08-30 22:19:52 +03:00
Heikki Linnakangas
1a666a01d6 Improve comments a little. 2022-08-23 12:58:54 +03:00
Heikki Linnakangas
d110d2c2fd Reorder permission checks in HTTP API call handlers.
Every handler function now follows the same pattern:

1. extract parameters from the call
2. check permissions
3. execute command.

Previously, we extracted some parameters before permission check and
some after. Let's be consistent.
2022-08-23 12:14:06 +03:00
Heikki Linnakangas
aaa60c92ca Use u64/i64 for logical size, comment on why to use signed i64.
usize/isize type corresponds to the CPU architecture's pointer width,
i.e. 64 bits on a 64-bit platform and 32 bits on a 32-bit platform.
The logical size of a database has nothing to do with the that, so
u64/i64 is more appropriate.

It doesn't make any difference in practice as long as you're on a
64-bit platform, and it's hard to imagine anyone wanting to run the
pageserver on a 32-bit platform, but let's be tidy.

Also add a comment on why we use signed i64 for the logical size
variable, even though size should never be negative. I'm not sure the
reasons are very good, but at least this documents them, and hints at
some possible better solutions.
2022-08-19 16:44:16 +03:00
Kirill Bulatov
c19b4a65f9 Remove Repository trait, rename LayeredRepository struct into Repository 2022-08-19 16:40:37 +03:00
Kirill Bulatov
8043612334 Remove Timeline trait, rename LayeredTimeline struct into Timeline 2022-08-19 16:40:37 +03:00