rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-08 22:12:56 +00:00

Author	SHA1	Message	Date
John Spray	85cd97af61	pageserver: add `InProgress` tenant map state, use a sync lock for the map (#5367 ) ## Problem Follows on from #5299 - We didn't have a generic way to protect a tenant undergoing changes: `Tenant` had states, but for our arbitrary transitions between secondary/attached, we need a general way to say "reserve this tenant ID, and don't allow any other ops on it, but don't try and report it as being in any particular state". - The TenantsMap structure was behind an async RwLock, but it was never correct to hold it across await points: that would block any other changes for all tenants. ## Summary of changes - Add the `TenantSlot::InProgress` value. This means: - Incoming administrative operations on the tenant should retry later - Anything trying to read the live state of the tenant (e.g. a page service reader) should retry later or block. - Store TenantsMap in `std::sync::RwLock` - Provide an extended `get_active_tenant_with_timeout` for page_service to use, which will wait on InProgress slots as well as non-active tenants. Closes: https://github.com/neondatabase/neon/issues/5378 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-06 14:03:22 +00:00
John Spray	6defa2b5d5	pageserver: add `Gate` as a partner to CancellationToken for safe shutdown of `Tenant` & `Timeline` (#5711 ) ## Problem When shutting down a Tenant, it isn't just important to cause any background tasks to stop. It's also important to wait until they have stopped before declaring shutdown complete, in cases where we may re-use the tenant's local storage for something else, such as running in secondary mode, or creating a new tenant with the same ID. ## Summary of changes A `Gate` class is added, inspired by [seastar::gate](https://docs.seastar.io/master/classseastar_1_1gate.html). For types that have an important lifetime that corresponds to some physical resource, use of a Gate as well as a CancellationToken provides a robust pattern for async requests & shutdown: - Requests must always acquire the gate as long as they are using the object - Shutdown must set the cancellation token, and then `close()` the gate to wait for requests in progress before returning. This is not for memory safety: it's for expressing the difference between "Arc<Tenant> exists", and "This tenant's files on disk are eligible to be read/written". - Both Tenant and Timeline get a Gate & CancellationToken. - The Timeline gate is held during eviction of layers, and during page_service requests. - Existing cancellation support in page_service is refined to use the timeline-scope cancellation token instead of a process-scope cancellation token. This replaces the use of `task_mgr::associate_with`: tasks no longer change their tenant/timelineidentity after being spawned. The Tenant's Gate is not yet used, but will be important for Tenant-scoped operations in secondary mode, where we must ensure that our secondary-mode downloads for a tenant are gated wrt the activity of an attached Tenant. This is part of a broader move away from using the global-state driven `task_mgr` shutdown tokens: - less global state where we rely on implicit knowledge of what task a given function is running in, and more explicit references to the cancellation token that a particular function/type will respect, making shutdown easier to reason about. - eventually avoid the big global TASKS mutex. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 12:39:20 +00:00
duguorong009	b3d3a2587d	feat: improve the serde impl for several types(`Lsn`, `TenantId`, `TimelineId` ...) (#5335 ) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 11:40:03 +02:00
Christian Schwarz	7ebe9ca1ac	pageserver: `/attach`: clarify semantics of 409 (#5698 ) context: https://app.incident.io/neondb/incidents/75 specifically: https://neondb.slack.com/archives/C0634NXQ6E7/p1698422852902959?thread_ts=1698419362.155059&cid=C0634NXQ6E7	2023-10-31 09:32:58 +01:00
John Spray	c13e932c3b	pageserver: add generation fields in openapi spec (#5690 ) These optional fields have existed for as while, but weren't mentioned in `openapi_spec.yaml` yet.	2023-10-27 14:20:04 +01:00
Joonas Koivunen	c508d3b5fa	reimpl Layer, remove remote layer, trait Layer, trait PersistentLayer (#4938 ) Implement a new `struct Layer` abstraction which manages downloadness internally, requiring no LayerMap locking or rewriting to download or evict providing a property "you have a layer, you can read it". The new `struct Layer` provides ability to keep the file resident via a RAII structure for new layers which still need to be uploaded. Previous solution solved this `RemoteTimelineClient::wait_completion` which lead to bugs like #5639. Evicting or the final local deletion after garbage collection is done using Arc'd value `Drop`. With a single `struct Layer` the closed open ended `trait Layer`, `trait PersistentLayer` and `struct RemoteLayer` are removed following noting that compaction could be simplified by simply not using any of the traits in between: #4839. The new `struct Layer` is a preliminary to remove `Timeline::layer_removal_cs` documented in #4745. Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058, #5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off: #5057, #5134.	2023-10-26 12:36:38 +03:00
Arpad Müller	a673e4e7a9	Optionally return json from get_lsn_by_timestamp (#5608 ) This does two things: first a minor refactor to not use HTTP/1.x style header names and also to not panic if some certain requests had no "Accept" header. As a second thing, it addresses the third bullet point from #3689: > Change `get_lsn_by_timestamp` API method to return LSN even if we only found commit before the specified timestamp. This is done by adding a version parameter to the `get_lsn_by_timestamp` API call and making its behaviour depend on the version number. Part of #3414 (but doesn't address it in its entirety). --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-25 18:46:34 +02:00
Arpad Müller	f842b22b90	Add endpoint for querying time info for lsn (#5497 ) ## Problem See #5468. ## Summary of changes Add a new `get_timestamp_of_lsn` endpoint, returning the timestamp associated with the given lsn. Fixes #5468. --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-19 04:50:49 +02:00
John Spray	b06dffe3dc	pageserver: fixes to `/location_config` API (#5548 ) ## Problem I found some issues with the `/location_config` API when writing new tests. ## Summary of changes - Calling the API with the "Detached" state is now idempotent. - `Tenant::spawn_attach` now takes a boolean to indicate whether to expect a marker file. Marker files are used in the old attach path, but not in the new location conf API. They aren't needed because in the New World, the choice of whether to attach via remote state ("attach") or to trust local state ("load") will be revised to cope with the transitions between secondary & attached (see https://github.com/neondatabase/neon/issues/5550). It is okay to merge this change ahead of that ticket, because the API is not used in the wild yet. - Instead of using `schedule_local_tenant_processing`, the location conf API handler does its own directory creation and calls `spawn_attach` directly. - A new `unsafe_create_dir_all` is added. This differs from crashsafe::create_dir_all in two ways: - It is intentionally not crashsafe, because in the location conf API we are no longer using directory or config existence as the signal for any important business logic. - It is async and uses `tokio::fs`.	2023-10-17 10:21:31 +02:00
Christian Schwarz	dd6990567f	walredo: apply_batch_postgres: get a backtrace whenever it encounters an error (#5541 ) For 2 weeks we've seen rare, spurious, not-reproducible page reconstruction failures with PG16 in prod. One of the commits we deployed this week was Commit commit `fc467941f9` Author: Joonas Koivunen <joonas@neon.tech> Date: Wed Oct 4 16:19:19 2023 +0300 walredo: log retryed error (#546) With the logs from that commit, we learned that some read() or write() system call that walredo does fails with `EAGAIN`, aka `Resource temporarily unavailable (os error 11)`. But we have no idea where exactly in the code we get back that error. So, use anyhow instead of fake std::io::Error's as an easy way to get a backtrace when the error happens, and change the logging to print that backtrace (i.e., use `{:?}` instead of `utils::error::report_compact_sources(e)`). The `WalRedoError` type had to go because we add additional `.context()` further up the call chain before we `{:?}`-print it. That additional `.context()` further up doesn't see that there's already an anyhow::Error inside the `WalRedoError::ApplyWalRecords` variant, and hence captures another backtrace and prints that one on `{:?}`-print instead of the original one inside `WalRedoError::ApplyWalRecords`. If we ever switch back to `report_compact_sources`, we should make sure we have some other way to uniquely identify the places where we return an error in the error message.	2023-10-13 14:08:23 +00:00
John Spray	39e144696f	pageserver: clean up `mgr.rs` types that needn't be public (#5529 ) ## Problem These types/functions are public and it prevents clippy from catching unused things. ## Summary of changes Move to `pub(crate)` and remove the error enum that becomes clearly unused as a result.	2023-10-11 11:50:16 +00:00
Arseny Sher	685add2009	Enable /metrics without auth. To enable auth faster.	2023-10-10 20:06:25 +03:00
John Spray	acefee9a32	pageserver: flush deletion queue on detach (#5452 ) ## Problem If a caller detaches a tenant and then attaches it again, pending deletions from the old attachment might not have happened yet. This is not a correctness problem, but it causes: - Risk of leaking some objects in S3 - Some warnings from the deletion queue when pending LSN updates and pending deletions don't pass validation. ## Summary of changes - Deletion queue now uses UnboundedChannel so that the push interfaces don't have to be async. - This was pulled out of https://github.com/neondatabase/neon/pull/5397, where it is also useful to be able to drive the queue from non-async contexts. - Why is it okay for this to be unbounded? The only way the unbounded-ness of the channel can become a problem is if writing out deletion lists can't keep up, but if the system were that overloaded then the code generating deletions (GC, compaction) would also be impacted. - DeletionQueueClient gets a new `flush_advisory` function, which is like flush_execute, but doesn't wait for completion: this is appropriate for use in contexts where we would like to encourage the deletion queue to flush, but don't need to block on it. - This function is also expected to be useful in next steps for seamless migration, where the option to flush to S3 while transitioning into AttachedStale will also include flushing deletion queue, but we wouldn't want to block on that flush. - The tenant_detach code in mgr.rs invokes flush_advisory after stopping the `Tenant` object. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-10-10 10:46:24 +01:00
Joonas Koivunen	4772cd6c93	fix: deny branching, starting compute from not yet uploaded timelines (#5484 ) Part of #5172. First commits show that we used to allow starting up a compute or creating a branch off a not yet uploaded timeline. This PR moves activation of a timeline to happen after initial layer file(s) (if any) and `index_part.json` have been uploaded. Simply moving activation to be after downloads have finished works because we now spawn a task per http request handler. Current behaviour of uploading on the timelines on next startup is kept, to be removed later as part of #5172. Adds: - `NeonCli.map_branch` and corresponding `neon_local` implementation: allow creating computes for timelines managed via pageserver http client/api - possibly duplicate tests (I did not want to search for, will cleanup in a follow-up if these duplicated) Changes: - make `wait_until_tenant_state` return immediatedly on `Broken` and not wait more	2023-10-09 17:03:38 +03:00
Joonas Koivunen	45f98dd018	debug_tool: get page at lsn and keyspace via http api (#5057 ) If there are any layermap or layer file related problems, having a reproducable `get_page@lsn` easily usable for fast debugging iteration is helpful. Split off from #4938. Later evolved to add http apis for: - `get_page@lsn` at `/v1/tenant/:tenant_id/timeline/:timeline_id/get?key=<hex>&lsn=<lsn string>` - collecting the keyspace at `/v1/tenant/:tenant_id/timeline/:timeline_id/keyspace?[at_lsn=<lsn string>]` - defaults to `last_record_lsn` collecting the keyspace seems to yield some ranges for which there is no key.	2023-10-06 12:17:38 +01:00
John Spray	bdfe27f3ac	swagger: add a 503 definition to each endpoint (#5476 ) ## Problem The control plane doesn't have generic handling for this. ## Summary of changes Add a 503 response to every endpoint.	2023-10-06 11:31:49 +01:00
Joonas Koivunen	a15f9b3baa	pageserver: Tune 503 Resource unavailable (#5489 ) 503 Resource Unavailable appears as error in logs, but is not really an error which should ever fail a test on, or even log an error in prod, [evidence]. Changes: - log 503 as `info!` level - use `Cow<'static, str>` instead of `String` - add an additional `wait_until_tenant_active` in `test_actually_duplicate_l1` We ought to have in tests "wait for tenants to complete loading" but this is easier to implement for now. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5485/6423110295/index.html#/testresult/182de66203864fc0	2023-10-06 09:59:14 +01:00
John Spray	baa5fa1e77	pageserver: location configuration API, attachment modes, secondary locations (#5299 ) ## Problem These changes are part of building seamless tenant migration, as described in the RFC: - https://github.com/neondatabase/neon/pull/5029 ## Summary of changes - A new configuration type `LocationConf` supersedes `TenantConfOpt` for storing a tenant's configuration in the pageserver repo dir. It contains `TenantConfOpt`, as well as a new `mode` attribute that describes what kind of location this is (secondary, attached, attachment mode etc). It is written to a file called `config-v1` instead of `config` -- this prepares us for neatly making any other profound changes to the format of the file in future. Forward compat for existing pageserver code is achieved by writing out both old and new style files. Backward compat is achieved by checking for the old-style file if the new one isn't found. - The `TenantMap` type changes, to hold `TenantSlot` instead of just `Tenant`. The `Tenant` type continues to be used for attached tenants only. Tenants in other states (such as secondaries) are represented by a different variant of `TenantSlot`. - Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>, they now hold a reference to a AttachedTenantConf, which includes the extra information from LocationConf. This enables them to know the current attachment mode. - The attachment mode is used as an advisory input to decide whether to do compaction and GC (AttachedStale is meant to avoid doing uploads, AttachedMulti is meant to avoid doing deletions). - A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config` to drive new location configuration. This provides a superset of the functionality of attach/detach/load/ignore: - Attaching a tenant is just configuring it in an attached state - Detaching a tenant is configuring it to a detached state - Loading a tenant is just the same as attaching it - Ignoring a tenant is the same as configuring it into Secondary with warm=false (i.e. retain the files on disk but do nothing else). Caveats: - AttachedMulti tenants don't do compaction in this PR, but they do in the follow on #5397 - Concurrent updates to the `location_config` API are not handled elegantly in this PR, a better mechanism is added in the follow on https://github.com/neondatabase/neon/pull/5367 - Secondary mode is just a placeholder in this PR: the code to upload heatmaps and do downloads on secondary locations will be added in a later PR (but that shouldn't change any external interfaces) Closes: https://github.com/neondatabase/neon/issues/5379 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-10-05 09:55:10 +01:00
John Spray	ace0c775fc	pageserver: prefer 503 to 500 for transient unavailability (#5439 ) ## Problem The 500 status code should only be used for bugs or unrecoverable failures: situations we did not expect. Currently, the pageserver is misusing this response code for some situations that are totally normal, like requests targeting tenants that are in the process of activating. The 503 response is a convenient catch-all for "I can't right now, but I will be able to". ## Summary of changes - Change some transient availability error conditions to return 503 instead of 500 - Update the HTTP client configuration in integration tests to retry on 503 After these changes, things like creating a tenant and then trying to create a timeline within it will no longer require carefully checking its status first, or retrying on 500s. Instead, a client which is properly configured to retry on 503 can quietly handle such situations.	2023-10-03 17:00:55 +01:00
John Spray	ba92668e37	pageserver: deletion queue & generation validation for deletions (#5207 ) ## Problem Pageservers must not delete objects or advertise updates to remote_consistent_lsn without checking that they hold the latest generation for the tenant in question (see [the RFC]( https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md)) In this PR: - A new "deletion queue" subsystem is introduced, through which deletions flow - `RemoteTimelineClient` is modified to send deletions through the deletion queue: - For GC & compaction, deletions flow through the full generation verifying process - For timeline deletions, deletions take a fast path that bypasses generation verification - The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced with a mechanism that maintains a "projected" lsn (equivalent to the previous property), and a "visible" LSN (which is the one that we may share with safekeepers). - Until `control_plane_api` is set, all deletions skip generation validation - Tests are introduced for the new functionality in `test_pageserver_generations.py` Once this lands, if a pageserver is configured with the `control_plane_api` configuration added in https://github.com/neondatabase/neon/pull/5163, it becomes safe to attach a tenant to multiple pageservers concurrently. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-26 16:11:55 +01:00
duguorong009	706977fb77	fix(pageserver): add the walreceiver state to tenant timeline GET api endpoint (#5196 ) Add a `walreceiver_state` field to `TimelineInfo` (response of `GET /v1/tenant/:tenant_id/timeline/:timeline_id`) and while doing that, refactor out a common `Timeline::walreceiver_state(..)`. No OpenAPI changes, because this is an internal debugging addition. Fixes #3115. Co-authored-by: Joonas Koivunen <joonas.koivunen@gmail.com>	2023-09-07 14:17:18 +03:00
duguorong009	31e1568dee	refactor(pageserver): refactor pageserver router state creation (#5165 ) Fixes #3894 by: - Refactor the pageserver router creation flow - Create the router state in `pageserver/src/bin/pageserver.rs`	2023-09-06 21:31:49 +03:00
John Spray	61d661a6c3	pageserver: generation number fetch on startup and use in /attach (#5163 ) ## Problem - #5050 Closes: https://github.com/neondatabase/neon/issues/5136 ## Summary of changes - A new configuration property `control_plane_api` controls other functionality in this PR: if it is unset (default) then everything still works as it does today. - If `control_plane_api` is set, then on startup we call out to control plane `/re-attach` endpoint to discover our attachments and their generations. If an attachment is missing from the response we implicitly detach the tenant. - Calls to pageserver `/attach` API may include a `generation` parameter. If `control_plane_api` is set, then this parameter is mandatory. - RemoteTimelineClient's loading of index_part.json is generation-aware, and will try to load the index_part with the most recent generation <= its own generation. - The `neon_local` testing environment now includes a new binary `attachment_service` which implements the endpoints that the pageserver requires to operate. This is on by default if running `cargo neon` by hand. In `test_runner/` tests, it is off by default: existing tests continue to run with in the legacy generation-less mode. Caveats: - The re-attachment during startup assumes that we are only re-attaching tenants that have previously been attached, and not totally new tenants -- this relies on the control plane's attachment logic to keep retrying so that we should eventually see the attach API call. That's important because the `/re-attach` API doesn't tell us which timelines we should attach -- we still use local disk state for that. Ref: https://github.com/neondatabase/neon/issues/5173 - Testing: generations are only enabled for one integration test right now (test_pageserver_restart), as a smoke test that all the machinery basically works. Writing fuller tests that stress tenant migration will come later, and involve extending our test fixtures to deal with multiple pageservers. - I'm not in love with "attachment_service" as a name for the neon_local component, but it's not very important because we can easily rename these test bits whenever we want. - Limited observability when in re-attach on startup: when I add generation validation for deletions in a later PR, I want to wrap up the control plane API calls in some small client class that will expose metrics for things like errors calling the control plane API, which will act as a strong red signal that something is not right. Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-09-06 14:44:48 +01:00
Joonas Koivunen	368ee6c8ca	refactor: failpoint support (#5033 ) - move them to pageserver which is the only dependant on the crate fail - "move" the exported macro to the new module - support at init time the same failpoints as runtime Found while debugging test failures and making tests more repeatable by allowing "exit" from pageserver start via environment variables. Made those changes to `test_gc_cutoff.py`. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-08-19 01:01:44 +03:00
Dmitry Rodionov	30888a24d9	Avoid flakiness in test_timeline_delete_fail_before_local_delete (#5032 ) The problem was that timeline detail can return timelines in not only active state. And by the time request comes timeline deletion can still be in progress if we're unlucky (test execution happened to be slower for some reason) Reference for failed test run https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5022/5891420105/index.html#suites/f588e0a787c49e67b29490359c589fae/dab036e9bd673274 The error was `Exception: detail succeeded (it should return 404)` reported by @koivunej	2023-08-18 20:49:11 +03:00
Dmitry Rodionov	c58b22bacb	Delete tenant's data from s3 (#4855 ) ## Summary of changes For context see https://github.com/neondatabase/neon/blob/main/docs/rfcs/022-pageserver-delete-from-s3.md Create Flow to delete tenant's data from pageserver. The approach heavily mimics previously implemented timeline deletion implemented mostly in https://github.com/neondatabase/neon/pull/4384 and followed up in https://github.com/neondatabase/neon/pull/4552 For remaining deletion related issues consult with deletion project here: https://github.com/orgs/neondatabase/projects/33 resolves #4250 resolves https://github.com/neondatabase/neon/issues/3889 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-08-10 18:53:16 +03:00
Joonas Koivunen	a25504deae	Limit concurrent compactions (#4777 ) Compactions can create a lot of concurrent work right now with #4265. Limit compactions to use at most 6/8 background runtime threads.	2023-07-25 10:19:04 +03:00
arpad-m	982fce1e72	Fix rustdoc warnings and test cargo doc in CI (#4711 ) ## Problem `cargo +nightly doc` is giving a lot of warnings: broken links, naked URLs, etc. ## Summary of changes * update the `proc-macro2` dependency so that it can compile on latest Rust nightly, see https://github.com/dtolnay/proc-macro2/pull/391 and https://github.com/dtolnay/proc-macro2/issues/398 * allow the `private_intra_doc_links` lint, as linking to something that's private is always more useful than just mentioning it without a link: if the link breaks in the future, at least there is a warning due to that. Also, one might enable [`--document-private-items`](https://doc.rust-lang.org/cargo/commands/cargo-doc.html#documentation-options) in the future and make these links work in general. * fix all the remaining warnings given by `cargo +nightly doc` * make it possible to run `cargo doc` on stable Rust by updating `opentelemetry` and associated crates to version 0.19, pulling in a fix that previously broke `cargo doc` on stable: https://github.com/open-telemetry/opentelemetry-rust/pull/904 * Add `cargo doc` to CI to ensure that it won't get broken in the future. Fixes #2557 ## Future work * Potentially, it might make sense, for development purposes, to publish the generated rustdocs somewhere, like for example [how the rust compiler does it](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html). I will file an issue for discussion.	2023-07-15 05:11:25 +03:00
Joonas Koivunen	87dd37a2f2	pageserver: Align tenant, timeline id names in spans (#4687 ) Uses `(tenant\|timeline)_id`. Not a statement about endorsing this naming style but it is better to be aligned.	2023-07-12 16:58:40 +03:00
Christian Schwarz	24eaa3b7ca	timeline creation: reflect failures due to ancestor LSN issues in status code (#4600 ) Before, it was a `500` and control plane would retry, wereas it actually should have stopped retrying. (Stacked on top of https://github.com/neondatabase/neon/pull/4597 ) fixes https://github.com/neondatabase/neon/issues/4595 part of https://github.com/neondatabase/cloud/issues/5626 --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-07-03 15:21:10 +03:00
Christian Schwarz	f558f88a08	refactor: distinguished error type for timeline creation failure (#4597 ) refs https://github.com/neondatabase/neon/issues/4595	2023-06-30 14:53:21 +02:00
Dmitry Rodionov	681c6910c2	Straighten the spec for timeline delete (#4538 ) ## Problem Lets keep 500 for unusual stuff that is not considered normal. Came up during one of the discussions around console logs now seeing this 500's. ## Summary of changes - Return 409 Conflict instead of 500 - Remove 200 ok status because it is not used anymore	2023-06-27 13:56:32 +03:00
Christian Schwarz	15456625c2	don't use MGMT_REQUEST_RUNTIME for consumption metrics synthetic size worker (#4560 ) The consumption metrics synthetic size worker does logical size calculation. Logical size calculation currently does synchronous disk IO. This blocks the MGMT_REQUEST_RUNTIME's executor threads, starving other futures. While there's work on the way to move the synchronous disk IO into spawn_blocking, the quickfix here is to use the BACKGROUND_RUNTIME instead of MGMT_REQUEST_RUNTIME. Actually it's not just a quickfix. We simply shouldn't be blocking MGMT_REQUEST_RUNTIME executor threads on CPU or sync disk IO. That work isn't done yet, as many of the mgmt tasks still _do_ disk IO. But it's not as intensive as the logical size calculations that we're fixing here. While we're at it, fix disk-usage-based eviction in a similar way. It wasn't the culprit here, according to prod logs, but it can theoretically be a little CPU-intensive. More context, including graphs from Prod: https://neondb.slack.com/archives/C03F5SM1N02/p1687541681336949	2023-06-23 15:40:36 -04:00
Dmitry Rodionov	75d583c04a	Tenant::load: fix uninit timeline marker processing (#4458 ) ## Problem During timeline creation we create special mark file which presense indicates that initialization didnt complete successfully. In case of a crash restart we can remove such half-initialized timeline and following retry from control plane side should perform another attempt. So in case of a possible crash restart during initial loading we have following picture: ``` timelines \| - <timeline_id>___uninit \| - <timeline_id> \| - \| <timeline files> ``` We call `std::fs::read_dir` to walk files in `timelines` directory one by one. If we see uninit file we proceed with deletion of both, timeline directory and uninit file. If we see timeline we check if uninit file exists and do the same cleanup. But in fact its possible to get both branches to be true at the same time. Result of readdir doesnt reflect following directory state modifications. So you can still get "valid" entry on the next iteration of the loop despite the fact that it was deleted in one of the previous iterations of the loop. To see that you can apply the following patch (it disables uninit mark cleanup on successful timeline creation): ```diff diff --git a/pageserver/src/tenant.rs b/pageserver/src/tenant.rs index 4beb2664..b3cdad8f 100644 --- a/pageserver/src/tenant.rs +++ b/pageserver/src/tenant.rs @@ -224,11 +224,6 @@ impl UninitializedTimeline<'_> { ) })?; } - uninit_mark.remove_uninit_mark().with_context(\|\| { - format!( - "Failed to remove uninit mark file for timeline {tenant_id}/{timeline_id}" - ) - })?; v.insert(Arc::clone(&new_timeline)); new_timeline.maybe_spawn_flush_loop(); ``` And perform the following steps: ```bash neon_local init neon_local start neon_local tenant create neon_local stop neon_local start ``` The error is: ```log INFO load{tenant_id=X}:blocking: Found an uninit mark file .neon/tenants/X/timelines/Y.___uninit, removing the timeline and its uninit mark 2023-06-09T18:43:41.664247Z ERROR load{tenant_id=X}: load failed, setting tenant state to Broken: failed to load metadata Caused by: 0: Failed to read metadata bytes from path .neon/tenants/X/timelines/Y/metadata 1: No such file or directory (os error 2) ``` So uninit mark got deleted together with timeline directory but we still got directory entry for it and tried to load it. The bug prevented tenant from being successfully loaded. ## Summary of changes Ideally I think we shouldnt place uninit marks in the same directory as timeline directories but move them to separate directory and gather them as an input to actual listing, but that would be sort of an on-disk format change, so just check whether entries are still valid before operating on them.	2023-06-21 14:25:58 +03:00
Christian Schwarz	3693d1f431	turn Timeline::layers into tokio::sync::RwLock (#4441 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). # Full Stack Of Preliminary PRs Thanks to the countless preliminary PRs, this conversion is relatively straight-forward. 1. Clean-ups * https://github.com/neondatabase/neon/pull/4316 * https://github.com/neondatabase/neon/pull/4317 * https://github.com/neondatabase/neon/pull/4318 * https://github.com/neondatabase/neon/pull/4319 * https://github.com/neondatabase/neon/pull/4321 * Note: these were mostly to find an alternative to #4291, which I thought we'd need in my original plan where we would need to convert `Tenant::timelines` into an async locking primitive (#4333). In reviews, we walked away from that, but these cleanups were still quite useful. 2. https://github.com/neondatabase/neon/pull/4364 3. https://github.com/neondatabase/neon/pull/4472 4. https://github.com/neondatabase/neon/pull/4476 5. https://github.com/neondatabase/neon/pull/4477 6. https://github.com/neondatabase/neon/pull/4485 # Significant Changes In This PR ## `compact_level0_phase1` & `create_delta_layer` This commit partially reverts "pgserver: spawn_blocking in compaction (#4265)" `4e359db4c7`. Specifically, it reverts the `spawn_blocking`-ificiation of `compact_level0_phase1`. If we didn't revert it, we'd have to use `Timeline::layers.blocking_read()` inside `compact_level0_phase1`. That would use up a thread in the `spawn_blocking` thread pool, which is hard-capped. I considered wrapping the code that follows the second `layers.read().await` into `spawn_blocking`, but there are lifetime issues with `deltas_to_compact`. Also, this PR switches the `create_delta_layer` _function_ back to async, and uses `spawn_blocking` inside to run the code that does sync IO, while keeping the code that needs to lock `Timeline::layers` async. ## `LayerIter` and `LayerKeyIter` `Send` bounds I had to add a `Send` bound on the `dyn` type that `LayerIter` and `LayerKeyIter` wrap. Why? Because we now have the second `layers.read().await` inside `compact_level0_phase`, and these iterator instances are held across that await-point. More background: https://github.com/neondatabase/neon/pull/4462#issuecomment-1587376960 ## `DatadirModification::flush` Needed to replace the `HashMap::retain` with a hand-rolled variant because `TimelineWriter::put` is now async.	2023-06-13 18:38:41 +02:00
Dmitry Rodionov	d53f9ab3eb	delete timelines from s3 (#4384 ) Delete data from s3 when timeline deletion is requested ## Summary of changes UploadQueue is altered to support scheduling of delete operations in stopped state. This looks weird, and I'm thinking whether there are better options/refactorings for upload client to make it look better. Probably can be part of https://github.com/neondatabase/neon/issues/4378 Deletion is implemented directly in existing endpoint because changes are not that significant. If we want more safety we can separate those or create feature flag for new behavior. resolves [#4193](https://github.com/neondatabase/neon/issues/4193) --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-06-08 15:01:22 +03:00
Dmitry Rodionov	8560a98d68	fix openapi spec to pass swagger editor validation (#4445 ) There shouldnt be a dash before `type: object`. Also added description.	2023-06-08 13:25:30 +03:00
Dmitry Rodionov	1a1019990a	map TenantState::Broken to TenantAttachmentStatus::Failed (#4371 ) ## Problem Attach failures are not reported in public part of the api (in `attachment_status` field of TenantInfo). ## Summary of changes Expose TenantState::Broken as TenantAttachmentStatus::Failed In the way its written Failed status will be reported even if no attachment happened. (I e if tenant become broken on startup). This is in line with other members. I e Active will be resolved to Attached even if no actual attach took place. This can be tweaked if needed. At the current stage it would be overengineering without clear motivation resolves #4344	2023-06-07 18:25:30 +03:00
Heikki Linnakangas	9787227c35	Shield HTTP request handlers from async cancellations. (#4314 ) We now spawn a new task for every HTTP request, and wait on the JoinHandle. If Hyper drops the Future, the spawned task will keep running. This protects the rest of the pageserver code from unexpected async cancellations. This creates a CancellationToken for each request and passes it to the handler function. If the HTTP request is dropped by the client, the CancellationToken is signaled. None of the handler functions make use for the CancellationToken currently, but they now they could. The CancellationToken arguments also work like documentation. When you're looking at a function signature and you see that it takes a CancellationToken as argument, it's a nice hint that the function might run for a long time, and won't be async cancelled. The default assumption in the pageserver is now that async functions are not cancellation-safe anyway, unless explictly marked as such, but this is a nice extra reminder. Spawning a task for each request is OK from a performance point of view because spawning is very cheap in Tokio, and none of our HTTP requests are very performance critical anyway. Fixes issue #3478	2023-06-02 08:28:13 -04:00
Christian Schwarz	f4f300732a	refactor TenantState transitions (#4321 ) This is preliminary work for/from #4220 (async `Layer::get_value_reconstruct_data`). The motivation is to avoid locking `Tenant::timelines` in places that can't be `async`, because in #4333 we want to convert Tenant::timelines from `std::sync::Mutex` to `tokio::sync::Mutex`. But, the changes here are useful in general because they clean up & document tenant state transitions. That also paves the way for #4350, which is an alternative to #4333 that refactors the pageserver code so that we can keep the `Tenant::timelines` mutex sync. This patch consists of the following core insights and changes: * spawn_load and spawn_attach own the tenant state until they're done * once load()/attach() calls are done ... * if they failed, transition them to Broken directly (we know that there's no background activity because we didn't call activate yet) * if they succeed, call activate. We can make it infallible. How? Later. * set_broken() and set_stopping() are changed to wait for spawn_load() / spawn_attach() to finish. * This sounds scary because it might hinder detach or shutdown, but actually, concurrent attach+detach, or attach+shutdown, or load+shutdown, or attach+shutdown were just racy before this PR. So, with this change, they're not anymore. In the future, we can add a `CancellationToken` stored in Tenant to cancel `load` and `attach` faster, i.e., make `spawn_load` / `spawn_attach` transition them to Broken state sooner. See the doc comments on TenantState for the state transitions that are now possible. It might seem scary, but actually, this patch reduces the possible state transitions. We introduce a new state `TenantState::Activating` to avoid grabbing the `Tenant::timelines` lock inside the `send_modify` closure. These were the humble beginnings of this PR (see Motivation section), and I think it's still the right thing to have this `Activating` state, even if we decide against async `Tenant::timelines` mutex. The reason is that `send_modify` locks internally, and by moving locking of Tenant::timelines out of the closure, the internal locking of `send_modify` becomes a leaf of the lock graph, and so, we eliminate deadlock risk. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-05-29 17:52:41 +03:00
Heikki Linnakangas	2cdf07f12c	Refactor RequestSpan into a function. Previously, you used it like this: \|r\| RequestSpan(my_handler).handle(r) But I don't see the point of the RequestSpan struct. It's just a wrapper around the handler function. With this commit, the call becomes: \|r\| request_span(r, my_handler) Which seems a little simpler. At first I thought that the RequestSpan struct would allow "chaining" other kinds of decorators like RequestSpan, so that you could do something like this: \|r\| CheckPermissions(RequestSpan(my_handler)).handle(r) But it doesn't work like that. If each of those structs wrap a handler function, it would actually look like this: \|r\| CheckPermissions(\|r\| RequestSpan(my_handler).handle(r))).handle(r) This commit doesn't make that kind of chaining any easier, but seems a little more straightforward anyway.	2023-05-27 11:47:22 +03:00
Heikki Linnakangas	a560b28829	Make new tenant/timeline IDs mandatory in create APIs. (#4304 ) We used to generate the ID, if the caller didn't specify it. That's bad practice, however, because network is never fully reliable, so it's possible we create a new tenant but the caller doesn't know about it, and because it doesn't know the tenant ID, it has no way of retrying or checking if it succeeded. To discourage that, make it mandatory. The web control plane has not relied on the auto-generation for a long time.	2023-05-26 16:19:36 +03:00
Christian Schwarz	e5617021a7	refactor: eliminate global storage_broker client state (#4318 ) (This is prep work to make `Timeline::activate` infallible.) This patch removes the global storage_broker client instance from the pageserver codebase. Instead, pageserver startup instantiates it and passes it down to the `Timeline::activate` function, which in turn passes it to the WalReceiver, which is the entity that actually uses it. Patch series: - #4316 - #4317 - #4318 - #4319	2023-05-25 16:47:42 +03:00
Christian Schwarz	83ba02b431	tenant_status: don't InternalServerError if tenant not found (#4337 ) Note this also changes the status code to the (correct) 404. Not sure if that's relevant to Console. Context: https://neondb.slack.com/archives/C04PSBP2SAF/p1684746238831449?thread_ts=1684742106.169859&cid=C04PSBP2SAF Atop #4300 because it cleans up the mgr::get_tenant() error type and I want eyes on that PR.	2023-05-25 11:38:04 +02:00
Christian Schwarz	37ecebe45b	mgr::get_tenant: distinguished error type (#4300 ) Before this patch, it would use error type `TenantStateError` which has many more error variants than can actually happen with `mgr::get_tenant`. Along the way, I also introduced `SetNewTenantConfigError` because it uses `mgr::get_tenant` and also can only fail in much fewer ways than `TenantStateError` suggests. The new `page_service.rs`'s `GetActiveTimelineError` and `GetActiveTenantError` types were necessary to avoid an `Other` variant on the `GetTenantError`. This patch is a by-product of reading code that subscribes to `Tenant::state` changes. Can't really connect it to any given project.	2023-05-25 11:37:12 +02:00
Christian Schwarz	df52587bef	attach-time tenant config (#4255 ) This PR adds support for supplying the tenant config upon /attach. Before this change, when relocating a tenant using `/detach` and `/attach`, the tenant config after `/attach` would be the default config from `pageserver.toml`. That is undesirable for settings such as the PITR-interval: if the tenant's config on the source was `30 days` and the default config on the attach-side is `7 days`, then the first GC run would eradicate 23 days worth of PITR capability. The API change is backwards-compatible: if the body is empty, we continue to use the default config. We'll remove that capability as soon as the cloud.git code is updated to use attach-time tenant config (https://github.com/neondatabase/neon/issues/4282 keeps track of this). unblocks https://github.com/neondatabase/cloud/issues/5092 fixes https://github.com/neondatabase/neon/issues/1555 part of https://github.com/neondatabase/neon/issues/886 (Tenant Relocation) Implementation ============== The preliminary PRs for this work were (most-recent to least-recent) * https://github.com/neondatabase/neon/pull/4279 * https://github.com/neondatabase/neon/pull/4267 * https://github.com/neondatabase/neon/pull/4252 * https://github.com/neondatabase/neon/pull/4235	2023-05-24 17:46:30 +03:00
Christian Schwarz	b391c94440	tenant create / update-config: reject unknown fields (#4267 ) This PR enforces that the tenant create / update-config APIs reject requests with unknown fields. This is a desirable property because some tenant config settings control the lifetime of user data (e.g., GC horizon or PITR interval). Suppose we inadvertently rename the `pitr_interval` field in the Rust code. Then, right now, a client that still uses the old name will send a tenant config request to configure a new PITR interval. Before this PR, we would accept such a request, ignore the old name field, and use the pageserver.toml default value for what the new PITR interval is. With this PR, we will instead reject such a request. One might argue that the client could simply check whether the config it sent has been applied, using the `/v1/tenant/.../config` endpoint. That is correct for tenant create and update-config. But, attach will soon [^1] grow the ability to have attach-time config as well. If we ignore unknown fields and fall back to global defaults in that case, we risk data loss. Example: 1. Default PITR in pageservers is 7 days. 2. Create a tenant and set its PITR to 30 days. 3. For 30 days, fill the tenant continuously with data. 4. Detach the tenant. 5. Attach tenant. Attach must use the 30-day PITR setting in this scenario. If it were to fall back to the 7-day default value, we would lose 23 days of PITR capability for the tenant. So, the PR that adds attach-time tenant config will build on the (clunky) infrastructure added in this PR [^1]: https://github.com/neondatabase/neon/pull/4255 Implementation Notes ==================== This could have been a simple `#[serde(deny_unknown_fields)]` but sadly, that is documented- but silent-at-compile-time-incompatible with `#[serde(flatten)]`. But we are still using this by adding on outer struct and use unit tests to ensure it is correct. `neon_local tenant config` now uses the `.remove()` pattern + bail if there are leftover config args. That's in line with what `neon_local tenant create` does. We should dedupe that logic in a future PR. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com> Co-authored-by: Alex Chi <iskyzh@gmail.com>	2023-05-18 21:16:09 -04:00
Christian Schwarz	89307822b0	mgmt api: share a single tenant config model struct in Rust and OpenAPI (#4252 ) This is prep for https://github.com/neondatabase/neon/pull/4255 [1/X] OpenAPI: share a single definition of TenantConfig DRYs up the pageserver OpenAPI YAML's representation of tenant config. All the fields of tenant config are now located in a model schema called TenantConfig. The tenant create & config-change endpoints have separate schemas, TenantCreateInfo and TenantConfigureArg, respectively. These schemas inherit from TenantConfig, using allOf 1. The tenant config-GET handler's response was previously named TenantConfig. It's now named TenantConfigResponse. None of these changes affect how the request looks on the wire. The generated Go code will change for Console because the OpenAPI code generator maps `allOf` to a Go struct embedding. Luckily, usage of tenant config in Console is still very lightweigt, but that will change in the near future. So, this is a good chance to set things straight. The console changes are tracked in https://github.com/neondatabase/cloud/pull/5046 [2/x]: extract the tenant config parts of create & config requests [3/x]: code movement: move TenantConfigRequestConfig next to TenantCreateRequestConfig [4/x] type-alias TenantConfigRequestConfig = TenantCreateRequestConfig; They are exactly the same. [5/x] switch to qualified use for tenant create/config request api models [6/x] rename models::TenantConfig{RequestConfig,} and remove the alias [7/x] OpenAPI: sync tenant create & configure body names from Rust code [8/x]: dedupe the two TryFrom<...> for TenantConfOpt impls The only difference is that the TenantConfigRequest impl does ``` tenant_conf.max_lsn_wal_lag = request_data.max_lsn_wal_lag; tenant_conf.trace_read_requests = request_data.trace_read_requests; ``` and the TenantCreateRequest impl does ``` if let Some(max_lsn_wal_lag) = request_data.max_lsn_wal_lag { tenant_conf.max_lsn_wal_lag = Some(max_lsn_wal_lag); } if let Some(trace_read_requests) = request_data.trace_read_requests { tenant_conf.trace_read_requests = Some(trace_read_requests); } ``` As far as I can tell, these are identical.	2023-05-17 12:31:17 +02:00
Christian Schwarz	4431779e32	refactor: attach: use create_tenant_files + schedule_local_tenant_processing (#4235 ) With this patch, the attach handler now follows the same pattern as tenant create with regards to instantiation of the new tenant: 1. Prepare on-disk state using `create_tenant_files`. 2. Use the same code path as pageserver startup to load it into memory and start background loops (`schedule_local_tenant_processing`). It's a bit sad we can't use the `PageServerConfig::tenant_attaching_mark_file_path` method inside `create_tenant_files` because it operates in a temporary directory. However, it's a small price to pay for the gained simplicity. During implementation, I noticed that we don't handle failures post `create_tenant_files` well. I left TODO comments in the code linking to the issue that I created for this [^1]. Also, I'll dedupe the spawn_load and spawn_attach code in a future commit. refs https://github.com/neondatabase/neon/issues/1555 part of https://github.com/neondatabase/neon/issues/886 (Tenant Relocation) [^1]: https://github.com/neondatabase/neon/issues/4233	2023-05-16 12:53:17 -04:00
Dmitry Rodionov	b7db62411b	Make storage time operations an enum instead of an array (#4238 ) Use an enum instead of an array. Before that there was no connection between definition of the metric and point where it was used aside from matching string literals. Now its possible to use IDE features to check for references. Also this allows to avoid mismatch between set of metrics that was defined and set of metrics that was actually used What is interesting is that `init logical size` case is not used. I think `LogicalSize` is a duplicate of `InitLogicalSize`. So removed the latter.	2023-05-16 16:54:29 +03:00

1 2 3 4 5

218 Commits