rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-22 15:41:15 +00:00

Author	SHA1	Message	Date
Joonas Koivunen	4be6bc7251	refactor: remove unnecessary unsafe (#5802 ) unsafe impls for `Send` and `Sync` should not be added by default. in the case of `SlotGuard` removing them does not cause any issues, as the compiler automatically derives those. This PR adds requirement to document the unsafety (see [clippy::undocumented_unsafe_blocks]) and opportunistically adds `#![deny(unsafe_code)]` to most places where we don't have unsafe code right now. TRPL on Send and Sync: https://doc.rust-lang.org/book/ch16-04-extensible-concurrency-sync-and-send.html [clippy::undocumented_unsafe_blocks]: https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks	2023-11-07 10:26:25 +00:00
Arpad Müller	b09a851705	Make azure blob storage not do extra metadata requests (#5777 ) Load the metadata from the returned `GetBlobResponse` and avoid downloading it via a separate request. As it turns out, the SDK does return the metadata: https://github.com/Azure/azure-sdk-for-rust/issues/1439 . This PR will reduce the number of requests to Azure caused by downloads. Fixes #5571	2023-11-06 15:16:55 +00:00
John Spray	85cd97af61	pageserver: add `InProgress` tenant map state, use a sync lock for the map (#5367 ) ## Problem Follows on from #5299 - We didn't have a generic way to protect a tenant undergoing changes: `Tenant` had states, but for our arbitrary transitions between secondary/attached, we need a general way to say "reserve this tenant ID, and don't allow any other ops on it, but don't try and report it as being in any particular state". - The TenantsMap structure was behind an async RwLock, but it was never correct to hold it across await points: that would block any other changes for all tenants. ## Summary of changes - Add the `TenantSlot::InProgress` value. This means: - Incoming administrative operations on the tenant should retry later - Anything trying to read the live state of the tenant (e.g. a page service reader) should retry later or block. - Store TenantsMap in `std::sync::RwLock` - Provide an extended `get_active_tenant_with_timeout` for page_service to use, which will wait on InProgress slots as well as non-active tenants. Closes: https://github.com/neondatabase/neon/issues/5378 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-06 14:03:22 +00:00
John Spray	6defa2b5d5	pageserver: add `Gate` as a partner to CancellationToken for safe shutdown of `Tenant` & `Timeline` (#5711 ) ## Problem When shutting down a Tenant, it isn't just important to cause any background tasks to stop. It's also important to wait until they have stopped before declaring shutdown complete, in cases where we may re-use the tenant's local storage for something else, such as running in secondary mode, or creating a new tenant with the same ID. ## Summary of changes A `Gate` class is added, inspired by [seastar::gate](https://docs.seastar.io/master/classseastar_1_1gate.html). For types that have an important lifetime that corresponds to some physical resource, use of a Gate as well as a CancellationToken provides a robust pattern for async requests & shutdown: - Requests must always acquire the gate as long as they are using the object - Shutdown must set the cancellation token, and then `close()` the gate to wait for requests in progress before returning. This is not for memory safety: it's for expressing the difference between "Arc<Tenant> exists", and "This tenant's files on disk are eligible to be read/written". - Both Tenant and Timeline get a Gate & CancellationToken. - The Timeline gate is held during eviction of layers, and during page_service requests. - Existing cancellation support in page_service is refined to use the timeline-scope cancellation token instead of a process-scope cancellation token. This replaces the use of `task_mgr::associate_with`: tasks no longer change their tenant/timelineidentity after being spawned. The Tenant's Gate is not yet used, but will be important for Tenant-scoped operations in secondary mode, where we must ensure that our secondary-mode downloads for a tenant are gated wrt the activity of an attached Tenant. This is part of a broader move away from using the global-state driven `task_mgr` shutdown tokens: - less global state where we rely on implicit knowledge of what task a given function is running in, and more explicit references to the cancellation token that a particular function/type will respect, making shutdown easier to reason about. - eventually avoid the big global TASKS mutex. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 12:39:20 +00:00
duguorong009	b3d3a2587d	feat: improve the serde impl for several types(`Lsn`, `TenantId`, `TimelineId` ...) (#5335 ) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 11:40:03 +02:00
duguorong009	09b5954526	refactor: use streaming in safekeeper `/v1/debug_dump` http response (#5731 ) - Update the handler for `/v1/debug_dump` http response in safekeeper - Update the `debug_dump::build()` to use the streaming in JSON build process	2023-11-05 10:16:54 +00:00
John Spray	306c4f9967	s3_scrubber: prepare for scrubbing buckets with generation-aware content (#5700 ) ## Problem The scrubber didn't know how to find the latest index_part when generations were in use. ## Summary of changes - Teach the scrubber to do the same dance that pageserver does when finding the latest index_part.json - Teach the scrubber how to understand layer files with generation suffixes. - General improvement to testability: scan_metadata has a machine readable output that the testing `S3Scrubber` wrapper can read. - Existing test coverage of scrubber was false-passing because it just didn't see any data due to prefixing of data in the bucket. Fix that. This is incremental improvement: the more confidence we can have in the scrubber, the more we can use it in integration tests to validate the state of remote storage. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-11-03 17:36:02 +00:00
Joonas Koivunen	27bdbf5e36	chore(layer): restore logging, doc changes (#5766 ) Some of the log messages were lost with the #4938. This PR adds some of them back, most notably: - starting to on-demand download - successful completion of on-demand download - ability to see when there were many waiters for the layer download - "unexpectedly on-demand downloading ..." is now `info!` Additionally some rare events are logged as error, which should never happen.	2023-11-02 19:05:33 +00:00
Em Sharnoff	367971a0e9	vm-monitor: Remove support for file cache in tmpfs (#5617 ) ref neondatabase/cloud#7516. We switched everything over to file cache on disk, now time to remove support for having it in tmpfs.	2023-11-02 16:06:16 +00:00
Joonas Koivunen	2dca4c03fc	feat(layer): cancellable get_or_maybe_download (#5744 ) With the layer implementation as was done in #4938, it is possible via cancellation to cause two concurrent downloads on the same path, due to how `RemoteTimelineClient::download_remote_layer` does tempfiles. Thread the init semaphore through the spawned task of downloading to make this impossible to happen.	2023-11-02 08:06:32 +00:00
Conrad Ludgate	d8c21ec70d	fix nightly 1.75 (#5719 ) ## Problem Neon doesn't compile on nightly and had numerous clippy complaints. ## Summary of changes 1. Fixed troublesome dependency 2. Fixed or ignored the lints where appropriate	2023-10-30 16:43:06 +00:00
Konstantin Knizhnik	ad99fa5f03	Grant BYPASSRLS and REPLICATION to exited roles (#5657 ) ## Problem Role need to have REPLICATION privilege to be able to used for logical replication. New roles are created with this option. This PR tries to update existed roles. ## Summary of changes Update roles in `handle_roles` method ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-30 15:29:25 +00:00
Joonas Koivunen	2bd79906d9	fix: possible page_service hang on cancel (#5696 ) Fixes #5341, one more suspected case, see: https://github.com/neondatabase/neon/issues/5341#issuecomment-1783052379 - races `MaybeWriteOnly::shutdown` with cancellation - switches to using `AsyncWriteExt::write_buf` - notes cancellation safety for shutdown	2023-10-27 19:09:34 +01:00
Gleb Novikov	a5292f7e67	Some minor renames in attachment service API (#5687 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] ~~If it is a core feature, I have added thorough tests.~~ - [ ] ~~Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?~~ - [ ] ~~If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.~~ ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-10-27 12:36:34 +01:00
duguorong009	39f8fd6945	feat: add `build_tag` env support for `set_build_info_metric` (#5576 ) - Add a new util `project_build_tag` macro, similar to `project_git_version` - Update the `set_build_info_metric` to accept and make use of `build_tag` info - Update all codes which use the `set_build_info_metric`	2023-10-27 10:47:11 +01:00
John Spray	de90bf4663	pageserver: always load remote metadata (no more `spawn_load`) (#5580 ) ## Problem The pageserver had two ways of loading a tenant: - `spawn_load` would trust on-disk content to reflect all existing timelines - `spawn_attach` would list timelines in remote storage. It was incorrect for `spawn_load` to trust local disk content, because it doesn't know if the tenant might have been attached and written somewhere else. To make this correct would requires some generation number checks, but the payoff is to avoid one S3 op per tenant at startup, so it's not worth the complexity -- it is much simpler to have one way to load a tenant. ## Summary of changes - `Tenant` objects are always created with `Tenant::spawn`: there is no more distinction between "load" and "attach". - The ability to run without remote storage (for `neon_local`) is preserved by adding a branch inside `attach` that uses a fallback `load_local` if no remote_storage is present. - Fix attaching a tenant when it has a timeline with no IndexPart: this can occur if a newly created timeline manages to upload a layer before it has uploaded an index. - The attach marker file that used to indicate whether a tenant should be "loaded" or "attached" is no longer needed, and is removed. - The GenericRemoteStorage interface gets a `list()` method that maps more directly to what ListObjects does, returning both keys and common prefixes. The existing `list_files` and `list_prefixes` methods are just calls into `list()` now -- these can be removed later if we would like to shrink the interface a bit. - The remote deletion marker is moved into `timelines/` and detected as part of listing timelines rather than as a separate GET request. If any existing tenants have a marker in the old location (unlikely, only happens if something crashes mid-delete), then they will rely on the control plane retrying to complete their deletion. - Revise S3 calls for timeline listing and tenant load to take a cancellation token, and retry forever: it never makes sense to make a Tenant broken because of a transient S3 issue. ## Breaking changes - The remote deletion marker is moved from `deleted` to `timelines/deleted` within the tenant prefix. Markers in the old location will be ignored: it is the control plane's responsibility to retry deletions until they succeed. Markers in the new location will be tolerated by the previous release of pageserver via https://github.com/neondatabase/neon/pull/5632 - The local `attaching` marker file is no longer written. Therefore, if the pageserver is downgraded after running this code, the old pageserver will not be able to distinguish between partially attached tenants and fully attached tenants. This would only impact tenants that were partway through attaching at the moment of downgrade. In the unlikely even t that we do experience an incident that prompts us to roll back, then we may check for attach operations in flight, and manually insert `attaching` marker files as needed. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-10-26 14:48:44 +01:00
Joonas Koivunen	c508d3b5fa	reimpl Layer, remove remote layer, trait Layer, trait PersistentLayer (#4938 ) Implement a new `struct Layer` abstraction which manages downloadness internally, requiring no LayerMap locking or rewriting to download or evict providing a property "you have a layer, you can read it". The new `struct Layer` provides ability to keep the file resident via a RAII structure for new layers which still need to be uploaded. Previous solution solved this `RemoteTimelineClient::wait_completion` which lead to bugs like #5639. Evicting or the final local deletion after garbage collection is done using Arc'd value `Drop`. With a single `struct Layer` the closed open ended `trait Layer`, `trait PersistentLayer` and `struct RemoteLayer` are removed following noting that compaction could be simplified by simply not using any of the traits in between: #4839. The new `struct Layer` is a preliminary to remove `Timeline::layer_removal_cs` documented in #4745. Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058, #5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off: #5057, #5134.	2023-10-26 12:36:38 +03:00
Arpad Müller	4bef977c56	Use tuples instead of manual comparison chain (#5637 ) Makes code a little bit simpler	2023-10-24 17:16:23 +00:00
Arpad Müller	1e250cd90a	Cleanup in azure_upload_download_works test (#5636 ) The `azure_upload_download_works` test is not cleaning up after itself, leaving behind the files it is uploading. I found these files when looking at the contents of the bucket in #5627. We now clean up the file we uploaded before, like the other tests do it as well. Follow-up of #5546	2023-10-23 19:08:56 +01:00
Em Sharnoff	2cf6a47cca	vm-monitor: Deny not fail downscale if no memory stats yet (#5606 ) Fixes an issue we observed on staging that happens when the autoscaler-agent attempts to immediately downscale the VM after binding, which is typical for pooled computes. The issue was occurring because the autoscaler-agent was requesting downscaling before the vm-monitor had gathered sufficient cgroup memory stats to be confident in approving it. When the vm-monitor returned an internal error instead of denying downscaling, the autoscaler-agent retried the connection and immediately hit the same issue (in part because cgroup stats are collected per-connection, rather than globally).	2023-10-19 19:09:37 +01:00
Em Sharnoff	2c8741a5ed	vm-monitor: Log full error on message handling failure (#5604 ) There's currently an issue with the vm-monitor on staging that's not really feasible to debug because the current display impl gives no context to the errors (just says "failed to downscale"). Logging the full error should help. For communications with the autoscaler-agent, it's ok to only provide the outermost cause, because we can cross-reference with the VM logs. At some point in the future, we may want to change that.	2023-10-19 18:10:33 +02:00
Arthur Petukhovsky	66f8f5f1c8	Call walproposer from Rust (#5403 ) Create Rust bindings for C functions from walproposer. This allows to write better tests with real walproposer code without spawning multiple processes and starting up the whole environment. `make walproposer-lib` stage was added to build static libraries `libwalproposer.a`, `libpgport.a`, `libpgcommon.a`. These libraries can be statically linked to any executable to call walproposer functions. `libs/walproposer/src/walproposer.rs` contains `test_simple_sync_safekeepers` to test that walproposer can be called from Rust to emulate sync_safekeepers logic. It can also be used as a usage example.	2023-10-19 14:17:15 +01:00
Arpad Müller	b1d6af5ebe	Azure blobs: Simplify error conversion by addition of to_download_error (#5575 ) There is a bunch of duplication and manual Result handling that can be simplified by moving the error conversion into a shared function, using `map_err`, and the question mark operator.	2023-10-19 14:31:09 +02:00
Arpad Müller	f842b22b90	Add endpoint for querying time info for lsn (#5497 ) ## Problem See #5468. ## Summary of changes Add a new `get_timestamp_of_lsn` endpoint, returning the timestamp associated with the given lsn. Fixes #5468. --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-19 04:50:49 +02:00
Konstantin Knizhnik	5c88213eaf	Logical replication (#5271 ) ## Problem See https://github.com/neondatabase/company_projects/issues/111 ## Summary of changes Save logical replication files in WAL at compute and include them in basebackup at pate server. ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-10-18 16:42:22 +03:00
John Spray	607d19f0e0	pageserver: clean up page service Result handling for shutdown/disconnect (#5504 ) ## Problem - QueryError always logged at error severity, even though disconnections are not true errors. - QueryError type is not expressive enough to distinguish actual errors from shutdowns. - In some functions we're returning Ok(()) on shutdown, in others we're returning an error ## Summary of changes - Add QueryError::Shutdown and use it in places we check for cancellation - Adopt consistent Result behavior: disconnects and shutdowns are always QueryError, not ok - Transform shutdown+disconnect errors to Ok(()) at the very top of the task that runs query handler - Use the postgres protocol error code for "admin shutdown" in responses to clients when we are shutting down. Closes: #5517	2023-10-18 13:28:38 +01:00
Em Sharnoff	9fe5cc6a82	vm-monitor: Switch from memory.high to polling memory.stat (#5524 ) tl;dr it's really hard to avoid throttling from memory.high, and it counts tmpfs & page cache usage, so it's also hard to make sense of. In the interest of fixing things quickly with something that should be good enough, this PR switches to instead periodically fetch memory statistics from the cgroup's memory.stat and use that data to determine if and when we should upscale. This PR fixes #5444, which has a lot more detail on the difficulties we've hit with memory.high. This PR also supersedes #5488.	2023-10-17 15:30:40 -07:00
Arpad Müller	093f8c5f45	Update rust to 1.73.0 (#5574 ) [Release notes](https://blog.rust-lang.org/2023/10/05/Rust-1.73.0.html)	2023-10-17 13:13:12 +01:00
Arpad Müller	00c71bb93a	Also try to login to Azure via SDK provided methods (#5573 ) ## Problem We ideally use the Azure SDK's way of obtaining authorization, as pointed out in https://github.com/neondatabase/neon/pull/5546#discussion_r1360619178 . ## Summary of changes This PR adds support for Azure SDK based authentication, using [DefaultAzureCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.DefaultAzureCredential.html), which tries the following credentials: * [EnvironmentCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.EnvironmentCredential.html), reading from various env vars * [ImdsManagedIdentityCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.ImdsManagedIdentityCredential.html), using managed identity * [AzureCliCredential](https://docs.rs/azure_identity/0.16.1/azure_identity/struct.AzureCliCredential.html), using Azure CLI closes #5566.	2023-10-17 11:59:57 +01:00
Arpad Müller	3666df6342	azure_blob.rs: use division instead of left shift (#5572 ) Should have been a right shift but I did a left shift. It's constant folded anyways so we just use a shift.	2023-10-16 19:52:07 +01:00
Alexey Kondratov	0ca342260c	[compute_ctl+pgxn] Handle invalid databases after failed drop (#5561 ) ## Problem In `89275f6c1e` we fixed an issue, when we were dropping db in Postgres even though cplane request failed. Yet, it introduced a new problem that we now de-register db in cplane even if we didn't actually drop it in Postgres. ## Summary of changes Here we revert extension change, so we now again may leave db in invalid state after failed drop. Instead, `compute_ctl` is now responsible for cleaning up invalid databases during full configuration. Thus, there are two ways of recovering from failed DROP DATABASE: 1. User can just repeat DROP DATABASE, same as in Vanilla Postgres. 2. If they didn't, then on next full configuration (dbs / roles changes in the API; password reset; or data availability check) invalid db will be cleaned up in the Postgres and re-created by `compute_ctl`. So again it follows pretty much the same semantics as Vanilla Postgres -- you need to drop it again after failed drop. That way, we have a recovery trajectory for both problems. See this commit for info about `invalid` db state: `a4b4cc1d60` According to it: > An invalid database cannot be connected to anymore, but can still be dropped. While on it, this commit also fixes another issue, when `compute_ctl` was trying to connect to databases with `ALLOW CONNECTIONS false`. Now it will just skip them. Fixes #5435	2023-10-16 20:46:45 +02:00
Arpad Müller	e09d5ada6a	Azure blob storage support (#5546 ) Adds prototype-level support for [Azure blob storage](https://azure.microsoft.com/en-us/products/storage/blobs). Some corners were cut, see the TODOs and the followup issue #5567 for details. Steps to try it out: * Create a storage account with block blobs (this is a per-storage account setting). * Create a container inside that storage account. * Set the appropriate env vars: `AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_ACCESS_KEY, REMOTE_STORAGE_AZURE_CONTAINER, REMOTE_STORAGE_AZURE_REGION` * Set the env var `ENABLE_REAL_AZURE_REMOTE_STORAGE=y` and run `cargo test -p remote_storage azure` Fixes #5562	2023-10-16 17:37:09 +02:00
John Spray	e0c8ad48d4	remote_storage: log detail errors in delete_objects (#5530 ) ## Problem When we got an error in the payload of a DeleteObjects response, we only logged how many errors, not what they were. ## Summary of changes Log up to 10 specific errors. We do not log all of them because that would be up to 1000 log lines per request.	2023-10-11 13:22:00 +01:00
John Spray	7eaa7a496b	pageserver: cancellation handling in writes to postgres client socket (#5503 ) ## Problem Writes to the postgres client socket from the page server were not wrapped in cancellation handling, so a stuck client connection could prevent tenant shutdowwn. ## Summary of changes All the places we call flush() to write to the socket, we should be respecting the cancellation token for the task. In this PR, I explicitly pass around a CancellationToken rather than doing inline `task_mgr::shutdown_token` calls, to avoid coupling it to the global task_mgr state and make it easier to refactor later. I have some follow-on commits that add a Shutdown variant to QueryError and use it more extensively, but that's pure refactor so will keep separate from this bug fix PR. Closes: https://github.com/neondatabase/neon/issues/5341	2023-10-09 15:54:17 +01:00
Shany Pozin	010b4d0d5c	Move ApiError 404 to info level (#5501 ) ## Problem Moving ApiError 404 to info level logging (see https://github.com/neondatabase/neon/pull/5489#issuecomment-1750211212)	2023-10-09 13:54:46 +03:00
Arpad Müller	607b185a49	Fix 1.73.0 clippy lints (#5494 ) Doesn't do an upgrade of rustc to 1.73.0 as we want to wait for the cargo response of the curl CVE before updating. In preparation for an update, we address the clippy lints that are newly firing in 1.73.0.	2023-10-06 14:17:19 +01:00
Joonas Koivunen	a15f9b3baa	pageserver: Tune 503 Resource unavailable (#5489 ) 503 Resource Unavailable appears as error in logs, but is not really an error which should ever fail a test on, or even log an error in prod, [evidence]. Changes: - log 503 as `info!` level - use `Cow<'static, str>` instead of `String` - add an additional `wait_until_tenant_active` in `test_actually_duplicate_l1` We ought to have in tests "wait for tenants to complete loading" but this is easier to implement for now. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5485/6423110295/index.html#/testresult/182de66203864fc0	2023-10-06 09:59:14 +01:00
John Spray	baa5fa1e77	pageserver: location configuration API, attachment modes, secondary locations (#5299 ) ## Problem These changes are part of building seamless tenant migration, as described in the RFC: - https://github.com/neondatabase/neon/pull/5029 ## Summary of changes - A new configuration type `LocationConf` supersedes `TenantConfOpt` for storing a tenant's configuration in the pageserver repo dir. It contains `TenantConfOpt`, as well as a new `mode` attribute that describes what kind of location this is (secondary, attached, attachment mode etc). It is written to a file called `config-v1` instead of `config` -- this prepares us for neatly making any other profound changes to the format of the file in future. Forward compat for existing pageserver code is achieved by writing out both old and new style files. Backward compat is achieved by checking for the old-style file if the new one isn't found. - The `TenantMap` type changes, to hold `TenantSlot` instead of just `Tenant`. The `Tenant` type continues to be used for attached tenants only. Tenants in other states (such as secondaries) are represented by a different variant of `TenantSlot`. - Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>, they now hold a reference to a AttachedTenantConf, which includes the extra information from LocationConf. This enables them to know the current attachment mode. - The attachment mode is used as an advisory input to decide whether to do compaction and GC (AttachedStale is meant to avoid doing uploads, AttachedMulti is meant to avoid doing deletions). - A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config` to drive new location configuration. This provides a superset of the functionality of attach/detach/load/ignore: - Attaching a tenant is just configuring it in an attached state - Detaching a tenant is configuring it to a detached state - Loading a tenant is just the same as attaching it - Ignoring a tenant is the same as configuring it into Secondary with warm=false (i.e. retain the files on disk but do nothing else). Caveats: - AttachedMulti tenants don't do compaction in this PR, but they do in the follow on #5397 - Concurrent updates to the `location_config` API are not handled elegantly in this PR, a better mechanism is added in the follow on https://github.com/neondatabase/neon/pull/5367 - Secondary mode is just a placeholder in this PR: the code to upload heatmaps and do downloads on secondary locations will be added in a later PR (but that shouldn't change any external interfaces) Closes: https://github.com/neondatabase/neon/issues/5379 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-10-05 09:55:10 +01:00
John Spray	c5ea91f831	pageserver: fix loading control plane JWT token (#5470 ) ## Problem In #5383 this configuration was added, but it missed the parts of the Builder class that let it actually be used. ## Summary of changes Add `control_plane_api_token` hooks to PageserverConfigBuilder	2023-10-05 01:31:17 +01:00
Em Sharnoff	6489a4ea40	vm-monitor: Remove mem::forget of tokio::sync::mpsc::Sender (#5441 ) If the cgroup integration was not enabled, this would cause compute_ctl to leak memory. Thankfully, we never use vm-monitor without the cgroup handling enabled, so this wasn't actually impacting us, but... it still looked suspicious, so figured it was worth changing.	2023-10-04 15:08:10 -07:00
duguorong009	25a37215f3	fix: replace all `std::PathBuf`s with `camino::Utf8PathBuf` (#5352 ) Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with `camino::Utf8Path`, `camino::Utf8PathBuf` in - pageserver - safekeeper - control_plane - libs/remote_storage Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 17:52:23 +03:00
John Spray	ace0c775fc	pageserver: prefer 503 to 500 for transient unavailability (#5439 ) ## Problem The 500 status code should only be used for bugs or unrecoverable failures: situations we did not expect. Currently, the pageserver is misusing this response code for some situations that are totally normal, like requests targeting tenants that are in the process of activating. The 503 response is a convenient catch-all for "I can't right now, but I will be able to". ## Summary of changes - Change some transient availability error conditions to return 503 instead of 500 - Update the HTTP client configuration in integration tests to retry on 503 After these changes, things like creating a tenant and then trying to create a timeline within it will no longer require carefully checking its status first, or retrying on 500s. Instead, a client which is properly configured to retry on 503 can quietly handle such situations.	2023-10-03 17:00:55 +01:00
Christian Schwarz	de0e96d2be	remote_storage: separate semaphores for read and write ops (#5440 ) Before this PR, a compaction that queues a lot of uploads could grab all the semaphore permits. Any readers that need on-demand downloads would queue up, causing getpage@lsn outliers. Internal context: https://neondb.slack.com/archives/C05NXJFNRPA/p1696264359425419?thread_ts=1696250393.840899&cid=C05NXJFNRPA	2023-10-03 11:22:11 +03:00
Conrad Ludgate	528fb1bd81	proxy: metrics2 (#5179 ) ## Problem We need to count metrics always when a connection is open. Not only when the transfer is 0. We also need to count bytes usage for HTTP. ## Summary of changes New structure for usage metrics. A `DashMap<Ids, Arc<Counters>>`. If the arc has 1 owner (the map) then I can conclude that no connections are open. If the counters has "open_connections" non zero, then I can conclude a new connection was opened in the last interval and should be reported on. Also, keep count of how many bytes processed for HTTP and report it here.	2023-09-28 11:38:26 +01:00
Em Sharnoff	48e85460fc	vm-monitor: Unset memory.high on start + refactor cgroup handling (#5348 ) ## Problem Over the past couple days, we've had a couple VMs hit issues with postgres getting hit by memory.high throttling, even after #5303 was supposed to fix that. The tl;dr of those issues is that because vm-monitor startup sets the file cache size first, before interacting with the cgroup, cgroup throttling can mean we timeout connecting to the file cache and never reset the cgroup, even if memory has been upscaled since then. See e.g.: - https://neondb.slack.com/archives/C03F5SM1N02/p1695218132208249 - https://neondb.slack.com/archives/C03F5SM1N02/p1695314613696659 ## Summary of changes This PR adds an additional step into vm-monitor startup, where we first set the cgroup's memory.high value to 'max', removing the capacity for throttling. This preferable to just setting memory.high before the file cache, because it's theoretically possible that the new value of memory.high could still be less than the current memory usage, in which case postgres could continue to be throttled without sufficient memory events to relieve that. Implementing this properly involved adding a method to our internal cgroup interface, and it seemed like there was duplicated functionality there, so this PR unifies that as well, making things a bit more consistent.	2023-09-27 21:27:23 -07:00
John Spray	2cced770da	pageserver: add control_plane_api_token config (#5383 ) ## Problem Control plane API calls in prod will need authentication. ## Summary of changes `control_plane_api_token` config is loaded and set as HTTP `Authorization` header. Closes: https://github.com/neondatabase/neon/issues/5139	2023-09-27 13:12:13 +01:00
John Spray	ba92668e37	pageserver: deletion queue & generation validation for deletions (#5207 ) ## Problem Pageservers must not delete objects or advertise updates to remote_consistent_lsn without checking that they hold the latest generation for the tenant in question (see [the RFC]( https://github.com/neondatabase/neon/blob/main/docs/rfcs/025-generation-numbers.md)) In this PR: - A new "deletion queue" subsystem is introduced, through which deletions flow - `RemoteTimelineClient` is modified to send deletions through the deletion queue: - For GC & compaction, deletions flow through the full generation verifying process - For timeline deletions, deletions take a fast path that bypasses generation verification - The `last_uploaded_consistent_lsn` value in `UploadQueue` is replaced with a mechanism that maintains a "projected" lsn (equivalent to the previous property), and a "visible" LSN (which is the one that we may share with safekeepers). - Until `control_plane_api` is set, all deletions skip generation validation - Tests are introduced for the new functionality in `test_pageserver_generations.py` Once this lands, if a pageserver is configured with the `control_plane_api` configuration added in https://github.com/neondatabase/neon/pull/5163, it becomes safe to attach a tenant to multiple pageservers concurrently. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-26 16:11:55 +01:00
Joonas Koivunen	16f0622222	fix: real_s3 flakyness with rust tests (#5386 ) Fixes #5072. See proof from https://github.com/neondatabase/neon/issues/5072#issuecomment-1735580798. Turns out multiple threads can get the same nanoseconds since epoch, so switch to using millis (for finding the prefix later on) and randomness via `thread_rng` (protect against adversial ci runners). Also changes the "per test looking alike" prefix to more "general" prefix.	2023-09-26 15:59:25 +01:00
Joonas Koivunen	5d8597c2f0	refactor(consumption_metrics): post-split cleanup (#5327 ) Split off from #5297. Builds upon #5326. Handles original review comments which I did not move to earlier split PRs. Completes test support for verifying events by notifying of the last batch of events. Adds cleaning up of tempfiles left because of an unlucky shutdown or SIGKILL. Finally closes #5175. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-09-18 23:30:01 +03:00
Em Sharnoff	722e5260bf	vm-monitor: Don't set cgroup memory.max (#5333 ) All it does is make postgres OOM more often (which, tbf, means we're less likely to have e.g. compute_ctl get OOM-killed, but that tradeoff isn't worth it). Internally, this means removing all references to `memory.max` and the places where we calculate or store the intended value. As discussed in the sync earlier. ref: - https://neondb.slack.com/archives/C03H1K0PGKH/p1694698949252439?thread_ts=1694505575.693449&cid=C03H1K0PGKH - https://neondb.slack.com/archives/C03H1K0PGKH/p1695049198622759	2023-09-18 17:47:48 +00:00

1 2 3 4 5 ...

394 Commits