rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-06-01 20:40:37 +00:00

Author	SHA1	Message	Date
John Spray	e89e41f8ba	tests: update for tenant generations (#5449 ) ## Problem Some existing tests are written in a way that's incompatible with tenant generations. ## Summary of changes Update all the tests that need updating: this is things like calling through the NeonPageserver.tenant_attach helper to get a generation number, instead of calling directly into the pageserver API. There are various more subtle cases.	2023-12-07 12:27:16 +00:00
Heikki Linnakangas	31be301ef3	Make simple_rcu::RcuWaitList::wait() async (#6046 ) The gc_timeline() function is async, but it calls the synchronous wait() function. In the worst case, that could lead to a deadlock by using up all tokio executor threads. In the passing, fix a few typos in comments. Fixes issue #6045. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-12-07 10:20:40 +02:00
Joonas Koivunen	a3c7d400b4	fix: avoid allocations with logging a slug (#6047 ) to_string forces allocating a less than pointer sized string (costing on stack 4 usize), using a Display formattable slug saves that. the difference seems small, but at the same time, we log these a lot.	2023-12-07 07:25:22 +00:00
John Spray	61fe9d360d	pageserver: add Key->Shard mapping logic & use it in page service (#5980 ) ## Problem When a pageserver receives a page service request identified by TenantId, it must decide which `Tenant` object to route it to. As in earlier PRs, this stuff is all a no-op for tenants with a single shard: calls to `is_key_local` always return true without doing any hashing on a single-shard ShardIdentity. Closes: https://github.com/neondatabase/neon/issues/6026 ## Summary of changes - Carry immutable `ShardIdentity` objects in Tenant and Timeline. These provide the information that Tenants/Timelines need to figure out which shard is responsible for which Key. - Augment `get_active_tenant_with_timeout` to take a `ShardSelector` specifying how the shard should be resolved for this tenant. This mode depends on the kind of request (e.g. basebackups always go to shard zero). - In `handle_get_page_at_lsn_request`, handle the case where the Timeline we looked up at connection time is not the correct shard for the page being requested. This can happen whenever one node holds multiple shards for the same tenant. This is currently written as a "slow path" with the optimistic expectation that usually we'll run with one shard per pageserver, and the Timeline resolved at connection time will be the one serving page requests. There is scope for optimization here later, to avoid doing the full shard lookup for each page. - Omit consumption metrics from nonzero shards: only the 0th shard is responsible for tracing accurate relation sizes. Note to reviewers: - Testing of these changes is happening separately on the `jcsp/sharding-pt1` branch, where we have hacked neon_local etc needed to run a test_pg_regress. - The main caveat to this implementation is that page service connections still look up one Timeline when the connection is opened, before they know which pages are going to be read. If there is one shard per pageserver then this will always also be the Timeline that serves page requests. However, if multiple shards are on one pageserver then get page requests will incur the cost of looking up the correct Timeline on each getpage request. We may look to improve this in future with a "sticky" timeline per connection handler so that subsequent requests for the same Timeline don't have to look up again, and/or by having postgres pass a shard hint when connecting. This is tracked in the "Loose ends" section of https://github.com/neondatabase/neon/issues/5507	2023-12-05 12:01:55 +00:00
Conrad Ludgate	f60e49fe8e	proxy: fix panic in startup packet (#6032 ) ## Problem Panic when less than 8 bytes is presented in a startup packet. ## Summary of changes We need there to be a 4 byte message code, so the expected min length is 8.	2023-12-05 11:24:16 +01:00
Alexey Kondratov	85d08581ed	[compute_ctl] Introduce feature flags in the compute spec (#6016 ) ## Problem In the past we've rolled out all new `compute_ctl` functionality right to all users, which could be risky. I want to have a more fine-grained control over what we enable, in which env and to which users. ## Summary of changes Add an option to pass a list of feature flags to `compute_ctl`. If not passed, it defaults to an empty list. Any unknown flags are ignored. This allows us to release new experimental features safer, as we can then flip the flag for one specific user, only Neon employees, free / pro / etc. users and so on. Or control it per environment. In the current implementation feature flags are passed via compute spec, so they do not allow controlling behavior of `empty` computes. For them, we can either stick with the previous approach, i.e. add separate cli args or introduce a more generic `--features` cli argument.	2023-12-04 19:54:18 +01:00
John Spray	1d81e70d60	pageserver: tweak logs for index_part loading (#6005 ) ## Problem On pageservers upgraded to enable generations, these INFO level logs were rather frequent. If a tenant timeline hasn't written new layers since the upgrade, it will emit the "No index_part.json*" log every time it starts. ## Summary of changes - Downgrade two log lines from info to debug - Add a tiny unit test that I wrote for sanity-checking that there wasn't something wrong with our Generation-comparing logic when loading index parts.	2023-12-04 09:57:47 +00:00
Christian Schwarz	ce1652990d	logical size: better represent level of accuracy in the type system (#5999 ) I would love to not expose the in-accurate value int he mgmt API at all, and in fact control plane doesn't use it [^1]. But our tests do, and I have no desire to change them at this time. [^1]: https://github.com/neondatabase/cloud/pull/8317	2023-12-01 14:16:29 +01:00
Arpad Müller	b71b8ecfc2	Add existing_initdb_timeline_id param to timeline creation (#5912 ) This PR adds an `existing_initdb_timeline_id` option to timeline creation APIs, taking an optional timeline ID. Follow-up of #5390. If the `existing_initdb_timeline_id` option is specified via the HTTP API, the pageserver downloads the existing initdb archive from the given timeline ID and extracts it, instead of running initdb itself. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-30 22:32:04 +01:00
John Khvatov	3e094e90d7	update aws sdk to 1.0.x (#5976 ) This change will be useful for experimenting with S3 performance. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 14:17:58 +02:00
Rahul Modpur	50d959fddc	refactor: use serde for TenantConf deserialization Fixes: #5300 (#5310 ) Remove handcrafted TenantConf deserialization code. Use `serde_path_to_error` to include the field which failed parsing. Leaves the duplicated TenantConf in pageserver and models, does not touch PageserverConf handcrafted deserialization. Error change: - before change: "configure option `checkpoint_distance` cannot be negative" - after change: "`checkpoint_distance`: invalid value: integer `-1`, expected u64" Fixes: #5300 Cc: #3682 --------- Signed-off-by: Rahul Modpur <rmodpur2@gmail.com> Co-authored-by: Shany Pozin <shany@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-30 12:47:13 +02:00
John Spray	9e55ad4796	pageserver: refactor TenantId to TenantShardId in Tenant & Timeline (#5957 ) (includes two preparatory commits from https://github.com/neondatabase/neon/pull/5960) ## Problem To accommodate multiple shards in the same tenant on the same pageserver, we must include the full TenantShardId in local paths. That means that all code touching local storage needs to see the TenantShardId. ## Summary of changes - Replace `tenant_id: TenantId` with `tenant_shard_id: TenantShardId` on Tenant, Timeline and RemoteTimelineClient. - Use TenantShardId in helpers for building local paths. - Update all the relevant call sites. This doesn't update absolutely everything: things like PageCache, TaskMgr, WalRedo are still shard-naive. The purpose of this PR is to update the core types so that others code can be added/updated incrementally without churning the most central shared types.	2023-11-29 14:52:35 +00:00
John Spray	1ab0cfc8cb	pageserver: add sharding metadata to `LocationConf` (#5932 ) ## Problem The TenantShardId in API URLs is sufficient to uniquely identify a tenant shard, but not for it to function: it also needs to know its full sharding configuration (stripe size, layout version) in order to map keys to shards. ## Summary of changes - Introduce ShardIdentity: this is the superset of ShardIndex (#5924 ) that is required for translating keys to shard numbers. - Include ShardIdentity as an optional attribute of LocationConf - Extend the public `LocationConfig` API structure with a flat representation of shard attributes. The net result is that at the point we construct a `Tenant`, we have a `ShardIdentity` (inside LocationConf). This enables the next steps to actually use the ShardIdentity to split WAL and validate that page service requires are reaching the correct shard.	2023-11-28 13:14:51 +00:00
John Spray	ca469be1cf	pageserver: add shard indices to layer metadata (#5928 ) ## Problem For sharded tenants, the layer keys must include the shard number and shard count, to disambiguate keys written by different shards in the same tenant (shard number), and disambiguate layers written before and after splits (shard count). Closes: https://github.com/neondatabase/neon/issues/5924 ## Summary of changes There are no functional changes in this PR: everything behaves the same for the default ShardIndex::unsharded() value. Actual construct of sharded tenants will come next. - Add a ShardIndex type: this is just a wrapper for a ShardCount and ShardNumber. This is a subset of ShardIdentity: whereas ShardIdentity contains enough information to filter page keys, ShardIndex contains just enough information to construct a remote key. ShardIndex has a compact encoding, the same as the shard part of TenantShardId. - Store the ShardIndex as part of IndexLayerMetadata, if it is set to a different value than ShardIndex::unsharded. - Update RemoteTimelineClient and DeletionQueue to construct paths using the layer metadata. Deletion code paths that previously just passed a `Generation` now pass a full `LayerFileMetadata` to capture the shard as well. Notes to reviewers: - In deletion code paths, I could have used a (Generation, ShardIndex) instead of the full LayerFileMetadata. I opted for the full object partly for brevity, and partly because in future when we add checksums the deletion code really will care about the full metadata in order to validate that it is deleting what was intended. - While ShardIdentity and TenantShardId could both use a ShardIndex, I find that they read more cleanly as "flat" structs that spell out the shard count and number field separately. Serialization code would need writing out by hand anyway, because TenantShardId's serialized form is not a serde struct-style serialization. - ShardIndex doesn't _have_ to exist (we could use ShardIdentity everywhere), but it is a worthwhile optimization, as we will have many copies of this as part of layer metadata. In future the size difference betweedn ShardIndex and ShardIdentity may become larger if we implement more sophisticated key distribution mechanisms (i.e. new values of ShardIdentity::layout). --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-28 11:47:25 +00:00
Arpad Müller	54327bbeec	Upload initdb results to S3 (#5390 ) ## Problem See #2592 ## Summary of changes Compresses the results of initdb into a .tar.zst file and uploads them to S3, to enable usage in recovery from lsn. Generations should not be involved I think because we do this only once at the very beginning of a timeline. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-23 18:11:52 +00:00
Christian Schwarz	9e3c07611c	logging: support output to stderr (#5896 ) (part of the getpage benchmarking epic #5771) The plan is to make the benchmarking tool log on stderr and emit results as JSON on stdout. That way, the test suite can simply take captures stdout and json.loads() it, while interactive users of the benchmarking tool have a reasonable experience as well. Existing logging users continue to print to stdout, so, this change should be a no-op functionally and performance-wise.	2023-11-22 11:08:35 +00:00
John Spray	ab631e6792	pageserver: make TenantsMap shard-aware (#5819 ) ## Problem When using TenantId as the key, we are unable to handle multiple tenant shards attached to the same pageserver for the same tenant ID. This is an expected scenario if we have e.g. 8 shards and 5 pageservers. ## Summary of changes - TenantsMap is now a BTreeMap instead of a HashMap: this enables looking up by range. In future, we will need this for page_service, as incoming requests will just specify the Key, and we'll have to figure out which shard to route it to. - A new key type TenantShardId is introduced, to act as the key in TenantsMap, and as the id type in external APIs. Its human readable serialization is backward compatible with TenantId, and also forward-compatible as long as sharding is not actually used (when we construct a TenantShardId with ShardCount(0), it serializes to an old-fashioned TenantId). - Essential tenant APIs are updated to accept TenantShardIds: tenant/timeline create, tenant delete, and /location_conf. These are the APIs that will enable driving sharded tenants. Other apis like /attach /detach /load /ignore will not work with sharding: those will soon be deprecated and replaced with /location_conf as part of the live migration work. Closes: #5787	2023-11-15 23:20:21 +02:00
John Spray	e0821e1eab	pageserver: refined Timeline shutdown (#5833 ) ## Problem We have observed the shutdown of a timeline taking a long time when a deletion arrives at a busy time for the system. This suggests that we are not respecting cancellation tokens promptly enough. ## Summary of changes - Refactor timeline shutdown so that rather than having a shutdown() function that takes a flag for optionally flushing, there are two distinct functions, one for graceful flushing shutdown, and another that does the "normal" shutdown where we're just setting a cancellation token and then tearing down as fast as we can. This makes things a bit easier to reason about, and enables us to remove the hand-written variant of shutdown that was maintained in `delete.rs` - Layer flush task checks cancellation token more carefully - Logical size calculation's handling of cancellation tokens is simplified: rather than passing one in, it respects the Timeline's cancellation token. This PR doesn't touch RemoteTimelineClient, which will be a key thing to fix as well, so that a slow remote storage op doesn't hold up shutdown.	2023-11-09 16:02:59 +00:00
John Spray	9c30883c4b	remote_storage: use S3 SDK's adaptive retry policy (#5813 ) ## Problem Currently, we aren't doing any explicit slowdown in response to 429 responses. Recently, as we hit remote storage a bit harder (pageserver does more ListObjectsv2 requests than it used to since #5580 ), we're seeing storms of 429 responses that may be the result of not just doing too may requests, but continuing to do those extra requests without backing off any more than our usual backoff::exponential. ## Summary of changes Switch from AWS's "Standard" retry policy to "Adaptive" -- docs describe this as experimental but it has been around for a long time. The main difference between Standard and Adaptive is that Adaptive rate-limits the client in response to feedback from the server, which is meant to avoid scenarios where the client would otherwise repeatedly hit throttling responses.	2023-11-09 13:50:13 +00:00
Arthur Petukhovsky	0495798591	Fix walproposer build on aarch64 (#5827 ) There was a compilation error due to `std::ffi::c_char` being different type on different platforms. Clippy also complained due to a similar reason.	2023-11-09 13:05:17 +00:00
Sasha Krassovsky	87389bc933	Add test simulating bad connection between pageserver and compute (#5728 ) ## Problem We have a funny 3-day timeout for connections between the compute and pageserver. We want to get rid of it, so to do that we need to make sure the compute is resilient to connection failures. Closes: https://github.com/neondatabase/neon/issues/5518 ## Summary of changes This test makes the pageserver randomly drop the connection if the failpoint is enabled, and ensures we can keep querying the pageserver. This PR also reduces the default timeout to 10 minutes from 3 days.	2023-11-08 19:48:57 +00:00
Arpad Müller	ea118a238a	JWT logging improvements (#5823 ) * lower level on auth success from info to debug (fixes #5820) * don't log stacktraces on auth errors (as requested on slack). we do this by introducing an `AuthError` type instead of using `anyhow` and `bail`. * return errors that have been censored for improved security.	2023-11-08 16:56:53 +00:00
Christian Schwarz	e9b227a11e	cleanup unused RemoteStorage fields (#5830 ) Found this while working on #5771	2023-11-08 16:54:33 +00:00
John Spray	40441f8ada	pageserver: use `Gate` for stronger safety check in `SlotGuard` (#5793 ) ## Problem #5711 and #5367 raced -- the `SlotGuard` type needs `Gate` to properly enforce its invariant that we may not drop an `Arc<Tenant>` from a slot. ## Summary of changes Replace the TODO with the intended check of Gate.	2023-11-08 13:00:11 +00:00
Em Sharnoff	acef742a6e	vm-monitor: Remove dependency on workspace_hack (#5752 ) neondatabase/autoscaling builds libs/vm-monitor during CI because it's a necessary component of autoscaling. workspace_hack includes a lot of crates that are not necessary for vm-monitor, which artificially inflates the build time on the autoscaling side, so hopefully removing the dependency should speed things up. Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 09:41:20 -08:00
Arpad Müller	e310533ed3	Support JWT key reload in pageserver (#5594 ) ## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 15:43:29 +01:00
Joonas Koivunen	4be6bc7251	refactor: remove unnecessary unsafe (#5802 ) unsafe impls for `Send` and `Sync` should not be added by default. in the case of `SlotGuard` removing them does not cause any issues, as the compiler automatically derives those. This PR adds requirement to document the unsafety (see [clippy::undocumented_unsafe_blocks]) and opportunistically adds `#![deny(unsafe_code)]` to most places where we don't have unsafe code right now. TRPL on Send and Sync: https://doc.rust-lang.org/book/ch16-04-extensible-concurrency-sync-and-send.html [clippy::undocumented_unsafe_blocks]: https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks	2023-11-07 10:26:25 +00:00
Arpad Müller	b09a851705	Make azure blob storage not do extra metadata requests (#5777 ) Load the metadata from the returned `GetBlobResponse` and avoid downloading it via a separate request. As it turns out, the SDK does return the metadata: https://github.com/Azure/azure-sdk-for-rust/issues/1439 . This PR will reduce the number of requests to Azure caused by downloads. Fixes #5571	2023-11-06 15:16:55 +00:00
John Spray	85cd97af61	pageserver: add `InProgress` tenant map state, use a sync lock for the map (#5367 ) ## Problem Follows on from #5299 - We didn't have a generic way to protect a tenant undergoing changes: `Tenant` had states, but for our arbitrary transitions between secondary/attached, we need a general way to say "reserve this tenant ID, and don't allow any other ops on it, but don't try and report it as being in any particular state". - The TenantsMap structure was behind an async RwLock, but it was never correct to hold it across await points: that would block any other changes for all tenants. ## Summary of changes - Add the `TenantSlot::InProgress` value. This means: - Incoming administrative operations on the tenant should retry later - Anything trying to read the live state of the tenant (e.g. a page service reader) should retry later or block. - Store TenantsMap in `std::sync::RwLock` - Provide an extended `get_active_tenant_with_timeout` for page_service to use, which will wait on InProgress slots as well as non-active tenants. Closes: https://github.com/neondatabase/neon/issues/5378 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-11-06 14:03:22 +00:00
John Spray	6defa2b5d5	pageserver: add `Gate` as a partner to CancellationToken for safe shutdown of `Tenant` & `Timeline` (#5711 ) ## Problem When shutting down a Tenant, it isn't just important to cause any background tasks to stop. It's also important to wait until they have stopped before declaring shutdown complete, in cases where we may re-use the tenant's local storage for something else, such as running in secondary mode, or creating a new tenant with the same ID. ## Summary of changes A `Gate` class is added, inspired by [seastar::gate](https://docs.seastar.io/master/classseastar_1_1gate.html). For types that have an important lifetime that corresponds to some physical resource, use of a Gate as well as a CancellationToken provides a robust pattern for async requests & shutdown: - Requests must always acquire the gate as long as they are using the object - Shutdown must set the cancellation token, and then `close()` the gate to wait for requests in progress before returning. This is not for memory safety: it's for expressing the difference between "Arc<Tenant> exists", and "This tenant's files on disk are eligible to be read/written". - Both Tenant and Timeline get a Gate & CancellationToken. - The Timeline gate is held during eviction of layers, and during page_service requests. - Existing cancellation support in page_service is refined to use the timeline-scope cancellation token instead of a process-scope cancellation token. This replaces the use of `task_mgr::associate_with`: tasks no longer change their tenant/timelineidentity after being spawned. The Tenant's Gate is not yet used, but will be important for Tenant-scoped operations in secondary mode, where we must ensure that our secondary-mode downloads for a tenant are gated wrt the activity of an attached Tenant. This is part of a broader move away from using the global-state driven `task_mgr` shutdown tokens: - less global state where we rely on implicit knowledge of what task a given function is running in, and more explicit references to the cancellation token that a particular function/type will respect, making shutdown easier to reason about. - eventually avoid the big global TASKS mutex. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 12:39:20 +00:00
duguorong009	b3d3a2587d	feat: improve the serde impl for several types(`Lsn`, `TenantId`, `TimelineId` ...) (#5335 ) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 11:40:03 +02:00
duguorong009	09b5954526	refactor: use streaming in safekeeper `/v1/debug_dump` http response (#5731 ) - Update the handler for `/v1/debug_dump` http response in safekeeper - Update the `debug_dump::build()` to use the streaming in JSON build process	2023-11-05 10:16:54 +00:00
John Spray	306c4f9967	s3_scrubber: prepare for scrubbing buckets with generation-aware content (#5700 ) ## Problem The scrubber didn't know how to find the latest index_part when generations were in use. ## Summary of changes - Teach the scrubber to do the same dance that pageserver does when finding the latest index_part.json - Teach the scrubber how to understand layer files with generation suffixes. - General improvement to testability: scan_metadata has a machine readable output that the testing `S3Scrubber` wrapper can read. - Existing test coverage of scrubber was false-passing because it just didn't see any data due to prefixing of data in the bucket. Fix that. This is incremental improvement: the more confidence we can have in the scrubber, the more we can use it in integration tests to validate the state of remote storage. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2023-11-03 17:36:02 +00:00
Joonas Koivunen	27bdbf5e36	chore(layer): restore logging, doc changes (#5766 ) Some of the log messages were lost with the #4938. This PR adds some of them back, most notably: - starting to on-demand download - successful completion of on-demand download - ability to see when there were many waiters for the layer download - "unexpectedly on-demand downloading ..." is now `info!` Additionally some rare events are logged as error, which should never happen.	2023-11-02 19:05:33 +00:00
Em Sharnoff	367971a0e9	vm-monitor: Remove support for file cache in tmpfs (#5617 ) ref neondatabase/cloud#7516. We switched everything over to file cache on disk, now time to remove support for having it in tmpfs.	2023-11-02 16:06:16 +00:00
Joonas Koivunen	2dca4c03fc	feat(layer): cancellable get_or_maybe_download (#5744 ) With the layer implementation as was done in #4938, it is possible via cancellation to cause two concurrent downloads on the same path, due to how `RemoteTimelineClient::download_remote_layer` does tempfiles. Thread the init semaphore through the spawned task of downloading to make this impossible to happen.	2023-11-02 08:06:32 +00:00
Conrad Ludgate	d8c21ec70d	fix nightly 1.75 (#5719 ) ## Problem Neon doesn't compile on nightly and had numerous clippy complaints. ## Summary of changes 1. Fixed troublesome dependency 2. Fixed or ignored the lints where appropriate	2023-10-30 16:43:06 +00:00
Konstantin Knizhnik	ad99fa5f03	Grant BYPASSRLS and REPLICATION to exited roles (#5657 ) ## Problem Role need to have REPLICATION privilege to be able to used for logical replication. New roles are created with this option. This PR tries to update existed roles. ## Summary of changes Update roles in `handle_roles` method ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>	2023-10-30 15:29:25 +00:00
Joonas Koivunen	2bd79906d9	fix: possible page_service hang on cancel (#5696 ) Fixes #5341, one more suspected case, see: https://github.com/neondatabase/neon/issues/5341#issuecomment-1783052379 - races `MaybeWriteOnly::shutdown` with cancellation - switches to using `AsyncWriteExt::write_buf` - notes cancellation safety for shutdown	2023-10-27 19:09:34 +01:00
Gleb Novikov	a5292f7e67	Some minor renames in attachment service API (#5687 ) ## Problem ## Summary of changes ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] ~~If it is a core feature, I have added thorough tests.~~ - [ ] ~~Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?~~ - [ ] ~~If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.~~ ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2023-10-27 12:36:34 +01:00
duguorong009	39f8fd6945	feat: add `build_tag` env support for `set_build_info_metric` (#5576 ) - Add a new util `project_build_tag` macro, similar to `project_git_version` - Update the `set_build_info_metric` to accept and make use of `build_tag` info - Update all codes which use the `set_build_info_metric`	2023-10-27 10:47:11 +01:00
John Spray	de90bf4663	pageserver: always load remote metadata (no more `spawn_load`) (#5580 ) ## Problem The pageserver had two ways of loading a tenant: - `spawn_load` would trust on-disk content to reflect all existing timelines - `spawn_attach` would list timelines in remote storage. It was incorrect for `spawn_load` to trust local disk content, because it doesn't know if the tenant might have been attached and written somewhere else. To make this correct would requires some generation number checks, but the payoff is to avoid one S3 op per tenant at startup, so it's not worth the complexity -- it is much simpler to have one way to load a tenant. ## Summary of changes - `Tenant` objects are always created with `Tenant::spawn`: there is no more distinction between "load" and "attach". - The ability to run without remote storage (for `neon_local`) is preserved by adding a branch inside `attach` that uses a fallback `load_local` if no remote_storage is present. - Fix attaching a tenant when it has a timeline with no IndexPart: this can occur if a newly created timeline manages to upload a layer before it has uploaded an index. - The attach marker file that used to indicate whether a tenant should be "loaded" or "attached" is no longer needed, and is removed. - The GenericRemoteStorage interface gets a `list()` method that maps more directly to what ListObjects does, returning both keys and common prefixes. The existing `list_files` and `list_prefixes` methods are just calls into `list()` now -- these can be removed later if we would like to shrink the interface a bit. - The remote deletion marker is moved into `timelines/` and detected as part of listing timelines rather than as a separate GET request. If any existing tenants have a marker in the old location (unlikely, only happens if something crashes mid-delete), then they will rely on the control plane retrying to complete their deletion. - Revise S3 calls for timeline listing and tenant load to take a cancellation token, and retry forever: it never makes sense to make a Tenant broken because of a transient S3 issue. ## Breaking changes - The remote deletion marker is moved from `deleted` to `timelines/deleted` within the tenant prefix. Markers in the old location will be ignored: it is the control plane's responsibility to retry deletions until they succeed. Markers in the new location will be tolerated by the previous release of pageserver via https://github.com/neondatabase/neon/pull/5632 - The local `attaching` marker file is no longer written. Therefore, if the pageserver is downgraded after running this code, the old pageserver will not be able to distinguish between partially attached tenants and fully attached tenants. This would only impact tenants that were partway through attaching at the moment of downgrade. In the unlikely even t that we do experience an incident that prompts us to roll back, then we may check for attach operations in flight, and manually insert `attaching` marker files as needed. --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-10-26 14:48:44 +01:00
Joonas Koivunen	c508d3b5fa	reimpl Layer, remove remote layer, trait Layer, trait PersistentLayer (#4938 ) Implement a new `struct Layer` abstraction which manages downloadness internally, requiring no LayerMap locking or rewriting to download or evict providing a property "you have a layer, you can read it". The new `struct Layer` provides ability to keep the file resident via a RAII structure for new layers which still need to be uploaded. Previous solution solved this `RemoteTimelineClient::wait_completion` which lead to bugs like #5639. Evicting or the final local deletion after garbage collection is done using Arc'd value `Drop`. With a single `struct Layer` the closed open ended `trait Layer`, `trait PersistentLayer` and `struct RemoteLayer` are removed following noting that compaction could be simplified by simply not using any of the traits in between: #4839. The new `struct Layer` is a preliminary to remove `Timeline::layer_removal_cs` documented in #4745. Preliminaries: #4936, #4937, #5013, #5014, #5022, #5033, #5044, #5058, #5059, #5061, #5074, #5103, epic #5172, #5645, #5649. Related split off: #5057, #5134.	2023-10-26 12:36:38 +03:00
Arpad Müller	4bef977c56	Use tuples instead of manual comparison chain (#5637 ) Makes code a little bit simpler	2023-10-24 17:16:23 +00:00
Arpad Müller	1e250cd90a	Cleanup in azure_upload_download_works test (#5636 ) The `azure_upload_download_works` test is not cleaning up after itself, leaving behind the files it is uploading. I found these files when looking at the contents of the bucket in #5627. We now clean up the file we uploaded before, like the other tests do it as well. Follow-up of #5546	2023-10-23 19:08:56 +01:00
Em Sharnoff	2cf6a47cca	vm-monitor: Deny not fail downscale if no memory stats yet (#5606 ) Fixes an issue we observed on staging that happens when the autoscaler-agent attempts to immediately downscale the VM after binding, which is typical for pooled computes. The issue was occurring because the autoscaler-agent was requesting downscaling before the vm-monitor had gathered sufficient cgroup memory stats to be confident in approving it. When the vm-monitor returned an internal error instead of denying downscaling, the autoscaler-agent retried the connection and immediately hit the same issue (in part because cgroup stats are collected per-connection, rather than globally).	2023-10-19 19:09:37 +01:00
Em Sharnoff	2c8741a5ed	vm-monitor: Log full error on message handling failure (#5604 ) There's currently an issue with the vm-monitor on staging that's not really feasible to debug because the current display impl gives no context to the errors (just says "failed to downscale"). Logging the full error should help. For communications with the autoscaler-agent, it's ok to only provide the outermost cause, because we can cross-reference with the VM logs. At some point in the future, we may want to change that.	2023-10-19 18:10:33 +02:00
Arthur Petukhovsky	66f8f5f1c8	Call walproposer from Rust (#5403 ) Create Rust bindings for C functions from walproposer. This allows to write better tests with real walproposer code without spawning multiple processes and starting up the whole environment. `make walproposer-lib` stage was added to build static libraries `libwalproposer.a`, `libpgport.a`, `libpgcommon.a`. These libraries can be statically linked to any executable to call walproposer functions. `libs/walproposer/src/walproposer.rs` contains `test_simple_sync_safekeepers` to test that walproposer can be called from Rust to emulate sync_safekeepers logic. It can also be used as a usage example.	2023-10-19 14:17:15 +01:00
Arpad Müller	b1d6af5ebe	Azure blobs: Simplify error conversion by addition of to_download_error (#5575 ) There is a bunch of duplication and manual Result handling that can be simplified by moving the error conversion into a shared function, using `map_err`, and the question mark operator.	2023-10-19 14:31:09 +02:00
Arpad Müller	f842b22b90	Add endpoint for querying time info for lsn (#5497 ) ## Problem See #5468. ## Summary of changes Add a new `get_timestamp_of_lsn` endpoint, returning the timestamp associated with the given lsn. Fixes #5468. --------- Co-authored-by: Shany Pozin <shany@neon.tech>	2023-10-19 04:50:49 +02:00

1 2 3 4 5 ...

420 Commits