rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-09 06:22:57 +00:00

Author	SHA1	Message	Date
Alex Chi Z	5f0d9f2360	fix: add safekeeper team to pgxn codeowners (#7170 ) `pgxn/` also contains WAL proposer code, so modifications to this directory should be able to be approved by the safekeeper team. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-20 18:40:48 +00:00
Arpad Müller	34fa34d15c	Dump layer map json in test_gc_feedback.py (#7179 ) The layer map json is an interesting file for that test, so dump it to make debugging easier.	2024-03-20 18:39:46 +00:00
Joonas Koivunen	e961e0d3df	fix(Layer): always init after downloading in the spawned task (#7175 ) Before this PR, cancellation for `LayerInner::get_or_maybe_download` could occur so that we have downloaded the layer file in the filesystem, but because of the cancellation chance, we have not set the internal `LayerInner::inner` or initialized the state. With the detached init support introduced in #7135 and in place in #7152, we can now initialize the internal state after successfully downloading in the spawned task. The next PR will fix the remaining problems that this PR leaves: - `Layer::keep_resident` is still used because - `Layer::get_or_maybe_download` always cancels an eviction, even when canceled Split off from #7030. Stacked on top of #7152. Cc: #5331.	2024-03-20 20:37:47 +02:00
John Spray	2726b1934e	pageserver: extra debug for test_secondary_downloads failures (#7183 ) - Enable debug logs for this test - Add some debug logging detail in downloader.rs - Add an info-level message in scheduler.rs that makes it obvious if a command is waiting for an existing task rather than spawning a new one.	2024-03-20 18:07:45 +00:00
Joonas Koivunen	3d16cda846	refactor(layer): use detached init (#7152 ) The second part of work towards fixing `Layer::keep_resident` so that it does not need to repair the internal state. #7135 added a nicer API for initialization. This PR uses it to remove a few indentation levels and the loop construction. The next PR #7175 will use the refactorings done in this PR, and always initialize the internal state after a download. Cc: #5331	2024-03-20 18:03:09 +02:00
Joonas Koivunen	fb66a3dd85	fix: ResidentLayer::load_keys should not create INFO level span (#7174 ) Since #6115 with more often used get_value_reconstruct_data and friends, we should not have needless INFO level span creation near hot paths. In our prod configuration, INFO spans are always created, but in practice, very rarely anything at INFO level is logged underneath. `ResidentLayer::load_keys` is only used during compaction so it is not that hot, but this aligns the access paths and their span usage. PR changes the span level to debug to align with others, and adds the layer name to the error which was missing. Split off from #7030.	2024-03-20 15:08:03 +01:00
Conrad Ludgate	6d996427b1	proxy: enable sha2 asm support (#7184 ) ## Problem faster sha2 hashing. ## Summary of changes enable asm feature for sha2. this feature will be default in sha2 0.11, so we might as well lean into it now. It provides a noticeable speed boost on macos aarch64. Haven't tested on x86 though	2024-03-20 12:26:31 +00:00
Vlad Lazar	4ba3f3518e	test: fix on demand activation test flakyness (#7180 ) Warm-up (and the "tenant startup complete" metric update) happens in a background tokio task. The tenant map is eagerly updated (can happen before the task finishes). The test assumed that if the tenant map was updated, then the metric should reflect that. That's not the case, so we tweak the test to wait for the metric. Fixes https://github.com/neondatabase/neon/issues/7158	2024-03-20 10:24:59 +00:00
John Spray	a5d5c2a6a0	storage controller: tech debt (#7165 ) This is a mixed bag of changes split out for separate review while working on other things, and batched together to reduce load on CI runners. Each commits stands alone for review purposes: - do_tenant_shard_split was a long function and had a synchronous validation phase at the start that could readily be pulled out into a separate function. This also avoids the special casing of ApiError::BadRequest when deciding whether an abort is needed on errors - Add a 'describe' API (GET on tenant ID) that will enable storcon-cli to see what's going on with a tenant - the 'locate' API wasn't really meant for use in the field. It's for tests: demote it to the /debug/ prefix - The `Single` placement policy was a redundant duplicate of Double(0), and Double was a bad name. Rename it Attached. (https://github.com/neondatabase/neon/issues/7107) - Some neon_local commands were added for debug/demos, which are now replaced by commands in storcon-cli (#7114 ). Even though that's not merged yet, we don't need the neon_local ones any more. Closes https://github.com/neondatabase/neon/issues/7107 ## Backward compat of Single/Double -> `Attached(n)` change A database migration is used to convert any existing values.	2024-03-19 16:08:20 +00:00
Tristan Partin	64c6dfd3e4	Move functions for creating/extracting tarballs into utils Useful for other code paths which will handle zstd compression and decompression.	2024-03-19 10:50:41 -05:00
Alex Chi Z	a8384a074e	fixup(#7168 ): neon_local: use pageserver defaults for known but unspecified config overrides (#7166 ) e2e tests cannot run on macOS unless the file engine env var is supplied. ``` ./scripts/pytest test_runner/regress/test_neon_superuser.py -s ``` will fail with tokio-epoll-uring not supported. This is because we persist the file engine config by default. In this pull request, we only persist when someone specifies it, so that it can use the default platform-variant config in the page server. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-19 10:43:24 -04:00
John Spray	b80704cd34	tests: log hygiene checks for storage controller (#6710 ) ## Problem As with the pageserver, we should fail tests that emit unexpected log errors/warnings. ## Summary of changes - Refactor existing log checks to be reusable - Run log checks for attachment_service - Add allow lists as needed.	2024-03-19 10:30:33 +00:00
Conrad Ludgate	49be446d95	async password validation (#7171 ) ## Problem password hashing can block main thread ## Summary of changes spawn_blocking the password hash call	2024-03-18 23:57:32 +01:00
Arthur Petukhovsky	ad5efb49ee	Support backpressure for sharding (#7100 ) Add shard_number to PageserverFeedback and parse it on the compute side. When compute receives a new ps_feedback, it calculates min LSNs among feedbacks from all shards, and uses those LSNs for backpressure. Add `test_sharding_backpressure` to verify that backpressure slows down compute to wait for the slowest shard.	2024-03-18 21:54:44 +00:00
Christian Schwarz	2bc2fd9cfd	fixup(#7160 / tokio_epoll_uring_ext): double-panic caused by info! in thread-local's drop() (#7164 ) Manual testing of the changes in #7160 revealed that, if the thread-local destructor ever runs (it apparently doesn't in our test suite runs, otherwise #7160 would not have auto-merged), we can encounter an `abort()` due to a double-panic in the tracing code. This github comment here contains the stack trace: https://github.com/neondatabase/neon/pull/7160#issuecomment-2003778176 This PR reverts #7160 and uses a atomic counter to identify the thread-local in log messages, instead of the memory address of the thread local, which may be re-used.	2024-03-18 16:12:01 +01:00
Joonas Koivunen	877fd14401	fix: spanless log message (#7155 ) with `immediate_gc` the span only covered the `gc_iteration`, make it cover the whole needless spawned task, which also does waiting for layer drops and stray logging in tests. also clarify some comments while we are here. Fixes: #6910	2024-03-18 16:27:53 +02:00
Christian Schwarz	db749914d8	fixup(#7141 / tokio_epoll_uring_ext): high frequency log message (#7160 ) The PR #7141 added log message ``` ThreadLocalState is being dropped and id might be re-used in the future ``` which was supposed to be emitted when the thread-local is destroyed. Instead, it was emitted on _each_ call to `thread_local_system()`, ie.., on each tokio-epoll-uring operation. Testing ------- Reproduced the issue locally and verified that this PR fixes the issue.	2024-03-18 12:29:20 +00:00
John Spray	1d3ae57f18	pageserver: refactoring in TenantManager to reduce duplication (#6732 ) ## Problem Followup to https://github.com/neondatabase/neon/pull/6725 In that PR, code for purging local files from a tenant shard was duplicated. ## Summary of changes - Refactor detach code into TenantManager - `spawn_background_purge` method can now be common between detach and split operations	2024-03-18 10:37:20 +00:00
Joonas Koivunen	30a3d80d2f	build: make procfs linux only dependency (#7156 ) the dependency refuses to build on macos so builds on `main` are broken right now, including the `release` PR.	2024-03-18 09:28:45 +00:00
Christian Schwarz	5cec5cb3cf	fixup(#7120 ): the macOS code used an outdated constant name, broke the build (#7150 )	2024-03-15 19:48:51 +00:00
Christian Schwarz	0694ee9531	tokio-epoll-uring: retry on launch failures due to locked memory (#7141 ) refs https://github.com/neondatabase/neon/issues/7136 Problem ------- Before this PR, we were using `tokio_epoll_uring::thread_local_system()`, which panics on tokio_epoll_uring::System::launch() failure As we've learned in [the past](https://github.com/neondatabase/neon/issues/6373#issuecomment-1905814391), some older Linux kernels account io_uring instances as locked memory. And while we've raised the limit in prod considerably, we did hit it once on 2024-03-11 16:30 UTC. That was after we enabled tokio-epoll-uring fleet-wide, but before we had shipped release-5090 (`c6ed86d3d0`) which did away with the last mass-creation of tokio-epoll-uring instances as per commit `3da410c8fe` Author: Christian Schwarz <christian@neon.tech> Date: Tue Mar 5 10:03:54 2024 +0100 tokio-epoll-uring: use it on the layer-creating code paths (#6378) Nonetheless, it highlighted that panicking in this situation is probably not ideal, as it can leave the pageserver process in a semi-broken state. Further, due to low sampling rate of Prometheus metrics, we don't know much about the circumstances of this failure instance. Solution -------- This PR implements a custom thread_local_system() that is pageserver-aware and will do the following on failure: - dump relevant stats to `tracing!`, hopefully they will be useful to understand the circumstances better - if it's the locked memory failure (or any other ENOMEM): abort() the process - if it's ENOMEM, retry with exponential back-off, capped at 3s. - add metric counters so we can create an alert This makes sense in the production environment where we know that _usually_, there's ample locked memory allowance available, and we know the failure rate is rare.	2024-03-15 19:46:15 +00:00
John Spray	9752ad8489	pageserver, controller: improve secondary download APIs for large shards (#7131 ) ## Problem The existing secondary download API relied on the caller to wait as long as it took to complete -- for large shards that could be a long time, so typical clients that might have a baked-in ~30s timeout would have a problem. ## Summary of changes - Take a `wait_ms` query parameter to instruct the pageserver how long to wait: if the download isn't complete in this duration, then 201 is returned instead of 200. - For both 200 and 201 responses, include response body describing download progress, in terms of layers and bytes. This is sufficient for the caller to track how much data is being transferred and log/present that status. - In storage controller live migrations, use this API to apply a much longer outer timeout, with smaller individual per-request timeouts, and log the progress of the downloads. - Add a test that injects layer download delays to exercise the new behavior	2024-03-15 19:45:58 +00:00
Christian Schwarz	ad6f538aef	tokio-epoll-uring: use it for on-demand downloads (#6992 ) # Problem On-demand downloads are still using `tokio::fs`, which we know is inefficient. # Changes - Add `pagebench ondemand-download-churn` to quantify on-demand download throughput - Requires dumping layer map, which required making `history_buffer` impl `Deserialize` - Implement an equivalent of `tokio::io::copy_buf` for owned buffers => `owned_buffers_io` module and children. - Make layer file download sensitive to `io_engine::get()`, using VirtualFile + above copy loop - For this, I had to move some code into the `retry_download`, e.g., `sync_all()` call. Drive-by: - fix missing escaping in `scripts/ps_ec2_setup_instance_store` - if we failed in retry_download to create a file, we'd try to remove it, encounter `NotFound`, and `abort()` the process using `on_fatal_io_error`. This PR adds treats `NotFound` as a success. # Testing Functional - The copy loop is generic & unit tested. Performance - Used the `ondemand-download-churn` benchmark to manually test against real S3. - Results (public Notion page): https://neondatabase.notion.site/Benchmarking-tokio-epoll-uring-on-demand-downloads-2024-04-15-newer-code-03c0fdc475c54492b44d9627b6e4e710?pvs=4 - Performance is equivalent at low concurrency. Jumpier situation at high concurrency, but, still less CPU / throughput with tokio-epoll-uring. - It’s a win. # Future Work Turn the manual performance testing described in the above results document into a performance regression test: https://github.com/neondatabase/neon/issues/7146	2024-03-15 18:57:05 +00:00
John Spray	1aa159acca	pageserver: cancellation for remote ops in tenant deletion on shutdown (#6105 ) ## Problem Tenant deletion had a couple of TODOs where we weren't using proper cancellation tokens that would have aborted the deletions during process shutdown. ## Summary of changes - Refactor enough that deletion/shutdown code has access to the TenantManager's cancellation toke - Use that cancellation token in tenant deletion instead of dummy tokens.	2024-03-15 18:03:49 +00:00
Christian Schwarz	60f30000ef	tokio-epoll-uring: fallback to std-fs if not available & not explicitly requested (#7120 ) fixes https://github.com/neondatabase/neon/issues/7116 Changes: - refactor PageServerConfigBuilder: support not-set values - implement runtime feature test - use runtime feature test to determine `virtual_file_io_engine` if not explicitly configured in the config - log the effective engine at startup - drive-by: improve assertion messages in `test_pageserver_init_node_id` This needed a tiny bit of tokio-epoll-uring work, hence bumping it. Changelog: ``` git log --no-decorate --oneline --reverse 868d2c42b5d54ca82fead6e8f2f233b69a540d3e..342ddd197a060a8354e8f11f4d12994419fff939 c7a74c6 Bump mio from 0.8.8 to 0.8.11 4df3466 Bump mio from 0.8.8 to 0.8.11 (#47) 342ddd1 lifecycle: expose `LaunchResult` enum (#49) ```	2024-03-15 17:46:04 +00:00
John Spray	bc1efa827f	pageserver: exclude gc_horizon from synthetic size calculation (#6407 ) ## Problem See: - https://github.com/neondatabase/neon/issues/6374 ## Summary of changes Whereas previously we calculated synthetic size from the gc_horizon or the pitr_interval (whichever is the lower LSN), now we ignore gc_horizon and exclusively start from the `pitr_interval`. This is a more generous calculation for billing, where we do not charge users for data retained due to gc_horizon.	2024-03-15 16:07:36 +00:00
John Spray	67522ce83d	docs: shard splitting RFC (#6358 ) Extend the previous sharding RFC with functionality for dynamically splitting shards to increase the total shard count on existing tenants.	2024-03-15 16:00:04 +00:00
John Spray	7d32af5ad5	.github: apply timeout to pytest `regress` (#7142 ) These test runs usually take 20-30 minutes. if something hangs, we see actions proceeding for several hours: it's more convenient to have them time out sooner so that we notice that something has hung faster.	2024-03-15 15:57:01 +00:00
Joonas Koivunen	59b6cce418	heavier_once_cell: add detached init support (#7135 ) Aiming for the design where `heavier_once_cell::OnceCell` is initialized by a future factory lead to awkwardness with how `LayerInner::get_or_maybe_download` looks right now with the `loop`. The loop helps with two situations: - an eviction has been scheduled but has not yet happened, and a read access should cancel the eviction - a previous `LayerInner::get_or_maybe_download` that canceled a pending eviction was canceled leaving the `heavier_once_cell::OnceCell` uninitialized but needing repair by the next `LayerInner::get_or_maybe_download` By instead supporting detached initialization in `heavier_once_cell::OnceCell` via an `OnceCell::get_or_detached_init`, we can fix what the monolithic #7030 does: - spawned off download task initializes the `heavier_once_cell::OnceCell` regardless of the download starter being canceled - a canceled `LayerInner::get_or_maybe_download` no longer stops eviction but can win it if not canceled Split off from #7030. Cc: #5331	2024-03-15 15:54:28 +00:00
Joonas Koivunen	bf187aa13f	fix(layer): metric miscalculations (#7137 ) Split off from #7030: - each early exit is counted as canceled init, even though it most likely was just `LayerInner::keep_resident` doing the no-download repair check - `downloaded_after` could had been accounted for multiple times, and also when repairing to match on-disk state Cc: #5331	2024-03-15 17:30:13 +02:00
John Spray	22c26d610b	pageserver: remove un-needed "uninit mark" (#5717 ) Switched the order; doing https://github.com/neondatabase/neon/pull/6139 first then can remove uninit marker after. ## Problem Previously, existence of a timeline directory was treated as evidence of the timeline's logical existence. That is no longer the case since we treat remote storage as the source of truth on each startup: we can therefore do without this mark file. The mark file had also been used as a pseudo-lock to guard against concurrent creations of the same TimelineId -- now that persistence is no longer required, this is a bit unwieldy. In #6139 the `Tenant::timelines_creating` was added to protect against concurrent creations on the same TimelineId, making the uninit mark file entirely redundant. ## Summary of changes - Code that writes & reads mark file is removed - Some nearby `pub` definitions are amended to `pub(crate)` - `test_duplicate_creation` is added to demonstrate that mutual exclusion of creations still works.	2024-03-15 17:23:05 +02:00
John Spray	516f793ab4	remote_storage: make last_modified and etag mandatory (#7126 ) ## Problem These fields were only optional for the convenience of the `local_fs` test helper -- real remote storage backends provide them. It complicated any code that actually wanted to use them for anything. ## Summary of changes - Make these fields non-optional - For azure/S3 it is an error if the server doesn't provide them - For local_fs, use random strings as etags and the file's mtime for last_modified.	2024-03-15 13:37:49 +00:00
John Spray	6443dbef90	tests: extend log allow list for test_sharding_split_failures (#7134 ) Failure types that panic the storage controller can cause unlucky pageservers to emit log warnings that they can't reach the generation validation API: https://neon-github-public-dev.s3.amazonaws.com/reports/main/8284495687/index.html Tolerate this log message: it's an expected behavior.	2024-03-15 13:18:12 +00:00
John Spray	23416cc358	docs: sharding phase 1 RFC (#5432 ) We need to shard our Tenants to support larger databases without those large databases dominating our pageservers and/or requiring dedicated pageservers. This RFC aims to define an initial capability that will permit creating large-capacity databases using a static configuration defined at time of Tenant creation. Online re-sharding is deferred as future work, as is offloading layers for historical reads. However, both of these capabilities would be implementable without further changes to the control plane or compute: this RFC aims to define the cross-component work needed to bootstrap sharding end-to-end.	2024-03-15 11:14:25 +00:00
Anna Khanova	46098ea0ea	proxy: add more missing warm logging (#7133 ) ## Problem There is one more missing thing about cached connections for `cold_start_info`. ## Summary of changes Fix and add comments.	2024-03-15 11:13:15 +00:00
Conrad Ludgate	49bc734e02	proxy: add websocket regression tests (#7121 ) ## Problem We have no regression tests for websocket flow ## Summary of changes Add a hacky implementation of the postgres protocol over websockets just to verify the protocol behaviour does not regress over time.	2024-03-15 10:21:48 +01:00
Alex Chi Z	76c44dc140	spec: disable neon extension auto upgrade (#7128 ) This pull request disables neon extension auto upgrade to help the next compute image upgrade smooth. ## Summary of changes We have two places to auto-upgrade neon extension: during compute spec update, and when the compute node starts. The compute spec update logic is always there, and the compute node start logic is added in https://github.com/neondatabase/neon/pull/7029. In this pull request, we disable both of them, so that we can still roll back to an older version of compute before figuring out the best way of extension upgrade-downgrade. https://github.com/neondatabase/neon/issues/6936 We will enable auto-upgrade in the next release following this release. There are no other extension upgrades from release 4917 and therefore after this pull request, it would be safe to revert to release 4917. Impact: * Project created after unpinning the compute image -> if we need to roll back, they will stuck, because the default neon extension version is 1.3. Need to manually pin the compute image version if such things happen. * Projects already stuck on staging due to not downgradeable -> I don't know their current status, maybe they are already running the latest compute image? * Other projects -> can be rolled back to release 4917. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-03-14 19:45:38 +00:00
Joonas Koivunen	58ef78cf41	doc(README): note cargo-nextest usage (#7122 ) We have been using #5681 for quite some time, and at least since #6931 the tests have assumed `cargo-nextest` to work around our use of global statics. Unlike the `cargo test`, the `cargo nextest run` runs each test as a separate process that can be timeouted. Add a mention of using `cargo-nextest` in the top-level README.md. Sub-crates can still declare they support `cargo test`, like `compute_tools/README.md` does.	2024-03-14 18:49:42 +00:00
John Spray	678ed39de2	storage controller: validate DNS of registering nodes (#7101 ) A node with a bad DNS configuration can register itself with the storage controller, and the controller will try and schedule work onto the node, but never succeed because it can't reach the node. The DNS case is a special case of asymmetric network issues. The general case isn't covered here -- but might make sense to tighten up after #6844 merges -- then we can avoid assuming a node is immediately available in re_attach.	2024-03-14 16:48:38 +00:00
Vlad Lazar	3d8830ac35	test_runner: re-enable large slru benchmark (#7125 ) Previously disabled due to https://github.com/neondatabase/neon/issues/7006.	2024-03-14 16:47:32 +00:00
Vlad Lazar	38767ace68	storage_controller: periodic pageserver heartbeats (#7092 ) ## Problem If a pageserver was offline when the storage controller started, there was no mechanism to update the storage controller state when the pageserver becomes active. ## Summary of changes * Add a heartbeater module. The heartbeater must be driven by an external loop. * Integrate the heartbeater into the service. - Extend the types used by the service and scheduler to keep track of a nodes' utilisation score. - Add a background loop to drive the heartbeater and update the state based on the deltas it generated - Do an initial round of heartbeats at start-up	2024-03-14 15:21:36 +00:00
Arseny Sher	9fe0193e51	Bump vendor/postgres v15 v14.	2024-03-14 18:06:53 +04:00
Christian Schwarz	8075f0965a	fix(test suite) `virtual_file_io_engine` and `get_vectored_impl` patametrization doesn't work (#7113 ) # Problem While investigating #7124, I noticed that the benchmark was always using the `DEFAULT_*` `virtual_file_io_engine` , i.e., `tokio-epoll-uring` as of https://github.com/neondatabase/neon/pull/7077. The fundamental problem is that the `control_plane` code has its own view of `PageServerConfig`, which, I believe, will always be a subset of the real pageserver's `pageserver/src/config.rs`. For the `virtual_file_io_engine` and `get_vectored_impl` parametrization of the test suite, we were constructing a dict on the Python side that contained these parameters, then handed it to `control_plane::PageServerConfig`'s derived `serde::Deserialize`. The default in serde is to ignore unknown fields, so, the Deserialize impl silently ignored the fields. In consequence, the fields weren't propagated to the `pageserver --init` call, and the tests ended up using the `pageserver/src/config.rs::DEFAULT_` values for the respective options all the time. Tests that explicitly used overrides in `env.pageserver.start()` and similar were not affected by this. But, it means that all the test suite runs where with parametrization didn't properly exercise the code path. # Changes - use `serde(deny_unknown_fields)` to expose the problem - With this change, the Python tests that override `virtual_file_io_engine` and `get_vectored_impl` fail on `pageserver --init`, exposing the problem. - use destructuring to uncover the issue in the future - fix the issue by adding the missing fields to the `control_plane` crate's `PageServerConf` - A better solution would be for control plane to re-use a struct provided by the pageserver crate, so that everything is in one place in `pageserver/src/config.rs`, but, our config parsing code is (almost) beyond repair anyways. - fix the `pageserver_virtual_file_io_engine` to be responsive to the env var - => required to make parametrization work in benchmarks # Testing Before merging this PR, I re-ran the regression tests & CI with the full matrix of `virtual_file_io_engine` and `tokio-epoll-uring`, see `9c7ea364e0`	2024-03-14 11:18:55 +00:00
John Spray	44f42627dd	pageserver/controller: error handling for shard splitting (#7074 ) ## Problem Shard splits worked, but weren't safe against failures (e.g. node crash during split) yet. Related: #6676 ## Summary of changes - Introduce async rwlocks at the scope of Tenant and Node: - exclusive tenant lock is used to protect splits - exclusive node lock is used to protect new reconciliation process that happens when setting node active - exclusive locks used in both cases when doing persistent updates (e.g. node scheduling conf) where the update to DB & in-memory state needs to be atomic. - Add failpoints to shard splitting in control plane and pageserver code. - Implement error handling in control plane for shard splits: this detaches child chards and ensures parent shards are re-attached. - Crash-safety for storage controller restarts requires little effort: we already reconcile with nodes over a storage controller restart, so as long as we reset any incomplete splits in the DB on restart (added in this PR), things are implicitly cleaned up. - Implement reconciliation with offline nodes before they transition to active: - (in this context reconciliation means something like startup_reconcile, not literally the Reconciler) - This covers cases where split abort cannot reach a node to clean it up: the cleanup will eventually happen when the node is marked active, as part of reconciliation. - This also covers the case where a node was unavailable when the storage controller started, but becomes available later: previously this allowed it to skip the startup reconcile. - Storage controller now terminates on panics. We only use panics for true "should never happen" assertions, and these cases can leave us in an un-usable state if we keep running (e.g. panicking in a shard split). In the unlikely event that we get into a crashloop as a result, we'll rely on kubernetes to back us off. - Add `test_sharding_split_failures` which exercises a variety of failure cases during shard split.	2024-03-14 09:11:57 +00:00
Conrad Ludgate	3bd6551b36	proxy http cancellation safety (#7117 ) ## Problem hyper auto-cancels the request futures on connection close. `sql_over_http::handle` is not 'drop cancel safe', so we need to do some other work to make sure connections are queries in the right way. ## Summary of changes 1. tokio::spawn the request handler to resolve the initial cancel-safety issue 2. share a cancellation token, and cancel it when the request `Service` is dropped. 3. Add a new log span to be able to track the HTTP connection lifecycle.	2024-03-14 08:20:56 +00:00
Christian Schwarz	69338e53e3	throttling: fixup interactions with Timeline::get_vectored (#7089 ) ## Problem Before this PR, `Timeline::get_vectored` would be throttled twice if the sequential option was enabled or if validation was enabled. Also, `pageserver_get_vectored_seconds` included the time spent in the throttle, which turns out to be undesirable for what we use that metric for. ## Summary of changes Double-throttle: * Add `Timeline::get0` method which is unthrottled. * Use that method from within the `Timeline::get_vectored` code path. Metric: * return throttled time from `throttle()` method * deduct the value from the observed time * globally rate-limited logging of duration subtraction errors, like in all other places that do the throttled-time deduction from observations	2024-03-13 17:49:17 +00:00
Arpad Müller	5309711691	Make tenant_id in TenantLocationConfigRequest optional (#7055 ) The `tenant_id` in `TenantLocationConfigRequest` in the `location_config` endpoint was only used in the storage controller/attachment service, and there it was only used for assertions and the creation part.	2024-03-13 17:30:29 +01:00
Joonas Koivunen	8a53d576e6	fix(metrics): time individual layer flush operations (#7109 ) Currently, the flushing operation could flush multiple frozen layers to the disk and store the aggregate time in the histogram. The result is a bimodal distribution with short and over 1000-second flushes. Change it so that we record how long one layer flush takes.	2024-03-13 15:10:20 +00:00
Anna Khanova	b0aff04157	proxy: add new dimension to exclude cplane latency (#7011 ) ## Problem Currently cplane communication is a part of the latency monitoring. It doesn't allow to setup the proper alerting based on proxy latency. ## Summary of changes Added dimension to exclude cplane latency.	2024-03-13 13:50:05 +01:00
Anna Khanova	0554bee022	proxy: Report warm cold start if connection is from the local cache (#7104 ) ## Problem * quotes in serialized string * no status if connection is from local cache ## Summary of changes * remove quotes * report warm if connection if from local cache	2024-03-13 11:45:19 +00:00

1 2 3 4 5 ...

4885 Commits