rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-16 09:52:54 +00:00

Author	SHA1	Message	Date
John Spray	6313f1fa7a	tests: tolerate transient unavailability in test_sharding_split_failures (#7223 ) ## Problem While most forms of split rollback don't interrupt clients, there are a couple of cases that do -- this interruption is brief, driven by the time it takes the controller to kick off Reconcilers during the async abort of the split, so it's operationally fine, but can trip up a test. - #7148 ## Summary of changes - Relax test check to require that the tenant is eventually available after split failure, rather than immediately. In the vast majority of cases this will pass on the first iteration.	2024-03-26 09:56:47 +00:00
John Spray	59cdee749e	storage controller: fixes to secondary location handling (#7169 ) Stacks on: - https://github.com/neondatabase/neon/pull/7165 Fixes while working on background optimization of scheduling after a split: - When a tenant has secondary locations, we weren't detaching the parent shards' secondary locations when doing a split - When a reconciler detaches a location, it was feeding back a locationconf with `Detached` mode in its `observed` object, whereas it should omit that location. This could cause the background reconcile task to keep kicking off no-op reconcilers forever (harmless but annoying). - During shard split, we were scheduling secondary locations for the child shards, but no reconcile was run for these until the next time the background reconcile task ran. Creating these ASAP is useful, because they'll be used shortly after a shard split as the destination locations for migrating the new shards to different nodes.	2024-03-21 12:06:57 +00:00
Vlad Lazar	c75b584430	storage_controller: add metrics (#7178 ) ## Problem Storage controller had basically no metrics. ## Summary of changes 1. Migrate the existing metrics to use Conrad's [`measured`](https://docs.rs/measured/0.0.14/measured/) crate. 2. Add metrics for incoming http requests 3. Add metrics for outgoing http requests to the pageserver 4. Add metrics for outgoing pass through requests to the pageserver 5. Add metrics for database queries Note that the metrics response for the attachment service does not use chunked encoding like the rest of the metrics endpoints. Conrad has kindly extended the crate such that it can now be done. Let's leave it for a follow-up since the payload shouldn't be that big at this point. Fixes https://github.com/neondatabase/neon/issues/6875	2024-03-21 12:00:20 +00:00
John Spray	a5d5c2a6a0	storage controller: tech debt (#7165 ) This is a mixed bag of changes split out for separate review while working on other things, and batched together to reduce load on CI runners. Each commits stands alone for review purposes: - do_tenant_shard_split was a long function and had a synchronous validation phase at the start that could readily be pulled out into a separate function. This also avoids the special casing of ApiError::BadRequest when deciding whether an abort is needed on errors - Add a 'describe' API (GET on tenant ID) that will enable storcon-cli to see what's going on with a tenant - the 'locate' API wasn't really meant for use in the field. It's for tests: demote it to the /debug/ prefix - The `Single` placement policy was a redundant duplicate of Double(0), and Double was a bad name. Rename it Attached. (https://github.com/neondatabase/neon/issues/7107) - Some neon_local commands were added for debug/demos, which are now replaced by commands in storcon-cli (#7114 ). Even though that's not merged yet, we don't need the neon_local ones any more. Closes https://github.com/neondatabase/neon/issues/7107 ## Backward compat of Single/Double -> `Attached(n)` change A database migration is used to convert any existing values.	2024-03-19 16:08:20 +00:00
John Spray	b80704cd34	tests: log hygiene checks for storage controller (#6710 ) ## Problem As with the pageserver, we should fail tests that emit unexpected log errors/warnings. ## Summary of changes - Refactor existing log checks to be reusable - Run log checks for attachment_service - Add allow lists as needed.	2024-03-19 10:30:33 +00:00
Arthur Petukhovsky	ad5efb49ee	Support backpressure for sharding (#7100 ) Add shard_number to PageserverFeedback and parse it on the compute side. When compute receives a new ps_feedback, it calculates min LSNs among feedbacks from all shards, and uses those LSNs for backpressure. Add `test_sharding_backpressure` to verify that backpressure slows down compute to wait for the slowest shard.	2024-03-18 21:54:44 +00:00
John Spray	6443dbef90	tests: extend log allow list for test_sharding_split_failures (#7134 ) Failure types that panic the storage controller can cause unlucky pageservers to emit log warnings that they can't reach the generation validation API: https://neon-github-public-dev.s3.amazonaws.com/reports/main/8284495687/index.html Tolerate this log message: it's an expected behavior.	2024-03-15 13:18:12 +00:00
John Spray	44f42627dd	pageserver/controller: error handling for shard splitting (#7074 ) ## Problem Shard splits worked, but weren't safe against failures (e.g. node crash during split) yet. Related: #6676 ## Summary of changes - Introduce async rwlocks at the scope of Tenant and Node: - exclusive tenant lock is used to protect splits - exclusive node lock is used to protect new reconciliation process that happens when setting node active - exclusive locks used in both cases when doing persistent updates (e.g. node scheduling conf) where the update to DB & in-memory state needs to be atomic. - Add failpoints to shard splitting in control plane and pageserver code. - Implement error handling in control plane for shard splits: this detaches child chards and ensures parent shards are re-attached. - Crash-safety for storage controller restarts requires little effort: we already reconcile with nodes over a storage controller restart, so as long as we reset any incomplete splits in the DB on restart (added in this PR), things are implicitly cleaned up. - Implement reconciliation with offline nodes before they transition to active: - (in this context reconciliation means something like startup_reconcile, not literally the Reconciler) - This covers cases where split abort cannot reach a node to clean it up: the cleanup will eventually happen when the node is marked active, as part of reconciliation. - This also covers the case where a node was unavailable when the storage controller started, but becomes available later: previously this allowed it to skip the startup reconcile. - Storage controller now terminates on panics. We only use panics for true "should never happen" assertions, and these cases can leave us in an un-usable state if we keep running (e.g. panicking in a shard split). In the unlikely event that we get into a crashloop as a result, we'll rely on kubernetes to back us off. - Add `test_sharding_split_failures` which exercises a variety of failure cases during shard split.	2024-03-14 09:11:57 +00:00
John Spray	1b41db8bdd	pageserver: enable setting stripe size inline with split request. (#7093 ) ## Summary - Currently we can set stripe size at tenant creation, but it doesn't mean anything until we have multiple shards - When onboarding an existing tenant, it will always get a default shard stripe size, so we would like to be able to pick the actual stripe size at the point we split. ## Why do this inline with a split? The alternative to this change would be to have a separate endpoint on the storage controller for setting the stripe size on a tenant, and only permit writes to that endpoint when the tenant has only a single shard. That would work, but be a little bit more work for a client, and not appreciably simpler (instead of having a special argument to the split functions, we'd have a special separate endpoint, and a requirement that the controller must sync its config down to the pageserver before calling the split API). Either approach would work, but this one feels a bit more robust end-to-end: the split API is the _very last moment_ that the stripe size is mutable, so if we aim to set it before splitting, it makes sense to do it as part of the same operation.	2024-03-12 20:41:08 +00:00
John Spray	89cf714890	tests/neon_local: rename "attachment service" -> "storage controller" (#7087 ) Not a user-facing change, but can break any existing `.neon` directories created by neon_local, as the name of the database used by the storage controller changes. This PR changes all the locations apart from the path of `control_plane/attachment_service` (waiting for an opportune moment to do that one, because it's the most conflict-ish wrt ongoing PRs like #6676 )	2024-03-12 11:36:27 +00:00
John Spray	b5246753bf	storage controller: miscellaneous improvements (#6800 ) - Add some context to logs - Add tests for pageserver restarts when managed by storage controller - Make /location_config tolerate compute hook failures on shard creations, not just modifications.	2024-02-22 09:33:40 +00:00
John Spray	afda4420bd	test_sharding_ingress: bigger data, skip in debug mode (#6859 ) ## Problem Accidentally merged #6852 without this test stability change. The test as-written could sometimes fail on debug-pg14. ## Summary of changes - Write more data so that the test can more reliably assert on the ratio of total layers to small layers - Skip the test in debug mode, since writing any more than a tiny bit of data tends to result in a flaky test in the much slower debug environment.	2024-02-21 17:03:55 +00:00
John Spray	84f027357d	pageserver: adjust checkpoint distance for sharded tenants (#6852 ) ## Problem Where the stripe size is the same order of magnitude as the checkpoint distance (such as with default settings), tenant shards can easily pass through `checkpoint_distance` bytes of LSN without actually ingesting anything. This results in emitting many tiny L0 delta layers. ## Summary of changes - Multiply checkpoint distance by shard count before comparing with LSN distance. This is a heuristic and does not guarantee that we won't emit small layers, but it fixes the issue for typical cases where the writes in a (checkpoint_distance * shard_count) range of LSN bytes are somewhat distributed across shards. - Add a test that checks the size of layers after ingesting to a sharded tenant; this fails before the fix. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-02-21 14:12:35 +00:00
John Spray	0c105ef352	storage controller: debug observability endpoints and self-test (#6820 ) This PR stacks on https://github.com/neondatabase/neon/pull/6814 Observability: - Because we only persist a subset of our state, and our external API is pretty high level, it can be hard to get at the detail of what's going on internally (e.g. the IntentState of a shard). - Add debug endpoints for getting a full dump of all TenantState and SchedulerNode objects - Enrich the /control/v1/node listing endpoint to include full in-memory detail of `Node` rather than just the `NodePersistence` subset Consistency checks: - The storage controller maintains separate in-memory and on-disk states, by design. To catch subtle bugs, it is useful to occasionally cross-check these. - The Scheduler maintains reference counts for shard->node relationships, which could drift if there was a bug in IntentState: exhausively cross check them in tests.	2024-02-19 20:29:23 +00:00
John Spray	4f7704af24	storage controller: fix spurious reconciles after pageserver restarts (#6814 ) ## Problem When investigating test failures (https://github.com/neondatabase/neon/issues/6813) I noticed we were doing a bunch of Reconciler runs right after splitting a tenant. It's because the splitting test does a pageserver restart, and there was a bug in /re-attach handling, where we would update the generation correctly in the database and intent state, but not observed state, thereby triggering a reconciliation on the next call to maybe_reconcile. This didn't break anything profound (underlying rules about generations were respected), but caused the storage controller to do an un-needed extra round of bumping the generation and reconciling. ## Summary of changes - Start adding metrics to the storage controller - Assert on the number of reconciles done in test_sharding_split_smoke - Fix /re-attach to update `observed` such that we don't spuriously re-reconcile tenants.	2024-02-19 17:44:20 +00:00
John Spray	5fa747e493	pageserver: shard splitting refinements (parent deletion, hard linking) (#6725 ) ## Problem - We weren't deleting parent shard contents once the split was done - Re-downloading layers into child shards is wasteful ## Summary of changes - Hard-link layers into child chart local storage during split - Delete parent shards content at the end --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-02-15 10:21:53 +02:00
John Spray	8d98981fe5	tests: deflake test_sharding_split_unsharded (#6699 ) ## Problem This test was a subset of the larger sharding test, and it missed the validate() call on workload that was implicitly waiting for a tenant to become active before trying to split it. It could therefore fail to split due to tenant not yet being active. ## Summary of changes - Insert .validate() call, and move the Workload setup to after the check of shard ID (as the shard ID check should pass immediately)	2024-02-09 13:20:04 +00:00
John Spray	951c9bf4ca	control_plane: fix shard splitting on unsharded tenant (#6689 ) ## Problem Previous test started with a new-style TenantShardId with a non-zero ShardCount. We also need to handle the case of a ShardCount() (aka `unsharded`) parent shard. A followup PR will refactor ShardCount to make its inner value private and thereby make this kind of mistake harder ## Summary of changes - Fix a place we were incorrectly treating a ShardCount as a number of shards rather than as thing that can be zero or the number of shards. - Add a test for this case.	2024-02-09 10:12:40 +00:00
John Spray	af91a28936	pageserver: shard splitting (#6379 ) ## Problem One doesn't know at tenant creation time how large the tenant will grow. We need to be able to dynamically adjust the shard count at runtime. This is implemented as "splitting" of shards into smaller child shards, which cover a subset of the keyspace that the parent covered. Refer to RFC: https://github.com/neondatabase/neon/pull/6358 Part of epic: #6278 ## Summary of changes This PR implements the happy path (does not cleanly recover from a crash mid-split, although won't lose any data), without any optimizations (e.g. child shards re-download their own copies of layers that the parent shard already had on local disk) - Add `/v1/tenant/:tenant_shard_id/shard_split` API to pageserver: this copies the shard's index to the child shards' paths, instantiates child `Tenant` object, and tears down parent `Tenant` object. - Add `splitting` column to `tenant_shards` table. This is written into an existing migration because we haven't deployed yet, so don't need to cleanly upgrade. - Add `/control/v1/tenant/:tenant_id/shard_split` API to attachment_service, - Add `test_sharding_split_smoke` test. This covers the happy path: future PRs will add tests that exercise failure cases.	2024-02-08 15:35:13 +00:00
John Spray	55b7cde665	tests: add basic coverage for sharding (#6380 ) ## Problem The support for sharding in the pageserver was written before https://github.com/neondatabase/neon/pull/6205 landed, so when it landed we couldn't directly test sharding. ## Summary of changes - Add `test_sharding_smoke` which tests the basics of creating a sharding tenant, creating a timeline within it, checking that data within it is distributed. - Add modes to pg_regress tests for running with 4 shards as well as with 1.	2024-01-26 14:40:47 +00:00

20 Commits