mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-15 17:32:56 +00:00
## Problem Shard splits worked, but weren't safe against failures (e.g. node crash during split) yet. Related: #6676 ## Summary of changes - Introduce async rwlocks at the scope of Tenant and Node: - exclusive tenant lock is used to protect splits - exclusive node lock is used to protect new reconciliation process that happens when setting node active - exclusive locks used in both cases when doing persistent updates (e.g. node scheduling conf) where the update to DB & in-memory state needs to be atomic. - Add failpoints to shard splitting in control plane and pageserver code. - Implement error handling in control plane for shard splits: this detaches child chards and ensures parent shards are re-attached. - Crash-safety for storage controller restarts requires little effort: we already reconcile with nodes over a storage controller restart, so as long as we reset any incomplete splits in the DB on restart (added in this PR), things are implicitly cleaned up. - Implement reconciliation with offline nodes before they transition to active: - (in this context reconciliation means something like startup_reconcile, not literally the Reconciler) - This covers cases where split abort cannot reach a node to clean it up: the cleanup will eventually happen when the node is marked active, as part of reconciliation. - This also covers the case where a node was unavailable when the storage controller started, but becomes available later: previously this allowed it to skip the startup reconcile. - Storage controller now terminates on panics. We only use panics for true "should never happen" assertions, and these cases can leave us in an un-usable state if we keep running (e.g. panicking in a shard split). In the unlikely event that we get into a crashloop as a result, we'll rely on kubernetes to back us off. - Add `test_sharding_split_failures` which exercises a variety of failure cases during shard split.
63 lines
2.2 KiB
Python
63 lines
2.2 KiB
Python
import concurrent.futures
|
|
from typing import Any
|
|
|
|
import pytest
|
|
from werkzeug.wrappers.request import Request
|
|
from werkzeug.wrappers.response import Response
|
|
|
|
from fixtures.log_helper import log
|
|
from fixtures.types import TenantId
|
|
|
|
|
|
class ComputeReconfigure:
|
|
def __init__(self, server):
|
|
self.server = server
|
|
self.control_plane_compute_hook_api = f"http://{server.host}:{server.port}/notify-attach"
|
|
self.workloads = {}
|
|
|
|
def register_workload(self, workload):
|
|
self.workloads[workload.tenant_id] = workload
|
|
|
|
|
|
@pytest.fixture(scope="function")
|
|
def compute_reconfigure_listener(make_httpserver):
|
|
"""
|
|
This fixture exposes an HTTP listener for the storage controller to submit
|
|
compute notifications to us, instead of updating neon_local endpoints itself.
|
|
|
|
Although storage controller can use neon_local directly, this causes problems when
|
|
the test is also concurrently modifying endpoints. Instead, configure storage controller
|
|
to send notifications up to this test code, which will route all endpoint updates
|
|
through Workload, which has a mutex to make concurrent updates safe.
|
|
"""
|
|
server = make_httpserver
|
|
|
|
self = ComputeReconfigure(server)
|
|
|
|
# Do neon_local endpoint reconfiguration in the background so that we can
|
|
# accept a healthy rate of calls into notify-attach.
|
|
reconfigure_threads = concurrent.futures.ThreadPoolExecutor(max_workers=1)
|
|
|
|
def handler(request: Request):
|
|
assert request.json is not None
|
|
body: dict[str, Any] = request.json
|
|
log.info(f"notify-attach request: {body}")
|
|
|
|
try:
|
|
workload = self.workloads[TenantId(body["tenant_id"])]
|
|
except KeyError:
|
|
pass
|
|
else:
|
|
# This causes the endpoint to query storage controller for its location, which
|
|
# is redundant since we already have it here, but this avoids extending the
|
|
# neon_local CLI to take full lists of locations
|
|
reconfigure_threads.submit(lambda workload=workload: workload.reconfigure()) # type: ignore[no-any-return]
|
|
|
|
return Response(status=200)
|
|
|
|
self.server.expect_request("/notify-attach", method="PUT").respond_with_handler(handler)
|
|
|
|
yield self
|
|
reconfigure_threads.shutdown()
|
|
server.clear()
|