rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-09 06:22:57 +00:00

Author	SHA1	Message	Date
John Spray	d04af08567	control_plane: storage controller secrets by env (#6952 ) ## Problem Sometimes folks prefer not to expose secrets as CLI args. ## Summary of changes - Add ability to load secrets from environment variables. We can eventually remove the AWS SM code path here if nobody is using it -- we don't need to maintain three ways to load secrets.	2024-02-29 10:00:01 +00:00
Vlad Lazar	c3a40a06f3	test: wait for storage controller readiness (#6930 ) ## Problem Starting up the pageserver before the storage controller is ready can lead to a round of reconciliation, which leads to the previous tenant being shut down. This disturbs some tests. ## Summary of changes Wait for the storage controller to become ready on neon env start-up. Closes https://github.com/neondatabase/neon/issues/6724	2024-02-28 09:52:22 +00:00
John Spray	a8ec18c0f4	refactor: move storage controller API structs into pageserver_api (#6927 ) ## Problem This is a precursor to adding a convenience CLI for the storage controller. ## Summary of changes - move controller api structs into pageserver_api::controller_api to make them visible to other crates - rename pageserver_api::control_api to pageserver_api::upcall_api to match the /upcall/v1/ naming in the storage controller. Why here rather than a totally separate crate? It's convenient to have all the pageserver-related stuff in one place, and if we ever wanted to move it to a different crate it's super easy to do that later.	2024-02-27 17:24:01 +00:00
Arpad Müller	045bc6af8b	Add new compaction abstraction, simulator, and implementation. (#6830 ) Rebased version of #5234, part of #6768 This consists of three parts: 1. A refactoring and new contract for implementing and testing compaction. The logic is now in a separate crate, with no dependency on the 'pageserver' crate. It defines an interface that the real pageserver must implement, in order to call the compaction algorithm. The interface models things like delta and image layers, but just the parts that the compaction algorithm needs to make decisions. That makes it easier unit test the algorithm and experiment with different implementations. I did not convert the current code to the new abstraction, however. When compaction algorithm is set to "Legacy", we just use the old code. It might be worthwhile to convert the old code to the new abstraction, so that we can compare the behavior of the new algorithm against the old one, using the same simulated cases. If we do that, have to be careful that the converted code really is equivalent to the old. This inclues only trivial changes to the main pageserver code. All the new code is behind a tenant config option. So this should be pretty safe to merge, even if the new implementation is buggy, as long as we don't enable it. 2. A new compaction algorithm, implemented using the new abstraction. The new algorithm is tiered compaction. It is inspired by the PoC at PR #4539, although I did not use that code directly, as I needed the new implementation to fit the new abstraction. The algorithm here is less advanced, I did not implement partial image layers, for example. I wanted to keep it simple on purpose, so that as we add bells and whistles, we can see the effects using the included simulator. One difference to #4539 and your typical LSM tree implementations is how we keep track of the LSM tree levels. This PR doesn't have a permanent concept of a level, tier or sorted run at all. There are just delta and image layers. However, when compaction starts, we look at the layers that exist, and arrange them into levels, depending on their shapes. That is ephemeral: when the compaction finishes, we forget that information. This allows the new algorithm to work without any extra bookkeeping. That makes it easier to transition from the old algorithm to new, and back again. There is just a new tenant config option to choose the compaction algorithm. The default is "Legacy", meaning the current algorithm in 'main'. If you set it to "Tiered", the new algorithm is used. 3. A simulator, which implements the new abstraction. The simulator can be used to analyze write and storage amplification, without running a test with the full pageserver. It can also draw an SVG animation of the simulation, to visualize how layers are created and deleted. To run the simulator: cargo run --bin compaction-simulator run-suite --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-27 17:15:46 +01:00
Vlad Lazar	5accf6e24a	attachment_service: JWT auth enforcement (#6897 ) ## Problem Attachment service does not do auth based on JWT scopes. ## Summary of changes Do JWT based permission checking for requests coming into the attachment service. Requests into the attachment service must use different tokens based on the endpoint: * `/control` and `/debug` require `admin` scope * `/upcall` requires `generations_api` scope * `/v1/...` requires `pageserverapi` scope Requests into the pageserver from the attachment service must use `pageserverapi` scope.	2024-02-26 18:17:06 +00:00
Christian Schwarz	dedf66ba5b	remove `gc_feedback` mechanism (#6863 ) It's been dead-code-at-runtime for 9 months, let's remove it. We can always re-introduce it at a later point. Came across this while working on #6861, which will touch `time_for_new_image_layer`. This is an opporunity to make that function simpler.	2024-02-26 10:05:24 +01:00
Anastasia Lubennikova	a12e4261a3	Add neon.primary_is_running GUC. (#6705 ) We set it for neon replica, if primary is running. Postgres uses this GUC at the start, to determine if replica should wait for RUNNING_XACTS from primary or not. Corresponding cloud PR is https://github.com/neondatabase/cloud/pull/10183 * Add test hot-standby replica startup. * Extract oldest_running_xid from XlRunningXits WAL records. --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@garret.ru> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-02-23 13:56:41 +00:00
John Spray	9c6145f0a9	control_plane: fix a compilation error from racing PRs (#6882 ) Merge of two green PRs raced, and ended up with a non-compiling result.	2024-02-22 16:51:46 +00:00
John Spray	cf3baf6039	storage controller: fix consistency check (#6855 ) - Some checks weren't properly returning an error when they failed - TenantState::to_persistent wasn't setting generation_pageserver properly - Changes to node scheduling policy weren't being persisted.	2024-02-22 14:10:49 +00:00
John Spray	9c48b5c4ab	controller: improved handling of offline nodes (#6846 ) Stacks on https://github.com/neondatabase/neon/pull/6823 - Pending a heartbeating mechanism (#6844 ), use /re-attach calls as a cue to mark an offline node as active, so that a node which is unavailable during controller startup doesn't require manual intervention if it later starts/restarts. - Tweak scheduling logic so that when we schedule the attached location for a tenant, we prefer to select from secondary locations rather than picking a fresh one. This is an interim state until we implement #6844 and full chaos testing for handling failures.	2024-02-22 14:01:06 +00:00
John Spray	b5246753bf	storage controller: miscellaneous improvements (#6800 ) - Add some context to logs - Add tests for pageserver restarts when managed by storage controller - Make /location_config tolerate compute hook failures on shard creations, not just modifications.	2024-02-22 09:33:40 +00:00
Arpad Müller	20fff05699	Remove stray del and TODO (#6867 ) The TODO has made it into #6821. I originally just put it there for bookmarking purposes. The `del` has been added by #6818 but is also redundant.	2024-02-21 19:39:14 +00:00
Arpad Müller	4de2f0f3e0	Implement a sharded time travel recovery endpoint (#6821 ) The sharding service didn't have support for S3 disaster recovery. This PR adds a new endpoint to the attachment service, which is slightly different from the endpoint on the pageserver, in that it takes the shard count history of the tenant as json parameters: we need to do time travel recovery for both the shard count at the target time and the shard count at the current moment in time, as well as the past shard counts that either still reference. Fixes #6604, part of https://github.com/neondatabase/cloud/issues/8233 --------- Co-authored-by: John Spray <john@neon.tech>	2024-02-21 16:35:37 +01:00
John Spray	e7452d3756	storage controller: concurrency + deadlines during startup reconcile (#6823 ) ## Problem During startup_reconcile we do a couple of potentially-slow things: - Calling out to all nodes to read their locations - Calling out to the cloud control plane to notify it of all tenants' attached nodes The read of node locations was not being done concurrently across nodes, and neither operation was bounded by a well defined deadline. ## Summary of changes - Refactor the async parts of startup_reconcile into separate functions - Add concurrency and deadline to `scan_node_locations` - Add deadline to `compute_notify_many` - Run `cleanup_locations` in the background: there's no need for startup_reconcile to wait for this to complete.	2024-02-21 09:54:25 +00:00
Alex Chi Z	3882f57001	neon_local: add flag to create test user and database (#6848 ) This pull request adds two flags: `--update-catalog true` for `endpoint create`, and `--create-test-user true` for `endpoint start`. The former enables catalog updates for neon_superuser permission and many other things, while the latter adds the user `test` and the database `neondb` when setting up the database. A combination of these two flags will create a Postgres similar to the production environment so that it would be easier for us to test if extensions behave correctly when added to Neon Postgres. Example output: ``` ❯ cargo neon endpoint start main --create-test-user true Finished dev [unoptimized + debuginfo] target(s) in 0.22s Running `target/debug/neon_local endpoint start main --create-test-user true` Starting existing endpoint main... Starting postgres node at 'postgresql://cloud_admin@127.0.0.1:55432/postgres' Also at 'postgresql://user@127.0.0.1:55432/neondb' ``` --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-02-21 00:20:42 +00:00
Nikita Kalyanov	cbb599f353	Add /terminate API (#6745 ) this is to speed up suspends, see https://github.com/neondatabase/cloud/issues/10284 ## Problem ## Summary of changes ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist	2024-02-20 19:42:36 +02:00
John Spray	02a8b7fbe0	storage controller: issue timeline create/delete calls concurrently (#6827 ) ## Problem Timeline creation is meant to be very fast: it should only take approximately on S3 PUT latency. When we have many shards in a tenant, we should preserve that responsiveness. ## Summary of changes - Issue create/delete pageserver API calls concurrently across all >0 shards - During tenant deletion, delete shard zero last, separately, to avoid confusing anything using GETs on the timeline. - Return 201 instead of 200 on creations to make cloud control plane happy --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-20 10:13:21 +00:00
John Spray	0c105ef352	storage controller: debug observability endpoints and self-test (#6820 ) This PR stacks on https://github.com/neondatabase/neon/pull/6814 Observability: - Because we only persist a subset of our state, and our external API is pretty high level, it can be hard to get at the detail of what's going on internally (e.g. the IntentState of a shard). - Add debug endpoints for getting a full dump of all TenantState and SchedulerNode objects - Enrich the /control/v1/node listing endpoint to include full in-memory detail of `Node` rather than just the `NodePersistence` subset Consistency checks: - The storage controller maintains separate in-memory and on-disk states, by design. To catch subtle bugs, it is useful to occasionally cross-check these. - The Scheduler maintains reference counts for shard->node relationships, which could drift if there was a bug in IntentState: exhausively cross check them in tests.	2024-02-19 20:29:23 +00:00
John Spray	4f7704af24	storage controller: fix spurious reconciles after pageserver restarts (#6814 ) ## Problem When investigating test failures (https://github.com/neondatabase/neon/issues/6813) I noticed we were doing a bunch of Reconciler runs right after splitting a tenant. It's because the splitting test does a pageserver restart, and there was a bug in /re-attach handling, where we would update the generation correctly in the database and intent state, but not observed state, thereby triggering a reconciliation on the next call to maybe_reconcile. This didn't break anything profound (underlying rules about generations were respected), but caused the storage controller to do an un-needed extra round of bumping the generation and reconciling. ## Summary of changes - Start adding metrics to the storage controller - Assert on the number of reconciles done in test_sharding_split_smoke - Fix /re-attach to update `observed` such that we don't spuriously re-reconcile tenants.	2024-02-19 17:44:20 +00:00
John Spray	7e4280955e	control_plane/attachment_service: improve Scheduler (#6633 ) ## Problem One of the major shortcuts in the initial version of this code was to construct a fresh `Scheduler` each time we need it, which is an O(N^2) cost as the tenant count increases. ## Summary of changes - Keep `Scheduler` alive through the lifetime of ServiceState - Use `IntentState` as a reference tracking helper, updating Scheduler refcounts as nodes are added/removed from the intent. There is an automated test that checks things don't get pathologically slow with thousands of shards, but it's not included in this PR because tests that implicitly test the runner node performance take some thought to stabilize/land in CI.	2024-02-19 14:12:20 +00:00
Christian Schwarz	ca07fa5f8b	per-TenantShard read throttling (#6706 )	2024-02-16 21:26:59 +01:00
John Spray	f2e5212fed	storage controller: background reconcile, graceful shutdown, better logging (#6709 ) ## Problem Now that the storage controller is working end to end, we start burning down the robustness aspects. ## Summary of changes - Add a background task that periodically calls `reconcile_all`. This ensures that if earlier operations couldn't succeed (e.g. because a node was unavailable), we will eventually retry. This is a naive initial implementation can start an unlimited number of reconcile tasks: limiting reconcile concurrency is a later item in #6342 - Add a number of tracing spans in key locations: each background task, each reconciler task. - Add a top level CancellationToken and Gate, and use these to implement a graceful shutdown that waits for tasks to shut down. This is not bulletproof yet, because within these tasks we have remote HTTP calls that aren't wrapped in cancellation/timeouts, but it creates the structure, and if we don't shutdown promptly then k8s will kill us. - To protect shard splits from background reconciliation, expose the `SplitState` in memory and use it to guard any APIs that require an attached tenant.	2024-02-16 13:00:53 +00:00
John Spray	6b980f38da	libs: refactor ShardCount.0 to private (#6690 ) ## Problem The ShardCount type has a magic '0' value that represents a legacy single-sharded tenant, whose TenantShardId is formatted without a `-0001` suffix (i.e. formatted as a traditional TenantId). This was error-prone in code locations that wanted the actual number of shards: they had to handle the 0 case specially. ## Summary of changes - Make the internal value of ShardCount private, and expose `count()` and `literal()` getters so that callers have to explicitly say whether they want the literal value (e.g. for storing in a TenantShardId), or the actual number of shards in the tenant. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-15 21:59:39 +00:00
Arpad Müller	a2d0d44b42	Remove unused allow's (#6760 ) These allow's became redundant some time ago so remove them, or address them if addressing is very simple.	2024-02-14 18:16:05 +00:00
John Spray	12b39c9db9	control_plane: add debug APIs for force-dropping tenant/node (#6702 ) ## Problem When debugging/supporting this service, we sometimes need it to just forget about a tenant or node, e.g. because of an issue cleanly tearing them down. For example, if I create a tenant with a PlacementPolicy that can't be scheduled on the nodes we have, we would never be able to schedule it for a DELETE to work. ## Summary of changes - Add APIs for dropping nodes and tenants that do no teardown other than removing the entity from the DB and removing any references to it.	2024-02-10 11:56:52 +00:00
John Spray	89a5c654bf	control_plane: follow up for embedded migrations (#6647 ) ## Problem In https://github.com/neondatabase/neon/pull/6637, we remove the need to run migrations externally, but for compat tests to work we can't remove those invocations from the neon_local binary. Once that previous PR merges, we can make the followup changes without upsetting compat tests.	2024-02-09 14:26:50 +00:00
Heikki Linnakangas	84a0e7b022	tests: Allow setting shutdown mode separately from 'destroy' flag In neon_local, the default mode is now always 'fast', regardless of 'destroy'. You can override it with the "neon_local endpoint stop --mode=immediate" flag. In python tests, we still default to 'immediate' mode when using the stop_and_destroy() function, and 'fast' with plain stop(). I kept that to avoid changing behavior in existing tests. I don't think existing tests depend on it, but I wasn't 100% certain.	2024-02-09 15:56:41 +02:00
John Spray	951c9bf4ca	control_plane: fix shard splitting on unsharded tenant (#6689 ) ## Problem Previous test started with a new-style TenantShardId with a non-zero ShardCount. We also need to handle the case of a ShardCount() (aka `unsharded`) parent shard. A followup PR will refactor ShardCount to make its inner value private and thereby make this kind of mistake harder ## Summary of changes - Fix a place we were incorrectly treating a ShardCount as a number of shards rather than as thing that can be zero or the number of shards. - Add a test for this case.	2024-02-09 10:12:40 +00:00
Arpad Müller	c0e0fc8151	Update Rust to 1.76.0 (#6683 ) [Release notes](https://github.com/rust-lang/rust/releases/tag/1.75.0).	2024-02-08 19:57:02 +01:00
John Spray	e8d2843df6	storage controller: improved handling of node availability on restart (#6658 ) - Automatically set a node's availability to Active if it is responsive in startup_reconcile - Impose a 5s timeout of HTTP request to list location conf, so that an unresponsive node can't hang it for minutes - Do several retries if the request fails with a retryable error, to be tolerant of concurrent pageserver & storage controller restarts - Add a readiness hook for use with k8s so that we can tell when the startup reconciliaton is done and the service is fully ready to do work. - Add /metrics to the list of un-authenticated endpoints (this is unrelated but we're touching the line in this PR already, and it fixes auth error spam in deployed container.) - A test for the above. Closes: #6670	2024-02-08 18:00:53 +00:00
John Spray	af91a28936	pageserver: shard splitting (#6379 ) ## Problem One doesn't know at tenant creation time how large the tenant will grow. We need to be able to dynamically adjust the shard count at runtime. This is implemented as "splitting" of shards into smaller child shards, which cover a subset of the keyspace that the parent covered. Refer to RFC: https://github.com/neondatabase/neon/pull/6358 Part of epic: #6278 ## Summary of changes This PR implements the happy path (does not cleanly recover from a crash mid-split, although won't lose any data), without any optimizations (e.g. child shards re-download their own copies of layers that the parent shard already had on local disk) - Add `/v1/tenant/:tenant_shard_id/shard_split` API to pageserver: this copies the shard's index to the child shards' paths, instantiates child `Tenant` object, and tears down parent `Tenant` object. - Add `splitting` column to `tenant_shards` table. This is written into an existing migration because we haven't deployed yet, so don't need to cleanly upgrade. - Add `/control/v1/tenant/:tenant_id/shard_split` API to attachment_service, - Add `test_sharding_split_smoke` test. This covers the happy path: future PRs will add tests that exercise failure cases.	2024-02-08 15:35:13 +00:00
John Spray	3bd2a4fd56	control_plane: avoid feedback loop with /location_config if compute hook fails. (#6668 ) ## Problem The existing behavior isn't exactly incorrect, but is operationally risky: if the control plane compute hook breaks, then all the control plane operations trying to call /location_config will end up retrying forever, which could put more load on the system. ## Summary of changes - Treat 404s as fatal errors to do fewer retries: a 404 either indicates we have the wrong URL, or some control plane bug is failing to recognize our tenant ID as existing. - Do not return an error on reconcilation errors in a non-creating /location_config response: this allows the control plane to finish its Operation (and we will eventually retry the compute notification later)	2024-02-07 19:14:18 +00:00
John Spray	090a789408	storage controller: use PUT instead of POST (#6659 ) This was a typo, the server expects PUT.	2024-02-07 13:24:10 +00:00
John Spray	3d4fe205ba	control_plane/attachment_service: database connection pool (#6622 ) ## Problem This is mainly to limit our concurrency, rather than to speed up requests (I was doing some sanity checks on performance of the service with thousands of shards) ## Summary of changes - Enable the `diesel:r2d2` feature, which provides an async connection pool - Acquire a connection before entering spawn_blocking for a database transaction (recall that diesel's interface is sync) - Set a connection pool size of 99 to fit within default postgres limit (100) - Also set the tokio blocking thread count to accomodate the same number of blocking tasks (the only thing we use spawn_blocking for is database calls).	2024-02-07 13:08:09 +00:00
John Spray	4f57dc6cc6	control_plane/attachment_service: take public key as value (#6651 ) It's awkward to point to a file when doing some kinds of ad-hoc deployment (like right now, when I'm hacking a helm chart having not quite hooked up secrets properly yet). We take all the rest of the secrets as CLI args directly, so let's do the same for public key.	2024-02-06 19:08:39 +00:00
John Spray	431f4234d4	storage controller: embed database migrations in binary (#6637 ) ## Problem We don't have a neat way to carry around migration .sql files during deploy, and in any case would prefer to avoid depending on diesel CLI to deploy. ## Summary of changes - Use `diesel_migrations` crate to embed migrations in our binary - Run migrations on startup - Drop the diesel dependency in the `neon_local` binary, as the attachment_service binary just needs the database to exist. Do database creation with a simple `createdb`. Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2024-02-06 10:07:10 +00:00
Joonas Koivunen	947165788d	refactor: needless cancellation token cloning (#6618 ) The solution we ended up for `backoff::retry` requires always cloning of cancellation tokens even though there is just `.await`. Fix that, and also turn the return type into `Option<Result<T, E>>` avoiding the need for the `E::cancelled()` fn passed in. Cc: #6096	2024-02-06 09:39:06 +02:00
John Spray	8e114bd610	control_plane/attachment_service: make --database-url optional (#6636 ) ## Problem This change was left out of #6585 accidentally -- just forgot to push the very last version of my branch. Now that we can load database url from Secrets Manager, we don't always need it on the CLI any more. We should let the user omit it instead of passing `--database-url ""` ## Summary of changes - Make `--database-url` optional	2024-02-05 20:31:55 +01:00
John Spray	cb7c89332f	control_plane: fix tenant GET, clean up endpoints (#6553 ) Cleanups from https://github.com/neondatabase/neon/pull/6394 - There was a rogue `*` breaking the `GET /tenant/:tenant_id`, which passes through to shard zero - There was a duplicate migrate endpoint - There are un-prefixed API endpoints that were only needed for compat tests and can now be removed.	2024-02-05 14:29:05 +00:00
John Spray	786e9cf75b	control_plane: implement HTTP compute hook for attachment service (#6471 ) ## Problem When we change which physical pageservers a tenant is attached to, we must update the control plane so that it can update computes. This will be done via an HTTP hook, as described in https://www.notion.so/neondatabase/Sharding-Service-Control-Plane-interface-6de56dd310a043bfa5c2f5564fa98365#1fe185a35d6d41f0a54279ac1a41bc94 ## Summary of changes - Optional CLI args `--control-plane-jwt-token` and `-compute-hook-url` are added. If these are set, then we will use this HTTP endpoint, instead of trying to use neon_local LocalEnv to update compute configuration. - Implement an HTTP-driven version of ComputeHook that calls into the configured URL - Notify for all tenants on startup, to ensure that we don't miss notifications if we crash partway through a change, and carry a `pending_compute_notification` flag at runtime to allow notifications to fail without risking never sending the update. - Add a test for all this One might wonder: why not do a "forever" retry for compute hook notifications, rather than carrying a flag on the shard to call reconcile() again later. The reason is that we will later limit concurreny of reconciles, when dealing with larger numbers of shards, and if reconcile is stuck waiting for the control plane to accept a notification request, it could jam up the whole system and prevent us making other changes. Anyway: from the perspective of the outside world, we _do_ retry forever, but we don't retry forever within a given Reconciler lifetime. The `pending_compute_notification` logic is predicated on later adding a background task that just calls `Service::reconcile_all` on a schedule to make sure that anything+everything that can fail a Reconciler::reconcile call will eventually be retried.	2024-02-02 19:22:03 +00:00
John Spray	7e2436695d	storage controller: use AWS Secrets Manager for database URL, etc (#6585 ) ## Problem Passing secrets in via CLI/environment is awkward when using helm for deployment, and not ideal for security (secrets may show up in ps, /proc). We can bypass these issues by simply connecting directly to the AWS Secrets Manager service at runtime. ## Summary of changes - Add dependency on aws-sdk-secretsmanager - Update other aws dependencies to latest, to match transitive dependency versions - Add `Secrets` type in attachment service, using AWS SDK to load if secrets are not provided on the command line.	2024-02-02 16:57:11 +00:00
Arpad Müller	527cdbc010	Don't require AWS access keys for S3 pytests (#6556 ) Don't require AWS access keys (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY) for S3 usage in the pytests, and also allow AWS_PROFILE to be passed. One of the two methods is required however. This allows local development like: ``` aws sso login --profile dev export ENABLE_REAL_S3_REMOTE_STORAGE=nonempty REMOTE_STORAGE_S3_REGION=eu-central-1 REMOTE_STORAGE_S3_BUCKET=neon-github-ci-tests AWS_PROFILE=dev cargo build_testing && RUST_BACKTRACE=1 ./scripts/pytest -k debug-pg16 test_runner/regress/test_tenant_delete.py::test_tenant_delete_smoke ``` related earlier PR for the cargo unit tests of the `remote_storage` crate: #6202 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2024-02-01 20:18:07 +00:00
Konstantin Knizhnik	9a9d9beaee	Download SLRU segments on demand (#6151 ) ## Problem See https://github.com/neondatabase/cloud/issues/8673 ## Summary of changes Download missed SLRU segments from page server ## Checklist before requesting a review - [ ] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. ## Checklist before merging - [ ] Do not forget to reformat commit message to not include the above checklist --------- Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-01-31 21:39:18 +02:00
John Spray	2bfc831c60	control_plane/attachment_service: make --path optional (#6545 ) ## Problem The `--path` argument is only used in testing, for compat tests that use a JSON snapshot of state rather than the postgres database. In regular deployments, it should be omitted (currently one has to specify `--path ""`) ## Summary of changes Make `--path` optional.	2024-01-31 17:02:41 +00:00
John Spray	4010adf653	control_plane/attachment_service: complete APIs (#6394 ) Depends on: https://github.com/neondatabase/neon/pull/6468 ## Problem The sharding service will be used as a "virtual pageserver" by the control plane -- so it needs the set of pageserver APIs that the control plane uses, and to present them under identical URLs, including prefix (/v1). ## Summary of changes - Add missing APIs: - Tenant deletion - Timeline deletion - Node list (used in test now, later in tools) - `/location_config` API (for migrating tenants into the sharding service) - Rework attachment service URLs: - `/v1` prefix is used for pageserver-compatible APIs - `/upcall/v1` prefix is used for APIs that are called by the pageserver (re-attach and validate) - `/debug/v1` prefix is used for endpoints that are for testing - `/control/v1` prefix is used for new sharding service APIs that do not mimic a pageserver API, such as registering and configuring nodes. - Add test_sharding_service. The sharding service already had some collateral coverage from its use in general tests, but this is the first dedicated testing for it.	2024-01-31 12:23:06 +00:00
Christian Schwarz	3a36a0a227	fix(test suite): some tests leak child processes (#6497 )	2024-01-26 18:23:53 +00:00
John Spray	58f6cb649e	control_plane: database persistence for attachment_service (#6468 ) ## Problem Spun off from https://github.com/neondatabase/neon/pull/6394 -- this PR is just the persistence parts and the changes that enable it to work nicely ## Summary of changes - Revert #6444 and #6450 - In neon_local, start a vanilla postgres instance for the attachment service to use. - Adopt `diesel` crate for database access in attachment service. This uses raw SQL migrations as the source of truth for the schema, so it's a soft dependency: we can switch libraries pretty easily. - Rewrite persistence.rs to use postgres (via diesel) instead of JSON. - Preserve JSON read+write at startup and shutdown: this enables using the JSON format in compatibility tests, so that we don't have to commit to our DB schema yet. - In neon_local, run database creation + migrations before starting attachment service - Run the initial reconciliation in Service::spawn in the background, so that the pageserver + attachment service don't get stuck waiting for each other to start, when restarting both together in a test.	2024-01-26 17:20:44 +00:00
Christian Schwarz	689ad72e92	fix(neon_local): leaks child process if it fails to start & pass checks (#6474 ) refs https://github.com/neondatabase/neon/issues/6473 Before this PR, if process_started() didn't return Ok(true) until we ran out of retries, we'd return an error but leave the process running. Try it by adding a 20s sleep to the pageserver `main()`, e.g., right before we claim the pidfile. Without this PR, output looks like so: ``` (.venv) cs@devvm-mbp:[~/src/neon-work-2]: ./target/debug/neon_local start Starting neon broker at 127.0.0.1:50051. storage_broker started, pid: 2710939 . attachment_service started, pid: 2710949 Starting pageserver node 1 at '127.0.0.1:64000' in ".neon/pageserver_1"..... pageserver has not started yet, continuing to wait..... pageserver 1 start failed: pageserver did not start in 10 seconds No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/pageserver_1/pageserver.pid" No process is holding the pidfile. The process must have already exited. Leave in place to avoid race conditions: ".neon/safekeepers/sk1/safekeeper.pid" Stopping storage_broker with pid 2710939 immediately....... storage_broker has not stopped yet, continuing to wait..... neon broker stop failed: storage_broker with pid 2710939 did not stop in 10 seconds Stopping attachment_service with pid 2710949 immediately....... attachment_service has not stopped yet, continuing to wait..... attachment service stop failed: attachment_service with pid 2710949 did not stop in 10 seconds ``` and we leak the pageserver process ``` (.venv) cs@devvm-mbp:[~/src/neon-work-2]: ps aux \| grep pageserver cs 2710959 0.0 0.2 2377960 47616 pts/4 Sl 14:36 0:00 /home/cs/src/neon-work-2/target/debug/pageserver -D .neon/pageserver_1 -c id=1 -c pg_distrib_dir='/home/cs/src/neon-work-2/pg_install' -c http_auth_type='Trust' -c pg_auth_type='Trust' -c listen_http_addr='127.0.0.1:9898' -c listen_pg_addr='127.0.0.1:64000' -c broker_endpoint='http://127.0.0.1:50051/' -c control_plane_api='http://127.0.0.1:1234/' -c remote_storage={local_path='../local_fs_remote_storage/pageserver'} ``` After this PR, there is no leaked process.	2024-01-25 19:20:02 +01:00
John Spray	a72af29d12	control_plane/attachment_service: implement PlacementPolicy::Detached (#6458 ) ## Problem The API for detaching things wasn't implement yet, but one could hit this case indirectly from tests when using attach-hook, and find tenants unexpectedly attached again because their policy remained Single. ## Summary of changes Add PlacementPolicy::Detached, and: - add the behavior for it in schedule() - in tenant_migrate, refuse if the policy is detached - automatically set this policy in attach-hook if the caller has specified pageserver=null.	2024-01-24 12:49:30 +01:00
Christian Schwarz	743f6dfb9b	fix(attachment_service): corrupted attachments.json when parallel requests (#6450 ) The pagebench integration PR (#6214) issues attachment requests in parallel. We observed corrupted attachments.json from time to time, especially in the test cases with high tenant counts. The atomic overwrite added in #6444 exposed the root cause cleanly: the `.commit()` calls of two request handlers could interleave or be reordered. See also: https://github.com/neondatabase/neon/pull/6444#issuecomment-1906392259 This PR makes changes to the `persistence` module to fix above race: - mpsc queue for PendingWrites - one writer task performs the writes in mpsc queue order - request handlers that need to do writes do it using the new `mutating_transaction` function. `mutating_transaction`, while holding the lock, does the modifications, serializes the post-modification state, and pushes that as a `PendingWrite` into the mpsc queue. It then release the lock and `await`s the completion of the write. The writer tasks executes the `PendingWrites` in queue order. Once the write has been executed, it wakes the writing tokio task.	2024-01-23 19:14:32 +00:00

... 4 5 6 7 8 ...

641 Commits