rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 21:42:56 +00:00

Author	SHA1	Message	Date
John Spray	7eaa7a496b	pageserver: cancellation handling in writes to postgres client socket (#5503 ) ## Problem Writes to the postgres client socket from the page server were not wrapped in cancellation handling, so a stuck client connection could prevent tenant shutdowwn. ## Summary of changes All the places we call flush() to write to the socket, we should be respecting the cancellation token for the task. In this PR, I explicitly pass around a CancellationToken rather than doing inline `task_mgr::shutdown_token` calls, to avoid coupling it to the global task_mgr state and make it easier to refactor later. I have some follow-on commits that add a Shutdown variant to QueryError and use it more extensively, but that's pure refactor so will keep separate from this bug fix PR. Closes: https://github.com/neondatabase/neon/issues/5341	2023-10-09 15:54:17 +01:00
Joonas Koivunen	4772cd6c93	fix: deny branching, starting compute from not yet uploaded timelines (#5484 ) Part of #5172. First commits show that we used to allow starting up a compute or creating a branch off a not yet uploaded timeline. This PR moves activation of a timeline to happen after initial layer file(s) (if any) and `index_part.json` have been uploaded. Simply moving activation to be after downloads have finished works because we now spawn a task per http request handler. Current behaviour of uploading on the timelines on next startup is kept, to be removed later as part of #5172. Adds: - `NeonCli.map_branch` and corresponding `neon_local` implementation: allow creating computes for timelines managed via pageserver http client/api - possibly duplicate tests (I did not want to search for, will cleanup in a follow-up if these duplicated) Changes: - make `wait_until_tenant_state` return immediatedly on `Broken` and not wait more	2023-10-09 17:03:38 +03:00
Shany Pozin	010b4d0d5c	Move ApiError 404 to info level (#5501 ) ## Problem Moving ApiError 404 to info level logging (see https://github.com/neondatabase/neon/pull/5489#issuecomment-1750211212)	2023-10-09 13:54:46 +03:00
Rahul Modpur	477cb3717b	Fix neon_local pageserver status command (#5475 ) ## Problem Fix neon_local pageserver status command #5430 ## Summary of changes Fix clap config for pageserver status subcommand ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section. Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-10-09 09:13:57 +01:00
John Spray	ea5a97e7b4	pageserver: implement emergency mode for operating without control plane (#5469 ) ## Problem Pageservers with `control_plane_api` configured require a control plane to start up: in an incident this might be a problem. ## Summary of changes Note to reviewers: most of the code churn in mgr.rs is the refactor commit that enables the later emergency mode commit: you may want to review commits separately. - Add `control_plane_emergency_mode` configuration property - Refactor init_tenant_mgr to separate loading configurations from the main loop where we construct Tenant, so that the generations fetch can peek at the configs in emergency mode. - During startup, in emergency mode, attach any tenants that were attached on their last run, using the same generation number. Closes: #5381 Closes: https://github.com/neondatabase/neon/issues/5492	2023-10-06 17:25:21 +01:00
John Spray	547914fe19	pageserver: adjust timeline deletion for generations (#5453 ) ## Problem Spun off from https://github.com/neondatabase/neon/pull/5449 Timeline deletion does the following: 1. Delete layers referenced in the index 2. Delete everything else in the timeline prefix, except the index 3. Delete the index. When generations were added, the filter in step 2 got outdated, such that the index objects were deleted along with everything else at step 2. That didn't really break anything, but it makes an automated test unhappy and is a violation of the original intent of the code, which presumably intends to upload an invariant that as long as any objects for a timeline exist, the index exists. (Eventually, this index-object-last complexity can go away: when we do https://github.com/neondatabase/neon/issues/5080, there is no need to keep the index_part around, as deletions can always be retried any time any where.) ## Summary of changes After object listing, split the listed objects into layers and index objects. Delete the layers first, then the index objects.	2023-10-06 16:15:18 +00:00
Arpad Müller	607b185a49	Fix 1.73.0 clippy lints (#5494 ) Doesn't do an upgrade of rustc to 1.73.0 as we want to wait for the cargo response of the curl CVE before updating. In preparation for an update, we address the clippy lints that are newly firing in 1.73.0.	2023-10-06 14:17:19 +01:00
Christian Schwarz	bfba5e3aca	page_cache: ensure forward progress on miss (#5482 ) Problem ======= Prior to this PR, when we had a cache miss, we'd get back a write guard, fill it, the drop it and retry the read from cache. If there's severe contention for the cache, it could happen that the just-filled data gets evicted before our retry, resulting in lost work and no forward progress. Solution ======== This PR leverages the now-available `tokio::sync::RwLockWriteGuard`'s `downgrade()` functionality to turn the filled slot write guard into a read guard. We don't drop the guard at any point, so, forward progress is ensured. Refs ==== Stacked atop https://github.com/neondatabase/neon/pull/5480 part of https://github.com/neondatabase/neon/issues/4743 specifically part of https://github.com/neondatabase/neon/issues/5479	2023-10-06 13:41:13 +01:00
Christian Schwarz	ecc7a9567b	page_cache: inline `{,try_}lock_for_write` into `memorize_materialized_page` (#5480 ) Motivation ========== It's the only user, and the name of `_for_write` is wrong as of commit `7a63685cde` Author: Christian Schwarz <christian@neon.tech> Date: Fri Aug 18 19:31:03 2023 +0200 simplify page-caching of EphemeralFile (#4994) Notes ===== This also allows us to get rid of the WriteBufResult type. Also rename `search_mapping_for_write` to `search_mapping_exact`. It makes more sense that way because there is `_for_write`-locking anymore. Refs ==== part of https://github.com/neondatabase/neon/issues/4743 specifically https://github.com/neondatabase/neon/issues/5479 this is prep work for https://github.com/neondatabase/neon/pull/5482	2023-10-06 13:38:02 +02:00
Joonas Koivunen	45f98dd018	debug_tool: get page at lsn and keyspace via http api (#5057 ) If there are any layermap or layer file related problems, having a reproducable `get_page@lsn` easily usable for fast debugging iteration is helpful. Split off from #4938. Later evolved to add http apis for: - `get_page@lsn` at `/v1/tenant/:tenant_id/timeline/:timeline_id/get?key=<hex>&lsn=<lsn string>` - collecting the keyspace at `/v1/tenant/:tenant_id/timeline/:timeline_id/keyspace?[at_lsn=<lsn string>]` - defaults to `last_record_lsn` collecting the keyspace seems to yield some ranges for which there is no key.	2023-10-06 12:17:38 +01:00
John Spray	bdfe27f3ac	swagger: add a 503 definition to each endpoint (#5476 ) ## Problem The control plane doesn't have generic handling for this. ## Summary of changes Add a 503 response to every endpoint.	2023-10-06 11:31:49 +01:00
Joonas Koivunen	a15f9b3baa	pageserver: Tune 503 Resource unavailable (#5489 ) 503 Resource Unavailable appears as error in logs, but is not really an error which should ever fail a test on, or even log an error in prod, [evidence]. Changes: - log 503 as `info!` level - use `Cow<'static, str>` instead of `String` - add an additional `wait_until_tenant_active` in `test_actually_duplicate_l1` We ought to have in tests "wait for tenants to complete loading" but this is easier to implement for now. [evidence]: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5485/6423110295/index.html#/testresult/182de66203864fc0	2023-10-06 09:59:14 +01:00
Alexander Bayandin	ce92638185	test_runner: allow race in test_tenant_delete_is_resumed_on_attach (#5478 ) ## Problem `test_tenant_delete_is_resumed_on_attach` is flaky ## Summary of changes - Allow race in `test_tenant_delete_is_resumed_on_attach` - Cleanup `allowed_errors` in the file a bit	2023-10-06 09:49:31 +01:00
Joonas Koivunen	a3c82f19b8	tests: prettier subprocess output in test log (#5485 ) Clean subprocess output so that: - one line of output is just one line without a linebreak - like shells handle `echo subshell says: $(echo foo)` - multiple lines are indented like other pytest output - error output is dedented and then indented to be like other pytest output Minor readability changes remove friction.	2023-10-05 20:15:55 +00:00
Arthur Petukhovsky	8b15252f98	Move walproposer state into struct (#5364 ) This patch extracts all postgres-dependent functions in a separate `walproposer_api` functions struct. It helps to compile walproposer as static library without compiling all other postgres server code. This is useful to allow calling walproposer C code from Rust, or linking this library with anything else. All global variables containing walproposer state were extracted to a separate `WalProposer` struct. This makes it possible to run several walproposers in the same process, in separate threads. There were no logic changes and PR mostly consists of shuffling functions between several files. We have a good test coverage for walproposer code and I've seen no issues with tests while I was refactoring it, so I don't expect any issues after merge. ref https://github.com/neondatabase/neon/issues/547 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2023-10-05 18:48:01 +01:00
Alexander Bayandin	522aaca718	Temporary deploy staging preprod region from main (#5477 ) ## Problem Stating preprod region can't use `release-XXX` right now, the config is unified across all regions, it supports only `XXX`. Ref https://neondb.slack.com/archives/C03H1K0PGKH/p1696506459720909?thread_ts=1696437812.365249&cid=C03H1K0PGKH ## Summary of changes - Deploy staging-preprod from main	2023-10-05 14:02:20 +00:00
John Spray	7cbb39063a	tests: stabilize + extend deletion queue recovery test (#5457 ) ## Problem This test was unstable when run in parallel with lots of others: if the pageserver stayed up long enough for some of the deletions to get validated, they won't be discarded on restart the way the test expects when keep_attachment=True. This was a test bug, not a pageserver bug. ## Summary of changes - Add failpoints to control plane api client - Use failpoint to pause validation in the test to cover the case where it had been flaky - Add a metric for the number of deleted keys validated - Add a permutation to the test to additionally exercise the case where we _do_ validate lists before restart: this is a coverage enhancement that seemed sensible when realizing that the test was relying on nothing being validated before restart. - the test will now always enter the restart with nothing or everything validated.	2023-10-05 11:22:05 +01:00
John Spray	baa5fa1e77	pageserver: location configuration API, attachment modes, secondary locations (#5299 ) ## Problem These changes are part of building seamless tenant migration, as described in the RFC: - https://github.com/neondatabase/neon/pull/5029 ## Summary of changes - A new configuration type `LocationConf` supersedes `TenantConfOpt` for storing a tenant's configuration in the pageserver repo dir. It contains `TenantConfOpt`, as well as a new `mode` attribute that describes what kind of location this is (secondary, attached, attachment mode etc). It is written to a file called `config-v1` instead of `config` -- this prepares us for neatly making any other profound changes to the format of the file in future. Forward compat for existing pageserver code is achieved by writing out both old and new style files. Backward compat is achieved by checking for the old-style file if the new one isn't found. - The `TenantMap` type changes, to hold `TenantSlot` instead of just `Tenant`. The `Tenant` type continues to be used for attached tenants only. Tenants in other states (such as secondaries) are represented by a different variant of `TenantSlot`. - Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>, they now hold a reference to a AttachedTenantConf, which includes the extra information from LocationConf. This enables them to know the current attachment mode. - The attachment mode is used as an advisory input to decide whether to do compaction and GC (AttachedStale is meant to avoid doing uploads, AttachedMulti is meant to avoid doing deletions). - A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config` to drive new location configuration. This provides a superset of the functionality of attach/detach/load/ignore: - Attaching a tenant is just configuring it in an attached state - Detaching a tenant is configuring it to a detached state - Loading a tenant is just the same as attaching it - Ignoring a tenant is the same as configuring it into Secondary with warm=false (i.e. retain the files on disk but do nothing else). Caveats: - AttachedMulti tenants don't do compaction in this PR, but they do in the follow on #5397 - Concurrent updates to the `location_config` API are not handled elegantly in this PR, a better mechanism is added in the follow on https://github.com/neondatabase/neon/pull/5367 - Secondary mode is just a placeholder in this PR: the code to upload heatmaps and do downloads on secondary locations will be added in a later PR (but that shouldn't change any external interfaces) Closes: https://github.com/neondatabase/neon/issues/5379 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-10-05 09:55:10 +01:00
Conrad Ludgate	c216b16b0f	proxy: fix memory leak (#5472 ) ## Problem these JoinSets live for the duration of the process. they might have many millions of connections spawned on them and they never get cleared. Fixes #4672 ## Summary of changes Drain the connections as we go	2023-10-05 07:30:28 +01:00
John Spray	c5ea91f831	pageserver: fix loading control plane JWT token (#5470 ) ## Problem In #5383 this configuration was added, but it missed the parts of the Builder class that let it actually be used. ## Summary of changes Add `control_plane_api_token` hooks to PageserverConfigBuilder	2023-10-05 01:31:17 +01:00
Em Sharnoff	6489a4ea40	vm-monitor: Remove mem::forget of tokio::sync::mpsc::Sender (#5441 ) If the cgroup integration was not enabled, this would cause compute_ctl to leak memory. Thankfully, we never use vm-monitor without the cgroup handling enabled, so this wasn't actually impacting us, but... it still looked suspicious, so figured it was worth changing.	2023-10-04 15:08:10 -07:00
Arthur Petukhovsky	f8a7498965	Wait for sk tli init in test_timeline_status (#5467 ) Fix #5447	2023-10-04 22:53:34 +01:00
Joonas Koivunen	7dce62a9ee	test: duplicate L1 layer (#5412 ) We overwrite L1 layers if compaction gets interrupted. We did not have a test showing that we do in fact do this. The test might be a bit flaky due to timestamp usage, but separating for smaller diff in as part of #5172. Also removes an unrelated 200s pgbench from the test suite.	2023-10-04 16:52:32 +01:00
Alexander Bayandin	7a2cafb34d	Use zstd to compress large allure artifacts (#5458 ) ## Problem - Because we compress artifacts file by file, we don't need to put them into `tar` containers (ie instead of `tar.gz` we can use just `gz`). - Pythons gz single-threaded and pretty slow. A benchmark has shown ~20 times speedup (19.876176291 vs 0.8748335830000009) on my laptop (for a pageserver.log size is 1.3M) ## Summary of changes - Replace tarfile with zstandart - Update allure to 2.24.0	2023-10-04 16:20:16 +01:00
duguorong009	25a37215f3	fix: replace all `std::PathBuf`s with `camino::Utf8PathBuf` (#5352 ) Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with `camino::Utf8Path`, `camino::Utf8PathBuf` in - pageserver - safekeeper - control_plane - libs/remote_storage Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 17:52:23 +03:00
Conrad Ludgate	f002b1a219	proxy: http limits (#5460 ) ## Problem 1MB request body is apparently too small for some clients ## Summary of changes Update to 10 MB request body. Also revert the removal of response limits while we don't have streaming support.	2023-10-04 15:01:05 +01:00
Joonas Koivunen	fc467941f9	walredo: log retryed error (#5462 ) We currently lose the actual reason the first walredo attempt failed. Together with implicit retry making it difficult to eyeball what is happening. PR version keeps the logging the same error message twice, which is what we've been doing all along. However correlating the retrying case and the finally returned error is difficult, because the actual error message was left out before this PR. Lastly, log the final error we present to postgres in the same span, not outside it. Additionally, suppress the stacktrace as the comment suggested.	2023-10-04 14:19:19 +01:00
Christian Schwarz	25bf791568	metrics: distinguish page reconstruction success & failure (#5463 ) Here's the existing dashboards that use the metric: https://github.com/search?q=repo%3Aneondatabase%2Fgrafana-dashboard-export%20pageserver_getpage_reconstruct_seconds&type=code Looks like only `_count` and `_sum` values are used currently. We can fix them up easily post merge. I think the histogram is worth keeping, though. follow-up to https://github.com/neondatabase/neon/pull/5459#pullrequestreview-1657072882 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 13:40:00 +01:00
Joonas Koivunen	dee2bcca44	fix: time the reconstruction, not future creation (#5459 ) `pageserver_getpage_reconstruct_seconds` histogram had been only recording the time it takes to create a future, not await on it. Since: `eb0a698adc`.	2023-10-04 11:01:07 +01:00
Joonas Koivunen	db8ff9d64b	testing: record walredo failures to test reports (#5451 ) We have rare walredo failures with pg16. Let's introduce recording of failing walredo input in `#[cfg(feature = "testing")]`. There is additional logging (the value reconstruction path logging usually shown with not found keys), keeping it for `#[cfg(features = "testing")]`. Cc: #5404.	2023-10-04 11:24:30 +03:00
Rahul Modpur	af6a20dfc2	Improve CrashsafeOverwriteError source printing (#5410 ) ## Problem Duplication of error in log Fixes #5366 ## Summary of changes Removed `{0}` from error description above each enum due to presence of `#[source]` to avoid duplication Signed-off-by: Rahul Modpur <rmodpur2@gmail.com>	2023-10-04 02:38:42 +02:00
Alexander Bayandin	fec94ad5b3	Update checksums for pg_jsonschema & pg_graphql (#5455 ) ## Problem Folks have re-taged releases for `pg_jsonschema` and `pg_graphql` (to increase timeouts on their CI), for us, these are a noop changes, but unfortunately, this will cause our builds to fail due to checksums mismatch (this might not strike right away because of the build cache). - `8ba7c7be9d` - `aa7509370a` ## Summary of changes - `pg_jsonschema` update checksum - `pg_graphql` update checksum	2023-10-03 18:42:39 +01:00
John Spray	ace0c775fc	pageserver: prefer 503 to 500 for transient unavailability (#5439 ) ## Problem The 500 status code should only be used for bugs or unrecoverable failures: situations we did not expect. Currently, the pageserver is misusing this response code for some situations that are totally normal, like requests targeting tenants that are in the process of activating. The 503 response is a convenient catch-all for "I can't right now, but I will be able to". ## Summary of changes - Change some transient availability error conditions to return 503 instead of 500 - Update the HTTP client configuration in integration tests to retry on 503 After these changes, things like creating a tenant and then trying to create a timeline within it will no longer require carefully checking its status first, or retrying on 500s. Instead, a client which is properly configured to retry on 503 can quietly handle such situations.	2023-10-03 17:00:55 +01:00
dependabot[bot]	78dde31827	build(deps): bump urllib3 from 1.26.11 to 1.26.17 (#5442 )	2023-10-03 11:50:27 +01:00
Christian Schwarz	de0e96d2be	remote_storage: separate semaphores for read and write ops (#5440 ) Before this PR, a compaction that queues a lot of uploads could grab all the semaphore permits. Any readers that need on-demand downloads would queue up, causing getpage@lsn outliers. Internal context: https://neondb.slack.com/archives/C05NXJFNRPA/p1696264359425419?thread_ts=1696250393.840899&cid=C05NXJFNRPA	2023-10-03 11:22:11 +03:00
Alexander Bayandin	00369c8c2a	Update pg_jsonschema & pg_grapgql extensions (#5438 ) - Update `pg_jsonschema` to 0.2.0 with Postgres 16 support - Update `pg_grapgql` to 1.4.0 with Postgres 16 support - Remove `pgx` (old name of `pgrx`) layer from Dockerfile	2023-10-02 23:50:27 +01:00
Vadim Kharitonov	c1dcf61ca2	Update pgx-ulid extension (#5382 ) - Update `pgx-ulid` from 0.1.0 to 0.1.3, and add it to Postgres 16 - Add `pg_tiktoken` to Postgres 16 image Closes #5374 --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2023-10-02 15:52:45 +01:00
Sasha Krassovsky	89275f6c1e	Fix invalid database resulting from failed DROP DB (#5423 ) ## Problem If the control plane happened to respond to a DROP DATABASE request with a non-200 response, we'd abort the DROP DATABASE transaction in the usual spot. However, Postgres for some reason actually performs the drop inside of `standard_ProcessUtility`. As such, the database was left in a weird state after aborting the transaction. We had test coverage of a failed CREATE DATABASE but not a failed DROP DATABASE. ## Summary of changes Since DROP DATABASE can't be inside of a transaction block, we can just forward the DDL changes to the control plane inside of `ProcessUtility_hook`, and if we respond with 500 bail out of `ProcessUtility` before we perform the drop. This change also adds a test, which reproduced the invalid database issue before the fix was applied.	2023-09-29 19:39:28 +01:00
Christian Schwarz	c07eef8ea5	page_cache: find_victim: don't spin while there's no chance for a slot (#5319 ) It is wasteful to cycle through the page cache slots trying to find a victim slot if all the slots are currently un-evictable because a read / write guard is alive. We suspect this wasteful cycling to be the root cause for an "indigestion" we observed in staging (#5291). The hypothesis is that we `.await` after we get ahold of a read / write guard, and that tokio actually deschedules us in favor of another future. If that other future then needs a page slot, it can't get ours because we're holding the guard. Repeat this, and eventually, the other future(s) will find themselves doing `find_victim` until they hit `exceeded evict iter limit`. The `find_victim` is wasteful and CPU-starves the futures that are already holding the read/write guard. A `yield` inside `find_victim` could mitigate the starvation, but wouldn't fix the wasting of CPU cycles. So instead, this PR queues waiters behind a tokio semaphore that counts evictable slots. The downside is that this stops the clock page replacement if we have 0 evictable slots. Also, as explained by the big block comment in `find_victims`, the semaphore doesn't fully prevent starvation because because we can't make tokio prioritize those tasks executing `find_victim` that have been trying the longest. Implementation =============== We need to acquire the semaphore permit before locking the slot. Otherwise, we could deadlock / discover that all permits are gone and would have to relinquish the slot, having moved forward the Clock LRU without making progress. The downside is that, we never get full throughput for read-heavy workloads, because, until the reader coalesces onto an existing permit, it'll hold its own permit. Addendum To Root-Cause Analysis In #5291 ======================================== Since merging that PR, @arpad-m pointed out that we couldn't have reached the `slot.write().await` with his patches because the VirtualFile slots can't have all been write-locked, because we only hold them locked while the IO is ongoing, and the IO is still done with synchronous system calls in that patch set, so, we can have had at most $number_of_executor_threads locked at any given time. I count 3 tokio runtimes that do `Timeline::get`, each with 8 executor threads in our deployment => $number_of_executor_threads = 3*8 = 24 . But the virtual file cache has 100 slots. We both agree that nothing changed about the core hypothesis, i.e., additional await points inside VirtualFile caused higher concurrency resulting in exhaustion of page cache slots. But we'll need to reproduce the issue and investigate further to truly understand the root cause, or find out that & why we were indeed using 100 VirtualFile slots. TODO: could it be compaction that needs to hold guards of many VirtualFile's in its iterators?	2023-09-29 20:03:56 +02:00
Alexander Bayandin	86dd28d4fb	Bump hermit-abi & num_cpus packages (#5427 ) ## Problem I've noticed that `hermit-abi` 0.3.1 [1] has been yanked from crates.io (looks like nothing too bad [2]). Also, we have 2 versions of `hermit-api` in dependencies (0.3.* and 0.2.*), update `num-cpus` to use the latest `hermit-api` 0.3.3. - [1] https://crates.io/crates/hermit-abi/0.3.1 - [2] https://github.com/hermit-os/hermit-rs/issues/436 ## Summary of changes - `cargo update -p num-cpus` - `cargo update -p hermit-abi` - Unignore `RUSTSEC-2023-0052` in `deny.toml` (it has been fixed in https://github.com/neondatabase/neon/pull/5069)	2023-09-29 12:57:45 +01:00
Conrad Ludgate	fd20bbc6cb	proxy: log params when no endpoint (#5418 ) ## Problem Our SNI error dashboard features IP addresses but it's not immediately clear who that is still (#5369) ## Summary of changes Log some startup params with this error	2023-09-29 09:40:27 +01:00
John Spray	6a1903987a	tests: use approximate equality in test_get_tenant_size_with_multiple_branches (#5411 ) ## Problem This test has been flaky for a long time. As far as I can tell, the test was simply wrong to expect postgres activity to result in deterministic sizes: making the match fuzzy is not a hack, it's just matching the reality that postgres doesn't promise to write exactly the same number of pages every time it runs a given query. ## Summary of changes Equalities now tolerate up to 4 pages different. This is big enough to tolerate the deltas we've seen in practice. Closes: https://github.com/neondatabase/neon/issues/2962	2023-09-29 09:15:43 +01:00
John Spray	1881373ec4	Update CODEOWNERS (#5421 ) It is usually not intended to notify a random member of the compute team for pageserver PRs. Leaving the notification of the storage team in place, because this serves a purpose when some external contributor opens a PR and isn't sure who to ask.	2023-09-28 17:34:51 +01:00
John Spray	ca3ca2bb9c	pageserver: don't try and recover deletion queue if no remote storage (#5419 ) ## Problem Because `neon_local` by default runs with no remote storage, it was not running the deletion queue workers, and the attempt to call into `recover()` was failing. This is a bogus configuration that will go away when we make remote storage mandatory. ## Summary of changes Don't try and do deletion queue recovery when remote storage is disabled. The reason we don't just unset `control_plane_api` to avoid this is that generations will soon become mandatory, irrespective of when we make remote storage mandatory.	2023-09-28 17:20:34 +01:00
Em Sharnoff	b497d0094e	file cache: Remove free space monitor (#5406 ) This effectively reverts #3832. There's a couple issues we just discovered with the free space monitor, and to my knowledge, the fact we're putting the file cache on a separate filesystem (even when on disk) that's guaranteed to have more room than the maximum size means that this free space monitor should have no effect. More details: 1. The control plane sets the maximum file cache size based on max CU 2. The control plane sets the size of the filesystem underlying the file cache based on the maximum user selectable CU (or, if the endpoint is larger, then that size), so that there's always enough room 3. If postmaster gets SIGKILL'd, then the free space monitor process does not exit 4. If the free space monitor is acting on the cache file but not subject to locking or up-to-date metadata from a newer postgres instance, then this could lead to data corruption. So, in practice I belive the risk of data corruption is low but not nothing, and given the issues we hit because of (3), and given that this the free space monitor shouldn't be necessary because of (1) and (2), it's best to just remove it outright. See also: neondatabase/autoscaling#534, #5405	2023-09-28 06:47:44 -07:00
Conrad Ludgate	528fb1bd81	proxy: metrics2 (#5179 ) ## Problem We need to count metrics always when a connection is open. Not only when the transfer is 0. We also need to count bytes usage for HTTP. ## Summary of changes New structure for usage metrics. A `DashMap<Ids, Arc<Counters>>`. If the arc has 1 owner (the map) then I can conclude that no connections are open. If the counters has "open_connections" non zero, then I can conclude a new connection was opened in the last interval and should be reported on. Also, keep count of how many bytes processed for HTTP and report it here.	2023-09-28 11:38:26 +01:00
Joonas Koivunen	af28362a47	tests: Default to LOCAL_FS for pageserver remote storage (#5402 ) Part of #5172. Builds upon #5243, #5298. Includes the test changes: - no more RemoteStorageKind.NOOP - no more testing of pageserver without remote storage - benchmarks now use LOCAL_FS as well Support for running without RemoteStorage is still kept but in practice, there are no tests and should not be any tests. Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-28 12:25:20 +03:00
John Spray	6b4bb91d0a	docs/rfcs: add RFC for fast tenant migration/failover (#5029 ) ## Problem Currently we don't have a way to migrate tenants from one pageserver to another without a risk of gap in availability. ## Summary of changes This follows on from https://github.com/neondatabase/neon/pull/4919 Migrating tenants between pageservers is essential to operating a service at scale, in several contexts: 1. Responding to a pageserver node failure by migrating tenants to other pageservers 2. Balancing load and capacity across pageservers, for example when a user expands their database and they need to migrate to a pageserver with more capacity. 3. Restarting pageservers for upgrades and maintenance Currently, a tenant may migrated by attaching to a new node, re-configuring endpoints to use the new node, and then later detaching from the old node. This is safe once [generation numbers](025-generation-numbers.md) are implemented, but does meet our seamless/fast/efficient goals: Co-authored-by: Christian Schwarz <christian@neon.tech>	2023-09-28 10:07:11 +01:00
Em Sharnoff	5fdc80db03	Bump vm-builder v0.17.11 -> v0.17.12 (#5407 ) Only relevant change is neondatabase/autoscaling#534 - refer there for more details.	2023-09-28 09:52:39 +02:00
Em Sharnoff	48e85460fc	vm-monitor: Unset memory.high on start + refactor cgroup handling (#5348 ) ## Problem Over the past couple days, we've had a couple VMs hit issues with postgres getting hit by memory.high throttling, even after #5303 was supposed to fix that. The tl;dr of those issues is that because vm-monitor startup sets the file cache size first, before interacting with the cgroup, cgroup throttling can mean we timeout connecting to the file cache and never reset the cgroup, even if memory has been upscaled since then. See e.g.: - https://neondb.slack.com/archives/C03F5SM1N02/p1695218132208249 - https://neondb.slack.com/archives/C03F5SM1N02/p1695314613696659 ## Summary of changes This PR adds an additional step into vm-monitor startup, where we first set the cgroup's memory.high value to 'max', removing the capacity for throttling. This preferable to just setting memory.high before the file cache, because it's theoretically possible that the new value of memory.high could still be less than the current memory usage, in which case postgres could continue to be throttled without sufficient memory events to relieve that. Implementing this properly involved adding a method to our internal cgroup interface, and it seemed like there was duplicated functionality there, so this PR unifies that as well, making things a bit more consistent.	2023-09-27 21:27:23 -07:00

1 2 3 4 5 ...

3834 Commits