rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-29 19:10:38 +00:00

Author	SHA1	Message	Date
Mikhail Kot	6138d61592	Object storage proxy (#11357 ) Service targeted for storing and retrieving LFC prewarm data. Can be used for proxying S3 access for Postgres extensions like pg_mooncake as well. Requests must include a Bearer JWT token. Token is validated using a pemfile (should be passed in infra/). Note: app is not tolerant to extra trailing slashes, see app.rs `delete_prefix` test for comments. Resolves: https://github.com/neondatabase/cloud/issues/26342 Unrelated changes: gate a `rename_noreplace` feature and disable it in `remote_storage` so as `object_storage` can be built with musl	2025-04-08 14:54:53 +00:00
Christian Schwarz	aad410c8f1	improve ondemand-download latency observability (#11421 ) ## Problem We don't have metrics to exactly quantify the end user impact of on-demand downloads. Perf tracing is underway (#11140) to supply us with high-resolution samples. But it will also be useful to have some aggregate per-timeline and per-instance metrics that definitively contain all observations. ## Summary of changes This PR consists of independent commits that should be reviewed independently. However, for convenience, we're going to merge them together. - refactor(metrics): measure_remote_op can use async traits - impr(pageserver metrics): task_kind dimension for remote_timeline_client latency histo - implements https://github.com/neondatabase/cloud/issues/26800 - refs https://github.com/neondatabase/cloud/issues/26193#issuecomment-2769705793 - use the opportunity to rename the metric and add a _global suffix; checked grafana export, it's only used in two personal dashboards, one of them mine, the other by Heikki - log on-demand download latency for expensive-to-query but precise ground truth - metric for wall clock time spent waiting for on-demand downloads ## Refs - refs https://github.com/neondatabase/cloud/issues/26800 - a bunch of minor investigations / incidents into latency outliers	2025-04-04 18:04:39 +00:00
Christian Schwarz	4f94751b75	pageserver config: ignore+warn about unknown fields (instead of `deny_unknown_fields`) (#11275 ) # Refs - refs https://github.com/neondatabase/neon/issues/8915 - discussion thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1742406381132599 - stacked atop https://github.com/neondatabase/neon/pull/11298 - corresponding internal docs update that illustrates how this PR removes friction: https://github.com/neondatabase/docs/pull/404 # Problem Rejecting `pageserver.toml`s with unknown fields adds friction, especially when using `pageserver.toml` fields as feature flags that need to be decommissioned. See the added paragraphs on `pageserver_api::models::ConfigToml` for details on what kind of friction it causes. Also read the corresponding internal docs update linked above to see a more imperative guide for using `pageserver.toml` flags as feature flags. # Solution ## Ignoring unknown fields Ignoring is the serde default behavior. So, just remove `serde(deny_unknown_fields)` from all structs in `pageserver_api::config::ConfigToml` `pageserver_api::config::TenantConfigToml`. I went through all the child fields and verified they don't use `deny_unknown_fields` either, including those shared with `pageserver_api::models`. ## Warning about unknown fields We still want to warn about unknown fields to - be informed about typos in the config template - be reminded about feature-flag style configs that have been cleaned up in code but not yet in config templates We tried `serde_ignore` (cf draft #11319) but it doesn't work with `serde(flatten)`. The solution we arrived at is to compare the on-disk TOML with the TOML that we produce if we serialize the `ConfigToml` again. Any key specified in the on-disk TOML but not present in the serialized TOML is flagged as an ignored key. The mechanism to do it is a tiny recursive decent visitor on the `toml_edit::DocumentMut`. # Future Work Invalid config _values_ in known fields will continue to fail pageserver startup. See - https://github.com/neondatabase/cloud/issues/24349 for current worst case impact to deployments & ideas to improve.	2025-04-04 17:30:58 +00:00
Dmitrii Kovalkov	181af302b5	storcon + safekeeper + scrubber: propagate root CA certs everywhere (#11418 ) ## Problem There are some places in the code where we create `reqwest::Client` without providing SSL CA certs from `ssl_ca_file`. These will break after we enable TLS everywhere. - Part of https://github.com/neondatabase/cloud/issues/22686 ## Summary of changes - Support `ssl_ca_file` in storage scrubber. - Add `use_https_safekeeper_api` option to safekeeper to use https for peer requests. - Propagate SSL CA certs to storage_controller/client, storcon's ComputeHook, PeerClient and maybe_forward.	2025-04-04 06:30:48 +00:00
Vlad Lazar	9db63fea7a	pageserver: optionally export perf traces in OTEL format (#11140 ) Based on https://github.com/neondatabase/neon/pull/11139 ## Problem We want to export performance traces from the pageserver in OTEL format. End goal is to see them in Grafana. ## Summary of changes https://github.com/neondatabase/neon/pull/11139 introduces the infrastructure required to run the otel collector alongside the pageserver. ### Design Requirements: 1. We'd like to avoid implementing our own performance tracing stack if possible and use the `tracing` crate if possible. 2. Ideally, we'd like zero overhead of a sampling rate of zero and be a be able to change the tracing config for a tenant on the fly. 3. We should leave the current span hierarchy intact. This includes adding perf traces without modifying existing tracing. To satisfy (3) (and (2) in part) a separate span hierarchy is used. `RequestContext` gains an optional `perf_span` member that's only set when the request was chosen by sampling. All perf span related methods added to `RequestContext` are no-ops for requests that are not sampled. This on its own is not enough for (3), so performance spans use a separate tracing subscriber. The `tracing` crate doesn't have great support for this, so there's a fair amount of boilerplate to override the subscriber at all points of the perf span lifecycle. ### Perf Impact [Periodic pagebench](https://neonprod.grafana.net/d/ddqtbfykfqfi8d/e904990?orgId=1&from=2025-02-08T14:15:59.362Z&to=2025-03-10T14:15:59.362Z&timezone=utc) shows no statistically significant regression with a sample ratio of 0. There's an annotation on the dashboard on 2025-03-06. ### Overview of changes: 1. Clean up the `RequestContext` API a bit. Namely, get rid of the `RequestContext::extend` API and use the builder instead. 2. Add pageserver level configs for tracing: sampling ratio, otel endpoint, etc. 3. Introduce some perf span tracking utilities and expose them via `RequestContext`. We add a `tracing::Span` wrapper to be used for perf spans and a `tracing::Instrumented` equivalent for it. See doc comments for reason. 4. Set up OTEL tracing infra according to configuration. A separate runtime is used for the collector. 5. Add perf traces to the read path. ## Refs - epic https://github.com/neondatabase/neon/issues/9873 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-04-03 17:56:51 +00:00
Alex Chi Z.	131b32ef48	fix(pageserver): clean up aux files before detaching (#11299 ) ## Problem Related to https://github.com/neondatabase/cloud/issues/26091 and https://github.com/neondatabase/cloud/issues/25840 Close https://github.com/neondatabase/neon/issues/11297 Discussion on Slack: https://neondb.slack.com/archives/C033RQ5SPDH/p1742320666313969 ## Summary of changes * When detaching, scan all aux files within `sparse_non_inherited_keyspace` in the ancestor timeline and create an image layer exactly at the ancestor LSN. All scanned keys will map to an empty value, which is a delete tombstone. - Note that end_lsn for rewritten delta layers = ancestor_lsn + 1, so the image layer will have image_end_lsn=end_lsn. With the current `select_layer` logic, the read path will always first read the image layer. * Add a test case. --------- Signed-off-by: Alex Chi Z <chi@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-04-03 15:55:22 +00:00
Dmitrii Kovalkov	2e11d129d0	tests: suppress mgm api timeout error in sotrcon (#11428 ) ## Problem Since `0f367cb665` the timeout in `with_client_retries` is implemented via `tokio::timeout` instead of `reqwest::ClientBuilder::timeout` (because we reuse the client). It changed the error representation if the timeout is exceeded. Such errors were suppressed in `allowed_errors.py`, but old regexps do not match the new error. Discussion: https://neondb.slack.com/archives/C033RQ5SPDH/p1743533184736319 ## Summary of changes - Add new `Timeout` error to `allowed_errors.py`	2025-04-03 14:18:50 +00:00
Erik Grinaker	17193d6a33	test_runner: fix pagebench tenant configs (#11420 ) ## Problem Pagebench creates a bunch of tenants by first creating a template tenant and copying its remote storage, then attaching the copies to the Pageserver. These tenants had custom configurations to disable GC and compaction. However, these configs were only picked up by the Pageserver on attach, and not registered with the storage controller. This caused the storage controller to replace the tenant configs with the default tenant config, re-enabling GC and compaction which interferes with benchmark performance. Resolves #11381. ## Summary of changes Register the copied tenants with the storage controller, instead of directly attaching them to the Pageserver.	2025-04-02 20:11:39 +00:00
Arpad Müller	e3d27b2f68	Start safekeeper node IDs with 0 and forbid 0 from registering (#11419 ) Right now we start safekeeper node ids at 0. However, other code treats 0 as invalid (see #11407). We decided on latter. Therefore, make the register python tests register safekeepers starting at node id 1 instead of 0, and forbid safekeepers with id 0 from registering. Context: https://github.com/neondatabase/neon/pull/11407#discussion_r2024852328	2025-04-02 18:36:50 +00:00
Alex Chi Z.	dd1299f337	feat(storcon): passthrough mark invisible and add tests (#11401 ) ## Problem close https://github.com/neondatabase/neon/issues/11279 ## Summary of changes * Allow passthrough of other methods in tenant timeline shard0 passthrough of storcon. * Passthrough mark invisible API in storcon. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-04-02 17:11:49 +00:00
Peter Bendel	4bc6dbdd5f	use a prod-like shared_buffers size for some perf unit tests (#11373 ) ## Problem In Neon DBaaS we adjust the shared_buffers to the size of the compute, or better described we adjust the max number of connections to the compute size and we adjust the shared_buffers size to the number of max connections according to about the following sizes `2 CU: 225mb; 4 CU: 450mb; 8 CU: 900mb` [see](`877e33b428/goapp/controlplane/internal/pkg/compute/computespec/pg_settings.go (L405)`) ## Summary of changes We should run perf unit tests with settings that is realistic for a paying customer and select 8 CU as the reference for those tests.	2025-04-02 10:43:05 +00:00
Alexey Kondratov	557127550c	feat(compute): Add compute_ctl_up metric (#11376 ) ## Problem For computes running inside NeonVM, the actual compute image tag is buried inside the NeonVM spec, and we cannot get it as part of standard k8s container metrics (it's always an image and a tag of the NeonVM runner container). The workaround we currently use is to extract the running computes info from the control plane database with SQL. It has several drawbacks: i) it's complicated, separate DB per region; ii) it's slow; iii) it's still an indirect source of info, i.e. k8s state could be different from what the control plane expects. ## Summary of changes Add a new `compute_ctl_up` gauge metric with `build_tag` and `status` labels. It will help us to both overview what are the tags/versions of all running computes; and to break them down by current status (`empty`, `running`, `failed`, etc.) Later, we could introduce low cardinality (no endpoint or compute ids) streaming aggregates for such metrics, so they will be blazingly fast and usable for monitoring the fleet-wide state.	2025-04-01 08:51:17 +00:00
Alexander Bayandin	30a7dd630c	ruff: enable TC — flake8-type-checking (#11368 ) ## Problem `TYPE_CHECKING` is used inconsistently across Python tests. ## Summary of changes - Update `ruff`: 0.7.0 -> 0.11.2 - Enable TC (flake8-type-checking): https://docs.astral.sh/ruff/rules/#flake8-type-checking-tc - (auto)fix all new issues	2025-03-30 18:58:33 +00:00
Erik Grinaker	db5384e1b0	pageserver: remove L0 flush upload wait (#11196 ) ## Problem Previously, L0 flushes would wait for uploads, as a simple form of backpressure. However, this prevented flush pipelining and upload parallelism. It has since been disabled by default and replaced by L0 compaction backpressure. Touches https://github.com/neondatabase/cloud/issues/24664. ## Summary of changes This patch removes L0 flush upload waits, along with the `l0_flush_wait_upload`. This can't be merged until the setting has been removed across the fleet.	2025-03-30 13:14:04 +00:00
Vlad Lazar	9fc7c22cc9	storcon: add use_local_compute_notifications flag (#11333 ) ## Problem While working on bulk import, I want to use the `control-plane-url` flag for a different request. Currently, the local compute hook is used whenever no control plane is specified in the config. My test requires local compute notifications and a configured `control-plane-url` which isn't supported. ## Summary of changes Add a `use-local-compute-notifications` flag. When this is set, we use the local flow regardless of other config values. It's enabled by default in neon_local and disabled by default in all other envs. I had to turn the flag off in tests that wish to bypass the local flow, but that's expected. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-03-21 15:31:06 +00:00
John Spray	76088c16d2	storcon: reproduce shard split issue (#11290 ) ## Problem Issue https://github.com/neondatabase/neon/issues/11254 describes a case where restart during a shard split can result in a bad end state in the database. ## Summary of changes - Add a reproducer for the issue - Tighten an existing safety check around updated row counts in complete_shard_split	2025-03-21 08:48:56 +00:00
Erik Grinaker	65d690b21d	storcon: add repeated auto-splits and initial splits (#11122 ) ## Problem Currently, we only split tenants into 8 shards once, at the 64 GB split threshold. For very large tenants, we need to keep splitting to avoid huge shards. And we also want to eagerly split at a lower threshold to improve throughput during initial ingestion. See https://github.com/neondatabase/cloud/issues/22532#issuecomment-2706215907 for details. Touches https://github.com/neondatabase/cloud/issues/22532. Requires #11157. ## Summary of changes This adds parameters and logic to enable repeated splits when a tenant's largest timeline divided by shard count exceeds `split_threshold`, as well as eager initial splits at a lower threshold to speed up initial ingestion. The default parameters are all set such that they retain the current behavior in production (only split into 8 shards once, at 64 GB). * `split_threshold` now specifies a maximum shard size. When a shard exceeds it, all tenant shards are split by powers of 2 such that all tenant shards fall below `split_threshold`. Disabled by default, like today. * Add `max_split_shards` to specify a max shard count for autosplits. Defaults to 8 to retain current behavior. * Add `initial_split_threshold` and `initial_split_shards` to specify a threshold and target count for eager splits of unsharded tenants. Defaults to 64 GB and 8 shards to retain current production behavior. Because this PR sets `initial_split_threshold` to 64 GB by default, it has the effect of enabling autosplits by default. This was not the case previously, since `split_threshold` defaults to None, but it is already enabled across production and staging. This is temporary until we complete the production rollout. For more details, see code comments. This must wait until #11157 has been deployed to Pageservers. Once this has been deployed to production, we plan to change the parameters to: * `split-threshold`: 256 GB * `initial-split-threshold`: 16 GB * `initial-split-shards`: 4 * `max-split-shards`: 16 The final split points will thus be: * Start: 1 shard * 16 GB: 4 shards * 1 TB: 8 shards * 2 TB: 16 shards We will then change the default settings to be disabled by default. --------- Co-authored-by: John Spray <john@neon.tech>	2025-03-20 15:43:57 +00:00
Gleb Novikov	2065074559	fast_import: put job status to s3 (#11284 ) ## Problem `fast_import` binary is being run inside neonvms, and they do not support proper `kubectl describe logs` now, there are a bunch of other caveats as well: https://github.com/neondatabase/autoscaling/issues/1320 Anyway, we needed a signal if job finished successfully or not, and if not — at least some error message for the cplane operation. And after [a short discussion](https://neondb.slack.com/archives/C07PG8J1L0P/p1741954251813609), that s3 object is the most convenient at the moment. ## Summary of changes If `s3_prefix` was provided to `fast_import` call, any job run puts a status object file into `{s3_prefix}/status/fast_import` with contents `{"done": true}` or `{"done": false, "error": "..."}`. Added a test as well	2025-03-20 15:23:35 +00:00
Dmitrii Kovalkov	9bf59989db	storcon: add https API (#11239 ) ## Problem Pageservers use unencrypted HTTP requests for storage controller API. - Closes: https://github.com/neondatabase/cloud/issues/25524 ## Summary of changes - Replace hyper0::server::Server with http_utils::server::Server in storage controller. - Add HTTPS handler for storage controller API. - Support `ssl_ca_file` in pageserver.	2025-03-20 08:22:02 +00:00
Arpad Müller	56149a046a	Add test_explicit_timeline_creation_storcon and make it work (#11261 ) Adds a basic test that makes the storcon issue explicit creation of a timeline on safeekepers (main storcon PR in #11058). It was adapted from `test_explicit_timeline_creation` from #11002. Also, do a bunch of fixes needed to get the test work (the API definitions weren't correct), and log more stuff when we can't create a new timeline due to no safekeepers being active. Part of #9011 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2025-03-17 16:28:21 +00:00
Dmitrii Kovalkov	f68be2b5e2	safekeeper: https for management API (#11171 ) ## Problem Storage controller uses unencrypted HTTP requests for safekeeper management API. - Closes: https://github.com/neondatabase/cloud/issues/24836 ## Summary of changes - Replace `hyper0::server::Server` with `http_utils::server::Server` in safekeeper. - Add HTTPS handler for safekeeper management API.	2025-03-14 11:41:22 +00:00
Alex Chi Z.	23b713900e	feat(storcon): passthrough ancestor detach behavior (#11199 ) ## Problem https://github.com/neondatabase/neon/issues/10310 https://github.com/neondatabase/neon/pull/11158 ## Summary of changes We need to passthrough the new detach behavior through the storcon API. Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-13 20:21:23 +00:00
Arpad Müller	b1a1be6a4c	switch pytests and neon_local to control_plane_hooks_api (#11195 ) We want to switch away from and deprecate the `--compute-hook-url` param for the storcon in favour of `--control-plane-url` because it allows us to construct urls with `notify-safekeepers`. This PR switches the pytests and neon_local from a `control_plane_compute_hook_api` to a new param named `control_plane_hooks_api` which is supposed to point to the parent of the `notify-attach` URL. We still support reading the old url from disk to not be too disruptive with existing deployments, but we just ignore it. Also add docs for the `notify-safekeepers` upcall API. Follow-up of #11173 Part of https://github.com/neondatabase/neon/issues/11163	2025-03-13 19:50:52 +00:00
Christian Schwarz	ed31dd2a3c	pageserver: better observability for slow wait_lsn (#11176 ) # Problem We leave too few observability breadcrumbs in the case where wait_lsn is exceptionally slow. # Changes - refactor: extract the monitoring logic out of `log_slow` into `monitor_slow_future` - add global + per-timeline counter for time spent waiting for wait_lsn - It is updated while we're still waiting, similar to what we do for page_service response flush. - add per-timeline counterpair for started & finished wait_lsn count - add slow-logging to leave breadcrumbs in logs, not just metrics For the slow-logging, we need to consider not flooding the logs during a broker or network outage/blip. The solution is a "log-streak-level" concurrency limit per timeline. At any given time, there is at most one slow wait_lsn that is logging the "still running" and "completed" sequence of logs. Other concurrent slow wait_lsn's don't log at all. This leaves at least one breadcrumb in each timeline's logs if some wait_lsn was exceptionally slow during a given period. The full degree of slowness can then be determined by looking at the per-timeline metric. # Performance Reran the `bench_log_slow` benchmark, no difference, so, existing call sites are fine. We do use a Semaphore, but only try_acquire it _after_ things have already been determined to be slow. So, no baseline overhead anticipated. # Refs - https://github.com/neondatabase/cloud/issues/23486#issuecomment-2711587222	2025-03-13 15:03:53 +00:00
Alex Chi Z.	c3b3b507f7	feat(pageserver): support detaching behavior v2 (#11158 ) ## Problem close https://github.com/neondatabase/neon/issues/10310 ## Summary of changes This patch adds a new behavior for the detach_ancestor API: detach with multi-level ancestor and no reparenting. Though we can potentially support multi-level + do reparenting / single-level + no-reparenting in the future, as it's not required for the recovery/snapshot epic, I'd prefer keeping things simple now that we only handle the old one and the new one instead of supporting the full feature matrix. I only added a test case of successful detaching instead of testing failures. I'd like to make this into staging and add more tests in the future. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-12 22:27:23 +00:00
Dmitrii Kovalkov	63b22d3fb1	pageserver: https for management API (#11025 ) ## Problem Storage controller uses unencrypted HTTP requests for pageserver management API. Closes: https://github.com/neondatabase/cloud/issues/24283 ## Summary of changes - Implement `http_utils::server::Server` with TLS support. - Replace `hyper0::server::Server` with `http_utils::server::Server` in pageserver. - Add HTTPS handler for pageserver management API. - Generate local SSL certificates in neon local.	2025-03-10 15:07:59 +00:00
Alex Chi Z.	cd438406fb	feat(pageserver): add force patch index_part API (#11119 ) ## Problem As part of the disaster recovery tool. Partly for https://github.com/neondatabase/neon/issues/9114. ## Summary of changes * Add a new pageserver API to force patch the fields in index_part and modify the timeline internal structures. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-07 17:42:52 +00:00
John Spray	87e6117dfd	storage controller: API-driven graceful migrations (#10913 ) ## Problem The current migration API does a live migration, but if the destination doesn't already have a secondary, that live migration is unlikely to be able to warm up a tenant properly within its timeout (full warmup of a big tenant can take tens of minutes). Background optimisation code knows how to do this gracefully by creating a secondary first, but we don't currently give a human a way to trigger that. Closes: https://github.com/neondatabase/neon/issues/10540 ## Summary of changes - Add `prefererred_node` parameter to TenantShard, which is respected by optimize_attachment - Modify migration API to have optional prewarm=true mode, in which we set preferred_node and call optimize_attachment, rather than directly modifying intentstate - Require override_scheduler=true flag if migrating somewhere that is a less-than-optimal scheduling location (e.g. wrong AZ) - Add `origin_node_id` to migration API so that callers can ensure they're moving from where they think they're moving from - Add tests for the above The storcon_cli wrapper for this has a 'watch' mode that waits for eventual cutover. This doesn't show the warmth of the secondary evolve because we don't currently have an API for that in the controller, as the passthrough API only targets attached locations, not secondaries. It would be straightforward to add later as a dedicated endpoint for getting secondary status, then extend the storcon_cli to consume that and print a nice progress indicator.	2025-03-07 17:02:38 +00:00
Vlad Lazar	084fc4a757	pageserver: enable previous heatmaps by default (#11132 ) We add the off by default configs in https://github.com/neondatabase/neon/pull/11088 because the unarchival heatmap was causing oversized secondary locations. That was fixed in https://github.com/neondatabase/neon/pull/11098, so let's turn them on by default.	2025-03-07 16:05:31 +00:00
Alexey Kondratov	f5aa8c3eac	feat(compute_ctl): Add a basic HTTP API benchmark (#11123 ) ## Problem We just had a regression reported at https://neondb.slack.com/archives/C08EXUJF554/p1741102467515599, which clearly came with one of the releases. It's not a huge problem yet, but it's annoying that we cannot quickly attribute it to a specific commit. ## Summary of changes Add a very simple `compute_ctl` HTTP API benchmark that does 10k requests to `/status` and `metrics.json` and reports p50 and p99. --------- Co-authored-by: Peter Bendel <peterbendel@neon.tech>	2025-03-07 12:35:42 +00:00
Alex Chi Z.	2de3629b88	test(pageserver): use reldirv2 by default in regress tests (#11081 ) ## Problem For pg_regress test, we do both v1 and v2; for all the rest, we default to v2. part of https://github.com/neondatabase/neon/issues/9516 ## Summary of changes Use reldir v2 across test cases by default. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-03-05 21:02:44 +00:00
Peter Bendel	604eb5e8d4	fix grafana dashboard link for pooler endoints (#11099 ) ## Problem Our benchmarking workflows contain links to grafana dashboards to troubleshoot problems. This works fine for non-pooled endpoints. For pooled endpoints we need to remove the `-pooler` suffix from the endpoint's hostname to get a valid endpoint ID. Example link that doesn't work in this run https://github.com/neondatabase/neon/actions/runs/13678933253/job/38246028316#step:8:311 ## Summary of changes Check if connection string is a -pooler connection string and if so remove this suffix from the endpoint ID. --------- Co-authored-by: Alexander Bayandin <alexander@neon.tech>	2025-03-05 20:01:17 +00:00
Arpad Müller	2d45522fa6	storcon db: load safekeepers from DB again (#11087 ) Earlier PR #11041 soft-disabled the loading code for safekeepers from the storcon db. This PR makes us load the safekeepers from the database again, now that we have [JWTs available on staging](https://github.com/neondatabase/neon/pull/11087) and soon on prod. This reverts commit `23fb8053c5`. Part of https://github.com/neondatabase/cloud/issues/24727	2025-03-05 15:45:43 +00:00
Vlad Lazar	8c12ccf729	pageserver: gate previous heatmap behind config flag (#11088 ) ## Problem On unarchival, we update the previous heatmap with all visible layers. When the primary generates a new heatmap it includes all those layers, so the secondary will download them. Since they're not actually resident on the primary (we didn't call the warm up API), they'll never be evicted, so they remain in the heatmap. This leads to oversized secondary locations like we saw in pre-prod. ## Summary of changes Gate the loading of the previous heatmaps and the heatmap generation on unarchival behind configuration flags. They are disabled by default, but enabled in tests.	2025-03-05 12:20:18 +00:00
Erik Grinaker	65addfc524	storcon: add per-tenant rate limiting for API requests (#10924 ) ## Problem Incoming requests often take the service lock, and sometimes even do database transactions. That creates a risk that a rogue client can starve the controller of the ability to do its primary job of reconciling tenants to an available state. ## Summary of changes * Use the `governor` crate to rate limit tenant requests at 10 requests per second. This is ~10-100x lower than the worst "attack" we've seen from a client bug. Admin APIs are not rate limited. * Add a `storage_controller_http_request_rate_limited` histogram for rate limited requests. * Log a warning every 10 seconds for rate limited tenants. The rate limiter is parametrized on TenantId, because the kinds of client bug we're protecting against generally happen within tenant scope, and the rates should be somewhat stable: we expect the global rate of requests to increase as we do more work, but we do not expect the rate of requests to one tenant to increase. --------- Co-authored-by: John Spray <john@neon.tech>	2025-03-03 22:04:59 +00:00
John Spray	b953daa21f	safekeeper: allow remote deletion to proceed after dropped requests (#11042 ) ## Problem If a caller times out on safekeeper timeline deletion on a large timeline, and waits a while before retrying, the deletion will not progress while the retry is waiting. The net effect is very very slow deletion as it only proceeds in 30 second bursts across 5 minute idle periods. Related: https://github.com/neondatabase/neon/issues/10265 ## Summary of changes - Run remote deletion in a background task - Carry a watch::Receiver on the Timeline for other callers to join the wait - Restart deletion if the API is called again and the previous attempt failed	2025-03-03 16:03:51 +00:00
Peter Bendel	a07599949f	First version of a new benchmark to test larger OLTP workload (#11053 ) ## Problem We want to support larger tenants (regarding logical database size, number of transactions per second etc.) and should increase our test coverage of OLTP transactions at larger scale. ## Summary of changes Start a new benchmark that over time will add more OLTP tests at larger scale. This PR covers the first version and will be extended in further PRs. Also fix some infrastructure: - default for new connections and large tenants is to use connection pooler pgbouncer, however our fixture always added `statement_timeout=120` which is not compatible with pooler [see](https://neon.tech/docs/connect/connection-errors#unsupported-startup-parameter) - action to create branch timed out after 10 seconds and 10 retries but for large tenants it can take longer so use increasing back-off for retries ## Test run https://github.com/neondatabase/neon/actions/runs/13593446706	2025-03-03 15:25:48 +00:00
Arseny Sher	ef2b50994c	walproposer: basic infra to enable generations (#11002 ) ## Problem Preparation for https://github.com/neondatabase/neon/issues/10851 ## Summary of changes Add walproposer `safekeepers_generations` field which can be set by prefixing `neon.safekeepers` GUC with `g#n:`. Non zero value (n) forces walproposer to use generations. In particular, this also disables implicit timeline creation as timeline will be created by storcon. Add test checking this. Also add missing infra: `--safekeepers-generation` flag to neon_local endpoint start + fix `--start-timeout` flag: it existed but value wasn't used.	2025-03-03 13:20:20 +00:00
Vlad Lazar	23fb8053c5	storcon: soft disable SK heartbeats (#11041 ) ## Problem JWT tokens aren't in place, so all SK heartbeats fail. This is equivalent to a wait before applying the PS heartbeats and makes things more flaky. ## Summary of Changes Add a flag that skips loading SKs from the db on start-up and at runtime.	2025-02-28 15:49:09 +00:00
Conrad Ludgate	d9ced89ec0	feat(proxy): require TLS to compute if prompted by cplane (#10717 ) https://github.com/neondatabase/cloud/issues/23008 For TLS between proxy and compute, we are using an internally provisioned CA to sign the compute certificates. This change ensures that proxy will load them from a supplied env var pointing to the correct file - this file and env var will be configured later, using a kubernetes secret. Control plane responds with a `server_name` field if and only if the compute uses TLS. This server name is the name we use to validate the certificate. Control plane still sends us the IP to connect to as well (to support overlay IP). To support this change, I'd had to split `host` and `host_addr` into separate fields. Using `host_addr` and bypassing `lookup_addr` if possible (which is what happens in production). `host` then is only used for the TLS connection. There's no blocker to merging this. The code paths will not be triggered until the new control plane is deployed and the `enableTLS` compute flag is enabled on a project.	2025-02-28 14:20:25 +00:00
Vlad Lazar	0d6d58bd3e	pageserver: make heatmap layer download API more cplane friendly (#10957 ) ## Problem We intend for cplane to use the heatmap layer download API to warm up timelines after unarchival. It's tricky for them to recurse in the ancestors, and the current implementation doesn't work well when unarchiving a chain of branches and warming them up. ## Summary of changes * Add a `recurse` flag to the API. When the flag is set, the operation recurses into the parent timeline after the current one is done. * Be resilient to warming up a chain of unarchived branches. Let's say we unarchived `B` and `C` from the `A -> B -> C` branch hierarchy. `B` got unarchived first. We generated the unarchival heatmaps and stash them in `A` and `B`. When `C` unarchived, it dropped it's unarchival heatmap since `A` and `B` already had one. If `C` needed layers from `A` and `B`, it was out of luck. Now, when choosing whether to keep an unarchival heatmap we look at its end LSN. If it's more inclusive than what we currently have, keep it.	2025-02-28 10:36:53 +00:00
Christian Schwarz	e35f7758d8	impr(controller_upcall_client): clean up copy-pasta code & add context to retries (#10991 ) Before this PR, re-attach and validate would log the same warning ``` calling control plane generation validation API failed ``` on retry errors. This can be confusing. This PR makes the message generically valid for any upcall and adds additional tracing spans to capture context. Along the way, clean up some copy-pasta variable naming. refs - https://github.com/neondatabase/neon/issues/10381#issuecomment-2684755827 --------- Co-authored-by: Alexander Lakhin <alexander.lakhin@neon.tech>	2025-02-27 10:59:43 +00:00
Arseny Sher	643a48210f	safekeeper: exclude API (#10757 ) ## Problem https://github.com/neondatabase/neon/pull/10241 added configuration switch endpoint, but it didn't delete timeline if node was excluded. ## Summary of changes Add separate /exclude API endpoint which similarly accepts membership configuration where sk is supposed by be excluded. Implementation deletes the timeline locally. Some more small related tweaks: - make mconf switch API PUT instead of POST as it is idempotent; - return 409 if switch was refused instead of 200 with requested & current; - remove unused was_active flag from delete response; - remove meaningless _force suffix from delete functions names; - reuse timeline.rs delete_dir function in timelines_global_map instead of its own copy. part of https://github.com/neondatabase/neon/issues/9965	2025-02-26 19:26:33 +00:00
Arseny Sher	01581f3af5	safekeeper: drop json_ctrl (#10722 ) ## Problem json_ctrl.rs is an obsolete attempt to have tests with fine control of feeding messages into safekeeper superseded by desim framework. ## Summary of changes Drop it.	2025-02-26 13:32:37 +00:00
Vlad Lazar	34996416d6	pageserver: guard against WAL gaps in the interpreted protocol (#10858 ) ## Problem The interpreted SK <-> PS protocol does not guard against gaps (neither does the Vanilla one, but that's beside the point). ## Summary of changes Extend the protocol to include the start LSN of the PG WAL section from which the records were interpreted. Validation is enabled via a config flag on the pageserver and works as follows: Case 1: `raw_wal_start_lsn` is smaller than the requested LSN There can't be gaps here, but we check that the shard received records which it hasn't seen before. Case 2: `raw_wal_start_lsn` is equal to the requested LSN This is the happy case. No gap and nothing to check Case 3: `raw_wal_start_lsn` is greater than the requested LSN This is a gap. To make Case 3 work I had to bend the protocol a bit. We read record chunks of WAL which aren't record aligned and feed them to the decoder. The picture below shows a shard which subscribes at a position somewhere within Record 2. We already have a wal reader which is below that position so we wait to catch up. We read some wal in Read 1 (all of Record 1 and some of Record 2). The new shard doesn't need Record 1 (it has already processed it according to the starting position), but we read past it's starting position. When we do Read 2, we decode Record 2 and ship it off to the shard, but the starting position of Read 2 is greater than the starting position the shard requested. This looks like a gap. ![image](https://github.com/user-attachments/assets/8aed292e-5d62-46a3-9b01-fbf9dc25efe0) To make it work, we extend the protocol to send an empty `InterpretedWalRecords` to shards if the WAL the records originated from ends the requested start position. On the pageserver, that just updates the tracking LSNs in memory (no-op really). This gives us a workaround for the fake gap. As a drive by, make `InterpretedWalRecords::next_record_lsn` mandatory in the application level definition. It's always included. Related: https://github.com/neondatabase/cloud/issues/23935	2025-02-20 17:49:05 +00:00
Dmitrii Kovalkov	e808e9432a	storcon: use https for pageservers (#10759 ) ## Problem Storage controller uses unsecure http for pageserver API. Closes: https://github.com/neondatabase/cloud/issues/23734 Closes: https://github.com/neondatabase/cloud/issues/24091 ## Summary of changes - Add an optional `listen_https_port` field to storage controller's Node state and its API (RegisterNode/ListNodes/etc). - Allow updating `listen_https_port` on node registration to gradually add https port for all nodes. - Add `use_https_pageserver_api` CLI option to storage controller to enable https. - Pageserver doesn't support https for now and always reports `https_port=None`. This will be addressed in follow-up PR.	2025-02-20 17:16:04 +00:00
Peter Bendel	9d074db18d	Use link to cross-service-endpoint dashboard in allure reports and benchmarking workflow logs (#10874 ) ## Problem We have links to deprecated dashboards in our logs Example https://github.com/neondatabase/neon/actions/runs/13382454571/job/37401983608#step:8:348 ## Summary of changes Use link to cross service endpoint instead. Example: https://github.com/neondatabase/neon/actions/runs/13395407925/job/37413056148#step:7:345	2025-02-18 19:54:21 +00:00
Vlad Lazar	1a69a8cba7	storage: add APIs for warming up location after cold migrations (#10788 ) ## Problem We lack an API for warming up attached locations based on the heatmap contents. This is problematic in two places: 1. If we manually migrate and cut over while the secondary is still cold 2. When we re-attach a previously offloaded tenant ## Summary of changes https://github.com/neondatabase/neon/pull/10597 made heatmap generation additive across migrations, so we won't clobber it a after a cold migration. This allows us to implement: 1. An endpoint for downloading all missing heatmap layers on the pageserver: `/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`. Only one such operation per timeline is allowed at any given time. The granularity is tenant shard. 2. An endpoint to the storage controller to trigger the downloads on the pageserver: `/v1/tenant/:tenant_shard_id/timeline/:timeline_id/download_heatmap_layers`. This works both at tenant and tenant shard level. If an unsharded tenant id is provided, the operation is started on all shards, otherwise only the specified shard. 3. A storcon cli command. Again, tenant and tenant-shard level granularities are supported. Cplane will call into storcon and trigger the downloads for all shards. When we want to rescue a migration, we will use storcon cli targeting the specific tenant shard. Related: https://github.com/neondatabase/neon/issues/10541	2025-02-18 16:09:06 +00:00
Alexander Bayandin	274cb13293	test_runner: fix mismatch versions tests on linux (#10869 ) ## Problem Tests with mixed-version binaries always use the latest binaries on CI ([an example](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10848/13378137061/index.html#suites/8fc5d1648d2225380766afde7c428d81/1ccefc4cfd4ef176/)): The versions of new `storage_broker` and old `pageserver` are the same: `b45254a5605f6fdafdf475cdd3e920fe00898543`. This affects only Linux, on macOS the version mixed correctly. ## Summary of changes - Use hardlinks instead of symlinks to create a directory with mixed-version binaries	2025-02-18 15:52:00 +00:00
Alexander Lakhin	f81259967d	Add test to make sure sanitizers really work when expected (#10838 )	2025-02-18 13:23:18 +00:00

1 2 3 4 5 ...

930 Commits