rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-03 12:10:36 +00:00

Author	SHA1	Message	Date
John Spray	73c6626b38	pageserver: stabilize & refine controller scale test (#8971 ) ## Problem We were seeing timeouts on migrations in this test. The test unfortunately tends to saturate local storage, which is shared between the pageservers and the control plane database, which makes the test kind of unrealistic. We will also want to increase the scale of this test, so it's worth fixing that. ## Summary of changes - Instead of randomly creating timelines at the same time as the other background operations, explicitly identify a subset of tenant which will have timelines, and create them at the start. This avoids pageservers putting a lot of load on the test node during the main body of the test. - Adjust the tenants created to create some number of 8 shard tenants and the rest 1 shard tenants, instead of just creating a lot of 2 shard tenants. - Use archival_config to exercise tenant-mutating operations, instead of using timeline creation for this. - Adjust reconcile_until_idle calls to avoid waiting 5 seconds between calls, which causes timelines with large shard count tenants. - Fix a pageserver bug where calls to archival_config during activation get 404	2024-10-15 09:31:18 +01:00
John Spray	426b1c5f08	storage controller: use 'infra' JWT scope for node registration (#9343 ) ## Problem Storage controller `/control` API mostly requires admin tokens, for interactive use by engineers. But for endpoints used by scripts, we should not require admin tokens. Discussion at https://neondb.slack.com/archives/C033RQ5SPDH/p1728550081788989?thread_ts=1728548232.265019&cid=C033RQ5SPDH ## Summary of changes - Introduce the 'infra' JWT scope, which was not previously used in the neon repo - For pageserver & safekeeper node registrations, require infra scope instead of admin Note that admin will still work, as the controller auth checks permit admin tokens for all endpoints irrespective of what scope they require.	2024-10-10 12:26:43 +01:00
Yuchen Liang	bee04b8a69	pageserver: add direct io config to virtual file (#9214 ) ## Problem We need a way to incrementally switch to direct IO. During the rollout we might want to switch to O_DIRECT on image and delta layer read path first before others. ## Summary of changes - Revisited and simplified direct io config in `PageserverConf`. - We could add a fallback mode for open, but for read there isn't a reasonable alternative (without creating another buffered virtual file). - Added a wrapper around `VirtualFile`, current implementation become `VirtualFileInner` - Use `open_v2`, `create_v2`, `open_with_options_v2` when we want to use the IO mode specified in PS config. - Once we onboard all IO through VirtualFile using this new API, we will delete the old code path. - Make io mode live configurable for benchmarking. - Only guaranteed for files opened after the config change, so do it before the experiment. As an example, we are using `open_v2` with `virtual_file::IoMode::Direct` in https://github.com/neondatabase/neon/pull/9169 We also remove `io_buffer_alignment` config in `a04cfd754b` and use it as a compile time constant. This way we don't have to carry the alignment around or make frequent call to retrieve this information from the static variable. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-09 08:33:07 -04:00
Anastasia Lubennikova	63e7fab990	Add /installed_extensions endpoint to collect statistics about extension usage. (#8917 ) Add /installed_extensions endpoint to collect statistics about extension usage. It returns a list of installed extensions in the format: ```json { "extensions": [ { "extname": "extension_name", "versions": ["1.0", "1.1"], "n_databases": 5, } ] } ``` --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-10-09 13:32:13 +01:00
Erik Grinaker	211970f0e0	remote_storage: add `DownloadOpts::byte_(start\|end)` (#9293 ) `download_byte_range()` is basically a copy of `download()` with an additional option passed to the backend SDKs. This can cause these code paths to diverge, and prevents combining various options. This patch adds `DownloadOpts::byte_(start\|end)` and move byte range handling into `download()`.	2024-10-09 10:29:06 +01:00
Em Sharnoff	cc29def544	vm-monitor: Ignore LFC in postgres cgroup memory threshold (#8668 ) In short: Currently we reserve 75% of memory to the LFC, meaning that if we scale up to keep postgres using less than 25% of the compute's memory. This means that for certain memory-heavy workloads, we end up scaling much higher than is actually needed — in the worst case, up to 4x, although in practice it tends not to be quite so bad. Part of neondatabase/autoscaling#1030.	2024-10-07 21:25:34 +01:00
Tristan Partin	6eba29c732	Improve logging on changes in a compute's status I'm trying to debug a situation with the LR benchmark publisher not being in the correct state. This should aid in debugging, while just being generally useful. PR: https://github.com/neondatabase/neon/pull/9265 Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-10-07 13:19:48 -04:00
Conrad Ludgate	6c05f89f7d	proxy: add local-proxy to compute image (#8823 ) 1. Adds local-proxy to compute image and vm spec 2. Updates local-proxy config processing, writing PID to a file eagerly 3. Updates compute-ctl to understand local proxy compute spec and to send SIGHUP to local-proxy over that pid. closes https://github.com/neondatabase/cloud/issues/16867	2024-10-04 14:52:01 +00:00
Erik Grinaker	04a6222418	remote_storage: add `head_object` integration test (#9274 )	2024-10-04 12:40:41 +01:00
Erik Grinaker	37158d0424	pageserver: use conditional GET for secondary tenant heatmaps (#9236 ) ## Problem Secondary tenant heatmaps were always downloaded, even when they hadn't changed. This can be avoided by using a conditional GET request passing the `ETag` of the previous heatmap. ## Summary of changes The `ETag` was already plumbed down into the heatmap downloader, and just needed further plumbing into the remote storage backends. * Add a `DownloadOpts` struct and pass it to `RemoteStorage::download()`. * Add an optional `DownloadOpts::etag` field, which uses a conditional GET and returns `DownloadError::Unmodified` on match.	2024-10-04 12:29:48 +02:00
Vlad Lazar	552fa2b972	pageserver: tweak oversized key read path warning (#9221 ) ## Problem `Oversized vectored read [...]` logs are spewing in prod because we have a few keys that are unexpectedly large: * reldir/relblock - these are unbounded, so it's known technical debt * slru block - they can be a bit bigger than 128KiB due to storage format overhead ## Summary of changes * Bump threshold to 130KiB * Don't warn on oversized reldir and dbdir keys Closes https://github.com/neondatabase/neon/issues/8967	2024-10-03 16:40:35 +01:00
Arpad Müller	9d93dd4807	Rename hyper 1.0 to hyper and hyper 0.14 to hyper0 (#9254 ) Follow-up of #9234 to give hyper 1.0 the version-free name, and the legacy version of hyper the one with the version number inside. As we move away from hyper 0.14, we can remove the `hyper0` name piece by piece. Part of #9255	2024-10-03 16:33:43 +02:00
Heikki Linnakangas	53b6e1a01c	vm-monitor: Upgrade axum from 0.6 to 0.7 (#9257 ) Because: - it's nice to be up-to-date, - we already had axum 0.7 in our dependency tree, so this avoids having to compile two versions, and - removes one of the remaining dpendencies to hyper version 0 Also bumps the 'tokio-tungstenite' dependency, to avoid having two versions in the dependency tree.	2024-10-03 16:49:39 +03:00
Folke Behrens	2e508b1ff9	Upgrade OpenTelemetry and other tracing crates (#9200 ) * tracing-utils now returns a `Layer` impl. Removes the need for crates to import OTel crates. * Drop the /v1/traces URI check. Verified that the code does the right thing. * Leave a TODO to hook in an error handler for OTel to log errors to when it assumes the regular pipeline cannot be used/is broken.	2024-10-01 11:02:54 +02:00
Arthur Petukhovsky	ba498a630a	Set disk quotas on bind in compute_ctl (#8936 ) Part of https://github.com/neondatabase/cloud/issues/13127. Resolves #9153 What changed in this PR: 1. Adds `ComputeSpec.disk_quota_bytes: Option<u64>` 2. Adds new arg to compute_ctl: `--set-disk-quota-for-fs <mountpoint>` 3. Implements running `/neonvm/bin/set-disk-quota` with the right value if both cmdline arg AND field in the spec are specified 4. Patches `/etc/sudoers.d` to allow `compute_ctl` to set quota with sudo This PR is very similar to the swap support added earlier, you can take a look at it as prior art: #7434 In theory, it can be implemented outside of compute_ctl when we will have a separate neonvm daemon, but we are not there yet. Current implementation is the simplest possible to unblock computes with larger disks. All code related to usage of `/neonvm/bin/set-disk-quota` is located in `disk_quota.rs`. We need to call this script with the following arguments: `/neonvm/bin/set-disk-quota {size_kb} {mountpoint}`. Quotas are set on the filesystem level, so we need to provide path to the directory that filesystem was mounted to. I tested this change locally with https://github.com/neondatabase/cloud/pull/17270. It should be safe to merge, because this feature is gated by both cmdline arg and field in the spec. If control-plane doesn't set values in both places, compute_ctl won't be affected by this change.	2024-09-27 20:52:22 +01:00
Tristan Partin	fc962c9605	Use long options when calling initdb Verbosity in this case is good when reading the code. Short options are better when operating in an interactive shell. Signed-off-by: Tristan Partin <tristan@neon.tech>	2024-09-27 08:22:16 -05:00
Conrad Ludgate	ec07a1ecc9	proxy: make local-proxy config by signal with PID, refine JWKS apis with role caching (#9164 )	2024-09-26 19:01:48 +01:00
Alex Chi Z.	42e19e952f	fix(pageserver): categorize client error in basebackup metrics (#9110 ) We separated client error from basebackup error log lines in https://github.com/neondatabase/neon/pull/7523, but we didn't do anything for the metrics. In this patch, we fixed it. ref https://github.com/neondatabase/neon/issues/8970 ## Summary of changes We use the same criteria as in `log_query_error` producing an info line (instead of error) for the metrics. We added a `client_error` category for the basebackup query time metrics. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-26 11:38:19 -04:00
Vlad Lazar	2cf47b1477	storcon: do az aware scheduling (#9083 ) ## Problem Storage controller didn't previously consider AZ locality between compute and pageservers when scheduling nodes. Control plane has this feature, and, since we are migrating tenants away from it, we need feature parity to avoid perf degradations. ## Summary of changes The change itself is fairly simple: 1. Thread az info into the scheduler 2. Add an extra member to the scheduling scores Step (2) deserves some more discussion. Let's break it down by the shard type being scheduled: Attached Shards We wish for attached shards of a tenant to end up in the preferred AZ of the tenant since that is where the compute is like to be. The AZ member for `NodeAttachmentSchedulingScore` has been placed below the affinity score (so it's got the second biggest weight for picking the node). The rationale for going below the affinity score is to avoid having all shards of a single tenant placed on the same node in 2 node regions, since that would mean that one tenant can drive the general workload of an entire pageserver. I'm not 100% sure this is the right decision, so open to discussing hoisting the AZ up to first place. Secondary Shards We wish for secondary shards of a tenant to be scheduled in a different AZ from the preferred one for HA purposes. The AZ member for `NodeSecondarySchedulingScore` has been placed first, so nodes in different AZs from the preferred one will always be considered first. On small clusters, this can mean that all the secondaries of a tenant are scheduled to the same pageserver, but secondaries don't use up as many resources as the attached location, so IMO the argument made for attached shards doesn't hold. Related: https://github.com/neondatabase/neon/issues/8848	2024-09-25 14:31:04 +01:00
Folke Behrens	7dcfcccf7c	Re-export git-version from utils and remove as direct dep (#9138 )	2024-09-25 14:38:35 +02:00
Heikki Linnakangas	5cbf5b45ae	Remove TenantState::Loading (#9118 ) The last real use was removed in commit `de90bf4663`. It was still used in a few unit tests, but they can use Attaching too.	2024-09-24 20:58:54 +00:00
Christian Schwarz	4d5add9ca0	compact_level0_phase1: remove final traces of value access mode config (#8935 ) refs https://github.com/neondatabase/neon/issues/8184 stacked atop https://github.com/neondatabase/neon/pull/8934 This PR changes from ignoring the config field to rejecting configs that contain it. PR https://github.com/neondatabase/infra/pull/1903 removes the field usage from `pageserver.toml`. It rolls into prod sooner or in the same release as this PR.	2024-09-23 15:05:22 +02:00
Conrad Ludgate	e675a21346	utils: leaky bucket should only report throttled if the notify queue is blocked on sleep (#9072 ) ## Problem Seems that PS might be too eager in reporting throttled tasks ## Summary of changes Introduce a sleep counter. If the sleep counter increases, then the acquire tasks was throttled.	2024-09-20 16:09:39 +01:00
Alex Chi Z.	6b93230270	fix(pageserver): receive body error now 500 (#9052 ) close https://github.com/neondatabase/neon/issues/8903 In https://github.com/neondatabase/neon/issues/8903 we observed JSON decoding error to have the following error message in the log: ``` Error processing HTTP request: Resource temporarily unavailable: 3956 (pageserver-6.ap-southeast-1.aws.neon.tech) error receiving body: error decoding response body ``` This is hard to understand. In this patch, we make the error message more reasonable. ## Summary of changes * receive body error is now an internal server error, passthrough the `reqwest::Error` (only decoding error) as `anyhow::Error`. * instead of formatting the error using `to_string`, we use the alternative `anyhow::Error` formatting, so that it prints out the cause of the error (i.e., what exactly cannot serde decode). I would expect seeing something like `error receiving body: error decoding response body: XXX field not found` after this patch, though I didn't set up a testing environment to observe the exact behavior. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-20 10:37:28 -04:00
Arseny Sher	32a0e759bd	safekeeper: add wal_last_modified to debug_dump. Adds to debug_dump option to include highest modified time among all WAL segments. In passing replace some str with OsStr to have less unwraps.	2024-09-19 16:17:25 +03:00
Heikki Linnakangas	7c489092b7	Remove unused duplicate DEFAULT_INGEST_BATCH_SIZE constant This constant in 'tenant_conf_defaults' was unused, but there's another constant with the same name in the global 'defaults'. I wish the setting was configurable per-tenant, but it isn't, so let's remove the confusing duplicate.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	06d55a3b12	Clean up concurrent logical size calc semaphore initialization The DEFAULT_CONCURRENT_TENANT_SIZE_LOGICAL_SIZE_QUERIES constant was unused, because we had just hardcoded it to 1 where the constant should've been used. Remove the ConfigurableSemaphore::Default implementation, since it was unused.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	5c68e6a172	Remove unused constant The code that used it was removed in commit `b9d2c7bdd5`	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	2753abc0d8	Remove leftover enums for configuring vectored get implementation The settings were removed in commit corb9d2c7b.	2024-09-19 15:41:35 +03:00
Heikki Linnakangas	7b34c2d7af	Remove misc dead code in libs/	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	15ae1fc3df	Remove a few postgres constants that were not used Dead code is generally useless, but with Postgres constants in particular, I'm also worried that if they're not used anywhere, we might fail to update them at a Postgres version update, and get very confused later when they have wrong values.	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	728b79b9dd	Remove some unnecessary derives	2024-09-19 11:57:10 +03:00
Heikki Linnakangas	d211f00f05	Remove unnecessary dependencies (#9000 ) Found by "cargo machete"	2024-09-17 17:55:45 +03:00
Heikki Linnakangas	2bbb4d3e1c	Remove misc unused code (#9014 )	2024-09-16 18:45:19 +00:00
Matthias van de Meent	78938d1b59	[compute/postgres] feature: PostgreSQL 17 (#8573 ) This adds preliminary PG17 support to Neon, based on RC1 / 2024-09-04 `07b828e9d4` NOTICE: The data produced by the included version of the PostgreSQL fork may not be compatible with the future full release of PostgreSQL 17 due to expected or unexpected future changes in magic numbers and internals. DO NOT EXPECT DATA IN V17-TENANTS TO BE COMPATIBLE WITH THE 17.0 RELEASE! Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-09-12 23:18:41 +01:00
John Spray	cb060548fb	libs: tweak PageserverUtilization::is_overloaded (#8946 ) ## Problem Having run in production for a while, we see that nodes are generally safely oversubscribed by about a factor of 2. ## Summary of changes Tweak the is_overloaded method to check for utililzation over 200% rather than over 100%	2024-09-11 18:45:34 +01:00
Arpad Müller	97582178cb	Remove async_trait from the Handler trait (#8958 ) Newest attempt to remove `async_trait` from the Handler trait. Earlier attempts were in #7301 and #8296 .	2024-09-10 02:40:00 +02:00
Matthias van de Meent	842be0ba74	Specialize WalIngest on PostgreSQL version (#8904 ) The current code assumes that most of this functionality is version-independent, which is only true up to v16 - PostgreSQL 17 has a new field in CheckPoint that we need to keep track of. This basically removes the file-level dependency on v14, and replaces it with switches that load the correct version dependencies where required.	2024-09-09 23:01:52 +01:00
Alex Chi Z.	e158df4e86	feat(pageserver): split delta writer automatically determines key range (#8850 ) close https://github.com/neondatabase/neon/issues/8838 ## Summary of changes This patch modifies the split delta layer writer to avoid taking start_key and end_key when creating/finishing the layer writer. The start_key for the delta layers will be the first key provided to the layer writer, and the end_key would be the `last_key.next()`. This simplifies the delta layer writer API. On that, the layer key hack is removed. Image layers now use the full key range, and delta layers use the first/last key provided by the user. --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-09 22:03:27 +01:00
Heikki Linnakangas	2d885ac07a	Update strum (#8962 ) I wanted to use some features from the newer version. The PR that needed the new version is not ready yet (and might never be), but seems nice to stay up in any case.	2024-09-08 21:47:57 +03:00
Heikki Linnakangas	89c5e80b3f	Update toml and toml_edit crates (#8963 ) Eliminates a few duplicate versions from the dependency tree.	2024-09-08 21:47:23 +03:00
Alexander Bayandin	7d7d1f354b	Fix rust warnings on macOS (#8955 ) ## Problem ``` error: unused import: `anyhow::Context` --> libs/utils/src/crashsafe.rs:8:5 \| 8 \| use anyhow::Context; \| ^^^^^^^^^^^^^^^ \| = note: `-D unused-imports` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_imports)]` error: unused variable: `fd` --> libs/utils/src/crashsafe.rs:209:15 \| 209 \| pub fn syncfs(fd: impl AsRawFd) -> anyhow::Result<()> { \| ^^ help: if this is intentional, prefix it with an underscore: `_fd` \| = note: `-D unused-variables` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unused_variables)]` ``` ## Summary of changes - Fix rust warnings on macOS	2024-09-07 08:17:25 +01:00
Arpad Müller	fa3fc73c1b	Address 1.82 clippy lints (#8944 ) Addresses the clippy lints of the beta 1.82 toolchain. The `too_long_first_doc_paragraph` lint complained a lot and was addressed separately: #8941	2024-09-06 21:05:18 +02:00
Alex Chi Z.	ac5815b594	feat(storage-controller): add node shards api (#8896 ) For control-plane managed tenants, we have the page in the admin console that lists all tenants on a specific pageserver. But for storage-controller managed ones, we don't have that functionality for now. ## Summary of changes Adds an API that lists all shards on a given node (intention + observed) --------- Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-09-06 14:14:21 -04:00
Arseny Sher	c1a51416db	safekeeper: fsync filesystem on start. We can't really rely on files contents after boot without fsync'ing them.	2024-09-06 19:14:25 +03:00
Arseny Sher	11cf16e3f3	safekeeper: add term_bump endpoint. When walproposer observes now higher term it restarts instead of crashing whole compute with PANIC; this avoids compute crash after term_bump call. After successfull election we're still checking last_log_term of the highest given vote to ensure basebackup is good, and PANIC otherwise. It will be used for migration per 035-safekeeper-dynamic-membership-change.md and https://github.com/neondatabase/docs/pull/21 ref https://github.com/neondatabase/neon/issues/8700	2024-09-06 19:13:50 +03:00
Arpad Müller	cbcd4058ed	Fix 1.82 clippy lint too_long_first_doc_paragraph (#8941 ) Addresses the 1.82 beta clippy lint `too_long_first_doc_paragraph` by adding newlines to the first sentence if it is short enough, and making a short first sentence if there is the need.	2024-09-06 14:33:52 +02:00
Vlad Lazar	e86fef05dd	storcon: track preferred AZ for each tenant shard (#8937 ) ## Problem We want to do AZ aware scheduling, but don't have enough metadata. ## Summary of changes Introduce a `preferred_az_id` concept for each managed tenant shard. In a future PR, the scheduler will use this as a soft preference. The idea is to try and keep the shard attachments within the same AZ. Under the assumption that the compute was placed in the correct AZ, this reduces the chances of cross AZ trafic from between compute and PS. In terms of code changes we: 1. Add a new nullable `preferred_az_id` column to the `tenant_shards` table. Also include an in-memory counterpart. 2. Populate the preferred az on tenant creation and shard splits. 3. Add an endpoint which allows to bulk-set preferred AZs. (3) gives us the migration path. I'll write a script which queries the cplane db in the region and sets the preferred az of all shards with an active compute to the AZ of said compute. For shards without an active compute, I'll use the AZ of the currently attached pageserver since this is what cplane uses now to schedule computes.	2024-09-06 13:11:17 +01:00
Arpad Müller	a1323231bc	Update Rust to 1.81.0 (#8939 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1810-2024-09-05). Prior update was in #8667 and #8518	2024-09-06 12:40:19 +02:00
Christian Schwarz	06e840b884	compact_level0_phase1: ignore access mode config, always do streaming-kmerge without validation (#8934 ) refs https://github.com/neondatabase/neon/issues/8184 PR https://github.com/neondatabase/infra/pull/1905 enabled streaming-kmerge without validation everywhere. It rolls into prod sooner or in the same release as this PR.	2024-09-06 10:58:48 +02:00

1 2 3 4 5 ...

782 Commits