rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-10 06:52:55 +00:00

Author	SHA1	Message	Date
Vlad Lazar	ffeede085e	libs: move metric collection for pageserver and safekeeper in a background task (#12525 ) ## Problem Safekeeper and pageserver metrics collection might time out. We've seen this in both hadron and neon. ## Summary of changes This PR moves metrics collection in PS/SK to the background so that we will always get some metrics, despite there may be some delays. Will leave it to the future work to reduce metrics collection time. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com>	2025-07-10 11:58:22 +00:00
HaoyuHuang	4db934407a	SK changes #1 (#12448 ) ## TLDR This PR is a no-op. The changes are disabled by default. ## Problem I. Currently we don't have a way to detect disk I/O failures from WAL operations. II. We observe that the offloader fails to upload a segment due to race conditions on XLOG SWITCH and PG start streaming WALs. wal_backup task continously failing to upload a full segment while the segment remains partial on the disk. The consequence is that commit_lsn for all SKs move forward but backup_lsn stays the same. Then, all SKs run out of disk space. III. We have discovered SK bugs where the WAL offload owner cannot keep up with WAL backup/upload to S3, which results in an unbounded accumulation of WAL segment files on the Safekeeper's disk until the disk becomes full. This is a somewhat dangerous operation that is hard to recover from because the Safekeeper cannot write its control files when it is out of disk space. There are actually 2 problems here: 1. A single problematic timeline can take over the entire disk for the SK 2. Once out of disk, it's difficult to recover SK IV. Neon reports certain storage errors as "critical" errors using a marco, which will increment a counter/metric that can be used to raise alerts. However, this metric isn't sliced by tenant and/or timeline today. We need the tenant/timeline dimension to better respond to incidents and for blast radius analysis. ## Summary of changes I. The PR adds a `safekeeper_wal_disk_io_errors ` which is incremented when SK fails to create or flush WALs. II. To mitigate this issue, we will re-elect a new offloader if the current offloader is lagging behind too much. Each SK makes the decision locally but they are aware of each other's commit and backup lsns. The new algorithm is - determine_offloader will pick a SK. say SK-1. - Each SK checks -- if commit_lsn - back_lsn > threshold, -- -- remove SK-1 from the candidate and call determine_offloader again. SK-1 will step down and all SKs will elect the same leader again. After the backup is caught up, the leader will become SK-1 again. This also helps when SK-1 is slow to backup. I'll set the reelect backup lag to 4 GB later. Setting to 128 MB in dev to trigger the code more frequently. III. This change addresses problem no. 1 by having the Safekeeper perform a timeline disk utilization check check when processing WAL proposal messages from Postgres/compute. The Safekeeper now rejects the WAL proposal message, effectively stops writing more WAL for the timeline to disk, if the existing WAL files for the timeline on the SK disk exceeds a certain size (the default threshold is 100GB). The disk utilization is calculated based on a `last_removed_segno` variable tracked by the background task removing WAL files, which produces an accurate and conservative estimate (>= than actual disk usage) of the actual disk usage. IV. * Add a new metric `hadron_critical_storage_event_count` that has the `tenant_shard_id` and `timeline_id` as dimensions. * Modified the `crtitical!` marco to include tenant_id and timeline_id as additional arguments and adapted existing call sites to populate the tenant shard and timeline ID fields. The `critical!` marco invocation now increments the `hadron_critical_storage_event_count` with the extra dimensions. (In SK there isn't the notion of a tenant-shard, so just the tenant ID is recorded in lieu of tenant shard ID.) I considered adding a separate marco to avoid merge conflicts, but I think in this case (detecting critical errors) conflicts are probably more desirable so that we can be aware whenever Neon adds another `critical!` invocation in their code. --------- Co-authored-by: Chen Luo <chen.luo@databricks.com> Co-authored-by: Haoyu Huang <haoyu.huang@databricks.com> Co-authored-by: William Huang <william.huang@databricks.com>	2025-07-03 14:32:53 +00:00
Evan Fleming	aa22572d8c	safekeeper: refactor static remote storage usage to use Arc (#10179 ) Greetings! Please add `w=1` to github url when viewing diff (sepcifically `wal_backup.rs`) ## Problem This PR is aimed at addressing the remaining work of #8200. Namely, removing static usage of remote storage in favour of arc. I did not opt to pass `Arc<RemoteStorage>` directly since it is actually `Optional<RemoteStorage>` as it is not necessarily always configured. I wanted to avoid having to pass `Arc<Optional<RemoteStorage>>` everywhere with individual consuming functions likely needing to handle unwrapping. Instead I've added a `WalBackup` struct that holds `Optional<RemoteStorage>` and handles initialization/unwrapping RemoteStorage internally. wal_backup functions now take self and `Arc<WalBackup>` is passed as a dependency through the various consumers that need it. ## Summary of changes - Add `WalBackup` that holds `Optional<RemoteStorage>` and handles initialization and unwrapping - Modify wal_backup functions to take `WalBackup` as self (Add `w=1` to github url when viewing diff here) - Initialize `WalBackup` in safekeeper root - Store `Arc<WalBackup>` in `GlobalTimelineMap` and pass and store in each Timeline as loaded - use `WalBackup` through Timeline as needed ## Refs - task to remove global variables https://github.com/neondatabase/neon/issues/8200 - drive-by fixes https://github.com/neondatabase/neon/issues/11501 by turning the panic reported there into an error `remote storage not configured` --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2025-05-16 12:41:10 +00:00
devin-ai-integration[bot]	84bc3380cc	Remove SAFEKEEPER_AUTH_TOKEN env var parsing from safekeeper (#11698 ) # Remove SAFEKEEPER_AUTH_TOKEN env var parsing from safekeeper This PR is a follow-up to #11443 that removes the parsing of the `SAFEKEEPER_AUTH_TOKEN` environment variable from the safekeeper codebase while keeping the `auth_token_path` CLI flag functionality. ## Changes: - Removed code that checks for the `SAFEKEEPER_AUTH_TOKEN` environment variable - Updated comments to reflect that only the `auth_token_path` CLI flag is now used As mentioned in PR #11443, the environment variable approach was planned to be deprecated and removed in favor of the file-based approach, which is more secure since environment variables can be quite public in both procfs and unit files. Link to Devin run: https://app.devin.ai/sessions/d6f56cf1b4164ea9880a9a06358a58ac Requested by: arpad@neon.tech --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: arpad@neon.tech <arpad@neon.tech> Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>	2025-04-28 15:34:47 +00:00
devin-ai-integration[bot]	1808dad269	Add --dev CLI flag to pageserver and safekeeper binaries (#11526 ) # Add --dev CLI flag to pageserver and safekeeper binaries This PR adds the `--dev` CLI flag to both the pageserver and safekeeper binaries without implementing any functionality yet. This is a precursor to PR #11517, which will implement the full functionality to require authentication by default unless the `--dev` flag is specified. ## Changes - Add `dev_mode` config field to pageserver binary - Add `--dev` CLI flag to safekeeper binary This PR is needed for forward compatibility tests to work properly, when we try to merge #11517 Link to Devin run: https://app.devin.ai/sessions/ad8231b4e2be430398072b6fc4e85d46 Requested by: John Spray (john@neon.tech) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: John Spray <john@neon.tech>	2025-04-24 10:45:40 +00:00
Dmitrii Kovalkov	6173c0f44c	safekeeper: add enable_tls_wal_service_api (#11520 ) ## Problem Safekeeper doesn't use TLS in wal service - Closes: https://github.com/neondatabase/cloud/issues/27302 ## Summary of changes - Add `enable_tls_wal_service_api` option to safekeeper's cmd arguments - Propagate `tls_server_config` to `wal_service` if the option is enabled - Create `BACKGROUND_RUNTIME` for small background tasks and offload SSL certificate reloader to it. No integration tests for now because support from compute side is required: https://github.com/neondatabase/cloud/issues/25823	2025-04-22 13:19:03 +00:00
Dmitrii Kovalkov	a0d844dfed	pageserver + safekeeper: pass ssl ca certs to broker client (#11635 ) ## Problem Pageservers and safakeepers do not pass CA certificates to broker client, so the client do not trust locally issued certificates. - Part of https://github.com/neondatabase/cloud/issues/27492 ## Summary of changes - Change `ssl_ca_certs` type in PS/SK's config to `Pem` which may be converted to both `reqwest` and `tonic` certificates. - Pass CA certificates to storage broker client in PS and SK	2025-04-18 06:27:23 +00:00
Arpad Müller	486872dd28	Add support to specify auth token via --auth-token-path (#11443 ) Before we specified the JWT via `SAFEKEEPER_AUTH_TOKEN`, but env vars are quite public, both in procfs as well as the unit files. So add a way to put the auth token into a file directly. context: https://neondb.slack.com/archives/C033RQ5SPDH/p1743692566311099	2025-04-07 16:12:04 +00:00
Dmitrii Kovalkov	181af302b5	storcon + safekeeper + scrubber: propagate root CA certs everywhere (#11418 ) ## Problem There are some places in the code where we create `reqwest::Client` without providing SSL CA certs from `ssl_ca_file`. These will break after we enable TLS everywhere. - Part of https://github.com/neondatabase/cloud/issues/22686 ## Summary of changes - Support `ssl_ca_file` in storage scrubber. - Add `use_https_safekeeper_api` option to safekeeper to use https for peer requests. - Propagate SSL CA certs to storage_controller/client, storcon's ComputeHook, PeerClient and maybe_forward.	2025-04-04 06:30:48 +00:00
Dmitrii Kovalkov	aeb53fea94	storage: support multiple SSL CA certificates (#11341 ) ## Problem - We need to support multiple SSL CA certificates for graceful root CA certificate rotation. - Closes: https://github.com/neondatabase/cloud/issues/25971 ## Summary of changes - Parses `ssl_ca_file` as a pem bundle, which may contain multiple certificates. Single pem cert is a valid pem bundle, so the change is backward compatible.	2025-03-21 13:43:38 +00:00
Dmitrii Kovalkov	28fc051dcc	storage: live ssl certificate reload (#11309 ) ## Problem SSL certs are loaded only during start up. It doesn't allow the rotation of short-lived certificates without server restart. - Closes: https://github.com/neondatabase/cloud/issues/25525 ## Summary of changes - Implement `ReloadingCertificateResolver` which reloads certificates from disk periodically.	2025-03-20 16:26:27 +00:00
Dmitrii Kovalkov	f68be2b5e2	safekeeper: https for management API (#11171 ) ## Problem Storage controller uses unencrypted HTTP requests for safekeeper management API. - Closes: https://github.com/neondatabase/cloud/issues/24836 ## Summary of changes - Replace `hyper0::server::Server` with `http_utils::server::Server` in safekeeper. - Add HTTPS handler for safekeeper management API.	2025-03-14 11:41:22 +00:00
Arpad Müller	920040e402	Update storage components to edition 2024 (#10919 ) Updates storage components to edition 2024. We like to stay on the latest edition if possible. There is no functional changes, however some code changes had to be done to accommodate the edition's breaking changes. The PR has two commits: * the first commit updates storage crates to edition 2024 and appeases `cargo clippy` by changing code. i have accidentially ran the formatter on some files that had other edits. * the second commit performs a `cargo fmt` I would recommend a closer review of the first commit and a less close review of the second one (as it just runs `cargo fmt`). part of https://github.com/neondatabase/neon/issues/10918	2025-02-25 23:51:37 +00:00
Vlad Lazar	dbebede7bf	safekeeper: fan out from single wal reader to multiple shards (#10190 ) ## Problem Safekeepers currently decode and interpret WAL for each shard separately. This is wasteful in terms of CPU memory usage - we've seen this in profiles. ## Summary of changes Fan-out interpreted WAL to multiple shards. The basic is that wal decoding and interpretation happens in a separate tokio task and senders attach to it. Senders only receive batches concerning their shard and only past the Lsn they've last seen. Fan-out is gated behind the `wal_reader_fanout` safekeeper flag (disabled by default for now). When fan-out is enabled, it might be desirable to control the absolute delta between the current position and a new shard's desired position (i.e. how far behind or ahead a shard may be). `max_delta_for_fanout` is a new optional safekeeper flag which dictates whether to create a new WAL reader or attach to the existing one. By default, this behaviour is disabled. Let's consider enabling it if we spot the need for it in the field. ## Testing Tests passed [here](https://github.com/neondatabase/neon/pull/10301) with wal reader fanout enabled as of `34f6a71718`. Related: https://github.com/neondatabase/neon/issues/9337 Epic: https://github.com/neondatabase/neon/issues/9329	2025-01-15 15:33:54 +00:00
Erik Grinaker	cd982a82ec	pageserver,safekeeper: increase heap profiling frequency to 2 MB (#10362 ) ## Problem Currently, the heap profiling frequency is every 1 MB allocated. Taking a profile stack trace takes about 1 µs, and allocating 1 MB takes about 15 µs, so the overhead is about 6.7% which is a bit high. This is a fixed cost regardless of whether heap profiles are actually accessed. ## Summary of changes Increase the heap profiling sample frequency from 1 MB to 2 MB, which reduces the overhead to about 3.3%. This seems acceptable, considering performance-sensitive code will avoid allocations as far as possible anyway.	2025-01-13 09:44:59 +00:00
Erik Grinaker	237dae71a1	Revert "pageserver,safekeeper: disable heap profiling (#10268 )" (#10303 ) This reverts commit `b33299dc37`. Heap profiles weren't the culprit after all. Touches #10225.	2025-01-07 22:49:00 +00:00
Erik Grinaker	b33299dc37	pageserver,safekeeper: disable heap profiling (#10268 ) ## Problem Since enabling continuous profiling in staging, we've seen frequent seg faults. This is suspected to be because jemalloc and pprof-rs take a stack trace at the same time, and the handlers aren't signal safe. jemalloc does this probabilistically on every allocation, regardless of whether someone is taking a heap profile, which means that any CPU profile has a chance to cause a seg fault. Touches #10225. ## Summary of changes For now, just disable heap profiles -- CPU profiles are more important, and we need to be able to take them without risking a crash.	2025-01-03 15:21:31 +00:00
Evan Fleming	b593e51eae	safekeeper: use arc for global timelines and config (#10051 ) Hello! I was interested in potentially making some contributions to Neon and looking through the issue backlog I found [8200](https://github.com/neondatabase/neon/issues/8200) which seemed like a good first issue to attempt to tackle. I see it was assigned a while ago so apologies if I'm stepping on any toes with this PR. I also apologize for the size of this PR. I'm not sure if there is a simple way to reduce it given the footprint of the components being changed. ## Problem This PR is attempting to address part of the problem outlined in issue [8200](https://github.com/neondatabase/neon/issues/8200). Namely to remove global static usage of timeline state in favour of `Arc<GlobalTimelines>` and to replace wasteful clones of `SafeKeeperConf` with `Arc<SafeKeeperConf>`. I did not opt to tackle `RemoteStorage` in this PR to minimize the amount of changes as this PR is already quite large. I also did not opt to introduce an `SafekeeperApp` wrapper struct to similarly minimize changes but I can tackle either or both of these omissions in this PR if folks would like. ## Summary of changes - Remove static usage of `GlobalTimelines` in favour of `Arc<GlobalTimelines>` - Wrap `SafeKeeperConf` in `Arc` to avoid wasteful clones of the underlying struct ## Some additional thoughts - We seem to currently store `SafeKeeperConf` in `GlobalTimelines` and then expose it through a public`get_global_config` function which requires locking. This seems needlessly wasteful and based on observed usage we could remove this public accessor and force consumers to acquire `SafeKeeperConf` through the new Arc reference.	2024-12-09 21:09:20 +00:00
Erik Grinaker	dcb24ce170	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-03 11:35:59 +00:00
Erik Grinaker	a6073b5013	safekeeper: use jemalloc (#9780 ) ## Problem To add Safekeeper heap profiling in #9778, we need to switch to an allocator that supports it. Pageserver and proxy already use jemalloc. Touches #9534. ## Summary of changes Use jemalloc in Safekeeper.	2024-11-29 13:38:04 +00:00
Arseny Sher	700b102b0f	safekeeper: retry eviction. (#9485 ) Without this manager may sleep forever after eviction failure without retries.	2024-10-25 17:48:29 +03:00
Arseny Sher	40f7930a7d	safekeeper: skip syncfs on start if --no-sync is specified. (#9166 ) https://neondb.slack.com/archives/C059ZC138NR/p1727350911890989?thread_ts=1727350211.370869&cid=C059ZC138NR	2024-09-27 09:59:38 +03:00
Arseny Sher	c1a51416db	safekeeper: fsync filesystem on start. We can't really rely on files contents after boot without fsync'ing them.	2024-09-06 19:14:25 +03:00
Arseny Sher	8eab7009c1	safekeeper: do pid file lock before id init	2024-09-06 19:14:25 +03:00
dotdister	0f3dac265b	safekeeper: remove unused partial_backup_enabled option (#8547 ) ## Problem There is an unused safekeeper option `partial_backup_enabled`. `partial_backup_enabled` was implemented in #6530, but this option was always turned into enabled in #8022. If you intended to keep this option for a specific reason, I will close this PR. ## Summary of changes I removed an unused safekeeper option `partial_backup_enabled`.	2024-08-05 09:23:59 +02:00
Arpad Müller	4e547e6274	Use DefaultCredentialsChain AWS authentication in remote_storage (#8440 ) PR #8299 has switched the storage scrubber to use `DefaultCredentialsChain`. Now we do this for `remote_storage`, as it allows us to use `remote_storage` from inside kubernetes. Most of the diff is due to `GenericRemoteStorage::from_config` becoming `async fn`.	2024-07-19 21:19:30 +02:00
John Spray	24f8133e89	safekeeper: add eviction_min_resident to stop evictions thrashing (#8335 ) ## Problem - The condition for eviction is not time-based: it is possible for a timeline to be restored in response to a client, that client times out, and then as soon as the timeline is restored it is immediately evicted again. - There is no delay on eviction at startup of the safekeeper, so when it starts up and sees many idle timelines, it does many evictions which will likely be immediately restored when someone uses the timeline. ## Summary of changes - Add `eviction_min_resident` parameter, and use it in `ready_for_eviction` to avoid evictions if the timeline has been resident for less than this period. - This also implicitly delays evictions at startup for `eviction_min_resident` - Set this to a very low number for the existing eviction test, which expects immediate eviction. The default period is 15 minutes. The general reasoning for that is that in the worst case where we thrash ~10k timelines on one safekeeper, downloading 16MB for each one, we should set a period that would not overwhelm the node's bandwidth.	2024-07-10 19:38:14 +01:00
John Spray	6849ae4810	safekeeper: add separate `tombstones` map for deleted timelines (#8253 ) ## Problem Safekeepers left running for a long time use a lot of memory (up to the point of OOMing, on small nodes) for deleted timelines, because the `Timeline` struct is kept alive as a guard against recreating deleted timelines. Closes: https://github.com/neondatabase/neon/issues/6810 ## Summary of changes - Create separate tombstones that just record a ttid and when the timeline was deleted. - Add a periodic housekeeping task that cleans up tombstones older than a hardcoded TTL (24h) I think this also makes https://github.com/neondatabase/neon/pull/6766 un-needed, as the tombstone is also checked during deletion. I considered making the overall timeline map use an enum type containing active or deleted, but having a separate map of tombstones avoids bloating that map, so that calls like `get()` can still go straight to a timeline without having to walk a hashmap that also contains tombstones.	2024-07-05 11:17:44 +01:00
Christian Schwarz	7dcdbaa25e	remote_storage config: move handling of empty inline table `{}` to callers (#8193 ) Before this PR, `RemoteStorageConfig::from_toml` would support deserializing an empty `{}` TOML inline table to a `None`, otherwise try `Some()`. We can instead let * in proxy: let clap derive handle the Option * in PS & SK: assume that if the field is specified, it must be a valid RemtoeStorageConfig (This PR started with a much simpler goal of factoring out the `deserialize_item` function because I need that in another PR).	2024-07-02 12:53:08 +02:00
Arthur Petukhovsky	e1a06b40b7	Add rate limiter for partial uploads (#8203 ) Too many concurrect partial uploads can hurt disk performance, this commit adds a limiter. Context: https://neondb.slack.com/archives/C04KGFVUWUQ/p1719489018814669?thread_ts=1719440183.134739&cid=C04KGFVUWUQ	2024-06-28 18:16:21 +01:00
Arthur Petukhovsky	76fc3d4aa1	Evict WAL files from disk (#8022 ) Fixes https://github.com/neondatabase/neon/issues/6337 Add safekeeper support to switch between `Present` and `Offloaded(flush_lsn)` states. The offloading is disabled by default, but can be controlled using new cmdline arguments: ``` --enable-offload Enable automatic switching to offloaded state --delete-offloaded-wal Delete local WAL files after offloading. When disabled, they will be left on disk --control-file-save-interval <CONTROL_FILE_SAVE_INTERVAL> Pending updates to control file will be automatically saved after this interval [default: 300s] ``` Manager watches state updates and detects when there are no actvity on the timeline and actual partial backup upload in remote storage. When all conditions are met, the state can be switched to offloaded. In `timeline.rs` there is `StateSK` enum to support switching between states. When offloaded, code can access only control file structure and cannot use `SafeKeeper` to accept new WAL. `FullAccessTimeline` is now renamed to `WalResidentTimeline`. This struct contains guard to notify manager about active tasks requiring on-disk WAL access. All guards are issued by the manager, all requests are sent via channel using `ManagerCtl`. When manager receives request to issue a guard, it unevicts timeline if it's currently evicted. Fixed a bug in partial WAL backup, it used `term` instead of `last_log_term` previously. After this commit is merged, next step is to roll this change out, as in issue #6338.	2024-06-26 18:58:56 +01:00
Arseny Sher	4feb6ba29c	Make pull_timeline work with auth enabled. - Make safekeeper read SAFEKEEPER_AUTH_TOKEN env variable with JWT token to connect to other safekeepers. - Set it in neon_local when auth is enabled. - Create simple rust http client supporting it, and use it in pull_timeline implementation. - Enable auth in all pull_timeline tests. - Make sk http_client() by default generate safekeeper wide token, it makes easier enabling auth in all tests by default.	2024-06-18 15:45:39 +03:00
Arthur Petukhovsky	16b2e74037	Add FullAccessTimeline guard in safekeepers (#7887 ) This is a preparation for https://github.com/neondatabase/neon/issues/6337. The idea is to add FullAccessTimeline, which will act as a guard for tasks requiring access to WAL files. Eviction will be blocked on these tasks and WAL won't be deleted from disk until there is at least one active FullAccessTimeline. To get FullAccessTimeline, tasks call `tli.full_access_guard().await?`. After eviction is implemented, this function will be responsible for downloading missing WAL file and waiting until the download finishes. This commit also contains other small refactorings: - Separate `get_tenant_dir` and `get_timeline_dir` functions for building a local path. This is useful for looking at usages and finding tasks requiring access to local filesystem. - `timeline_manager` is now responsible for spawning all background tasks - WAL removal task is now spawned instantly after horizon is updated	2024-05-31 13:19:45 +00:00
Arthur Petukhovsky	bd5cb9e86b	Implement timeline_manager for safekeeper background tasks (#7768 ) In safekeepers we have several background tasks. Previously `WAL backup` task was spawned by another task called `wal_backup_launcher`. That task received notifications via `wal_backup_launcher_rx` and decided to spawn or kill existing backup task associated with the timeline. This was inconvenient because each code segment that touched shared state was responsible for pushing notification into `wal_backup_launcher_tx` channel. This was error prone because it's easy to miss and could lead to deadlock in some cases, if notification pushing was done in the wrong order. We also had a similar issue with `is_active` timeline flag. That flag was calculated based on the state and code modifying the state had to call function to update the flag. We had a few bugs related to that, when we forgot to update `is_active` flag in some places where it could change. To fix these issues, this PR adds a new `timeline_manager` background task associated with each timeline. This task is responsible for managing all background tasks, including `is_active` flag which is used for pushing broker messages. It is subscribed for updates in timeline state in a loop and decides to spawn/kill background tasks when needed. There is a new structure called `TimelinesSet`. It stores a set of `Arc<Timeline>` and allows to copy the set to iterate without holding the mutex. This is what replaced `is_active` flag for the broker. Now broker push task holds a reference to the `TimelinesSet` with active timelines and use it instead of iterating over all timelines and filtering by `is_active` flag. Also added some metrics for manager iterations and active backup tasks. Ideally manager should be doing not too many iterations and we should not have a lot of backup tasks spawned at the same time. Fixes #7751 --------- Co-authored-by: Arseny Sher <sher-ars@yandex.ru>	2024-05-22 09:34:39 +01:00
Arthur Petukhovsky	50a45e67dc	Discover safekeepers via broker request (#7279 ) We had an incident where pageserver requests timed out because pageserver couldn't fetch WAL from safekeepers. This incident was caused by a bug in safekeeper logic for timeline activation, which prevented pageserver from finding safekeepers. This bug was since fixed, but there is still a chance of a similar bug in the future due to overall complexity. We add a new broker message to "signal interest" for timeline. This signal will be sent by pageservers `wait_lsn`, and safekeepers will receive this signal to start broadcasting broker messages. Then every broker subscriber will be able to find the safekeepers and connect to them (to start fetching WAL). This feature is not limited to pageservers and any service that wants to download WAL from safekeepers will be able to use this discovery request. This commit changes pageserver's connection_manager (walreceiver) to send a SafekeeperDiscoveryRequest when there is no information about safekeepers present in memory. Current implementation will send these requests only if there is an active wait_lsn() call and no more often than once per 10 seconds. Add `test_broker_discovery` to test this: safekeepers started with `--disable-periodic-broker-push` will not push info to broker so that pageserver must use a discovery to start fetching WAL. Add task_stats in safekeepers broker module to log a warning if there is no message received from the broker for the last 10 seconds. Closes #5471 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-04-30 18:50:03 +00:00
Arthur Petukhovsky	3f77f26aa2	Upload partial segments (#6530 ) Add support for backing up partial segments to remote storage. Disabled by default, can be enabled with `--partial-backup-enabled`. Safekeeper timeline has a background task which is subscribed to `commit_lsn` and `flush_lsn` updates. After the partial segment was updated (`flush_lsn` was changed), the segment will be uploaded to S3 in about 15 minutes. The filename format for partial segments is `Segment_Term_Flush_Commit_skNN.partial`, where: - `Segment` – the segment name, like `000000010000000000000001` - `Term` – current term - `Flush` – flush_lsn in hex format `{:016X}`, e.g. `00000000346BC568` - `Commit` – commit_lsn in the same hex format - `NN` – safekeeper_id, like `1` The full object name example: `000000010000000000000002_2_0000000002534868_0000000002534410_sk1.partial` Each safekeeper will keep info about remote partial segments in its control file. Code updates state in the control file before doing any S3 operations. This way control file stores information about all potentially existing remote partial segments and can clean them up after uploading a newer version. Closes #6336	2024-04-03 15:20:51 +00:00
Arthur Petukhovsky	03f8a42ed9	Add walsenders_keep_horizon option (#6860 ) Add `--walsenders-keep-horizon` argument to safekeeper cmdline. It will prevent deleting WAL segments from disk if they are needed by the active START_REPLICATION connection. This is useful for sharding. Without this option, if one of the shard falls behind, it starts to read WAL from S3, which is much slower than disk. This can result in huge shard lagging.	2024-02-21 19:09:40 +00:00
Arseny Sher	e79a19339c	Add failpoint support to safekeeper. Just a copy paste from pageserver.	2024-01-02 10:50:20 +04:00
Arseny Sher	78e73b20e1	Notify safekeeper readiness with systemd. To avoid downtime during deploy, as in busy regions initial load can currently take ~30s.	2023-11-29 14:07:06 +04:00
Christian Schwarz	9e3c07611c	logging: support output to stderr (#5896 ) (part of the getpage benchmarking epic #5771) The plan is to make the benchmarking tool log on stderr and emit results as JSON on stdout. That way, the test suite can simply take captures stdout and json.loads() it, while interactive users of the benchmarking tool have a reasonable experience as well. Existing logging users continue to print to stdout, so, this change should be a no-op functionally and performance-wise.	2023-11-22 11:08:35 +00:00
Arpad Müller	e310533ed3	Support JWT key reload in pageserver (#5594 ) ## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 15:43:29 +01:00
duguorong009	39f8fd6945	feat: add `build_tag` env support for `set_build_info_metric` (#5576 ) - Add a new util `project_build_tag` macro, similar to `project_git_version` - Update the `set_build_info_metric` to accept and make use of `build_tag` info - Update all codes which use the `set_build_info_metric`	2023-10-27 10:47:11 +01:00
Arseny Sher	b332268cec	Introduce safekeeper peer recovery. Implements fetching of WAL by safekeeper from another safekeeper by imitating behaviour of last elected leader. This allows to avoid WAL accumulation on compute and facilitates faster compute startup as it doesn't need to download any WAL. Actually removing WAL download in walproposer is a matter of another patch though. There is a per timeline task which always runs, checking regularly if it should start recovery frome someone, meaning there is something to fetch and there is no streaming compute. It then proceeds with fetching, finishing when there is nothing more to receive. Implements https://github.com/neondatabase/neon/pull/4875	2023-10-20 10:57:59 +03:00
duguorong009	25a37215f3	fix: replace all `std::PathBuf`s with `camino::Utf8PathBuf` (#5352 ) Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with `camino::Utf8Path`, `camino::Utf8PathBuf` in - pageserver - safekeeper - control_plane - libs/remote_storage Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 17:52:23 +03:00
Arseny Sher	87f7d6bce3	Start and stop per timeline recovery task. Slightly refactors init: now load_tenant_timelines is also async to properly init the timeline, but to keep global map lock sync we just acquire it anew for each timeline. Recovery task itself is just a stub here. part of https://github.com/neondatabase/neon/pull/4875	2023-08-29 23:19:40 +03:00
Arseny Sher	4687b2e597	Test that auth on pg/http services can be enabled separately in sks. To this end add 1) -e option to 'neon_local safekeeper start' command appending extra options to safekeeper invocation; 2) Allow multiple occurrences of the same option in safekeepers, the last value is taken. 3) Allow to specify empty string for *-auth-public-key-path opts, it disables auth for the service.	2023-08-15 19:31:20 +03:00
Arseny Sher	13adc83fc3	Allow to enable http/pg/pg tenant only auth separately in safekeeper. The same option enables auth and specifies public key, so this allows to use different public keys as well. The motivation is to 1) Allow to e.g. change pageserver key/token without replacing all compute tokens. 2) Enable auth gradually.	2023-08-15 19:31:20 +03:00
Cuong Nguyen	039017cb4b	Add new flag for advertising pg address (#4898 ) ## Problem The safekeeper advertises the same address specified in `--listen-pg`, which is problematic when the listening address is different from the address that the pageserver can use to connect to the safekeeper. ## Summary of changes Add a new optional flag called `--advertise-pg` for the address to be advertised. If this flag is not specified, the behavior is the same as before.	2023-08-08 14:26:38 +03:00
Yinnan Yao	705ae2dce9	Fix error message for listen_pg_addr_tenant_only binding (#4787 ) ## Problem Wrong use of `conf.listen_pg_addr` in `error!()`. ## Summary of changes Use `listen_pg_addr_tenant_only` instead of `conf.listen_pg_addr`. Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>	2023-07-31 14:40:52 +01:00
Arseny Sher	1e7db5458f	Add one more WAL service port allowing only tenant scoped auth tokens. It will make it easier to limit access at network level, with e.g. k8s network policies. ref https://github.com/neondatabase/neon/issues/4730	2023-07-19 06:03:51 +04:00

1 2

98 Commits