rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 13:32:57 +00:00

Author	SHA1	Message	Date
Erik Grinaker	cd982a82ec	pageserver,safekeeper: increase heap profiling frequency to 2 MB (#10362 ) ## Problem Currently, the heap profiling frequency is every 1 MB allocated. Taking a profile stack trace takes about 1 µs, and allocating 1 MB takes about 15 µs, so the overhead is about 6.7% which is a bit high. This is a fixed cost regardless of whether heap profiles are actually accessed. ## Summary of changes Increase the heap profiling sample frequency from 1 MB to 2 MB, which reduces the overhead to about 3.3%. This seems acceptable, considering performance-sensitive code will avoid allocations as far as possible anyway.	2025-01-13 09:44:59 +00:00
Arseny Sher	030ab1c0e8	TLA+ spec for safekeeper membership change (#9966 ) ## Problem We want to define the algorithm for safekeeper membership change. ## Summary of changes Add spec for it, several models and logs of checking them. ref https://github.com/neondatabase/neon/issues/8699	2025-01-09 12:26:17 +00:00
Erik Grinaker	237dae71a1	Revert "pageserver,safekeeper: disable heap profiling (#10268 )" (#10303 ) This reverts commit `b33299dc37`. Heap profiles weren't the culprit after all. Touches #10225.	2025-01-07 22:49:00 +00:00
Erik Grinaker	b33299dc37	pageserver,safekeeper: disable heap profiling (#10268 ) ## Problem Since enabling continuous profiling in staging, we've seen frequent seg faults. This is suspected to be because jemalloc and pprof-rs take a stack trace at the same time, and the handlers aren't signal safe. jemalloc does this probabilistically on every allocation, regardless of whether someone is taking a heap profile, which means that any CPU profile has a chance to cause a seg fault. Touches #10225. ## Summary of changes For now, just disable heap profiles -- CPU profiles are more important, and we need to be able to take them without risking a crash.	2025-01-03 15:21:31 +00:00
Vlad Lazar	628451d68e	safekeeper: short-circuit interpreted wal sender (#10202 ) ## Problem Safekeeper may currently send a batch to the pageserver even if it hasn't decoded a new record. I think this is quite unlikely in the field, but worth adressing. ## Summary of changes Don't send anything if we haven't decoded a full record. Once this merges and releases, the `InterpretedWalRecords` struct can be updated to remove the Option wrapper for `next_record_lsn`.	2024-12-19 14:04:46 +00:00
Vlad Lazar	502d512fe2	safekeeper: lift benchmarking utils into safekeeper crate (#10200 ) ## Problem The benchmarking utilities are also useful for testing. We want to write tests in the safekeeper crate. ## Summary of changes This commit lifts the utils to the safekeeper crate. They are compiled if the benchmarking features is enabled or if in test mode.	2024-12-19 14:04:42 +00:00
Erik Grinaker	1668d39b7c	safekeeper: fix typo in allowlist for `/profile/heap` (#10186 )	2024-12-18 15:51:53 +00:00
Conrad Ludgate	6565fd4056	chore: fix clippy lints 2024-12-06 (#10138 )	2024-12-16 15:33:21 +00:00
Arseny Sher	1ed0e52bc8	Extract safekeeper http client to separate crate. (#10140 ) ## Problem We want to use safekeeper http client in storage controller and neon_local. ## Summary of changes Extract it to separate crate. No functional changes.	2024-12-16 12:07:24 +00:00
Arseny Sher	ce8eb089f3	Extract public sk types to safekeeper_api (#10137 ) ## Problem We want to extract safekeeper http client to separate crate for use in storage controller and neon_local. However, many types used in the API are internal to safekeeper. ## Summary of changes Move them to safekeeper_api crate. No functional changes. ref https://github.com/neondatabase/neon/issues/9011	2024-12-13 14:06:27 +00:00
Evan Fleming	b593e51eae	safekeeper: use arc for global timelines and config (#10051 ) Hello! I was interested in potentially making some contributions to Neon and looking through the issue backlog I found [8200](https://github.com/neondatabase/neon/issues/8200) which seemed like a good first issue to attempt to tackle. I see it was assigned a while ago so apologies if I'm stepping on any toes with this PR. I also apologize for the size of this PR. I'm not sure if there is a simple way to reduce it given the footprint of the components being changed. ## Problem This PR is attempting to address part of the problem outlined in issue [8200](https://github.com/neondatabase/neon/issues/8200). Namely to remove global static usage of timeline state in favour of `Arc<GlobalTimelines>` and to replace wasteful clones of `SafeKeeperConf` with `Arc<SafeKeeperConf>`. I did not opt to tackle `RemoteStorage` in this PR to minimize the amount of changes as this PR is already quite large. I also did not opt to introduce an `SafekeeperApp` wrapper struct to similarly minimize changes but I can tackle either or both of these omissions in this PR if folks would like. ## Summary of changes - Remove static usage of `GlobalTimelines` in favour of `Arc<GlobalTimelines>` - Wrap `SafeKeeperConf` in `Arc` to avoid wasteful clones of the underlying struct ## Some additional thoughts - We seem to currently store `SafeKeeperConf` in `GlobalTimelines` and then expose it through a public`get_global_config` function which requires locking. This seems needlessly wasteful and based on observed usage we could remove this public accessor and force consumers to acquire `SafeKeeperConf` through the new Arc reference.	2024-12-09 21:09:20 +00:00
Erik Grinaker	699a213c5d	Display reqwest error source (#10004 ) ## Problem Reqwest errors don't include details about the inner source error. This means that we get opaque errors like: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config) ``` Instead of the more helpful: ``` receive body: error sending request for url (http://localhost:9898/v1/location_config): operation timed out ``` Touches #9801. ## Summary of changes Include the source error for `reqwest::Error` wherever it's displayed.	2024-12-04 13:05:53 +00:00
Erik Grinaker	dcb24ce170	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-03 11:35:59 +00:00
Arseny Sher	fa909c27fc	Update consensus protocol spec (#9607 ) The spec was written for the buggy protocol which we had before the one more similar to Raft was implemented. Update the spec with what we currently have. ref https://github.com/neondatabase/neon/issues/8699	2024-12-02 16:10:44 +00:00
Erik Grinaker	a6073b5013	safekeeper: use jemalloc (#9780 ) ## Problem To add Safekeeper heap profiling in #9778, we need to switch to an allocator that supports it. Pageserver and proxy already use jemalloc. Touches #9534. ## Summary of changes Use jemalloc in Safekeeper.	2024-11-29 13:38:04 +00:00
Vlad Lazar	eb520a14ce	pageserver: return correct LSN for interpreted proto keep alive responses (#9928 ) ## Problem For the interpreted proto the pageserver is not returning the correct LSN in replies to keep alive requests. This is because the interpreted protocol arm was not updating `last_rec_lsn`. ## Summary of changes * Return correct LSN in keep-alive responses * Fix shard field in wal sender traces	2024-11-28 17:38:47 +00:00
Vlad Lazar	9e0148de11	safekeeper: use protobuf for sending compressed records to pageserver (#9821 ) ## Problem https://github.com/neondatabase/neon/pull/9746 lifted decoding and interpretation of WAL to the safekeeper. This reduced the ingested amount on the pageservers by around 10x for a tenant with 8 shards, but doubled the ingested amount for single sharded tenants. Also, https://github.com/neondatabase/neon/pull/9746 uses bincode which doesn't support schema evolution. Technically the schema can be evolved, but it's very cumbersome. ## Summary of changes This patch set addresses both problems by adding protobuf support for the interpreted wal records and adding compression support. Compressed protobuf reduced the ingested amount by 100x on the 32 shards `test_sharded_ingest` case (compared to non-interpreted proto). For the 1 shard case the reduction is 5x. Sister change to `rust-postgres` is [here](https://github.com/neondatabase/rust-postgres/pull/33). ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-27 12:12:21 +00:00
Vlad Lazar	7a2f0ed8d4	safekeeper: lift decoding and interpretation of WAL to the safekeeper (#9746 ) ## Problem For any given tenant shard, pageservers receive all of the tenant's WAL from the safekeeper. This soft-blocks us from using larger shard counts due to bandwidth concerns and CPU overhead of filtering out the records. ## Summary of changes This PR lifts the decoding and interpretation of WAL from the pageserver into the safekeeper. A customised PG replication protocol is used where instead of sending raw WAL, the safekeeper sends filtered, interpreted records. The receiver drives the protocol selection, so, on the pageserver side, usage of the new protocol is gated by a new pageserver config: `wal_receiver_protocol`. More granularly the changes are: 1. Optionally inject the protocol and shard identity into the arguments used for starting replication 2. On the safekeeper side, implement a new wal sending primitive which decodes and interprets records before sending them over 3. On the pageserver side, implement the ingestion of this new replication message type. It's very similar to what we already have for raw wal (minus decoding and interpreting). ## Notes * This PR currently uses my [branch of rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support) which includes the deserialization logic for the new replication message type. PR for that is open [here](https://github.com/neondatabase/rust-postgres/pull/32). * This PR contains changes for both pageservers and safekeepers. It's safe to merge because the new protocol is disabled by default on the pageserver side. We can gradually start enabling it in subsequent releases. * CI tests are running on https://github.com/neondatabase/neon/pull/9747 ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-25 17:29:28 +00:00
Erik Grinaker	e939d36dd4	safekeeper,pageserver: fix CPU profiling allowlists (#9856 ) ## Problem The HTTP router allowlists matched both on the path and the query string. This meant that only `/profile/cpu` would be allowed without auth, while `/profile/cpu?format=svg` would require auth. Follows #9764. ## Summary of changes * Match allowlists on URI path, rather than the entire URI. * Fix the allowlist for Safekeeper to use `/profile/cpu` rather than the old `/pprof/profile`. * Just use a constant slice for the allowlist; it's only a handful of items, and these handlers are not on hot paths.	2024-11-22 17:50:33 +00:00
Erik Grinaker	190e8cebac	safekeeper,pageserver: add CPU profiling (#9764 ) ## Problem We don't have a convenient way to gather CPU profiles from a running binary, e.g. during production incidents or end-to-end benchmarks, nor during microbenchmarks (particularly on macOS). We would also like to have continuous profiling in production, likely using [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/). We may choose to use either eBPF profiles or pprof profiles for this (pending testing and discussion with SREs), but pprof profiles appear useful regardless for the reasons listed above. See https://github.com/neondatabase/cloud/issues/14888. This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888. ## Summary of changes Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns it. Defaults to a 5-second pprof Protobuf profile for use with e.g. `pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query parameters: * `format`: output format (`pprof` or `svg`) * `frequency`: sampling frequency in microseconds (default 100) * `seconds`: number of seconds to profile (default 5) Also integrates pprof profiles into Criterion benchmarks, such that flamegraph reports can be taken with `cargo bench ... --profile-duration <seconds>`. Output under `target/criterion//profile/flamegraph.svg`. Example profiles: pprof profile (use [`pprof`](https://github.com/google/pprof)): [profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz) * Web interface: `pprof -http :6060 profile.pb.gz` * Interactive flamegraph: [profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)	2024-11-21 18:59:46 +00:00
John Spray	33dce25af8	safekeeper: block deletion on protocol handler shutdown (#9364 ) ## Problem Two recently observed log errors indicate safekeeper tasks for a timeline running after that timeline's deletion has started. - https://github.com/neondatabase/neon/issues/8972 - https://github.com/neondatabase/neon/issues/8974 These code paths do not have a mechanism that coordinates task shutdown with the overall shutdown of the timeline. ## Summary of changes - Add a `Gate` to `Timeline` - Take the gate as part of resident timeline guard: any code that holds a guard over a timeline staying resident should also hold a guard over the timeline's total lifetime. - Take the gate from the wal removal task - Respect Timeline::cancel in WAL send/recv code, so that we do not block shutdown indefinitely. - Add a test that deletes timelines with open pageserver+compute connections, to check these get torn down as expected. There is some risk to introducing gates: if there is code holding a gate which does not properly respect a cancellation token, it can cause shutdown hangs. The risk of this for safekeepers is lower in practice than it is for other services, because in a healthy timeline deletion, the compute is shutdown first, then the timeline is deleted on the pageserver, and finally it is deleted on the safekeepers -- that makes it much less likely that some protocol handler will still be running. Closes: #8972 Closes: #8974	2024-11-20 11:07:45 +00:00
Erik Grinaker	de7e4a34ca	safekeeper: send `AppendResponse` on segment flush (#9692 ) ## Problem When processing pipelined `AppendRequest`s, we explicitly flush the WAL every second and return an `AppendResponse`. However, the WAL is also implicitly flushed on segment bounds, but this does not result in an `AppendResponse`. Because of this, concurrent transactions may take up to 1 second to commit and writes may take up to 1 second before sending to the pageserver. ## Summary of changes Advance `flush_lsn` when a WAL segment is closed and flushed, and emit an `AppendResponse`. To accommodate this, track the `flush_lsn` in addition to the `flush_record_lsn`.	2024-11-17 18:19:14 +01:00
Arseny Sher	d06bf4b0fe	safekeeper: fix atomicity of WAL truncation (#9685 ) If WAL truncation fails in the middle it might leave some data on disk above the write/flush LSN. In theory, concatenated with previous records it might form bogus WAL (though very unlikely in practice because CRC would protect from that). To protect from that, set pending_wal_truncation flag: means before any WAL writes truncation must be retried until it succeeds. We already did that in case of safekeeper restart, now extend this mechanism for failures without restart. Also, importantly, reset LSNs in the beginning of the operation, not in the end, because once on disk deletion starts previous pointers are wrong. All this most likely haven't created any problems in practice because CRC protects from the consequences. Tests for this are hard; simulation infrastructure might be useful here in the future, but not yet.	2024-11-14 13:06:42 +03:00
Erik Grinaker	2256a5727a	safekeeper: use `WAL_SEGMENT_SIZE` for empty timeline state (#9734 ) ## Problem `TimelinePersistentState::empty()`, used for tests and benchmarks, had a hardcoded 16 MB WAL segment size. This caused confusion when attempting to change the global segment size. ## Summary of changes Inherit from `WAL_SEGMENT_SIZE` in `TimelinePersistentState::empty()`.	2024-11-12 20:35:44 +00:00
Erik Grinaker	6b19867410	safekeeper: don't flush control file on WAL ingest path (#9698 ) ## Problem The control file is flushed on the WAL ingest path when the commit LSN advances by one segment, to bound the amount of recovery work in case of a crash. This involves 3 additional fsyncs, which can have a significant impact on WAL ingest throughput. This is to some extent mitigated by `AppendResponse` not being emitted on segment bound flushes, since this will prevent commit LSN advancement, which will be addressed separately. ## Summary of changes Don't flush the control file on the WAL ingest path at all. Instead, leave that responsibility to the timeline manager, but ask it to flush eagerly if the control file lags the in-memory commit LSN by more than one segment. This should not cause more than `REFRESH_INTERVAL` (300 ms) additional latency before flushing the control file, which is negligible.	2024-11-12 15:17:03 +00:00
Erik Grinaker	f63de5f527	safekeeper: add `initialize_segment` variant of `safekeeper_wal_storage_operation_seconds` (#9691 ) ## Problem We don't have a metric capturing the latency of segment initialization. This can be significant due to fsyncs. ## Summary of changes Add an `initialize_segment` variant of `safekeeper_wal_storage_operation_seconds`.	2024-11-11 17:55:50 +01:00
Vlad Lazar	ceaa80ffeb	storcon: add peer token for peer to peer communication (#9695 ) ## Problem We wish to stop using admin tokens in the infra repo, but step down requests use the admin token. ## Summary of Changes Introduce a new "ControllerPeer" scope and use it for step-down requests.	2024-11-11 09:58:41 +00:00
Erik Grinaker	ab47804d00	safekeeper: remove unused `WriteGuardSharedState::skip_update` (#9699 )	2024-11-08 19:25:31 +00:00
Erik Grinaker	f18aa04b90	safekeeper: use `set_len()` to zero out segments (#9665 ) ## Problem When we create a new segment, we zero it out in order to avoid changing the length and fsyncing metadata on every write. However, we zeroed it out by writing 8 KB zero-pages, and Tokio file writes have non-trivial overhead. ## Summary of changes Zero out the segment using [`File::set_len()`](https://docs.rs/tokio/latest/i686-unknown-linux-gnu/tokio/fs/struct.File.html#method.set_len) instead. This will typically (depending on the filesystem) just write a sparse file and omit the 16 MB of data entirely. This improves WAL append throughput for large messages by over 400% with fsync disabled, and 100% with fsync enabled.	2024-11-07 15:09:57 +00:00
Erik Grinaker	01265b7bc6	safekeeper: add basic WAL ingestion benchmarks (#9531 ) ## Problem We don't have any benchmarks for Safekeeper WAL ingestion. ## Summary of changes Add some basic benchmarks for WAL ingestion, specifically for `SafeKeeper::process_msg()` (single append) and `WalAcceptor` (pipelined batch ingestion). Also add some baseline file write benchmarks.	2024-11-07 13:24:03 +00:00
Erik Grinaker	d6aa26a533	postgres_ffi: make `WalGenerator` generic over record generator (#9614 ) ## Problem Benchmarks need more control over the WAL generated by `WalGenerator`. In particular, they need to vary the size of logical messages. ## Summary of changes * Make `WalGenerator` generic over `RecordGenerator`, which constructs WAL records. * Add `LogicalMessageGenerator` which emits logical messages, with a configurable payload. * Minor tweaks and code reorganization. There are no changes to the core logic or emitted WAL.	2024-11-07 10:38:39 +00:00
Erik Grinaker	babfeb70ba	safekeeper: don't allocate send buffers on stack (#9644 ) ## Problem While experimenting with `MAX_SEND_SIZE` for benchmarking, I saw stack overflows when increasing it to 1 MB. Turns out a few buffers of this size are stack-allocated rather than heap-allocated. Even at the default 128 KB size, that's a bit large to allocate on the stack. ## Summary of changes Heap-allocate buffers of size `MAX_SEND_SIZE`.	2024-11-05 17:05:30 +00:00
Erik Grinaker	0d5a512825	safekeeper: add walreceiver metrics (#9450 ) ## Problem We don't have any observability for Safekeeper WAL receiver queues. ## Summary of changes Adds a few WAL receiver metrics: * `safekeeper_wal_receivers`: gauge of currently connected WAL receivers. * `safekeeper_wal_receiver_queue_depth`: histogram of queue depths per receiver, sampled every 5 seconds. * `safekeeper_wal_receiver_queue_depth_total`: gauge of total queued messages across all receivers. * `safekeeper_wal_receiver_queue_size_total`: gauge of total queued message sizes across all receivers. There are already metrics for ingested WAL volume: `written_wal_bytes` counter per timeline, and `safekeeper_write_wal_bytes` per-request histogram.	2024-11-04 15:22:46 +00:00
Erik Grinaker	123816e99a	safekeeper: log slow WalAcceptor sends (#9564 ) ## Problem We don't have any observability into full WalAcceptor queues per timeline. ## Summary of changes Logs a message when a WalAcceptor send has blocked for 5 seconds, and another message when the send completes. This implies that the log frequency is at most once every 5 seconds per timeline, so we don't need further throttling.	2024-11-01 13:47:03 +01:00
Erik Grinaker	96e35e11a6	postgres_ffi: add WAL generator for tests/benchmarks (#9503 ) ## Problem We don't have a convenient way to generate WAL records for benchmarks and tests. ## Summary of changes Adds a WAL generator, exposed as an iterator. It currently only generates logical messages (noops), but will be extended to write actual table rows later. Some existing code for WAL generation has been replaced with this generator, to reduce duplication.	2024-10-30 14:46:39 +03:00
Erik Grinaker	248558dee8	safekeeper: refactor `WalAcceptor` to be event-driven (#9462 ) ## Problem The `WalAcceptor` main loop currently uses two nested loops to consume inbound messages. This makes it hard to slot in periodic events like metrics collection. It also duplicates the event processing code, and assumes all messages in steady state are AppendRequests (other messages types may be dropped if following an AppendRequest). ## Summary of changes Refactor the `WalAcceptor` loop to be event driven.	2024-10-28 17:18:37 +00:00
John Spray	923974d4da	safekeeper: don't un-evict timelines during snapshot API handler (#9428 ) ## Problem When we use pull_timeline API on an evicted timeline, it gets downloaded to serve the snapshot API request. That means that to evacuate all the timelines from a node, the node needs enough disk space to download partial segments from all timelines, which may not be physically the case. Closes: #8833 ## Summary of changes - Add a "try" variant of acquiring a residence guard, that returns None if the timeline is offloaded - During snapshot API handler, take a different code path if the timeline isn't resident, where we just read the checkpoint and don't try to read any segments.	2024-10-28 08:47:12 +00:00
Erik Grinaker	b54b632c6a	safekeeper: don't pass conf into storage constructors (#9523 ) ## Problem The storage components take an entire `SafekeeperConf` during construction, but only actually use the `no_sync` field. This makes it hard to understand the storage inputs (which fields do they actually care about?), and is also inconvenient for tests and benchmarks that need to set up a lot of unnecessary boilerplate. ## Summary of changes * Don't take the entire config, but pass in the `no_sync` field explicitly. * Take the timeline dir instead of `ttid` as an input, since it's the only thing it cares about. * Fix a couple of tests to not leak tempdirs. * Various minor tweaks.	2024-10-25 18:19:52 +01:00
Erik Grinaker	9909551f47	safekeeper: fix version in `TimelinePersistentState::empty()` (#9521 ) ## Problem The Postgres version in `TimelinePersistentState::empty()` is incorrect: the major version should be multiplied by 10000. ## Summary of changes Multiply the version by 10000.	2024-10-25 16:22:35 +01:00
Arseny Sher	700b102b0f	safekeeper: retry eviction. (#9485 ) Without this manager may sleep forever after eviction failure without retries.	2024-10-25 17:48:29 +03:00
Arseny Sher	1e8e04bb2c	safekeeper: refactor timeline initialization (#9362 ) Always do timeline init through atomic rename of temp directory. Add GlobalTimelines::load_temp_timeline which does this, and use it from both pull_timeline and basic timeline creation. Fixes a collection of issues: - previously timeline creation didn't really flushed cfile to disk due to 'nothing to do if state didn't change' check; - even if it did, without tmp dir it is possible to lose the cfile but leave timeline dir in place, making it look corrupted; - tenant directory creation fsync was missing in timeline creation; - pull_timeline is now protected from concurrent both itself and timeline creation; - now global timelines map entry got special CreationInProgress entry type which prevents from anyone getting access to timeline while it is being created (previously one could get access to it, but it was locked during creation, which is valid but confusing if creation failed). fixes #8927	2024-10-22 07:11:36 +01:00
Conrad Ludgate	b8304f90d6	2024 oct new clippy lints (#9448 ) Fixes new lints from `cargo +nightly clippy` (`clippy 0.1.83 (798fb83f 2024-10-16)`)	2024-10-18 10:27:50 +01:00
Erik Grinaker	299cde899b	safekeeper: flush WAL on compute disconnect (#9436 ) ## Problem In #9259, we found that the `check_safekeepers_synced` fast path could result in a lower basebackup LSN than the `flush_lsn` reported by Safekeepers in `VoteResponse`, causing the compute to panic once on startup. This would happen if the Safekeeper had unflushed WAL records due to a compute disconnect. The `TIMELINE_STATUS` query would report a `flush_lsn` below these unflushed records, while `VoteResponse` would flush the WAL and report the advanced `flush_lsn`. See https://github.com/neondatabase/neon/issues/9259#issuecomment-2410849032. ## Summary of changes Flush the WAL if the compute disconnects during WAL processing.	2024-10-17 17:19:18 +02:00
John Spray	426b1c5f08	storage controller: use 'infra' JWT scope for node registration (#9343 ) ## Problem Storage controller `/control` API mostly requires admin tokens, for interactive use by engineers. But for endpoints used by scripts, we should not require admin tokens. Discussion at https://neondb.slack.com/archives/C033RQ5SPDH/p1728550081788989?thread_ts=1728548232.265019&cid=C033RQ5SPDH ## Summary of changes - Introduce the 'infra' JWT scope, which was not previously used in the neon repo - For pageserver & safekeeper node registrations, require infra scope instead of admin Note that admin will still work, as the controller auth checks permit admin tokens for all endpoints irrespective of what scope they require.	2024-10-10 12:26:43 +01:00
Arseny Sher	a181392738	safekeeper: add evicted_timelines gauge. (#9318 ) showing total number of evicted timelines.	2024-10-09 14:40:30 +03:00
Erik Grinaker	211970f0e0	remote_storage: add `DownloadOpts::byte_(start\|end)` (#9293 ) `download_byte_range()` is basically a copy of `download()` with an additional option passed to the backend SDKs. This can cause these code paths to diverge, and prevents combining various options. This patch adds `DownloadOpts::byte_(start\|end)` and move byte range handling into `download()`.	2024-10-09 10:29:06 +01:00
Arpad Müller	912d47ec02	storage_broker: update hyper and tonic again (#9299 ) Update hyper and tonic again in the storage broker, this time with a fix for the issue that made us revert the update last time. The first commit is a revert of #9268, the second a fix for the issue. fixes #9231.	2024-10-07 21:12:13 +02:00
Arseny Sher	eae4470bb6	safekeeper: remove local WAL files ignoring peer_horizon_lsn. (#8900 ) If peer safekeeper needs garbage collected segment it will be fetched now from s3 using on-demand WAL download. Reduces danger of running out of disk space when safekeeper fails.	2024-10-04 19:07:39 +03:00
Arpad Müller	e3d6ecaeee	Revert hyper and tonic updates (#9268 )	2024-10-03 19:21:22 +01:00
Arseny Sher	d785fcb5ff	safekeeper: fix panic in debug_dump. (#9097 ) Panic was triggered only when dump selected no timelines. sentry report: https://neondatabase.sentry.io/issues/5832368589/	2024-10-03 19:22:22 +03:00

1 2 3 4 5 ...

350 Commits