rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-04 03:52:56 +00:00

Author	SHA1	Message	Date
HaoyuHuang	b7fc5a2fe0	A few SC changes (#12615 ) ## Summary of changes A bunch of no-op changes. --------- Co-authored-by: Vlad Lazar <vlad@neon.tech>	2025-07-17 13:14:36 +00:00
Tristan Partin	eadb05f78e	Teach neon_local to pass the Authorization header to compute_ctl (#11490 ) This allows us to remove hacks in the compute_ctl authorization middleware which allowed for bypasses of auth checks. Fixes: https://github.com/neondatabase/neon/issues/11316 Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-04-15 17:27:49 +00:00
Mikhail Kot	6138d61592	Object storage proxy (#11357 ) Service targeted for storing and retrieving LFC prewarm data. Can be used for proxying S3 access for Postgres extensions like pg_mooncake as well. Requests must include a Bearer JWT token. Token is validated using a pemfile (should be passed in infra/). Note: app is not tolerant to extra trailing slashes, see app.rs `delete_prefix` test for comments. Resolves: https://github.com/neondatabase/cloud/issues/26342 Unrelated changes: gate a `rename_noreplace` feature and disable it in `remote_storage` so as `object_storage` can be built with musl	2025-04-08 14:54:53 +00:00
Vlad Lazar	1c0ff3c04d	utils: explicit OTEL export config and OTEL enablement via common entry point (#11139 ) We want to export performance traces from the pageserver in OTEL format. End goal is to see them in Grafana. To this end, there are two changes here: 1. Update the `tracing-utils` crate to allow for explicitly specifying the export configuration. Pageserver configuration is loaded from a file on start-up. This allows us to use the same flow for export configs there. 2. Update the `utils::logging::init` common entry point to set up OTEL tracing infrastructure if requested. Note that an entirely different tracing subscriber is used. This is to avoid interference with the existing tracing set-up. For now, no service uses this functionality. PR to plug this into the pageserver is [here](https://github.com/neondatabase/neon/pull/11140). Related https://github.com/neondatabase/neon/issues/9873	2025-03-12 11:07:49 +00:00
Erik Grinaker	a2902e774a	http-utils: generate heap profiles with jemalloc_pprof (#11075 ) ## Problem The code to generate symbolized pprof heap profiles and flamegraph SVGs has been upstreamed to the `jemalloc_pprof` crate: * https://github.com/polarsignals/rust-jemalloc-pprof/pull/22 * https://github.com/polarsignals/rust-jemalloc-pprof/pull/23 ## Summary of changes Use `jemalloc_pprof` to generate symbolized pprof heap profiles and flamegraph SVGs. This reintroduces a bunch of internal jemalloc stack frames that we'd previously strip, e.g. each stack now always ends with `prof_backtrace_impl` (where jemalloc takes a stack trace for heap profiling), but that seems ok.	2025-03-04 12:13:41 +00:00
Erik Grinaker	9b42d1ce1a	pageserver: periodically log slow ongoing getpage requests (#10906 ) ## Problem We don't have good observability for "stuck" getpage requests. Resolves https://github.com/neondatabase/cloud/issues/23808. ## Summary of changes Log a periodic warning (every 30 seconds) if GetPage request execution is slow to complete, to aid in debugging stuck GetPage requests. This does not cover response flushing (we have separate logging for that), nor reading the request from the socket and batching it (expected to be insignificant and not straightforward to handle with the current protocol). This costs 95 nanoseconds on the happy path when awaiting a `tokio::task::yield_now()`: ``` warn_slow/enabled=false time: [45.716 ns 46.116 ns 46.687 ns] warn_slow/enabled=true time: [141.53 ns 141.83 ns 142.18 ns] ```	2025-02-20 21:38:42 +00:00
Heikki Linnakangas	2d96134a4e	Remove unused dependencies (#10887 ) Per cargo machete.	2025-02-19 14:09:01 +00:00
Christian Schwarz	b992a1a62a	page_service: include socket send & recv queue length in slow flush log mesage (#10823 ) # Summary In - https://github.com/neondatabase/neon/pull/10813 we added slow flush logging but it didn't log the TCP send & recv queue length. This PR adds that data to the log message. I believe the implementation to be safe & correct right now, but it's brittle and thus this PR should be reverted or improved upon once the investigation is over. Refs: - stacked atop https://github.com/neondatabase/neon/pull/10813 - context: https://neondb.slack.com/archives/C08DE6Q9C3B/p1739464533762049?thread_ts=1739462628.361019&cid=C08DE6Q9C3B - improves https://github.com/neondatabase/neon/issues/10668 - part of https://github.com/neondatabase/cloud/issues/23515 # How It Works The trouble is two-fold: 1. getting to the raw socket file descriptor through the many Rust types that wrap it and 2. integrating with the `measure()` function Rust wraps it in types to model file descriptor lifetimes and ownership, and usually one can get access using `as_raw_fd()`. However, we `split()` the stream and the resulting [`tokio::io::WriteHalf`](https://docs.rs/tokio/latest/tokio/io/struct.WriteHalf.html) . Check the PR commit history for my attempts to do it. My solution is to get the socket fd before we wrap it in our protocol types, and to store that fd in the new `PostgresBackend::socket_fd` field. I believe it's safe because the lifetime of `PostgresBackend::socket_fd` value == the lifetime of the `TcpStream` that wrap and store in `PostgresBackend::framed`. Specifically, the only place that close()s the socket is the `impl Drop for TcpStream`. I think the protocol stack calls `TcpStream::shutdown()`, but, that doesn't `close()` the file descriptor underneath. Regarding integration with the `measure()` function, the trouble is that `flush_fut` is currently a generic `Future` type. So, we just pass in the `socket_fd` as a separate argument. A clean implementation would convert the `pgb_writer.flush()` to a named future that provides an accessor for the socket fd while not being polled. I tried (see PR history), but failed to break through the `WriteHalf`. # Testing Tested locally by running ``` ./target/debug/pagebench get-page-latest-lsn --num-clients=1000 --queue-depth=1000 ``` in one terminal, waiting a bit, then ``` pkill -STOP pagebench ``` then wait for slow logs to show up in `pageserver.log`. Pick one of the slow log message's port pairs, e.g., `127.0.0.1:39500`, and then checking sockstat output ``` ss -ntp \| grep '127.0.0.1:39500' ``` to ensure that send & recv queue size match those in the log message.	2025-02-14 16:20:07 +00:00
Heikki Linnakangas	635b67508b	Split utils::http to separate crate (#10753 ) Avoids compiling the crate and its dependencies into binaries that don't need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the release-line-debug-size-lto profile.	2025-02-11 22:06:53 +00:00
Erik Grinaker	2b8ea1e768	utils: add flamegraph for heap profiles (#10223 ) ## Problem Unlike CPU profiles, the `/profile/heap` endpoint can't automatically generate SVG flamegraphs. This requires the user to install and use `pprof` tooling, which is unnecessary and annoying. Resolves #10203. ## Summary of changes Add `format=svg` for the `/profile/heap` route, and generate an SVG flamegraph using the `inferno` crate, similarly to what `pprof-rs` already does for CPU profiles.	2025-01-10 12:14:29 +00:00
Erik Grinaker	a55853f67f	utils: symbolize heap profiles (#10153 ) ## Problem Jemalloc heap profiles aren't symbolized. This is inconvenient, and doesn't work with Grafana Cloud Profiles. Resolves #9964. ## Summary of changes Symbolize the heap profiles in-process, and strip unnecessary cruft. This uses about 100 MB additional memory to cache the DWARF information, but I believe this is already the case with CPU profiles, which use the same library for symbolization. With cached DWARF information, the symbolization CPU overhead is negligible. Example profiles: * [pageserver.pb.gz](https://github.com/user-attachments/files/18141395/pageserver.pb.gz) * [safekeeper.pb.gz](https://github.com/user-attachments/files/18141396/safekeeper.pb.gz)	2024-12-17 16:51:58 +00:00
Erik Grinaker	dcb24ce170	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-03 11:35:59 +00:00
Christian Schwarz	aa4ec11af9	page_service: rewrite batching to work without a timeout (#9851 ) # Problem The timeout-based batching adds latency to unbatchable workloads. We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see [this](https://github.com/neondatabase/neon/pull/9822) abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads. # Solution The insight is that, in the steady state of a batchable workload, the time we spend in `get_vectored` will be hundreds of microseconds anyway. If we prepare the next batch concurrently to `get_vectored`, we will have a sizeable batch ready once `get_vectored` of the current batch is done and do not need an explicit timeout. This can be reasonably described as pipelining of the protocol handler. # Implementation We model the sub-protocol handler for pagestream requests (`handle_pagrequests`) as two futures that form a pipeline: 2. Batching: read requests from the connection and fill the current batch 3. Execution: `take` the current batch, execute it using `get_vectored`, and send the response. The Reading and Batching stage are connected through a new type of channel called `spsc_fold`. See the long comment in the `handle_pagerequests_pipelined` for details. # Changes - Refactor `handle_pagerequests` - separate functions for - reading one protocol message; produces a `BatchedFeMessage` with just one page request in it - batching; tried to merge an incoming `BatchedFeMessage` into an existing `BatchedFeMessage`; returns `None` on success and returns back the incoming message in case merging isn't possible - execution of a batched message - unify the timeline handle acquisition & request span construction; it now happen in the function that reads the protocol message - Implement serial and pipelined model - serial: what we had before any of the batching changes - read one protocol message - execute protocol messages - pipelined: the design described above - optionality for execution of the pipeline: either via concurrent futures vs tokio tasks - Pageserver config - remove batching timeout field - add ability to configure pipelining mode - add ability to limit max batch size for pipelined configurations (required for the rollout, cf https://github.com/neondatabase/cloud/issues/20620 ) - ability to configure execution mode - Tests - remove `batch_timeout` parametrization - rename `test_getpage_merge_smoke` to `test_throughput` - add parametrization to test different max batch sizes and execution moes - rename `test_timer_precision` to `test_latency` - rename the test case file to `test_page_service_batching.py` - better descriptions of what the tests actually do ## On the holding The `TimelineHandle` in the pending batch While batching, we hold the `TimelineHandle` in the pending batch. Therefore, the timeline will not finish shutting down while we're batching. This is not a problem in practice because the concurrently ongoing `get_vectored` call will fail quickly with an error indicating that the timeline is shutting down. This results in the Execution stage returning a `QueryError::Shutdown`, which causes the pipeline / entire page service connection to shut down. This drops all references to the `Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the contained `TimelineHandle`s. - => fixes https://github.com/neondatabase/neon/issues/9850 # Performance Local run of the benchmarks, results in [this empty commit](`1cf5b1463f`) in the PR branch. Key take-aways: * `concurrent-futures` and `tasks` deliver identical `batching_factor` * tail latency impact unknown, cf https://github.com/neondatabase/neon/issues/9837 * `concurrent-futures` has higher throughput than `tasks` in all workloads (=lower `time` metric) * In unbatchable workloads, `concurrent-futures` has 5% higher `CPU-per-throughput` than that of `tasks`, and 15% higher than that of `serial`. * In batchable-32 workload, `concurrent-futures` has 8% lower `CPU-per-throughput` than that of `tasks` (comparison to tput of `serial` is irrelevant) * in unbatchable workloads, mean and tail latencies of `concurrent-futures` is practically identical to `serial`, whereas `tasks` adds 20-30us of overhead Overall, `concurrent-futures` seems like a slightly more attractive choice. # Rollout This change is disabled-by-default. Rollout plan: - https://github.com/neondatabase/cloud/issues/20620 # Refs - epic: https://github.com/neondatabase/neon/issues/9376 - this sub-task: https://github.com/neondatabase/neon/issues/9377 - the abandoned attempt to improve batching timeout resolution: https://github.com/neondatabase/neon/pull/9820 - closes https://github.com/neondatabase/neon/issues/9850 - fixes https://github.com/neondatabase/neon/issues/9835	2024-11-30 00:16:24 +00:00
Vlad Lazar	7a2f0ed8d4	safekeeper: lift decoding and interpretation of WAL to the safekeeper (#9746 ) ## Problem For any given tenant shard, pageservers receive all of the tenant's WAL from the safekeeper. This soft-blocks us from using larger shard counts due to bandwidth concerns and CPU overhead of filtering out the records. ## Summary of changes This PR lifts the decoding and interpretation of WAL from the pageserver into the safekeeper. A customised PG replication protocol is used where instead of sending raw WAL, the safekeeper sends filtered, interpreted records. The receiver drives the protocol selection, so, on the pageserver side, usage of the new protocol is gated by a new pageserver config: `wal_receiver_protocol`. More granularly the changes are: 1. Optionally inject the protocol and shard identity into the arguments used for starting replication 2. On the safekeeper side, implement a new wal sending primitive which decodes and interprets records before sending them over 3. On the pageserver side, implement the ingestion of this new replication message type. It's very similar to what we already have for raw wal (minus decoding and interpreting). ## Notes * This PR currently uses my [branch of rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support) which includes the deserialization logic for the new replication message type. PR for that is open [here](https://github.com/neondatabase/rust-postgres/pull/32). * This PR contains changes for both pageservers and safekeepers. It's safe to merge because the new protocol is disabled by default on the pageserver side. We can gradually start enabling it in subsequent releases. * CI tests are running on https://github.com/neondatabase/neon/pull/9747 ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329	2024-11-25 17:29:28 +00:00
Erik Grinaker	190e8cebac	safekeeper,pageserver: add CPU profiling (#9764 ) ## Problem We don't have a convenient way to gather CPU profiles from a running binary, e.g. during production incidents or end-to-end benchmarks, nor during microbenchmarks (particularly on macOS). We would also like to have continuous profiling in production, likely using [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/). We may choose to use either eBPF profiles or pprof profiles for this (pending testing and discussion with SREs), but pprof profiles appear useful regardless for the reasons listed above. See https://github.com/neondatabase/cloud/issues/14888. This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888. ## Summary of changes Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns it. Defaults to a 5-second pprof Protobuf profile for use with e.g. `pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query parameters: * `format`: output format (`pprof` or `svg`) * `frequency`: sampling frequency in microseconds (default 100) * `seconds`: number of seconds to profile (default 5) Also integrates pprof profiles into Criterion benchmarks, such that flamegraph reports can be taken with `cargo bench ... --profile-duration <seconds>`. Output under `target/criterion//profile/flamegraph.svg`. Example profiles: pprof profile (use [`pprof`](https://github.com/google/pprof)): [profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz) * Web interface: `pprof -http :6060 profile.pb.gz` * Interactive flamegraph: [profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)	2024-11-21 18:59:46 +00:00
Arpad Müller	9d93dd4807	Rename hyper 1.0 to hyper and hyper 0.14 to hyper0 (#9254 ) Follow-up of #9234 to give hyper 1.0 the version-free name, and the legacy version of hyper the one with the version number inside. As we move away from hyper 0.14, we can remove the `hyper0` name piece by piece. Part of #9255	2024-10-03 16:33:43 +02:00
Folke Behrens	7dcfcccf7c	Re-export git-version from utils and remove as direct dep (#9138 )	2024-09-25 14:38:35 +02:00
Heikki Linnakangas	d211f00f05	Remove unnecessary dependencies (#9000 ) Found by "cargo machete"	2024-09-17 17:55:45 +03:00
Arpad Müller	8eaa8ad358	Remove async_trait usages from safekeeper and neon_local (#8864 ) Removes additional async_trait usages from safekeeper and neon_local. Also removes now redundant dependencies of the `async_trait` crate. cc earlier work: #6305, #6464, #7303, #7342, #7212, #8296	2024-08-29 18:24:25 +02:00
Conrad Ludgate	a644f01b6a	proxy+pageserver: shared leaky bucket impl (#8539 ) In proxy I switched to a leaky-bucket impl using the GCRA algorithm. I figured I could share the code with pageserver and remove the leaky_bucket crate dependency with some very basic tokio timers and queues for fairness. The underlying algorithm should be fairly clear how it works from the comments I have left in the code. --- In benchmarking pageserver, @problame found that the new implementation fixes a getpage throughput discontinuity in pageserver under the `pagebench get-page-latest-lsn` benchmark with the clickbench dataset (`test_perf_olap.py`). The discontinuity is that for any of `--num-clients={2,3,4}`, getpage throughput remains 10k. With `--num-clients=5` and greater, getpage throughput then jumps to the configured 20k rate limit. With the changes in this PR, the discontinuity is gone, and we scale throughput linearly to `--num-clients` until the configured rate limit. More context in https://github.com/neondatabase/cloud/issues/16886#issuecomment-2315257641. closes https://github.com/neondatabase/cloud/issues/16886 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-29 11:26:52 +00:00
Conrad Ludgate	428b105dde	remove workspace hack from libs (#8780 ) This removes workspace hack from all libs, not from any binaries. This does not change the behaviour of the hack. Running ``` cargo clean cargo build --release --bin proxy ``` Before this change took 5m16s. After this change took 3m3s. This is because this allows the build to be parallelisable much more.	2024-08-21 14:45:32 +01:00
John Spray	f5db655447	pageserver: simplify LayerAccessStats (#8431 ) ## Problem LayerAccessStats contains a lot of detail that we don't use: short histories of most recent accesses, specifics on what kind of task accessed a layer, etc. This is all stored inside a Mutex, which is locked every time something accesses a layer. ## Summary of changes - Store timestamps at a very low resolution (to the nearest second), sufficient for use on the timescales of eviction. - Pack access time and last residence change time into a single u64 - Use the high bits of the u64 for other flags, including the new layer visibility concept. - Simplify the external-facing model for access stats to just include what we now track. Note that the `HistoryBufferWithDropCounter` is removed here because it is no longer used. I do not dislike this type, we just happen not to use it for anything else at present. Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-07-24 08:17:28 +01:00
Christian Schwarz	7dcdbaa25e	remote_storage config: move handling of empty inline table `{}` to callers (#8193 ) Before this PR, `RemoteStorageConfig::from_toml` would support deserializing an empty `{}` TOML inline table to a `None`, otherwise try `Some()`. We can instead let * in proxy: let clap derive handle the Option * in PS & SK: assume that if the field is specified, it must be a valid RemtoeStorageConfig (This PR started with a much simpler goal of factoring out the `deserialize_item` function because I need that in another PR).	2024-07-02 12:53:08 +02:00
Kevin Mingtarja	a306d0a54b	implement Serialize/Deserialize for SystemTime with RFC3339 format (#7203 ) ## Problem We have two places that use a helper (`ser_rfc3339_millis`) to get serde to stringify SystemTimes into the desired format. ## Summary of changes Created a new module `utils::serde_system_time` and inside it a wrapper type `SystemTime` for `std::time::SystemTime` that serializes/deserializes to the RFC3339 format. This new type is then used in the two places that were previously using the helper for serialization, thereby eliminating the need to decorate structs. Closes #7151.	2024-04-08 15:53:07 +01:00
Tristan Partin	64c6dfd3e4	Move functions for creating/extracting tarballs into utils Useful for other code paths which will handle zstd compression and decompression.	2024-03-19 10:50:41 -05:00
Christian Schwarz	ca07fa5f8b	per-TenantShard read throttling (#6706 )	2024-02-16 21:26:59 +01:00
Arseny Sher	dbd36e40dc	Move failpoint support code to utils. To enable them in safekeeper as well.	2024-01-02 10:50:20 +04:00
Joonas Koivunen	07508fb110	fix: better Json parsing errors (#6135 ) Before any json parsing from the http api only returned errors were per field errors. Now they are done using `serde_path_to_error`, which at least helped greatly with the `disk_usage_eviction_run` used for testing. I don't think this can conflict with anything added in #5310.	2023-12-15 12:18:22 +02:00
Arpad Müller	e310533ed3	Support JWT key reload in pageserver (#5594 ) ## Problem For quickly rotating JWT secrets, we want to be able to reload the JWT public key file in the pageserver, and also support multiple JWT keys. See #4897. ## Summary of changes * Allow directories for the `auth_validation_public_key_path` config param instead of just files. for the safekeepers, all of their config options also support multiple JWT keys. * For the pageservers, make the JWT public keys easily globally swappable by using the `arc-swap` crate. * Add an endpoint to the pageserver, triggered by a POST to `/v1/reload_auth_validation_keys`, that reloads the JWT public keys from the pre-configured path (for security reasons, you cannot upload any keys yourself). Fixes #4897 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-07 15:43:29 +01:00
duguorong009	b3d3a2587d	feat: improve the serde impl for several types(`Lsn`, `TenantId`, `TimelineId` ...) (#5335 ) Improve the serde impl for several types (`Lsn`, `TenantId`, `TimelineId`) by making them sensitive to `Serializer::is_human_readadable` (true for json, false for bincode). Fixes #3511 by: - Implement the custom serde for `Lsn` - Implement the custom serde for `Id` - Add the helper module `serde_as_u64` in `libs/utils/src/lsn.rs` - Remove the unnecessary attr `#[serde_as(as = "DisplayFromStr")]` in all possible structs Additionally some safekeeper types gained serde tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-11-06 11:40:03 +02:00
duguorong009	25a37215f3	fix: replace all `std::PathBuf`s with `camino::Utf8PathBuf` (#5352 ) Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with `camino::Utf8Path`, `camino::Utf8PathBuf` in - pageserver - safekeeper - control_plane - libs/remote_storage Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 17:52:23 +03:00
Arseny Sher	bc49c73fee	Move wal_stream_connection_config to utils. It will be used by safekeeper as well.	2023-08-29 23:19:40 +03:00
John Spray	3e2f0ffb11	libs: make backoff::retry() take a cancellation token (#5065 ) ## Problem Currently, anything that uses backoff::retry will delay the join of its task by however long its backoff sleep is, multiplied by its max retries. Whenever we call a function that sleeps, we should be passing in a CancellationToken. ## Summary of changes - Add a `Cancel` type to backoff::retry that wraps a CancellationToken and an error `Fn` to generate an error if the cancellation token fires. - In call sites that already run in a `task_mgr` task, use `shutdown_token()` to provide the token. In other locations, use a dead `CancellationToken` to satisfy the interface, and leave a TODO to fix it up when we broaden the use of explicit cancellation tokens.	2023-08-24 14:54:46 +03:00
Joonas Koivunen	25d2f4b669	metrics: chunked responses (#4768 ) Metrics can get really large in the order of hundreds of megabytes, which we used to buffer completly (after a few rounds of growing the buffer).	2023-07-21 15:10:55 +00:00
Alexander Bayandin	33c2d94ba6	Fix git-env version for PRs (#4641 ) ## Problem Binaries created from PRs (both in docker images and for tests) have wrong git-env versions, they point to phantom merge commits. ## Summary of changes - Prefer GIT_VERSION env variable even if git information was accessible - Use `${{ github.event.pull_request.head.sha \|\| github.sha }}` instead of `${{ github.sha }}` for `GIT_VERSION` in workflows So the builds will still happen from this phantom commit, but we will report the PR commit. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-07-10 20:01:01 +01:00
Joonas Koivunen	cff7ae0b0d	fix: no more ansi colored logs (#4613 ) Allure does not support ansi colored logs, yet `compute_ctl` has them. Upgrade criterion to get rid of atty dependency, disable ansi colors, remove atty dependency and disable ansi feature of tracing-subscriber. This is a heavy-handed approach. I am not aware of a workflow where you'd want to connect a terminal directly to for example `compute_ctl`, usually you find the logs in a file. If someone had been using colors, they will now need to: - turn the `tracing-subscriber.default-features` to `true` - edit their wanted project to have colors I decided to explicitly disable ansi colors in case we would have in future a dependency accidentally enabling the feature on `tracing-subscriber`, which would be quite surprising but not unimagineable. By getting rid of `atty` from dependencies we get rid of <https://github.com/advisories/GHSA-g98v-hv3f-hcfr>.	2023-07-03 16:37:02 +03:00
Arseny Sher	fdacfaabfd	Move PageserverFeedback to utils. It allows to replace u64 with proper Lsn and pretty print PageserverFeedback with serde(_json). Now walsenders on safekeepers queried with debug_dump look like "walsenders": [ { "ttid": "fafe0cf39a99c608c872706149de9d2a/b4fb3be6f576935e7f0fcb84bdb909a1", "addr": "127.0.0.1:48774", "conn_id": 3, "appname": "pageserver", "feedback": { "Pageserver": { "current_timeline_size": 32096256, "last_received_lsn": "0/2415298", "disk_consistent_lsn": "0/1696628", "remote_consistent_lsn": "0/0", "replytime": "2023-04-12T13:54:53.958856+00:00" } } } ],	2023-04-28 06:22:13 +04:00
Christian Schwarz	9ea7b5dd38	clean up logging around on-demand downloads (#4030 ) - Remove repeated tenant & timeline from span - Demote logging of the path to debug level - Log completion at info level, in the same function where we log errors - distinguish between layer file download success & on-demand download succeeding as a whole in the log message wording - Assert that the span contains a tenant id and a timeline id fixes https://github.com/neondatabase/neon/issues/3945 Before: ``` INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: download complete: /storage/pageserver/data/tenants/$TENANT_ID/timelines/$TIMELINE_ID/000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91 INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates. ``` After: ``` INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: layer file download finished INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates. INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: on-demand download successful ```	2023-04-27 11:54:48 +02:00
Kirill Bulatov	ebea298415	Update most of the dependencies to their latest versions (#4026 ) See https://github.com/neondatabase/neon/pull/3991 Brings the changes back with the right way to use new `toml_edit` to deserialize values and a unit test for this. All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated. 3 sets of crates remain unupdated: * `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13). * `opentelemetry` and `opentelemetry-` crates ``` error[E0308]: mismatched types --> libs/tracing-utils/src/http.rs:65:21 \| 65 \| span.set_parent(parent_ctx); \| ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context` \| \| \| arguments to this method are incorrect \| = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `opentelemetry_api` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8 \| 43 \| fn set_parent(&self, cx: Context); \| ^^^^^^^^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `tracing-utils` due to previous error warning: build failed, waiting for other jobs to finish... error: could not compile `tracing-utils` due to previous error ``` `tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need. similarly, `rustls`, `tokio-rustls`, `rustls-` and `tls-listener` crates have similar issue: ``` error[E0308]: mismatched types --> libs/postgres_backend/tests/simple_select.rs:112:78 \| 112 \| let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg); \| --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig` \| \| \| arguments to this function are incorrect \| = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types note: struct `ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1 \| 125 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1 \| 91 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `rustls` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12 \| 23 \| pub fn new(config: ClientConfig) -> Self { \| ^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `postgres_backend` due to previous error warning: build failed, waiting for other jobs to finish... ``` aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed. Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008	2023-04-14 18:28:54 +03:00
Kirill Bulatov	f7995b3c70	Revert "Update most of the dependencies to their latest versions (#3991 )" (#4013 ) This reverts commit `a64044a7a9`. See https://neondb.slack.com/archives/C03H1K0PGKH/p1681306682795559	2023-04-12 14:51:59 +00:00
Kirill Bulatov	a64044a7a9	Update most of the dependencies to their latest versions (#3991 ) All non-trivial updates extracted into separate commits, also `carho hakari` data and its manifest format were updated. 3 sets of crates remain unupdated: * `base64` — touches proxy in a lot of places and changed its api (by 0.21 version) quite strongly since our version (0.13). * `opentelemetry` and `opentelemetry-` crates ``` error[E0308]: mismatched types --> libs/tracing-utils/src/http.rs:65:21 \| 65 \| span.set_parent(parent_ctx); \| ---------- ^^^^^^^^^^ expected struct `opentelemetry_api::context::Context`, found struct `opentelemetry::Context` \| \| \| arguments to this method are incorrect \| = note: struct `opentelemetry::Context` and struct `opentelemetry_api::context::Context` have similar names, but are actually distinct types note: struct `opentelemetry::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.19.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ note: struct `opentelemetry_api::context::Context` is defined in crate `opentelemetry_api` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/opentelemetry_api-0.18.0/src/context.rs:77:1 \| 77 \| pub struct Context { \| ^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `opentelemetry_api` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-opentelemetry-0.18.0/src/span_ext.rs:43:8 \| 43 \| fn set_parent(&self, cx: Context); \| ^^^^^^^^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `tracing-utils` due to previous error warning: build failed, waiting for other jobs to finish... error: could not compile `tracing-utils` due to previous error ``` `tracing-opentelemetry` of version `0.19` is not yet released, that is supposed to have the update we need. similarly, `rustls`, `tokio-rustls`, `rustls-` and `tls-listener` crates have similar issue: ``` error[E0308]: mismatched types --> libs/postgres_backend/tests/simple_select.rs:112:78 \| 112 \| let mut make_tls_connect = tokio_postgres_rustls::MakeRustlsConnect::new(client_cfg); \| --------------------------------------------- ^^^^^^^^^^ expected struct `rustls::client::client_conn::ClientConfig`, found struct `ClientConfig` \| \| \| arguments to this function are incorrect \| = note: struct `ClientConfig` and struct `rustls::client::client_conn::ClientConfig` have similar names, but are actually distinct types note: struct `ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.21.0/src/client/client_conn.rs:125:1 \| 125 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ note: struct `rustls::client::client_conn::ClientConfig` is defined in crate `rustls` --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/rustls-0.20.8/src/client/client_conn.rs:91:1 \| 91 \| pub struct ClientConfig { \| ^^^^^^^^^^^^^^^^^^^^^^^ = note: perhaps two different versions of crate `rustls` are being used? note: associated function defined here --> /Users/someonetoignore/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-postgres-rustls-0.9.0/src/lib.rs:23:12 \| 23 \| pub fn new(config: ClientConfig) -> Self { \| ^^^ For more information about this error, try `rustc --explain E0308`. error: could not compile `postgres_backend` due to previous error warning: build failed, waiting for other jobs to finish... ``` aws crates: I could not make new API to work with bucket endpoint overload, and console e2e tests failed. Other our tests passed, further investigation is worth to be done in https://github.com/neondatabase/neon/issues/4008	2023-04-12 15:32:38 +03:00
Christian Schwarz	a64dd3ecb5	disk-usage-based layer eviction (#3809 ) This patch adds a pageserver-global background loop that evicts layers in response to a shortage of available bytes in the $repo/tenants directory's filesystem. The loop runs periodically at a configurable `period`. Each loop iteration uses `statvfs` to determine filesystem-level space usage. It compares the returned usage data against two different types of thresholds. The iteration tries to evict layers until app-internal accounting says we should be below the thresholds. We cross-check this internal accounting with the real world by making another `statvfs` at the end of the iteration. We're good if that second statvfs shows that we're _actually_ below the configured thresholds. If we're still above one or more thresholds, we emit a warning log message, leaving it to the operator to investigate further. There are two thresholds: - `max_usage_pct` is the relative available space, expressed in percent of the total filesystem space. If the actual usage is higher, the threshold is exceeded. - `min_avail_bytes` is the absolute available space in bytes. If the actual usage is lower, the threshold is exceeded. The iteration evicts layers in LRU fashion with a reservation of up to `tenant_min_resident_size` bytes of the most recent layers per tenant. The layers not part of the per-tenant reservation are evicted least-recently-used first until we're below all thresholds. The `tenant_min_resident_size` can be overridden per tenant as `min_resident_size_override` (bytes). In addition to the loop, there is also an HTTP endpoint to perform one loop iteration synchronous to the request. The endpoint takes an absolute number of bytes that the iteration needs to evict before pressure is relieved. The tests use this endpoint, which is a great simplification over setting up loopback-mounts in the tests, which would be required to test the statvfs part of the implementation. We will rely on manual testing in staging to test the statvfs parts. The HTTP endpoint is also handy in emergencies where an operator wants the pageserver to evict a given amount of space _now. Hence, it's arguments documented in openapi_spec.yml. The response type isn't documented though because we don't consider it stable. The endpoint should _not_ be used by Console but it could be used by on-call. Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Dmitry Rodionov <dmitry@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2023-03-31 14:47:57 +03:00
Arthur Petukhovsky	d9a1329834	Make postgres_backend use generic IO type (#3789 ) - Support measuring inbound and outbound traffic in MeasuredStream - Start using MeasuredStream in safekeepers code	2023-03-13 12:18:10 +03:00
Arseny Sher	0d8ced8534	Remove sync postgres_backend, tidy up its split usage. - Add support for splitting async postgres_backend into read and write halfes. Safekeeper needs this for bidirectional streams. To this end, encapsulate reading-writing postgres messages to framed.rs with split support without any additional changes (relying on BufRead for reading and BytesMut out buffer for writing). - Use async postgres_backend throughout safekeeper (and in proxy auth link part). - In both safekeeper COPY streams, do read-write from the same thread/task with select! for easier error handling. - Tidy up finishing CopyBoth streams in safekeeper sending and receiving WAL -- join split parts back catching errors from them before returning. Initially I hoped to do that read-write without split at all, through polling IO: https://github.com/neondatabase/neon/pull/3522 However that turned out to be more complicated than I initially expected due to 1) borrow checking and 2) anon Future types. 1) required Rc<Refcell<...>> which is Send construct just to satisfy the checker; 2) can be workaround with transmute. But this is so messy that I decided to leave split.	2023-03-09 20:45:56 +03:00
Arseny Sher	7627d85345	Move async postgres_backend to its own crate. To untie cyclic dependency between sync and async versions of postgres_backend, copy QueryError and some logging/error routines to postgres_backend.rs. This is temporal glue to make commits smaller, sync version will be dropped by the upcoming commit completely.	2023-03-09 20:45:56 +03:00
Shany Pozin	d19c5248c9	Add UUID header to mgmt API (#3708 ) ## Describe your changes ## Issue ticket number and link #3479 ## Checklist before requesting a review - [x] I have performed a self-review of my code. - [ ] If it is a core feature, I have added thorough tests. - [ ] Do we need to implement analytics? if so did you add the relevant metrics to the dashboard? - [ ] If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.	2023-03-01 18:09:08 +02:00
Christian Schwarz	58fa4f0eb7	maintain access stats for historic layers This patch adds basic access statistics for historic layers and exposes them in the management API's `LayerMapInfo`. We record the accesses in the `{Delta,Image}Layer::load()` function because it's the common path of * page_service (`Timline::get_reconstruct_data()`) * Compaction (`PersistentLayer::iter()` and `PersistentLayer::key_iter()`) The stats survive residence status changes, and record these as well. When scraping the layer map endpoint to record its evolution over time, one must account for stat resets because they are in-memory only and will reset on pageserver restart. Use the launch timestamp header added by (#3527) to identify pageserver restarts. This is PR https://github.com/neondatabase/neon/pull/3496	2023-02-06 17:01:38 +01:00
Christian Schwarz	590695e845	improve query param parsing - add parse_query_param() - use Cow<> where possible - move param parsing code to utils::http::request This was originally PR https://github.com/neondatabase/neon/pull/3502 which targeted a different branch. closes #3510	2023-02-01 14:11:12 +01:00
Kirill Bulatov	fd18692dfb	Output coloured pageserver logs for tty stdout	2023-01-24 09:58:08 +02:00
Kirill Bulatov	bce4233d3a	Rework Cargo.toml dependencies (#3322 ) * Use workspace variables from cargo, coming with rustc [1.64](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1640-2022-09-22) See https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-package-table and https://doc.rust-lang.org/nightly/cargo/reference/workspaces.html#the-dependencies-table sections. Now, all dependencies in all non-root `Cargo.toml` files are defined as ``` clap.workspace = true ``` sometimes, when extra features are needed, as ``` bytes = {workspace = true, features = ['serde'] } ``` With the actual declarations (with shared features and version numbers/file paths/etc.) in the root Cargo.toml. Features are additive: https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#inheriting-a-dependency-from-a-workspace * Uses the mechanism above to set common, 2021, edition and license across the workspace * Mechanically bumps a few dependencies * Updates hakari format, as it suggested: ``` work/neon/neon kb/cargo-templated ❯ cargo hakari generate info: no changes detected info: new hakari format version available: 3 (current: 2) (add or update `dep-format-version = "3"` in hakari.toml, then run `cargo hakari generate && cargo hakari manage-deps`) ```	2023-01-13 18:13:34 +02:00

1 2

63 Commits