rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-05 20:42:54 +00:00

Author	SHA1	Message	Date
Arpad Müller	0330b61729	Azure SDK: use neon branch again (#10844 ) Originally I wanted to switch back to the `neon` branch before merging #10825, but I forgot to do it. Do it in a separate PR now. No actual change of the source code, only changes the branch name (so that maybe in a few weeks we can delete the temporary branch `arpad/neon-rebase`).	2025-02-17 14:59:01 +00:00
Arpad Müller	81f08d304a	Rebase Azure SDK and apply newest patch (#10825 ) The [upstream PR](https://github.com/Azure/azure-sdk-for-rust/pull/1997) has been merged with some changes to use threads with async, so apply them to the neon specific fork to be nice to the executor (before, we had the state as of filing of that PR). Also, rebase onto the latest version of upstream's `legacy` branch. current SDK commits: [link](https://github.com/neondatabase/azure-sdk-for-rust/commits/neon-2025-02-14) now: [link](https://github.com/neondatabase/azure-sdk-for-rust/commits/arpad/neon-refresh) Prior update was in #10790	2025-02-17 10:44:44 +00:00
Heikki Linnakangas	635b67508b	Split utils::http to separate crate (#10753 ) Avoids compiling the crate and its dependencies into binaries that don't need them. Shrinks the compute_ctl binary from about 31MB to 28MB in the release-line-debug-size-lto profile.	2025-02-11 22:06:53 +00:00
Folke Behrens	dcf335a251	proxy: Switch proxy to JSON logging (#9857 ) ## Problem We want to switch proxy and ideally all Rust services to structured JSON logging to support better filtering and cross-referencing with tracing. ## Summary of changes * Introduce a custom tracing-subscriber to write the JSON. In a first attempt a customized tracing::fmt::FmtSubscriber was used, but it's very inefficient and can still generate invalid JSON. It's also doesn't allow us to add important fields to the root object. * Make this opt in: the `LOGFMT` env var can be set to `"json"` to enable to new logger at startup.	2025-02-04 14:50:53 +00:00
Folke Behrens	6041a93591	Update tokio base crates (#10556 ) Update `tokio` base crates and their deps. Pin `tokio` to at least 1.41 which stabilized task ID APIs. To dedup `mio` dep the `notify` crate is updated. It's used in `compute_tools`. `9f81828429/compute_tools/src/pg_helpers.rs (L258-L367)`	2025-01-31 09:54:31 +00:00
Conrad Ludgate	738bf83583	chore: replace dashmap with clashmap (#10582 ) ## Problem Because dashmap 6 switched to hashbrown RawTable API, it required us to use unsafe code in the upgrade: https://github.com/neondatabase/neon/pull/8107 ## Summary of changes Switch to clashmap, a fork maintained by me which removes much of the unsafe and ultimately switches to HashTable instead of RawTable to remove much of the unsafe requirement on us.	2025-01-31 09:53:43 +00:00
Tristan Partin	15fecb8474	Update axum to 0.8.1 (#10332 ) Only a few things that needed updating: - async_trait was removed - Message::Text takes a Utf8Bytes object instead of a String Signed-off-by: Tristan Partin <tristan@neon.tech> Co-authored-by: Conrad Ludgate <connor@neon.tech>	2025-01-28 15:32:59 +00:00
Conrad Ludgate	2b49d6ee05	feat: adjust the tonic features to remove axum dependency (#10348 ) To help facilitate an upgrade to axum 0.8 (https://github.com/neondatabase/neon/pull/10332#pullrequestreview-2541989619) this massages the tonic dependency features so that tonic does not depend on axum.	2025-01-22 09:15:52 +00:00
Erik Grinaker	2b8ea1e768	utils: add flamegraph for heap profiles (#10223 ) ## Problem Unlike CPU profiles, the `/profile/heap` endpoint can't automatically generate SVG flamegraphs. This requires the user to install and use `pprof` tooling, which is unnecessary and annoying. Resolves #10203. ## Summary of changes Add `format=svg` for the `/profile/heap` route, and generate an SVG flamegraph using the `inferno` crate, similarly to what `pprof-rs` already does for CPU profiles.	2025-01-10 12:14:29 +00:00
Folke Behrens	b6205af4a5	Update tracing/otel crates (#10311 ) Update the tracing(-x) and opentelemetry(-x) crates. Some breaking changes require updating our code: * Initialization is done via builders now https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-otlp/CHANGELOG.md#0270 * Errors from OTel SDK are logged via tracing crate as well. https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry/CHANGELOG.md#0270	2025-01-10 08:48:03 +00:00
Tristan Partin	49756a0d01	Implement compute_ctl management API in Axum (#10099 ) This is a refactor to create better abstractions related to our management server. It cleans up the code, and prepares everything for authorized communication to and from the control plane. Signed-off-by: Tristan Partin <tristan@neon.tech>	2025-01-09 20:08:26 +00:00
Erik Grinaker	95f1920231	cargo: build with frame pointers (#10226 ) ## Problem Frame pointers are typically disabled by default (depending on CPU architecture), to improve performance. This frees up a CPU register, and avoids a couple of instructions per function call. However, it makes stack unwinding much more inefficient, since it has to use DWARF debug information instead, and gives worse results with e.g. `perf` and eBPF profiles. The `backtrace` implementation of `libunwind` is also suspected to cause seg faults. The performance benefit of frame pointer omission doesn't appear to matter that much on modern 64-bit CPU architectures (which have plenty of registers and optimized instruction execution), and benchmarks did not show measurable overhead. The Rust standard library and jemalloc already enable frame pointers by default. For more information, see https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html. Resolves #10224. Touches #10225. ## Summary of changes Enable frame pointers in all builds, and use frame pointers for pprof-rs stack sampling.	2025-01-06 17:27:08 +00:00
Erik Grinaker	a55853f67f	utils: symbolize heap profiles (#10153 ) ## Problem Jemalloc heap profiles aren't symbolized. This is inconvenient, and doesn't work with Grafana Cloud Profiles. Resolves #9964. ## Summary of changes Symbolize the heap profiles in-process, and strip unnecessary cruft. This uses about 100 MB additional memory to cache the DWARF information, but I believe this is already the case with CPU profiles, which use the same library for symbolization. With cached DWARF information, the symbolization CPU overhead is negligible. Example profiles: * [pageserver.pb.gz](https://github.com/user-attachments/files/18141395/pageserver.pb.gz) * [safekeeper.pb.gz](https://github.com/user-attachments/files/18141396/safekeeper.pb.gz)	2024-12-17 16:51:58 +00:00
Arseny Sher	1ed0e52bc8	Extract safekeeper http client to separate crate. (#10140 ) ## Problem We want to use safekeeper http client in storage controller and neon_local. ## Summary of changes Extract it to separate crate. No functional changes.	2024-12-16 12:07:24 +00:00
Arpad Müller	e74e7aac93	Use updated patched azure SDK crates (#10036 ) For a while already, we've been unable to update the Azure SDK crates due to Azure adopting use of a non-tokio async runtime, see #7545. The effort to upstream the fix got stalled, and I think it's better to switch to a patched version of the SDK that is up to date. Now we have a fork of the SDK under the neondatabase github org, to which I have applied Conrad's rebased patches to: https://github.com/neondatabase/azure-sdk-for-rust/tree/neon . The existence of a fork will also help with shipping bulk delete support before it's upstreamed (#7931). Also, in related news, the Azure SDK has gotten a rift in development, where the main branch pertains to a future, to-be-officially-blessed release of the SDK, and the older versions, which we are currently using, are on the `legacy` branch. Upstream doesn't really want patches for the `legacy` branch any more, they want to focus on the `main` efforts. However, even then, the `legacy` branch is still newer than what we are having right now, so let's switch to `legacy` for now. Depending on how long it takes, we can switch to the official version of the SDK once it's released or switch to the upstream `main` branch if there is changes we want before that. As a nice side effect of this PR, we now use reqwest 0.12 everywhere, dropping the dependency on version 0.11. Fixes #7545	2024-12-09 15:50:06 +00:00
Conrad Ludgate	bd52822e14	feat(proxy): add option to forward startup params (#9979 ) (stacked on #9990 and #9995) Partially fixes #1287 with a custom option field to enable the fixed behaviour. This allows us to gradually roll out the fix without silently changing the observed behaviour for our customers. related to https://github.com/neondatabase/cloud/issues/15284	2024-12-04 12:58:35 +00:00
Erik Grinaker	dcb24ce170	safekeeper,pageserver: add heap profiling (#9778 ) ## Problem We don't have good observability for memory usage. This would be useful e.g. to debug OOM incidents or optimize performance or resource usage. We would also like to use continuous profiling with e.g. [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/) (see https://github.com/neondatabase/cloud/issues/14888). This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches https://github.com/neondatabase/neon/issues/9534. Touches https://github.com/neondatabase/cloud/issues/14888. Depends on #9779. Depends on #9780. ## Summary of changes Adds a HTTP route `/profile/heap` that takes a heap profile and returns it. Query parameters: * `format`: output format (`jemalloc` or `pprof`; default `pprof`). Unlike CPU profiles (see #9764), heap profiles are not symbolized and require the original binary to translate addresses to function names. To make this work with Grafana, we'll probably have to symbolize the process server-side -- this is left as future work, as is other output formats like SVG. Heap profiles don't work on macOS due to limitations in jemalloc.	2024-12-03 11:35:59 +00:00
Folke Behrens	243bca1c49	Bump OTel, tracing, reqwest crates (#9970 )	2024-12-02 17:24:48 +00:00
Christian Schwarz	aa4ec11af9	page_service: rewrite batching to work without a timeout (#9851 ) # Problem The timeout-based batching adds latency to unbatchable workloads. We can choose a short batching timeout (e.g. 10us) but that requires high-resolution timers, which tokio doesn't have. I thoroughly explored options to use OS timers (see [this](https://github.com/neondatabase/neon/pull/9822) abandoned PR). In short, it's not an attractive option because any timer implementation adds non-trivial overheads. # Solution The insight is that, in the steady state of a batchable workload, the time we spend in `get_vectored` will be hundreds of microseconds anyway. If we prepare the next batch concurrently to `get_vectored`, we will have a sizeable batch ready once `get_vectored` of the current batch is done and do not need an explicit timeout. This can be reasonably described as pipelining of the protocol handler. # Implementation We model the sub-protocol handler for pagestream requests (`handle_pagrequests`) as two futures that form a pipeline: 2. Batching: read requests from the connection and fill the current batch 3. Execution: `take` the current batch, execute it using `get_vectored`, and send the response. The Reading and Batching stage are connected through a new type of channel called `spsc_fold`. See the long comment in the `handle_pagerequests_pipelined` for details. # Changes - Refactor `handle_pagerequests` - separate functions for - reading one protocol message; produces a `BatchedFeMessage` with just one page request in it - batching; tried to merge an incoming `BatchedFeMessage` into an existing `BatchedFeMessage`; returns `None` on success and returns back the incoming message in case merging isn't possible - execution of a batched message - unify the timeline handle acquisition & request span construction; it now happen in the function that reads the protocol message - Implement serial and pipelined model - serial: what we had before any of the batching changes - read one protocol message - execute protocol messages - pipelined: the design described above - optionality for execution of the pipeline: either via concurrent futures vs tokio tasks - Pageserver config - remove batching timeout field - add ability to configure pipelining mode - add ability to limit max batch size for pipelined configurations (required for the rollout, cf https://github.com/neondatabase/cloud/issues/20620 ) - ability to configure execution mode - Tests - remove `batch_timeout` parametrization - rename `test_getpage_merge_smoke` to `test_throughput` - add parametrization to test different max batch sizes and execution moes - rename `test_timer_precision` to `test_latency` - rename the test case file to `test_page_service_batching.py` - better descriptions of what the tests actually do ## On the holding The `TimelineHandle` in the pending batch While batching, we hold the `TimelineHandle` in the pending batch. Therefore, the timeline will not finish shutting down while we're batching. This is not a problem in practice because the concurrently ongoing `get_vectored` call will fail quickly with an error indicating that the timeline is shutting down. This results in the Execution stage returning a `QueryError::Shutdown`, which causes the pipeline / entire page service connection to shut down. This drops all references to the `Arc<Mutex<Option<Box<BatchedFeMessage>>>>` object, thereby dropping the contained `TimelineHandle`s. - => fixes https://github.com/neondatabase/neon/issues/9850 # Performance Local run of the benchmarks, results in [this empty commit](`1cf5b1463f`) in the PR branch. Key take-aways: * `concurrent-futures` and `tasks` deliver identical `batching_factor` * tail latency impact unknown, cf https://github.com/neondatabase/neon/issues/9837 * `concurrent-futures` has higher throughput than `tasks` in all workloads (=lower `time` metric) * In unbatchable workloads, `concurrent-futures` has 5% higher `CPU-per-throughput` than that of `tasks`, and 15% higher than that of `serial`. * In batchable-32 workload, `concurrent-futures` has 8% lower `CPU-per-throughput` than that of `tasks` (comparison to tput of `serial` is irrelevant) * in unbatchable workloads, mean and tail latencies of `concurrent-futures` is practically identical to `serial`, whereas `tasks` adds 20-30us of overhead Overall, `concurrent-futures` seems like a slightly more attractive choice. # Rollout This change is disabled-by-default. Rollout plan: - https://github.com/neondatabase/cloud/issues/20620 # Refs - epic: https://github.com/neondatabase/neon/issues/9376 - this sub-task: https://github.com/neondatabase/neon/issues/9377 - the abandoned attempt to improve batching timeout resolution: https://github.com/neondatabase/neon/pull/9820 - closes https://github.com/neondatabase/neon/issues/9850 - fixes https://github.com/neondatabase/neon/issues/9835	2024-11-30 00:16:24 +00:00
Conrad Ludgate	1d642d6a57	chore(proxy): vendor a subset of rust-postgres (#9930 ) Our rust-postgres fork is getting messy. Mostly because proxy wants more control over the raw protocol than tokio-postgres provides. As such, it's diverging more and more. Storage and compute also make use of rust-postgres, but in more normal usage, thus they don't need our crazy changes. Idea: * proxy maintains their subset * other teams use a minimal patch set against upstream rust-postgres Reviewing this code will be difficult. To implement it, I 1. Copied tokio-postgres, postgres-protocol and postgres-types from `00940fcdb5` 2. Updated their package names with the `2` suffix to make them compile in the workspace. 3. Updated proxy to use those packages 4. Copied in the code from tokio-postgres-rustls 0.13 (with some patches applied https://github.com/jbg/tokio-postgres-rustls/pull/32 https://github.com/jbg/tokio-postgres-rustls/pull/33) 5. Removed as much dead code as I could find in the vendored libraries 6. Updated the tokio-postgres-rustls code to use our existing channel binding implementation	2024-11-29 11:08:01 +00:00
Christian Schwarz	450be26bbb	fast imports: initial Importer and Storage changes (#9218 ) Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: Stas Kelvic <stas@neon.tech> # Context This PR contains PoC-level changes for a product feature that allows onboarding large databases into Neon without going through the regular data path. # Changes This internal RFC provides all the context * https://github.com/neondatabase/cloud/pull/19799 In the language of the RFC, this PR covers * the Importer code (`fast_import`) * all the Pageserver changes (mgmt API changes, flow implementation, etc) * a basic test for the Pageserver changes # Reviewing As acknowledged in the RFC, the code added in this PR is not ready for general availability. Also, the architecture is not to be discussed in this PR, but in the RFC and associated Slack channel instead. Reviewers of this PR should take that into consideration. The quality bar to apply during review depends on what area of the code is being reviewed: * Importer code (`fast_import`): practically anything goes * Core flow (`flow.rs`): * Malicious input data must be expected and the existing threat models apply. * The code must not be safe to execute on dedicated Pageserver instances: * This means in particular that tenants on other Pageserver instances must not be affected negatively wrt data confidentiality, integrity or availability. * Other code: the usual quality bar * Pay special attention to correct use of gate guards, timeline cancellation in all places during shutdown & migration, etc. * Consider the broader system impact; if you find potentially problematic interactions with Storage features that were not covered in the RFC, bring that up during the review. I recommend submitting three separate reviews, for the three high-level areas with different quality bars. # References (Internal-only) * refs https://github.com/neondatabase/cloud/issues/17507 * refs https://github.com/neondatabase/company_projects/issues/293 * refs https://github.com/neondatabase/company_projects/issues/309 * refs https://github.com/neondatabase/cloud/issues/20646 --------- Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com> Co-authored-by: Heikki Linnakangas <heikki@neon.tech> Co-authored-by: John Spray <john@neon.tech>	2024-11-22 22:47:06 +00:00
Ivan Efremov	3b1ac8b14a	proxy: Implement cancellation rate limiting (#9739 ) Implement cancellation rate limiting and ip allowlist checks. Add ip_allowlist to the cancel closure Fixes [#16456](https://github.com/neondatabase/cloud/issues/16456)	2024-11-22 16:46:38 +00:00
Erik Grinaker	190e8cebac	safekeeper,pageserver: add CPU profiling (#9764 ) ## Problem We don't have a convenient way to gather CPU profiles from a running binary, e.g. during production incidents or end-to-end benchmarks, nor during microbenchmarks (particularly on macOS). We would also like to have continuous profiling in production, likely using [Grafana Cloud Profiles](https://grafana.com/products/cloud/profiles-for-continuous-profiling/). We may choose to use either eBPF profiles or pprof profiles for this (pending testing and discussion with SREs), but pprof profiles appear useful regardless for the reasons listed above. See https://github.com/neondatabase/cloud/issues/14888. This PR is intended as a proof of concept, to try it out in staging and drive further discussions about profiling more broadly. Touches #9534. Touches https://github.com/neondatabase/cloud/issues/14888. ## Summary of changes Adds a HTTP route `/profile/cpu` that takes a CPU profile and returns it. Defaults to a 5-second pprof Protobuf profile for use with e.g. `pprof` or Grafana Alloy, but can also emit an SVG flamegraph. Query parameters: * `format`: output format (`pprof` or `svg`) * `frequency`: sampling frequency in microseconds (default 100) * `seconds`: number of seconds to profile (default 5) Also integrates pprof profiles into Criterion benchmarks, such that flamegraph reports can be taken with `cargo bench ... --profile-duration <seconds>`. Output under `target/criterion//profile/flamegraph.svg`. Example profiles: pprof profile (use [`pprof`](https://github.com/google/pprof)): [profile.pb.gz](https://github.com/user-attachments/files/17756788/profile.pb.gz) * Web interface: `pprof -http :6060 profile.pb.gz` * Interactive flamegraph: [profile.svg.gz](https://github.com/user-attachments/files/17756782/profile.svg.gz)	2024-11-21 18:59:46 +00:00
Erik Grinaker	8880134171	Cargo.toml: upgrade tikv-jemallocator to 0.6.0 (#9779 )	2024-11-17 19:52:05 +01:00
Vlad Lazar	21282aa113	cargo: use neon branch of rust-postgres (#9757 ) ## Problem We are pining our fork of rust-postgres to a commit hash and that prevents us from making further changes to it. The latest commit in rust-postgres requires https://github.com/neondatabase/neon/pull/8747, but that seems to have gone stale. I reverted rust-postgres `neon` branch to the pinned commit in https://github.com/neondatabase/rust-postgres/pull/31. ## Summary of changes Switch back to using the `neon` branch of the rust-postgres fork.	2024-11-14 15:16:43 +00:00
Conrad Ludgate	fcde40d600	[proxy] use the proxy protocol v2 command to silence some logs (#9620 ) The PROXY Protocol V2 offers a "command" concept. It can be of two different values. "Local" and "Proxy". The spec suggests that "Local" be used for health-checks. We can thus use this to silence logging for such health checks such as those from NLB. This additionally refactors the flow to be a bit more type-safe, self documenting and using zerocopy deser.	2024-11-05 17:23:00 +00:00
Conrad Ludgate	3dcdbcc34d	remove aws-lc-rs dep and fix storage_broker tls (#9613 ) It seems the ecosystem is not so keen on moving to aws-lc-rs as it's build setup is more complicated than ring (requiring cmake). Eventually I expect the ecosystem should pivot to https://github.com/ctz/graviola/tree/main/rustls-graviola as it stabilises (it has a very simply build step and license), but for now let's try not have a headache of juggling two crypto libs. I also noticed that tonic will just fail with tls without a default provider, so I added some defensive code for that.	2024-11-04 13:29:13 +00:00
Vlad Lazar	07b974480c	pageserver: move things around to prepare for decoding logic (#9504 ) ## Problem We wish to have high level WAL decoding logic in `wal_decoder::decoder` module. ## Summary of Changes For this we need the `Value` and `NeonWalRecord` types accessible there, so: 1. Move `Value` and `NeonWalRecord` to `pageserver::value` and `pageserver::record` respectively. 2. Get rid of `pageserver::repository` (follow up from (1)) 3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In theory they could live in `wal_decoder`, but it would create a circular dependency between `wal_decoder` and `postgres_ffi`. Long term it makes sense for those types to be PG version specific, so that will work out nicely. 4. Move higher level WAL record types (to be ingested by pageserver) into `wal_decoder::models` Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-29 10:00:34 +00:00
Conrad Ludgate	d762ad0883	update rustls (#9396 ) The forever ongoing effort of juggling multiple versions of rustls :3 now with new crypto library aws-lc. Because of dependencies, it is currently impossible to not have both ring and aws-lc in the dep tree, therefore our only options are not updating rustls or having both crypto backends enabled... According to benchmarks run by the rustls maintainer, aws-lc is faster than ring in some cases too <https://jbp.io/graviola/>, so it's not without its upsides,	2024-10-17 20:45:37 +01:00
Conrad Ludgate	d92d36a315	[local_proxy] update api for pg_session_jwt (#9359 ) pg_session_jwt now: 1. Sets the JWK in a PGU_BACKEND session guc, no longer in the init() function. 2. JWK no longer needs the kid.	2024-10-15 12:13:57 +00:00
Arpad Müller	912d47ec02	storage_broker: update hyper and tonic again (#9299 ) Update hyper and tonic again in the storage broker, this time with a fix for the issue that made us revert the update last time. The first commit is a revert of #9268, the second a fix for the issue. fixes #9231.	2024-10-07 21:12:13 +02:00
Erik Grinaker	60fb840e1f	Cargo.toml: enable `sso` for `aws-config` (#9261 ) ## Problem The S3 tests couldn't use SSO authentication for local tests against S3. ## Summary of changes Enable the `sso` feature of `aws-config`. Also run `cargo hakari generate` which made some updates to `workspace_hack`.	2024-10-04 11:27:06 +01:00
Arpad Müller	e3d6ecaeee	Revert hyper and tonic updates (#9268 )	2024-10-03 19:21:22 +01:00
Arpad Müller	9d93dd4807	Rename hyper 1.0 to hyper and hyper 0.14 to hyper0 (#9254 ) Follow-up of #9234 to give hyper 1.0 the version-free name, and the legacy version of hyper the one with the version number inside. As we move away from hyper 0.14, we can remove the `hyper0` name piece by piece. Part of #9255	2024-10-03 16:33:43 +02:00
Heikki Linnakangas	53b6e1a01c	vm-monitor: Upgrade axum from 0.6 to 0.7 (#9257 ) Because: - it's nice to be up-to-date, - we already had axum 0.7 in our dependency tree, so this avoids having to compile two versions, and - removes one of the remaining dpendencies to hyper version 0 Also bumps the 'tokio-tungstenite' dependency, to avoid having two versions in the dependency tree.	2024-10-03 16:49:39 +03:00
Arpad Müller	1b176fe74a	Use hyper 1.0 and tonic 0.12 in storage broker (#9234 ) Fixes #9231 . Upgrade hyper to 1.4.0 and use hyper 1.4 instead of 0.14 in the storage broker, together with tonic 0.12. The two upgrades go hand in hand. Thanks to the broker being independent from other components, we can upgrade its hyper version without touching the other components, which makes things easier.	2024-10-03 00:48:12 +02:00
Arpad Müller	387e569259	Update aws SDK crates (#9233 ) This updates the aws SDK crates to their newest released versions.	2024-10-02 08:00:08 +02:00
Folke Behrens	2e508b1ff9	Upgrade OpenTelemetry and other tracing crates (#9200 ) * tracing-utils now returns a `Layer` impl. Removes the need for crates to import OTel crates. * Drop the /v1/traces URI check. Verified that the code does the right thing. * Leave a TODO to hook in an error handler for OTel to log errors to when it assumes the regular pipeline cannot be used/is broken.	2024-10-01 11:02:54 +02:00
Heikki Linnakangas	e16e82749f	Remove unused crates from workspace Cargo.toml These were not referenced in any of the other Cargo.toml files in the workspace. They were not being built because of that, so there was little harm in having them listed, but let's be tidy.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	913af44219	Update "memoffset" crate To eliminate one version of it from our dependency tree.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	ecd615ab6d	Update "hostname" crate We were already building v0.4.0 as an indirect dependency, so this avoids having to build two different versions of it.	2024-09-23 00:37:41 +03:00
Heikki Linnakangas	982b376ea2	Update parquet crate to a released version (#8961 ) PR #7782 set the dependency in Cargo.toml to 'master', and locked the version to commit that contained a specific fix, because we needed the fix before it was included in a versioned release. The fix was later included in parquet crate version 52.0.0, so we can now switch back to using a released version. The latest release is 53.0.0, switch straight to that. --------- Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>	2024-09-10 00:04:00 +03:00
Heikki Linnakangas	2d885ac07a	Update strum (#8962 ) I wanted to use some features from the newer version. The PR that needed the new version is not ready yet (and might never be), but seems nice to stay up in any case.	2024-09-08 21:47:57 +03:00
Heikki Linnakangas	89c5e80b3f	Update toml and toml_edit crates (#8963 ) Eliminates a few duplicate versions from the dependency tree.	2024-09-08 21:47:23 +03:00
Heikki Linnakangas	93ec7503e0	Lock the correct revision of rust-postgres crates (#8960 ) We modified the crate in an incompatible way and upgraded to the new version in PR #8076. However, it was reverted in #8654. The revert reverted the Cargo.lock reference to it, but since Cargo.toml still points to the (tip of the) 'neon' branch, every time you make any other unrelated changes to Cargo.toml, it also tries to update the rust-postgres crates to the tip of the 'neon' branch again, which doesn't work. To fix, lock the crates to the exact commit SHA that works.	2024-09-07 14:11:36 +01:00
Arpad Müller	a1323231bc	Update Rust to 1.81.0 (#8939 ) We keep the practice of keeping the compiler up to date, pointing to the latest release. This is done by many other projects in the Rust ecosystem as well. [Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1810-2024-09-05). Prior update was in #8667 and #8518	2024-09-06 12:40:19 +02:00
Christian Schwarz	cf11c8ab6a	update svg_fmt to 0.4.3 (#8930 ) Audited ``` diff -r -u ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/svg_fmt-0.4.{2,3} ``` fixes https://github.com/neondatabase/neon/issues/7763	2024-09-06 10:52:29 +02:00
Christian Schwarz	850421ec06	refactor(pageserver): rely on serde derive for toml deserialization (#7656 ) This PR simplifies the pageserver configuration parsing as follows: * introduce the `pageserver_api::config::ConfigToml` type * implement `Default` for `ConfigToml` * use serde derive to do the brain-dead leg-work of processing the toml document * use `serde(default)` to fill in default values * in `pageserver` crate: * use `toml_edit` to deserialize the pageserver.toml string into a `ConfigToml` * `PageServerConfig::parse_and_validate` then * consumes the `ConfigToml` * destructures it exhaustively into its constituent fields * constructs the `PageServerConfig` The rules are: * in `ConfigToml`, use `deny_unknown_fields` everywhere * static default values go in `pageserver_api` * if there cannot be a static default value (e.g. which default IO engine to use, because it depends on the runtime), make the field in `ConfigToml` an `Option` * if runtime-augmentation of a value is needed, do that in `parse_and_validate` * a good example is `virtual_file_io_engine` or `l0_flush`, both of which need to execute code to determine the effective value in `PageServerConf` The benefits: * massive amount of brain-dead repetitive code can be deleted * "unused variable" compile-time errors when removing a config value, due to the exhaustive destructuring in `parse_and_validate` * compile-time errors guide you when adding a new config field Drawbacks: * serde derive is sometimes a bit too magical * `deny_unknown_fields` is easy to miss Future Work / Benefits: * make `neon_local` use `pageserver_api` to construct `ConfigToml` and write it to `pageserver.toml` * This provides more type safety / coompile-time errors than the current approach. ### Refs Fixes #3682 ### Future Work * `remote_storage` deser doesn't reject unknown fields https://github.com/neondatabase/neon/issues/8915 * clean up `libs/pageserver_api/src/config.rs` further * break up into multiple files, at least for tenant config * move `models` as appropriate / refine distinction between config and API models / be explicit about when it's the same * use `pub(crate)` visibility on `mod defaults` to detect stale values	2024-09-05 14:59:49 +02:00
Conrad Ludgate	a644f01b6a	proxy+pageserver: shared leaky bucket impl (#8539 ) In proxy I switched to a leaky-bucket impl using the GCRA algorithm. I figured I could share the code with pageserver and remove the leaky_bucket crate dependency with some very basic tokio timers and queues for fairness. The underlying algorithm should be fairly clear how it works from the comments I have left in the code. --- In benchmarking pageserver, @problame found that the new implementation fixes a getpage throughput discontinuity in pageserver under the `pagebench get-page-latest-lsn` benchmark with the clickbench dataset (`test_perf_olap.py`). The discontinuity is that for any of `--num-clients={2,3,4}`, getpage throughput remains 10k. With `--num-clients=5` and greater, getpage throughput then jumps to the configured 20k rate limit. With the changes in this PR, the discontinuity is gone, and we scale throughput linearly to `--num-clients` until the configured rate limit. More context in https://github.com/neondatabase/cloud/issues/16886#issuecomment-2315257641. closes https://github.com/neondatabase/cloud/issues/16886 --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech>	2024-08-29 11:26:52 +00:00
Christian Schwarz	9627747d35	bypass `PageCache` for `InMemoryLayer` + avoid `Value::deser` on L0 flush (#8537 ) Part of [Epic: Bypass PageCache for user data blocks](https://github.com/neondatabase/neon/issues/7386). # Problem `InMemoryLayer` still uses the `PageCache` for all data stored in the `VirtualFile` that underlies the `EphemeralFile`. # Background Before this PR, `EphemeralFile` is a fancy and (code-bloated) buffered writer around a `VirtualFile` that supports `blob_io`. The `InMemoryLayerInner::index` stores offsets into the `EphemeralFile`. At those offset, we find a varint length followed by the serialized `Value`. Vectored reads (`get_values_reconstruct_data`) are not in fact vectored - each `Value` that needs to be read is read sequentially. The `will_init` bit of information which we use to early-exit the `get_values_reconstruct_data` for a given key is stored in the serialized `Value`, meaning we have to read & deserialize the `Value` from the `EphemeralFile`. The L0 flushing also needs to re-determine the `will_init` bit of information, by deserializing each value during L0 flush. # Changes 1. Store the value length and `will_init` information in the `InMemoryLayer::index`. The `EphemeralFile` thus only needs to store the values. 2. For `get_values_reconstruct_data`: - Use the in-memory `index` figures out which values need to be read. Having the `will_init` stored in the index enables us to do that. - View the EphemeralFile as a byte array of "DIO chunks", each 512 bytes in size (adjustable constant). A "DIO chunk" is the minimal unit that we can read under direct IO. - Figure out which chunks need to be read to retrieve the serialized bytes for thes values we need to read. - Coalesce chunk reads such that each DIO chunk is only read once to serve all value reads that need data from that chunk. - Merge adjacent chunk reads into larger `EphemeralFile::read_exact_at_eof_ok` of up to 128k (adjustable constant). 3. The new `EphemeralFile::read_exact_at_eof_ok` fills the IO buffer from the underlying VirtualFile and/or its in-memory buffer. 4. The L0 flush code is changed to use the `index` directly, `blob_io` 5. We can remove the `ephemeral_file::page_caching` construct now. The `get_values_reconstruct_data` changes seem like a bit overkill but they are necessary so we issue the equivalent amount of read system calls compared to before this PR where it was highly likely that even if the first PageCache access was a miss, remaining reads within the same `get_values_reconstruct_data` call from the same `EphemeralFile` page were a hit. The "DIO chunk" stuff is truly unnecessary for page cache bypass, but, since we're working on [direct IO](https://github.com/neondatabase/neon/issues/8130) and https://github.com/neondatabase/neon/issues/8719 specifically, we need to do _something_ like this anyways in the near future. # Alternative Design The original plan was to use the `vectored_blob_io` code it relies on the invariant of Delta&Image layers that `index order == values order`. Further, `vectored_blob_io` code's strategy for merging IOs is limited to adjacent reads. However, with direct IO, there is another level of merging that should be done, specifically, if multiple reads map to the same "DIO chunk" (=alignment-requirement-sized and -aligned region of the file), then it's "free" to read the chunk into an IO buffer and serve the two reads from that buffer. => https://github.com/neondatabase/neon/issues/8719 # Testing / Performance Correctness of the IO merging code is ensured by unit tests. Additionally, minimal tests are added for the `EphemeralFile` implementation and the bit-packed `InMemoryLayerIndexValue`. Performance testing results are presented below. All pref testing done on my M2 MacBook Pro, running a Linux VM. It's a release build without `--features testing`. We see definitive improvement in ingest performance microbenchmark and an ad-hoc microbenchmark for getpage against InMemoryLayer. ``` baseline: commit `7c74112b2a` origin/main HEAD: `ef1c55c52e` ``` <details> ``` cargo bench --bench bench_ingest -- 'ingest 128MB/100b seq, no delta' baseline ingest-small-values/ingest 128MB/100b seq, no delta time: [483.50 ms 498.73 ms 522.53 ms] thrpt: [244.96 MiB/s 256.65 MiB/s 264.73 MiB/s] HEAD ingest-small-values/ingest 128MB/100b seq, no delta time: [479.22 ms 482.92 ms 487.35 ms] thrpt: [262.64 MiB/s 265.06 MiB/s 267.10 MiB/s] ``` </details> We don't have a micro-benchmark for InMemoryLayer and it's quite cumbersome to add one. So, I did manual testing in `neon_local`. <details> ``` ./target/release/neon_local stop rm -rf .neon ./target/release/neon_local init ./target/release/neon_local start ./target/release/neon_local tenant create --set-default ./target/release/neon_local endpoint create foo ./target/release/neon_local endpoint start foo psql 'postgresql://cloud_admin@127.0.0.1:55432/postgres' psql (13.16 (Debian 13.16-0+deb11u1), server 15.7) CREATE TABLE wal_test ( id SERIAL PRIMARY KEY, data TEXT ); DO $$ DECLARE i INTEGER := 1; BEGIN WHILE i <= 500000 LOOP INSERT INTO wal_test (data) VALUES ('data'); i := i + 1; END LOOP; END $$; -- => result is one L0 from initdb and one 137M-sized ephemeral-2 DO $$ DECLARE i INTEGER := 1; random_id INTEGER; random_record wal_test%ROWTYPE; start_time TIMESTAMP := clock_timestamp(); selects_completed INTEGER := 0; min_id INTEGER := 1; -- Minimum ID value max_id INTEGER := 100000; -- Maximum ID value, based on your insert range iters INTEGER := 100000000; -- Number of iterations to run BEGIN WHILE i <= iters LOOP -- Generate a random ID within the known range random_id := min_id + floor(random() * (max_id - min_id + 1))::int; -- Select the row with the generated random ID SELECT * INTO random_record FROM wal_test WHERE id = random_id; -- Increment the select counter selects_completed := selects_completed + 1; -- Check if a second has passed IF EXTRACT(EPOCH FROM clock_timestamp() - start_time) >= 1 THEN -- Print the number of selects completed in the last second RAISE NOTICE 'Selects completed in last second: %', selects_completed; -- Reset counters for the next second selects_completed := 0; start_time := clock_timestamp(); END IF; -- Increment the loop counter i := i + 1; END LOOP; END $$; ./target/release/neon_local stop baseline: commit `7c74112b2a` origin/main NOTICE: Selects completed in last second: 1864 NOTICE: Selects completed in last second: 1850 NOTICE: Selects completed in last second: 1851 NOTICE: Selects completed in last second: 1918 NOTICE: Selects completed in last second: 1911 NOTICE: Selects completed in last second: 1879 NOTICE: Selects completed in last second: 1858 NOTICE: Selects completed in last second: 1827 NOTICE: Selects completed in last second: 1933 ours NOTICE: Selects completed in last second: 1915 NOTICE: Selects completed in last second: 1928 NOTICE: Selects completed in last second: 1913 NOTICE: Selects completed in last second: 1932 NOTICE: Selects completed in last second: 1846 NOTICE: Selects completed in last second: 1955 NOTICE: Selects completed in last second: 1991 NOTICE: Selects completed in last second: 1973 ``` NB: the ephemeral file sizes differ by ca 1MiB, ours being 1MiB smaller. </details> # Rollout This PR changes the code in-place and is not gated by a feature flag.	2024-08-28 18:31:41 +00:00

1 2 3 4 5

218 Commits