rust/neon - neon - Gitea: Git with a cup of tea

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-15 09:22:55 +00:00

Author	SHA1	Message	Date
Alex Chi Z.	81f9aba005	fix(pagectl): layer parsing and image layer dump (#9571 ) This patch contains various improvements for the pagectl tool. ## Summary of changes * Rewrite layer name parsing: LayerName now supports all variants we use now. * Drop pagectl's own layer parsing function, use LayerName in the pageserver crate. * Support image layer dumping in the layer dump command using ImageLayer::dump, drop the original implementation. Signed-off-by: Alex Chi Z <chi@neon.tech>	2024-10-29 15:16:23 -04:00
Vlad Lazar	07b974480c	pageserver: move things around to prepare for decoding logic (#9504 ) ## Problem We wish to have high level WAL decoding logic in `wal_decoder::decoder` module. ## Summary of Changes For this we need the `Value` and `NeonWalRecord` types accessible there, so: 1. Move `Value` and `NeonWalRecord` to `pageserver::value` and `pageserver::record` respectively. 2. Get rid of `pageserver::repository` (follow up from (1)) 3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In theory they could live in `wal_decoder`, but it would create a circular dependency between `wal_decoder` and `postgres_ffi`. Long term it makes sense for those types to be PG version specific, so that will work out nicely. 4. Move higher level WAL record types (to be ingested by pageserver) into `wal_decoder::models` Related: https://github.com/neondatabase/neon/issues/9335 Epic: https://github.com/neondatabase/neon/issues/9329	2024-10-29 10:00:34 +00:00
Yuchen Liang	49d5e56c08	pageserver: use direct IO for delta and image layer reads (#9326 ) Part of #8130 ## Problem Pageserver previously goes through the kernel page cache for all the IOs. The kernel page cache makes light-loaded pageserver have deceptive fast performance. Using direct IO would offer predictable latencies of our virtual file IO operations. In particular for reads, the data pages also have an extremely low temporal locality because the most frequently accessed pages are cached on the compute side. ## Summary of changes This PR enables pageserver to use direct IO for delta layer and image layer reads. We can ship them separately because these layers are write-once, read-many, so we will not be mixing buffered IO with direct IO. - implement `IoBufferMut`, an buffer type with aligned allocation (currently set to 512). - use `IoBufferMut` at all places we are doing reads on image + delta layers. - leverage Rust type system and use `IoBufAlignedMut` marker trait to guarantee that the input buffers for the IO operations are aligned. - page cache allocation is also made aligned. _* in-memory layer reads and the write path will be shipped separately._ ## Testing Integration test suite run with O_DIRECT enabled: https://github.com/neondatabase/neon/pull/9350 ## Performance We evaluated performance based on the `get-page-at-latest-lsn` benchmark. The results demonstrate a decrease in the number of IOps, no sigificant change in the latency mean, and an slight improvement on the p99.9 and p99.99 latencies. [Benchmark](https://www.notion.so/neondatabase/Benchmark-O_DIRECT-for-image-and-delta-layers-2024-10-01-112f189e00478092a195ea5a0137e706?pvs=4) ## Rollout We will add `virtual_file_io_mode=direct` region by region to enable direct IO on image + delta layers. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-21 11:01:25 -04:00
Yuchen Liang	bee04b8a69	pageserver: add direct io config to virtual file (#9214 ) ## Problem We need a way to incrementally switch to direct IO. During the rollout we might want to switch to O_DIRECT on image and delta layer read path first before others. ## Summary of changes - Revisited and simplified direct io config in `PageserverConf`. - We could add a fallback mode for open, but for read there isn't a reasonable alternative (without creating another buffered virtual file). - Added a wrapper around `VirtualFile`, current implementation become `VirtualFileInner` - Use `open_v2`, `create_v2`, `open_with_options_v2` when we want to use the IO mode specified in PS config. - Once we onboard all IO through VirtualFile using this new API, we will delete the old code path. - Make io mode live configurable for benchmarking. - Only guaranteed for files opened after the config change, so do it before the experiment. As an example, we are using `open_v2` with `virtual_file::IoMode::Direct` in https://github.com/neondatabase/neon/pull/9169 We also remove `io_buffer_alignment` config in `a04cfd754b` and use it as a compile time constant. This way we don't have to carry the alignment around or make frequent call to retrieve this information from the static variable. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-10-09 08:33:07 -04:00
Matthias van de Meent	78938d1b59	[compute/postgres] feature: PostgreSQL 17 (#8573 ) This adds preliminary PG17 support to Neon, based on RC1 / 2024-09-04 `07b828e9d4` NOTICE: The data produced by the included version of the PostgreSQL fork may not be compatible with the future full release of PostgreSQL 17 due to expected or unexpected future changes in magic numbers and internals. DO NOT EXPECT DATA IN V17-TENANTS TO BE COMPATIBLE WITH THE 17.0 RELEASE! Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech> Co-authored-by: Heikki Linnakangas <heikki@neon.tech>	2024-09-12 23:18:41 +01:00
Christian Schwarz	850421ec06	refactor(pageserver): rely on serde derive for toml deserialization (#7656 ) This PR simplifies the pageserver configuration parsing as follows: * introduce the `pageserver_api::config::ConfigToml` type * implement `Default` for `ConfigToml` * use serde derive to do the brain-dead leg-work of processing the toml document * use `serde(default)` to fill in default values * in `pageserver` crate: * use `toml_edit` to deserialize the pageserver.toml string into a `ConfigToml` * `PageServerConfig::parse_and_validate` then * consumes the `ConfigToml` * destructures it exhaustively into its constituent fields * constructs the `PageServerConfig` The rules are: * in `ConfigToml`, use `deny_unknown_fields` everywhere * static default values go in `pageserver_api` * if there cannot be a static default value (e.g. which default IO engine to use, because it depends on the runtime), make the field in `ConfigToml` an `Option` * if runtime-augmentation of a value is needed, do that in `parse_and_validate` * a good example is `virtual_file_io_engine` or `l0_flush`, both of which need to execute code to determine the effective value in `PageServerConf` The benefits: * massive amount of brain-dead repetitive code can be deleted * "unused variable" compile-time errors when removing a config value, due to the exhaustive destructuring in `parse_and_validate` * compile-time errors guide you when adding a new config field Drawbacks: * serde derive is sometimes a bit too magical * `deny_unknown_fields` is easy to miss Future Work / Benefits: * make `neon_local` use `pageserver_api` to construct `ConfigToml` and write it to `pageserver.toml` * This provides more type safety / coompile-time errors than the current approach. ### Refs Fixes #3682 ### Future Work * `remote_storage` deser doesn't reject unknown fields https://github.com/neondatabase/neon/issues/8915 * clean up `libs/pageserver_api/src/config.rs` further * break up into multiple files, at least for tenant config * move `models` as appropriate / refine distinction between config and API models / be explicit about when it's the same * use `pub(crate)` visibility on `mod defaults` to detect stale values	2024-09-05 14:59:49 +02:00
Yuchen Liang	a889a49e06	pageserver: do vectored read on each dio-aligned section once (#8763 ) Part of #8130, closes #8719. ## Problem Currently, vectored blob io only coalesce blocks if they are immediately adjacent to each other. When we switch to Direct IO, we need a way to coalesce blobs that are within the dio-aligned boundary but has gap between them. ## Summary of changes - Introduces a `VectoredReadCoalesceMode` for `VectoredReadPlanner` and `StreamingVectoredReadPlanner` which has two modes: - `AdjacentOnly` (current implementation) - `Chunked(<alignment requirement>)` - New `ChunkedVectorBuilder` that considers batching `dio-align`-sized read, the start and end of the vectored read will respect `stx_dio_offset_align` / `stx_dio_mem_align` (`vectored_read.start` and `vectored_read.blobs_at.first().start_offset` will be two different value). - Since we break the assumption that blobs within single `VectoredRead` are next to each other (implicit end offset), we start to store blob end offsets in the `VectoredRead`. - Adapted existing tests to run in both `VectoredReadCoalesceMode`. - The io alignment can also be live configured at runtime. Signed-off-by: Yuchen Liang <yuchen@neon.tech>	2024-08-28 15:54:42 +01:00
Christian Schwarz	6ff74295b5	chore(pageserver): plumb through RequestContext to VirtualFile open methods (#7725 ) This PR introduces no functional changes. The `open()` path will be done separately. refs https://github.com/neondatabase/neon/issues/6107 refs https://github.com/neondatabase/neon/issues/7386 Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2024-05-13 14:52:06 +02:00
Vlad Lazar	2b11466b59	pageserver: optimise disk io for vectored get (#6780 ) ## Problem The vectored read path proposed in https://github.com/neondatabase/neon/pull/6576 seems to be functionally correct, but in my testing (see below) it is about 10-20% slower than the naive sequential vectored implementation. ## Summary of changes There's three parts to this PR: 1. Supporting vectored blob reads. This is actually trickier than it sounds because on disk blobs are prefixed with a variable length size header. Since the blobs are not necessarily fixed size, we need to juggle the offsets such that the callers can retrieve the blobs from the resulting buffer. 2. Merge disk read requests issued by the vectored read path up to a maximum size. Again, the merging is complicated by the fact that blobs are not fixed size. We keep track of the begin and end offset of each blob and pass them into the vectored blob reader. In turn, the reader will return a buffer and the offsets at which the blobs begin and end. 3. A benchmark for basebackup requests against tenant with large SLRU block counts is added. This required a small change to pagebench and a new config variable for the pageserver which toggles the vectored get validation. We can probably optimise things further by adding a little bit of concurrency for our IO. In principle, it's as simple as spawning a task which deals with issuing IO and doing the serialisation and handling on the parent task which receives input via a channel.	2024-02-28 12:06:00 +00:00
Christian Schwarz	51f9385b1b	live-reconfigurable virtual_file::IoEngine (#6552 ) This PR adds an API to live-reconfigure the VirtualFile io engine. It also adds a flag to `pagebench get-page-latest-lsn`, which is where I found this functionality to be useful: it helps compare the io engines in a benchmark without re-compiling a release build, which took ~50s on the i3en.3xlarge where I was doing the benchmark. Switching the IO engine is completely safe at runtime.	2024-02-07 17:47:55 +00:00
Christian Schwarz	918b03b3b0	integrate tokio-epoll-uring as alternative VirtualFile IO engine (#5824 )	2024-01-26 09:25:07 +01:00
duguorong009	25a37215f3	fix: replace all `std::PathBuf`s with `camino::Utf8PathBuf` (#5352 ) Fixes #4689 by replacing all of `std::Path` , `std::PathBuf` with `camino::Utf8Path`, `camino::Utf8PathBuf` in - pageserver - safekeeper - control_plane - libs/remote_storage Co-authored-by: Joonas Koivunen <joonas@neon.tech>	2023-10-04 17:52:23 +03:00
Christian Schwarz	a0c82969a2	page cache: per-task-kind access stats (#5339 ) This PR adds a `task_kind` label to page cache access metrics. These are to validate our hypothesis that the high hit page cache rate we observe in prod is due to internal tasks, not getpage requests from compute. We believe the latter should near-always be a pageserver-page-cache _miss_ because compute has it's own page cache, and hence there is no locality of reference for its accesses to pageserver page cache. Before this PR, we didn't have `RequestContext` propagation to any code below the on-demand downloader. The vast majority of changes in this PR is concerned with adding that propagation.	2023-09-25 18:30:10 +02:00
Arpad Müller	d206655a63	Make VirtualFile::{open, open_with_options, create,sync_all,with_file} async fn (#5224 ) ## Problem Once we use async file system APIs for `VirtualFile`, these functions will also need to be async fn. ## Summary of changes Makes the functions `open, open_with_options, create,sync_all,with_file` of `VirtualFile` async fn, including all functions that call it. Like in the prior PRs, the actual I/O operations are not using async APIs yet, as per request in the #4743 epic. We switch towards not using `VirtualFile` in the par_fsync module, hopefully this is only temporary until we can actually do fully async I/O in `VirtualFile`. This might cause us to exhaust fd limits in the tests, but it should only be an issue for the local developer as we have high ulimits in prod. This PR is a follow-up of #5189, #5190, #5195, and #5203. Part of #4743.	2023-09-08 00:50:50 +02:00
John Spray	743933176e	scrubber: add `scan-metadata` and hook into integration tests (#5176 ) ## Problem - Scrubber's `tidy` command requires presence of a control plane - Scrubber has no tests at all ## Summary of changes - Add re-usable async streams for reading metadata from a bucket - Add a `scan-metadata` command that reads from those streams and calls existing `checks.rs` code to validate metadata, then returns a summary struct for the bucket. Command returns nonzero status if errors are found. - Add an `enable_scrub_on_exit()` function to NeonEnvBuilder so that tests using remote storage can request to have the scrubber run after they finish - Enable remote storarge and scrub_on_exit in test_pageserver_restart and test_pageserver_chaos This is a "toe in the water" of the overall space of validating the scrubber. Later, we should: - Enable scrubbing at end of tests using remote storage by default - Make the success condition stricter than "no errors": tests should declare what tenants+timelines they expect to see in the bucket (or sniff these from the functions tests use to create them) and we should require that the scrubber reports on these particular tenants/timelines. The `tidy` command is untouched in this PR, but it should be refactored later to use similar async streaming interface instead of the current batch-reading approach (the streams are faster with large buckets), and to also be covered by some tests. --------- Co-authored-by: Joonas Koivunen <joonas@neon.tech> Co-authored-by: Alexander Bayandin <alexander@neon.tech> Co-authored-by: Christian Schwarz <christian@neon.tech> Co-authored-by: Conrad Ludgate <conrad@neon.tech>	2023-09-06 11:55:24 +01:00
Arpad Müller	eb0a698adc	Make page cache and read_blk async (#5023 ) ## Problem `read_blk` does I/O and thus we would like to make it async. We can't make the function async as long as the `PageReadGuard` returned by `read_blk` isn't `Send`. The page cache is called by `read_blk`, and thus it can't be async without `read_blk` being async. Thus, we have a circular dependency. ## Summary of changes Due to the circular dependency, we convert both the page cache and `read_blk` to async at the same time: We make the page cache use `tokio::sync` synchronization primitives as those are `Send`. This makes all the places that acquire a lock require async though, which we then also do. This includes also asyncification of the `read_blk` function. Builds upon #4994, #5015, #5056, and #5129. Part of #4743.	2023-08-30 09:04:31 +02:00
Arpad Müller	8c13296add	Remove `BlockReader::read_blk` in favour of `BlockCursor` (#5015 ) ## Problem We want to make `read_blk` an async function, but outside of `async_trait`, which allocates, and nightly features, we can't use async fn's in traits. ## Summary of changes * Remove all uses of `BlockReader::read_blk` in favour of using block cursors, at least where the type of the `BlockReader` is behind a generic * Introduce a `BlockReaderRef` enum that lists all implementors of `BlockReader::read_blk`. * Remove `BlockReader::read_blk` and move its implementations into inherent functions on the types instead. We don't turn `read_blk` into an async fn yet, for that we also need to modify the page cache. So this is a preparatory PR, albeit an important one. Part of #4743.	2023-08-25 12:28:01 +02:00
Arpad Müller	a241c8b2a4	Make DiskBtreeReader::{visit, get} async (#4863 ) ## Problem `DiskBtreeReader::get` and `DiskBtreeReader::visit` both call `read_blk` internally, which we would like to make async in the future. This PR focuses on making the interface of these two functions `async`. There is further work to be done in forms of making `visit` to not be recursive any more, similar to #4838. For that, see https://github.com/neondatabase/neon/pull/4884. Builds on top of https://github.com/neondatabase/neon/pull/4839, part of https://github.com/neondatabase/neon/issues/4743 ## Summary of changes Make `DiskBtreeReader::get` and `DiskBtreeReader::visit` async functions and `await` in the places that call these functions.	2023-08-03 17:36:46 +02:00
arpad-m	e5b7ddfeee	Preparatory pageserver async conversions (#4773 ) In #4743, we'd like to convert the read path to use `async` rust. In preparation of that, this PR switches some functions that are calling lower level functions like `BlockReader::read_blk`, `BlockCursor::read_blob`, etc into `async`. The PR does not switch all functions however, and only focuses on the ones which are easy to switch. This leaves around some async functions that are (currently) unnecessarily `async`, but on the other hand it makes future changes smaller in diff. Part of #4743 (but does not completely address it).	2023-07-24 14:01:54 +02:00
Alex Chi Z	7126197000	pagectl: refactor ctl and support dump kv in delta (#4268 ) This PR refactors the original page_binutils with a single tool pagectl, use clap derive for better command line parsing, and adds the dump kv tool to extract information from delta file. This helps me better understand what's inside the page server. We can add support for other types of file and more functionalities in the future. --------- Signed-off-by: Alex Chi <iskyzh@gmail.com>	2023-05-24 19:36:07 +03:00

20 Commits