perf!: use larger buffers for blob_io and ephemeral_file (#7485)

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-05-28 02:20:42 +00:00

part of https://github.com/neondatabase/neon/issues/7124

# Problem

(Re-stating the problem from #7124 for posterity)

The `test_bulk_ingest` benchmark shows about 2x lower throughput with
`tokio-epoll-uring` compared to `std-fs`.
That's why we temporarily disabled it in #7238.

The reason for this regression is that the benchmark runs on a system
without memory pressure and thus std-fs writes don't block on disk IO
but only copy the data into the kernel page cache.
`tokio-epoll-uring` cannot beat that at this time, and possibly never.
(However, under memory pressure, std-fs would stall the executor thread
on kernel page cache writeback disk IO. That's why we want to use
`tokio-epoll-uring`. And we likely want to use O_DIRECT in the future,
at which point std-fs becomes an absolute show-stopper.)

More elaborate analysis:
https://neondatabase.notion.site/Why-test_bulk_ingest-is-slower-with-tokio-epoll-uring-918c5e619df045a7bd7b5f806cfbd53f?pvs=4

# Changes

This PR increases the buffer size of `blob_io` and `EphemeralFile` from
PAGE_SZ=8k to 64k.

Longer-term, we probably want to do double-buffering / pipelined IO.

# Resource Usage

We currently do not flush the buffer when freezing the InMemoryLayer.
That means a single Timeline can have multiple 64k buffers alive, esp if
flushing is slow.
This poses an OOM risk.

We should either bound the number of frozen layers
(https://github.com/neondatabase/neon/issues/7317).

Or we should change the freezing code to flush the buffer and drop the
allocation.

However, that's future work.

# Performance

(Measurements done on i3en.3xlarge.)

The `test_bulk_insert.py` is too noisy, even with instance storage. It
varies by 30-40%. I suspect that's due to compaction. Raising amount of
data by 10x doesn't help with the noisiness.)

So, I used the `bench_ingest` from @jcsp 's #7409  .
Specifically, the `ingest-small-values/ingest 128MB/100b seq` and
`ingest-small-values/ingest 128MB/100b seq, no delta` benchmarks.

|     |                   | seq | seq, no delta |
|-----|-------------------|-----|---------------|
| 8k  | std-fs            | 55  | 165           |
| 8k  | tokio-epoll-uring | 37  | 107           |
| 64k | std-fs            | 55  | 180           |
| 64k | tokio-epoll-uring | 48  | 164           |

The `8k` is from before this PR, the `64k` is with this PR.
The values are the throughput reported by the benchmark (MiB/s).

We see that this PR gets `tokio-epoll-uring` from 67% to 87% of `std-fs`
performance in the `seq` benchmark. Notably, `seq` appears to hit some
other bottleneck at `55 MiB/s`. CC'ing #7418 due to the apparent
bottlenecks in writing delta layers.

For `seq, no delta`, this PR gets `tokio-epoll-uring` from 64% to 91% of
`std-fs` performance.

This commit is contained in:

Christian Schwarz

2024-04-26 13:34:28 +02:00

committed by

GitHub

parent f1de18f1c9

commit ed57772793

2 changed files with 2 additions and 2 deletions

									
										2

pageserver/src/tenant/blob_io.rs
									
												View File
												
				@@ -121,7 +121,7 @@ impl<const BUFFERED: bool> BlobWriter<BUFFERED> {

				        self.offset

				    }

				    const CAPACITY: usize = if BUFFERED { PAGE_SZ } else { 0 };

				    const CAPACITY: usize = if BUFFERED { 64 * 1024 } else { 0 };

				    /// Writes the given buffer directly to the underlying `VirtualFile`.

				    /// You need to make sure that the internal buffer is empty, otherwise

									
										2

pageserver/src/tenant/ephemeral_file/zero_padded_read_write.rs
									
												View File
												
				@@ -27,7 +27,7 @@ use crate::{

				    },

				};

				const TAIL_SZ: usize = PAGE_SZ;

				const TAIL_SZ: usize = 64 * 1024;

				/// See module-level comment.

				pub struct RW<W: OwnedAsyncWriter> {

perf!: use larger buffers for blob_io and ephemeral_file (#7485)

2 pageserver/src/tenant/blob_io.rs Unescape Escape View File

2 pageserver/src/tenant/ephemeral_file/zero_padded_read_write.rs Unescape Escape View File

2

pageserver/src/tenant/blob_io.rs

View File

2

pageserver/src/tenant/ephemeral_file/zero_padded_read_write.rs

View File