Files
neon/scripts/ps_ec2_setup_instance_store
Christian Schwarz ad6f538aef tokio-epoll-uring: use it for on-demand downloads (#6992)
# Problem

On-demand downloads are still using `tokio::fs`, which we know is
inefficient.

# Changes

- Add `pagebench ondemand-download-churn` to quantify on-demand download
throughput
- Requires dumping layer map, which required making `history_buffer`
impl `Deserialize`
- Implement an equivalent of `tokio::io::copy_buf` for owned buffers =>
`owned_buffers_io` module and children.
- Make layer file download sensitive to `io_engine::get()`, using
VirtualFile + above copy loop
- For this, I had to move some code into the `retry_download`, e.g.,
`sync_all()` call.

Drive-by:
- fix missing escaping in `scripts/ps_ec2_setup_instance_store` 
- if we failed in retry_download to create a file, we'd try to remove
it, encounter `NotFound`, and `abort()` the process using
`on_fatal_io_error`. This PR adds treats `NotFound` as a success.

# Testing

Functional

- The copy loop is generic & unit tested.

Performance

- Used the `ondemand-download-churn` benchmark to manually test against
real S3.
- Results (public Notion page):
https://neondatabase.notion.site/Benchmarking-tokio-epoll-uring-on-demand-downloads-2024-04-15-newer-code-03c0fdc475c54492b44d9627b6e4e710?pvs=4
- Performance is equivalent at low concurrency. Jumpier situation at
high concurrency, but, still less CPU / throughput with
tokio-epoll-uring.
  - It’s a win.

# Future Work

Turn the manual performance testing described in the above results
document into a performance regression test:
https://github.com/neondatabase/neon/issues/7146
2024-03-15 18:57:05 +00:00

56 lines
2.0 KiB
Bash
Executable File

#!/usr/bin/env bash
# This script sets up an ext4 partition on an EC2 storage-optimized instance's instance store volume.
# Unix permission/ownership is set to the calling user (the script does sudo internally.)
#
# It's intentionally not idempotent; don't take on that complexity in a bash script.
set -euo pipefail
set -x
# This seems crude, but, apparently instance store NVMe volumes aren't exposed in the in instance metadata block-device-mapping.
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html#bdm-instance-metadata
if [ "$(cat /sys/class/block/nvme1n1/device/model)" != "Amazon EC2 NVMe Instance Storage " ]; then
echo "nvme1n1 is not Amazon EC2 NVMe Instance Storage: '$(cat /sys/class/block/nvme1n1/device/model)'"
exit 1
fi
# NB: we DO NOT warm up all the blocks on the drive as recommended by https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/disk-performance.html
# The reason is that we don't do that in production either.
# do all the on-disk initialization work now instead of a background kernel thread
# so that we're ready for benchmarking right after this line
sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/nvme1n1
MOUNTPOINT=/instance_store
sudo mkdir "$MOUNTPOINT"
sudo mount /dev/nvme1n1 "$MOUNTPOINT"
sudo chown -R "$(id -u)":"$(id -g)" "$MOUNTPOINT"
TEST_OUTPUT="$MOUNTPOINT/test_output"
mkdir "$TEST_OUTPUT"
NEON_REPO_DIR="$MOUNTPOINT/repo_dir"
mkdir "$NEON_REPO_DIR"
cat <<EOF
SETUP COMPLETE
To run your local neon.git build on the instance store volume,
run the following commands from the top of the neon.git checkout
# raise file descriptor limit of your shell and its child processes
sudo prlimit -p \$\$ --nofile=800000:800000
# test suite run
export TEST_OUTPUT="$TEST_OUTPUT"
DEFAULT_PG_VERSION=15 BUILD_TYPE=release ./scripts/pytest test_runner/performance/test_latency.py
# for interactive use
export NEON_REPO_DIR="$NEON_REPO_DIR"
cargo build_testing --release
./target/release/neon_local init --force empty-dir-ok
EOF