feat(pageserver): use vectored_get in collect_keyspace

Signed-off-by: Alex Chi Z <chi@neon.tech>
storcon: signal LSN wait to pageserver during live migration (#10452 )
2026-05-21 15:10:44 +00:00 · 2025-01-28 20:31:15 +01:00 · 2025-01-28 17:33:07 +00:00 · 2025-01-28 17:21:05 +00:00 · 2025-01-28 17:08:17 +00:00 · 2025-01-28 15:32:59 +00:00
45 changed files with 764 additions and 278 deletions
--- a/.github/actions/neon-project-create/action.yml
+++ b/.github/actions/neon-project-create/action.yml
@@ -17,6 +17,31 @@ inputs:
  compute_units:
    description: '[Min, Max] compute units'
    default: '[1, 1]'
+  # settings below only needed if you want the project to be sharded from the beginning
+  shard_split_project:
+    description: 'by default new projects are not shard-split, specify true to shard-split'
+    required: false
+    default: 'false'
+  admin_api_key:
+    description: 'Admin API Key needed for shard-splitting. Must be specified if shard_split_project is true'
+    required: false
+  shard_count:
+    description: 'Number of shards to split the project into, only applies if shard_split_project is true'
+    required: false
+    default: '8'
+  stripe_size:
+    description: 'Stripe size, optional, in 8kiB pages.  e.g. set 2048 for 16MB stripes. Default is 128 MiB, only applies if shard_split_project is true'
+    required: false
+    default: '32768'
+  psql_path:
+    description: 'Path to psql binary - it is caller responsibility to provision the psql binary'
+    required: false
+    default: '/tmp/neon/pg_install/v16/bin/psql'
+  libpq_lib_path:
+    description: 'Path to directory containing libpq library - it is caller responsibility to provision the libpq library'
+    required: false
+    default: '/tmp/neon/pg_install/v16/lib'
+  

 outputs:
  dsn:
@@ -63,6 +88,23 @@ runs:
        echo "project_id=${project_id}" >> $GITHUB_OUTPUT

        echo "Project ${project_id} has been created"
+
+        if [ "${SHARD_SPLIT_PROJECT}" = "true" ]; then
+          # determine tenant ID
+          TENANT_ID=`${PSQL} ${dsn} -t -A -c "SHOW neon.tenant_id"`
+          
+          echo "Splitting project ${project_id} with tenant_id ${TENANT_ID} into $((SHARD_COUNT)) shards with stripe size $((STRIPE_SIZE))"
+
+          echo "Sending PUT request to https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split"
+          echo "with body {\"new_shard_count\": $((SHARD_COUNT)), \"new_stripe_size\": $((STRIPE_SIZE))}"
+          
+          # we need an ADMIN API KEY to invoke storage controller API for shard splitting (bash -u above checks that the variable is set)
+          curl -X PUT \
+            "https://${API_HOST}/regions/${REGION_ID}/api/v1/admin/storage/proxy/control/v1/tenant/${TENANT_ID}/shard_split" \
+            -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer ${ADMIN_API_KEY}" \
+            -d "{\"new_shard_count\": $SHARD_COUNT, \"new_stripe_size\": $STRIPE_SIZE}"
+        fi
+
      env:
        API_HOST: ${{ inputs.api_host }}
        API_KEY: ${{ inputs.api_key }}
@@ -70,3 +112,9 @@ runs:
        POSTGRES_VERSION: ${{ inputs.postgres_version }}
        MIN_CU: ${{ fromJSON(inputs.compute_units)[0] }}
        MAX_CU: ${{ fromJSON(inputs.compute_units)[1] }}
+        SHARD_SPLIT_PROJECT: ${{ inputs.shard_split_project }}
+        ADMIN_API_KEY: ${{ inputs.admin_api_key }}
+        SHARD_COUNT: ${{ inputs.shard_count }}
+        STRIPE_SIZE: ${{ inputs.stripe_size }}
+        PSQL: ${{ inputs.psql_path }}
+        LD_LIBRARY_PATH: ${{ inputs.libpq_lib_path }}
--- a/.github/workflows/_build-and-test-locally.yml
+++ b/.github/workflows/_build-and-test-locally.yml
@@ -158,8 +158,6 @@ jobs:

      - name: Run cargo build
        run: |
-          PQ_LIB_DIR=$(pwd)/pg_install/v16/lib
-          export PQ_LIB_DIR
          ${cov_prefix} mold -run cargo build $CARGO_FLAGS $CARGO_FEATURES --bins --tests

      # Do install *before* running rust tests because they might recompile the
@@ -217,8 +215,6 @@ jobs:
        env:
          NEXTEST_RETRIES: 3
        run: |
-          PQ_LIB_DIR=$(pwd)/pg_install/v16/lib
-          export PQ_LIB_DIR
          LD_LIBRARY_PATH=$(pwd)/pg_install/v17/lib
          export LD_LIBRARY_PATH

--- a/.github/workflows/build-macos.yml
+++ b/.github/workflows/build-macos.yml
@@ -235,7 +235,7 @@ jobs:
          echo 'CPPFLAGS=-I/usr/local/opt/openssl@3/include' >> $GITHUB_ENV

      - name: Run cargo build (only for v17)
-        run: PQ_LIB_DIR=$(pwd)/pg_install/v17/lib cargo build --all --release -j$(sysctl -n hw.ncpu)
+        run: cargo build --all --release -j$(sysctl -n hw.ncpu)

      - name: Check that no warnings are produced (only for v17)
        run: ./run_clippy.sh
--- a/.github/workflows/ingest_benchmark.yml
+++ b/.github/workflows/ingest_benchmark.yml
@@ -28,7 +28,24 @@ jobs:
    strategy:
      fail-fast: false # allow other variants to continue even if one fails
      matrix:
-        target_project: [new_empty_project, large_existing_project]
+        include:
+          - target_project: new_empty_project_stripe_size_2048 
+            stripe_size: 2048 # 16 MiB
+            postgres_version: 16
+          - target_project: new_empty_project_stripe_size_32768
+            stripe_size: 32768 # 256 MiB # note that this is different from null because using null will shard_split the project only if it reaches the threshold
+                               # while here it is sharded from the beginning with a shard size of 256 MiB
+            postgres_version: 16
+          - target_project: new_empty_project
+            stripe_size: null # run with neon defaults which will shard split only when reaching the threshold
+            postgres_version: 16
+          - target_project: new_empty_project
+            stripe_size: null # run with neon defaults which will shard split only when reaching the threshold
+            postgres_version: 17
+          - target_project: large_existing_project
+            stripe_size: null # cannot re-shared or choose different stripe size for existing, already sharded project
+            postgres_version: 16
+      max-parallel: 1 # we want to run each stripe size sequentially to be able to compare the results
    permissions:
      contents: write
      statuses: write
@@ -67,17 +84,21 @@ jobs:
        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}

    - name: Create Neon Project
-      if: ${{ matrix.target_project == 'new_empty_project' }}
+      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}
      id: create-neon-project-ingest-target
      uses: ./.github/actions/neon-project-create
      with:
        region_id: aws-us-east-2
-        postgres_version: 16
+        postgres_version: ${{ matrix.postgres_version }}
        compute_units: '[7, 7]' # we want to test large compute here to avoid compute-side bottleneck
        api_key: ${{ secrets.NEON_STAGING_API_KEY }}
+        shard_split_project: ${{ matrix.stripe_size != null && 'true' || 'false' }}
+        admin_api_key: ${{ secrets.NEON_STAGING_ADMIN_API_KEY }} 
+        shard_count: 8
+        stripe_size: ${{ matrix.stripe_size }}

    - name: Initialize Neon project
-      if: ${{ matrix.target_project == 'new_empty_project' }}
+      if: ${{ startsWith(matrix.target_project, 'new_empty_project') }}
      env:
          BENCHMARK_INGEST_TARGET_CONNSTR: ${{ steps.create-neon-project-ingest-target.outputs.dsn }}
          NEW_PROJECT_ID: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}
@@ -130,7 +151,7 @@ jobs:
        test_selection: performance/test_perf_ingest_using_pgcopydb.py
        run_in_parallel: false
        extra_params: -s -m remote_cluster --timeout 86400 -k test_ingest_performance_using_pgcopydb
-        pg_version: v16
+        pg_version: v${{ matrix.postgres_version }}
        save_perf_report: true
        aws-oicd-role-arn: ${{ vars.DEV_AWS_OIDC_ROLE_ARN }}
      env:
@@ -146,7 +167,7 @@ jobs:
        ${PSQL} "${BENCHMARK_INGEST_TARGET_CONNSTR}" -c "\dt+"

    - name: Delete Neon Project
-      if: ${{ always() && matrix.target_project == 'new_empty_project' }}
+      if: ${{ always() && startsWith(matrix.target_project, 'new_empty_project') }}
      uses: ./.github/actions/neon-project-delete
      with:
        project_id: ${{ steps.create-neon-project-ingest-target.outputs.project_id }}
--- a/.github/workflows/neon_extra_builds.yml
+++ b/.github/workflows/neon_extra_builds.yml
@@ -114,7 +114,7 @@ jobs:
        run: make walproposer-lib -j$(nproc)

      - name: Produce the build stats
-        run: PQ_LIB_DIR=$(pwd)/pg_install/v17/lib cargo build --all --release --timings -j$(nproc)
+        run: cargo build --all --release --timings -j$(nproc)

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -179,7 +179,7 @@ dependencies = [
 "nom",
 "num-traits",
 "rusticata-macros",
- "thiserror",
+ "thiserror 1.0.69",
 "time",
 ]

@@ -718,14 +718,14 @@ dependencies = [

 [[package]]
 name = "axum"
-version = "0.7.9"
+version = "0.8.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f"
+checksum = "6d6fd624c75e18b3b4c6b9caf42b1afe24437daaee904069137d8bab077be8b8"
 dependencies = [
- "async-trait",
 "axum-core",
 "base64 0.22.1",
 "bytes",
+ "form_urlencoded",
 "futures-util",
 "http 1.1.0",
 "http-body 1.0.0",
@@ -733,7 +733,7 @@ dependencies = [
 "hyper 1.4.1",
 "hyper-util",
 "itoa",
- "matchit 0.7.0",
+ "matchit",
 "memchr",
 "mime",
 "percent-encoding",
@@ -746,7 +746,7 @@ dependencies = [
 "sha1",
 "sync_wrapper 1.0.1",
 "tokio",
- "tokio-tungstenite 0.24.0",
+ "tokio-tungstenite 0.26.1",
 "tower 0.5.2",
 "tower-layer",
 "tower-service",
@@ -755,11 +755,10 @@ dependencies = [

 [[package]]
 name = "axum-core"
-version = "0.4.5"
+version = "0.5.0"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199"
+checksum = "df1362f362fd16024ae199c1970ce98f9661bf5ef94b9808fee734bc3698b733"
 dependencies = [
- "async-trait",
 "bytes",
 "futures-util",
 "http 1.1.0",
@@ -1130,7 +1129,7 @@ dependencies = [
 "log",
 "nix 0.25.1",
 "regex",
- "thiserror",
+ "thiserror 1.0.69",
 ]

 [[package]]
@@ -1311,7 +1310,7 @@ dependencies = [
 "serde_with",
 "signal-hook",
 "tar",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-postgres 0.7.7",
 "tokio-stream",
@@ -1420,7 +1419,7 @@ dependencies = [
 "serde",
 "serde_json",
 "storage_broker",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-postgres 0.7.7",
 "tokio-util",
@@ -2264,7 +2263,7 @@ dependencies = [
 "pin-project",
 "rand 0.8.5",
 "sha1",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-util",
 ]
@@ -3390,12 +3389,6 @@ dependencies = [
 "regex-automata 0.1.10",
 ]

-[[package]]
-name = "matchit"
-version = "0.7.0"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "b87248edafb776e59e6ee64a79086f65890d3510f2c656c000bf2a7e8a0aea40"
-
 [[package]]
 name = "matchit"
 version = "0.8.4"
@@ -3786,7 +3779,7 @@ dependencies = [
 "serde_json",
 "serde_path_to_error",
 "sha2",
- "thiserror",
+ "thiserror 1.0.69",
 "url",
 ]

@@ -3836,7 +3829,7 @@ dependencies = [
 "futures-sink",
 "js-sys",
 "pin-project-lite",
- "thiserror",
+ "thiserror 1.0.69",
 "tracing",
 ]

@@ -3868,7 +3861,7 @@ dependencies = [
 "opentelemetry_sdk",
 "prost",
 "reqwest",
- "thiserror",
+ "thiserror 1.0.69",
 ]

 [[package]]
@@ -3904,7 +3897,7 @@ dependencies = [
 "percent-encoding",
 "rand 0.8.5",
 "serde_json",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-stream",
 "tracing",
@@ -4018,7 +4011,7 @@ dependencies = [
 "remote_storage",
 "serde_json",
 "svg_fmt",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-util",
 "utils",
@@ -4094,7 +4087,7 @@ dependencies = [
 "strum_macros",
 "sysinfo",
 "tenant_size_model",
- "thiserror",
+ "thiserror 1.0.69",
 "tikv-jemallocator",
 "tokio",
 "tokio-epoll-uring",
@@ -4140,7 +4133,7 @@ dependencies = [
 "storage_broker",
 "strum",
 "strum_macros",
- "thiserror",
+ "thiserror 1.0.69",
 "utils",
 ]

@@ -4155,7 +4148,7 @@ dependencies = [
 "postgres",
 "reqwest",
 "serde",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-postgres 0.7.7",
 "tokio-stream",
@@ -4559,7 +4552,7 @@ dependencies = [
 "rustls 0.23.18",
 "rustls-pemfile 2.1.1",
 "serde",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-postgres 0.7.7",
 "tokio-postgres-rustls",
@@ -4597,7 +4590,7 @@ dependencies = [
 "pprof",
 "regex",
 "serde",
- "thiserror",
+ "thiserror 1.0.69",
 "tracing",
 "utils",
 ]
@@ -4608,7 +4601,7 @@ version = "0.1.0"
 dependencies = [
 "anyhow",
 "camino",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "workspace_hack",
 ]
@@ -4641,7 +4634,7 @@ dependencies = [
 "smallvec",
 "symbolic-demangle",
 "tempfile",
- "thiserror",
+ "thiserror 1.0.69",
 ]

 [[package]]
@@ -4673,7 +4666,7 @@ dependencies = [
 "postgres-protocol 0.6.4",
 "rand 0.8.5",
 "serde",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 ]

@@ -4744,7 +4737,7 @@ dependencies = [
 "memchr",
 "parking_lot 0.12.1",
 "procfs",
- "thiserror",
+ "thiserror 1.0.69",
 ]

 [[package]]
@@ -4914,7 +4907,7 @@ dependencies = [
 "strum",
 "strum_macros",
 "subtle",
- "thiserror",
+ "thiserror 1.0.69",
 "tikv-jemalloc-ctl",
 "tikv-jemallocator",
 "tokio",
@@ -5311,7 +5304,7 @@ dependencies = [
 "http 1.1.0",
 "reqwest",
 "serde",
- "thiserror",
+ "thiserror 1.0.69",
 "tower-service",
 ]

@@ -5331,7 +5324,7 @@ dependencies = [
 "reqwest",
 "reqwest-middleware",
 "retry-policies",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tracing",
 "wasm-timer",
@@ -5347,7 +5340,7 @@ dependencies = [
 "async-trait",
 "getrandom 0.2.11",
 "http 1.1.0",
- "matchit 0.8.4",
+ "matchit",
 "opentelemetry",
 "reqwest",
 "reqwest-middleware",
@@ -5726,7 +5719,7 @@ dependencies = [
 "storage_broker",
 "strum",
 "strum_macros",
- "thiserror",
+ "thiserror 1.0.69",
 "tikv-jemallocator",
 "tokio",
 "tokio-io-timeout",
@@ -5765,7 +5758,7 @@ dependencies = [
 "reqwest",
 "safekeeper_api",
 "serde",
- "thiserror",
+ "thiserror 1.0.69",
 "utils",
 "workspace_hack",
 ]
@@ -5974,7 +5967,7 @@ dependencies = [
 "rand 0.8.5",
 "serde",
 "serde_json",
- "thiserror",
+ "thiserror 1.0.69",
 "time",
 "url",
 "uuid",
@@ -6046,7 +6039,7 @@ checksum = "c7715380eec75f029a4ef7de39a9200e0a63823176b759d055b613f5a87df6a6"
 dependencies = [
 "percent-encoding",
 "serde",
- "thiserror",
+ "thiserror 1.0.69",
 ]

 [[package]]
@@ -6208,7 +6201,7 @@ checksum = "adc4e5204eb1910f40f9cfa375f6f05b68c3abac4b6fd879c8ff5e7ae8a0a085"
 dependencies = [
 "num-bigint",
 "num-traits",
- "thiserror",
+ "thiserror 1.0.69",
 "time",
 ]

@@ -6353,7 +6346,7 @@ dependencies = [
 "serde_json",
 "strum",
 "strum_macros",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-util",
 "tracing",
@@ -6645,7 +6638,16 @@ version = "1.0.69"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "b6aaf5339b578ea85b50e080feb250a3e8ae8cfcdff9a461c9ec2904bc923f52"
 dependencies = [
- "thiserror-impl",
+ "thiserror-impl 1.0.69",
+]
+
+[[package]]
+name = "thiserror"
+version = "2.0.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d452f284b73e6d76dd36758a0c8684b1d5be31f92b89d07fd5822175732206fc"
+dependencies = [
+ "thiserror-impl 2.0.11",
 ]

 [[package]]
@@ -6659,6 +6661,17 @@ dependencies = [
 "syn 2.0.90",
 ]

+[[package]]
+name = "thiserror-impl"
+version = "2.0.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "26afc1baea8a989337eeb52b6e72a039780ce45c3edfcc9c5b9d112feeb173c2"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn 2.0.90",
+]
+
 [[package]]
 name = "thread_local"
 version = "1.1.7"
@@ -6815,7 +6828,7 @@ dependencies = [
 "nix 0.26.4",
 "once_cell",
 "scopeguard",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-util",
 "tracing",
@@ -6998,14 +7011,14 @@ dependencies = [

 [[package]]
 name = "tokio-tungstenite"
-version = "0.24.0"
+version = "0.26.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "edc5f74e248dc973e0dbb7b74c7e0d6fcc301c694ff50049504004ef4d0cdcd9"
+checksum = "be4bf6fecd69fcdede0ec680aaf474cdab988f9de6bc73d3758f0160e3b7025a"
 dependencies = [
 "futures-util",
 "log",
 "tokio",
- "tungstenite 0.24.0",
+ "tungstenite 0.26.1",
 ]

 [[package]]
@@ -7315,16 +7328,16 @@ dependencies = [
 "log",
 "rand 0.8.5",
 "sha1",
- "thiserror",
+ "thiserror 1.0.69",
 "url",
 "utf-8",
 ]

 [[package]]
 name = "tungstenite"
-version = "0.24.0"
+version = "0.26.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "18e5b8366ee7a95b16d32197d0b2604b43a0be89dc5fac9f8e96ccafbaedda8a"
+checksum = "413083a99c579593656008130e29255e54dcaae495be556cc26888f211648c24"
 dependencies = [
 "byteorder",
 "bytes",
@@ -7334,7 +7347,7 @@ dependencies = [
 "log",
 "rand 0.8.5",
 "sha1",
- "thiserror",
+ "thiserror 2.0.11",
 "utf-8",
 ]

@@ -7529,7 +7542,7 @@ dependencies = [
 "signal-hook",
 "strum",
 "strum_macros",
- "thiserror",
+ "thiserror 1.0.69",
 "tokio",
 "tokio-stream",
 "tokio-tar",
@@ -7629,7 +7642,7 @@ dependencies = [
 "remote_storage",
 "serde",
 "serde_json",
- "thiserror",
+ "thiserror 1.0.69",
 "tikv-jemallocator",
 "tokio",
 "tokio-util",
@@ -8158,7 +8171,7 @@ dependencies = [
 "ring",
 "signature 2.2.0",
 "spki 0.7.3",
- "thiserror",
+ "thiserror 1.0.69",
 "zeroize",
 ]

@@ -8175,7 +8188,7 @@ dependencies = [
 "nom",
 "oid-registry",
 "rusticata-macros",
- "thiserror",
+ "thiserror 1.0.69",
 "time",
 ]

--- a/Cargo.toml
+++ b/Cargo.toml
@@ -65,7 +65,7 @@ aws-smithy-types = "1.2"
 aws-credential-types = "1.2.0"
 aws-sigv4 = { version = "1.2", features = ["sign-http"] }
 aws-types = "1.3"
-axum = { version = "0.7.9", features = ["ws"] }
+axum = { version = "0.8.1", features = ["ws"] }
 base64 = "0.13.0"
 bincode = "1.3"
 bindgen = "0.70"
--- a/2
+++ b/2
@@ -45,7 +45,7 @@ COPY --chown=nonroot . .

 ARG ADDITIONAL_RUSTFLAGS
 RUN set -e \
-    && PQ_LIB_DIR=$(pwd)/pg_install/v${STABLE_PG_VERSION}/lib RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo build \
+    && RUSTFLAGS="-Clinker=clang -Clink-arg=-fuse-ld=mold -Clink-arg=-Wl,--no-rosegment -Cforce-frame-pointers=yes ${ADDITIONAL_RUSTFLAGS}" cargo build \
      --bin pg_sni_router  \
      --bin pageserver  \
      --bin pagectl  \
--- a/compute_tools/src/compute.rs
+++ b/compute_tools/src/compute.rs
@@ -1486,7 +1486,6 @@ impl ComputeNode {
            // First, create control files for all availale extensions
            extension_server::create_control_files(remote_extensions, &self.pgbin);

-            // Second, preload all remote extensions specified in the shared_preload_libraries
            let library_load_start_time = Utc::now();
            let remote_ext_metrics = self.prepare_preload_libraries(&pspec.spec)?;

@@ -1909,9 +1908,8 @@ LIMIT 100",
            .as_ref()
            .ok_or(anyhow::anyhow!("Remote extensions are not configured"))?;

-        let mut libs_vec = Vec::new();
-
        info!("parse shared_preload_libraries from spec.cluster.settings");
+        let mut libs_vec = Vec::new();
        if let Some(libs) = spec.cluster.settings.find("shared_preload_libraries") {
            libs_vec = libs
                .split(&[',', '\'', ' '])
@@ -1919,9 +1917,9 @@ LIMIT 100",
                .map(str::to_string)
                .collect();
        }
-
-        // This is used in neon_local and python tests
        info!("parse shared_preload_libraries from provided postgresql.conf");
+
+        // that is used in neon_local and python tests
        if let Some(conf) = &spec.cluster.postgresql_conf {
            let conf_lines = conf.split('\n').collect::<Vec<&str>>();
            let mut shared_preload_libraries_line = "";
@@ -1945,10 +1943,7 @@ LIMIT 100",
        // Assume that they are already present locally.
        libs_vec.retain(|lib| remote_extensions.library_index.contains_key(lib));

-        info!(
-            "Downloading extensions specified in shared_preload_libraries: {:?}",
-            &libs_vec
-        );
+        info!("Downloading to shared preload libraries: {:?}", &libs_vec);

        let mut download_tasks = Vec::new();
        for library in &libs_vec {
--- a/compute_tools/src/extension_server.rs
+++ b/compute_tools/src/extension_server.rs
@@ -148,18 +148,18 @@ fn parse_pg_version(human_version: &str) -> PostgresMajorVersion {
        },
        _ => {}
    }
-    panic!("Unsupported Postgres version {human_version}");
+    panic!("Unsuported postgres version {human_version}");
 }

-/// Download the archive for a given extension,
-/// unzip it, and place files in the appropriate locations (share/lib)
+// download the archive for a given extension,
+// unzip it, and place files in the appropriate locations (share/lib)
 pub async fn download_extension(
    ext_name: &str,
    ext_path: &RemotePath,
    ext_remote_storage: &str,
    pgbin: &str,
 ) -> Result<u64> {
-    info!("Downloading extension {:?} from {:?}", ext_name, ext_path);
+    info!("Download extension {:?} from {:?}", ext_name, ext_path);

    // TODO add retry logic
    let download_buffer =
@@ -200,23 +200,23 @@ pub async fn download_extension(
    // move contents of the libdir / sharedir in unzipped archive to the correct local paths
    for paths in [sharedir_paths, libdir_paths] {
        let (zip_dir, real_dir) = paths;
-        info!("Moving {zip_dir:?}/* to {real_dir:?}");
+        info!("mv {zip_dir:?}/*  {real_dir:?}");
        for file in std::fs::read_dir(zip_dir)? {
            let old_file = file?.path();
            let new_file =
                Path::new(&real_dir).join(old_file.file_name().context("error parsing file")?);
-            info!("Moving {old_file:?} to {new_file:?}");
+            info!("moving {old_file:?} to {new_file:?}");

            // extension download failed: Directory not empty (os error 39)
            match std::fs::rename(old_file, new_file) {
-                Ok(()) => info!("Move succeeded"),
+                Ok(()) => info!("move succeeded"),
                Err(e) => {
-                    warn!("Move failed, probably because the extension already exists: {e}")
+                    warn!("move failed, probably because the extension already exists: {e}")
                }
            }
        }
    }
-    info!("Done moving extension {ext_name}");
+    info!("done moving extension {ext_name}");
    Ok(download_size)
 }

@@ -239,16 +239,10 @@ pub fn create_control_files(remote_extensions: &RemoteExtSpec, pgbin: &str) {
        for (control_name, control_content) in &ext_data.control_data {
            let control_path = local_sharedir.join(control_name);
            if !control_path.exists() {
-                info!(
-                    "Writing control file content {:?}: {:?}",
-                    control_path, control_content
-                );
+                info!("writing file {:?}{:?}", control_path, control_content);
                std::fs::write(control_path, control_content).unwrap();
            } else {
-                warn!(
-                    "Control file {:?} exists locally. Ignoring the version from the spec.",
-                    control_path
-                );
+                warn!("control file {:?} exists both locally and remotely. ignoring the remote version.", control_path);
            }
        }
    }
@@ -256,7 +250,9 @@ pub fn create_control_files(remote_extensions: &RemoteExtSpec, pgbin: &str) {

 // Do request to extension storage proxy, i.e.
 // curl http://pg-ext-s3-gateway/latest/v15/extensions/anon.tar.zst
-// using HTTP GET and return the response body as bytes.
+// using HHTP GET
+// and return the response body as bytes
+//
 async fn download_extension_tar(ext_remote_storage: &str, ext_path: &str) -> Result<Bytes> {
    let uri = format!("{}/{}", ext_remote_storage, ext_path);

--- a/compute_tools/src/http/extract/json.rs
+++ b/compute_tools/src/http/extract/json.rs
@@ -1,9 +1,6 @@
 use std::ops::{Deref, DerefMut};

-use axum::{
-    async_trait,
-    extract::{rejection::JsonRejection, FromRequest, Request},
-};
+use axum::extract::{rejection::JsonRejection, FromRequest, Request};
 use compute_api::responses::GenericAPIError;
 use http::StatusCode;

@@ -12,7 +9,6 @@ use http::StatusCode;
 #[derive(Debug, Clone, Copy, Default)]
 pub(crate) struct Json<T>(pub T);

-#[async_trait]
 impl<S, T> FromRequest<S> for Json<T>
 where
    axum::Json<T>: FromRequest<S, Rejection = JsonRejection>,
--- a/compute_tools/src/http/extract/path.rs
+++ b/compute_tools/src/http/extract/path.rs
@@ -1,9 +1,6 @@
 use std::ops::{Deref, DerefMut};

-use axum::{
-    async_trait,
-    extract::{rejection::PathRejection, FromRequestParts},
-};
+use axum::extract::{rejection::PathRejection, FromRequestParts};
 use compute_api::responses::GenericAPIError;
 use http::{request::Parts, StatusCode};

@@ -12,7 +9,6 @@ use http::{request::Parts, StatusCode};
 #[derive(Debug, Clone, Copy, Default)]
 pub(crate) struct Path<T>(pub T);

-#[async_trait]
 impl<S, T> FromRequestParts<S> for Path<T>
 where
    axum::extract::Path<T>: FromRequestParts<S, Rejection = PathRejection>,
--- a/compute_tools/src/http/extract/query.rs
+++ b/compute_tools/src/http/extract/query.rs
@@ -1,9 +1,6 @@
 use std::ops::{Deref, DerefMut};

-use axum::{
-    async_trait,
-    extract::{rejection::QueryRejection, FromRequestParts},
-};
+use axum::extract::{rejection::QueryRejection, FromRequestParts};
 use compute_api::responses::GenericAPIError;
 use http::{request::Parts, StatusCode};

@@ -12,7 +9,6 @@ use http::{request::Parts, StatusCode};
 #[derive(Debug, Clone, Copy, Default)]
 pub(crate) struct Query<T>(pub T);

-#[async_trait]
 impl<S, T> FromRequestParts<S> for Query<T>
 where
    axum::extract::Query<T>: FromRequestParts<S, Rejection = QueryRejection>,
--- a/compute_tools/src/http/server.rs
+++ b/compute_tools/src/http/server.rs
@@ -55,7 +55,7 @@ async fn serve(port: u16, compute: Arc<ComputeNode>) {
        .route("/database_schema", get(database_schema::get_schema_dump))
        .route("/dbs_and_roles", get(dbs_and_roles::get_catalog_objects))
        .route(
-            "/extension_server/*filename",
+            "/extension_server/{*filename}",
            post(extension_server::download_extension),
        )
        .route("/extensions", post(extensions::install_extension))
--- a/control_plane/src/pageserver.rs
+++ b/control_plane/src/pageserver.rs
@@ -357,6 +357,11 @@ impl PageServerNode {
                .map(|x| x.parse::<usize>())
                .transpose()
                .context("Failed to parse 'l0_flush_delay_threshold' as an integer")?,
+            l0_flush_wait_upload: settings
+                .remove("l0_flush_wait_upload")
+                .map(|x| x.parse::<bool>())
+                .transpose()
+                .context("Failed to parse 'l0_flush_wait_upload' as a boolean")?,
            l0_flush_stall_threshold: settings
                .remove("l0_flush_stall_threshold")
                .map(|x| x.parse::<usize>())
--- a/deny.toml
+++ b/deny.toml
@@ -41,8 +41,8 @@ allow = [
    "MIT",
    "MPL-2.0",
    "OpenSSL",
-    "Unicode-DFS-2016",
    "Unicode-3.0",
+    "Zlib",
 ]
 confidence-threshold = 0.8
 exceptions = [
--- a/libs/pageserver_api/src/config.rs
+++ b/libs/pageserver_api/src/config.rs
@@ -260,12 +260,15 @@ pub struct TenantConfigToml {
    /// Level0 delta layer threshold at which to delay layer flushes for compaction backpressure,
    /// such that they take 2x as long, and start waiting for layer flushes during ephemeral layer
    /// rolls. This helps compaction keep up with WAL ingestion, and avoids read amplification
-    /// blowing up. Should be >compaction_threshold. If None, defaults to 2 * compaction_threshold.
-    /// 0 to disable.
+    /// blowing up. Should be >compaction_threshold. 0 to disable. Disabled by default.
    pub l0_flush_delay_threshold: Option<usize>,
-    /// Level0 delta layer threshold at which to stall layer flushes. 0 to disable. If None,
-    /// defaults to 4 * compaction_threshold. Must be >compaction_threshold to avoid deadlock.
+    /// Level0 delta layer threshold at which to stall layer flushes. Must be >compaction_threshold
+    /// to avoid deadlock. 0 to disable. Disabled by default.
    pub l0_flush_stall_threshold: Option<usize>,
+    /// If true, Level0 delta layer flushes will wait for S3 upload before flushing the next
+    /// layer. This is a temporary backpressure mechanism which should be removed once
+    /// l0_flush_{delay,stall}_threshold is fully enabled.
+    pub l0_flush_wait_upload: bool,
    // Determines how much history is retained, to allow
    // branching and read replicas at an older point in time.
    // The unit is #of bytes of WAL.
@@ -523,6 +526,8 @@ pub mod tenant_conf_defaults {
    pub const DEFAULT_COMPACTION_ALGORITHM: crate::models::CompactionAlgorithm =
        crate::models::CompactionAlgorithm::Legacy;

+    pub const DEFAULT_L0_FLUSH_WAIT_UPLOAD: bool = true;
+
    pub const DEFAULT_GC_HORIZON: u64 = 64 * 1024 * 1024;

    // Large DEFAULT_GC_PERIOD is fine as long as PITR_INTERVAL is larger.
@@ -563,6 +568,7 @@ impl Default for TenantConfigToml {
            },
            l0_flush_delay_threshold: None,
            l0_flush_stall_threshold: None,
+            l0_flush_wait_upload: DEFAULT_L0_FLUSH_WAIT_UPLOAD,
            gc_horizon: DEFAULT_GC_HORIZON,
            gc_period: humantime::parse_duration(DEFAULT_GC_PERIOD)
                .expect("cannot parse default gc period"),
--- a/libs/pageserver_api/src/key.rs
+++ b/libs/pageserver_api/src/key.rs
@@ -464,6 +464,18 @@ pub fn rel_size_to_key(rel: RelTag) -> Key {
    }
 }

+#[inline(always)]
+pub fn rel_size_key_to_rel(key: Key) -> RelTag {
+    assert_eq!(key.field1, 0x00);
+    assert_eq!(key.field6, 0xffff_ffff);
+    RelTag {
+        forknum: key.field5,
+        spcnode: key.field2,
+        dbnode: key.field3,
+        relnode: key.field4,
+    }
+}
+
 impl Key {
    #[inline(always)]
    pub fn is_rel_size_key(&self) -> bool {
@@ -559,6 +571,15 @@ pub fn slru_segment_size_to_key(kind: SlruKind, segno: u32) -> Key {
    }
 }

+#[inline(always)]
+pub fn slru_segment_size_key_to_segno(key: Key) -> u32 {
+    assert_eq!(key.field1, 0x01);
+    assert_eq!(key.field3, 1);
+    assert_eq!(key.field5, 0);
+    assert_eq!(key.field6, 0xffff_ffff);
+    key.field4
+}
+
 impl Key {
    pub fn is_slru_segment_size_key(&self) -> bool {
        self.field1 == 0x01
--- a/libs/pageserver_api/src/models.rs
+++ b/libs/pageserver_api/src/models.rs
@@ -466,6 +466,8 @@ pub struct TenantConfigPatch {
    #[serde(skip_serializing_if = "FieldPatch::is_noop")]
    pub l0_flush_stall_threshold: FieldPatch<usize>,
    #[serde(skip_serializing_if = "FieldPatch::is_noop")]
+    pub l0_flush_wait_upload: FieldPatch<bool>,
+    #[serde(skip_serializing_if = "FieldPatch::is_noop")]
    pub gc_horizon: FieldPatch<u64>,
    #[serde(skip_serializing_if = "FieldPatch::is_noop")]
    pub gc_period: FieldPatch<String>,
@@ -524,6 +526,7 @@ pub struct TenantConfig {
    pub compaction_algorithm: Option<CompactionAlgorithmSettings>,
    pub l0_flush_delay_threshold: Option<usize>,
    pub l0_flush_stall_threshold: Option<usize>,
+    pub l0_flush_wait_upload: Option<bool>,
    pub gc_horizon: Option<u64>,
    pub gc_period: Option<String>,
    pub image_creation_threshold: Option<usize>,
@@ -559,6 +562,7 @@ impl TenantConfig {
            mut compaction_algorithm,
            mut l0_flush_delay_threshold,
            mut l0_flush_stall_threshold,
+            mut l0_flush_wait_upload,
            mut gc_horizon,
            mut gc_period,
            mut image_creation_threshold,
@@ -597,6 +601,7 @@ impl TenantConfig {
        patch
            .l0_flush_stall_threshold
            .apply(&mut l0_flush_stall_threshold);
+        patch.l0_flush_wait_upload.apply(&mut l0_flush_wait_upload);
        patch.gc_horizon.apply(&mut gc_horizon);
        patch.gc_period.apply(&mut gc_period);
        patch
@@ -651,6 +656,7 @@ impl TenantConfig {
            compaction_algorithm,
            l0_flush_delay_threshold,
            l0_flush_stall_threshold,
+            l0_flush_wait_upload,
            gc_horizon,
            gc_period,
            image_creation_threshold,
@@ -1015,6 +1021,13 @@ pub struct TenantConfigPatchRequest {
    pub config: TenantConfigPatch, // as we have a flattened field, we should reject all unknown fields in it
 }

+#[derive(Serialize, Deserialize, Debug)]
+pub struct TenantWaitLsnRequest {
+    #[serde(flatten)]
+    pub timelines: HashMap<TimelineId, Lsn>,
+    pub timeout: Duration,
+}
+
 /// See [`TenantState::attachment_status`] and the OpenAPI docs for context.
 #[derive(Serialize, Deserialize, Clone)]
 #[serde(tag = "slug", content = "data", rename_all = "snake_case")]
--- a/libs/vm_monitor/src/dispatcher.rs
+++ b/libs/vm_monitor/src/dispatcher.rs
@@ -7,7 +7,7 @@
 //! (notifying it of upscale).

 use anyhow::{bail, Context};
-use axum::extract::ws::{Message, WebSocket};
+use axum::extract::ws::{Message, Utf8Bytes, WebSocket};
 use futures::{
    stream::{SplitSink, SplitStream},
    SinkExt, StreamExt,
@@ -82,21 +82,21 @@ impl Dispatcher {

        let highest_shared_version = match monitor_range.highest_shared_version(&agent_range) {
            Ok(version) => {
-                sink.send(Message::Text(
+                sink.send(Message::Text(Utf8Bytes::from(
                    serde_json::to_string(&ProtocolResponse::Version(version)).unwrap(),
-                ))
+                )))
                .await
                .context("failed to notify agent of negotiated protocol version")?;
                version
            }
            Err(e) => {
-                sink.send(Message::Text(
+                sink.send(Message::Text(Utf8Bytes::from(
                    serde_json::to_string(&ProtocolResponse::Error(format!(
                        "Received protocol version range {} which does not overlap with {}",
                        agent_range, monitor_range
                    )))
                    .unwrap(),
-                ))
+                )))
                .await
                .context("failed to notify agent of no overlap between protocol version ranges")?;
                Err(e).context("error determining suitable protocol version range")?
@@ -126,7 +126,7 @@ impl Dispatcher {

        let json = serde_json::to_string(&message).context("failed to serialize message")?;
        self.sink
-            .send(Message::Text(json))
+            .send(Message::Text(Utf8Bytes::from(json)))
            .await
            .context("stream error sending message")
    }
--- a/pageserver/client/src/mgmt_api.rs
+++ b/pageserver/client/src/mgmt_api.rs
@@ -763,4 +763,19 @@ impl Client {
            .await
            .map_err(Error::ReceiveBody)
    }
+
+    pub async fn wait_lsn(
+        &self,
+        tenant_shard_id: TenantShardId,
+        request: TenantWaitLsnRequest,
+    ) -> Result<StatusCode> {
+        let uri = format!(
+            "{}/v1/tenant/{tenant_shard_id}/wait_lsn",
+            self.mgmt_api_endpoint,
+        );
+
+        self.request_noerror(Method::POST, uri, request)
+            .await
+            .map(|resp| resp.status())
+    }
 }
--- a/pageserver/compaction/src/simulator/draw.rs
+++ b/pageserver/compaction/src/simulator/draw.rs
@@ -160,9 +160,12 @@ pub fn draw_history<W: std::io::Write>(history: &[LayerTraceEvent], mut output:

        // Fill in and thicken rectangle if it's an
        // image layer so that we can see it.
-        let mut style = Style::default();
-        style.fill = Fill::Color(rgb(0x80, 0x80, 0x80));
-        style.stroke = Stroke::Color(rgb(0, 0, 0), 0.5);
+        let mut style = Style {
+            fill: Fill::Color(rgb(0x80, 0x80, 0x80)),
+            stroke: Stroke::Color(rgb(0, 0, 0), 0.5),
+            opacity: 1.0,
+            stroke_opacity: 1.0,
+        };

        let y_start = lsn_max - lsn_start;
        let y_end = lsn_max - lsn_end;
@@ -214,10 +217,6 @@ pub fn draw_history<W: std::io::Write>(history: &[LayerTraceEvent], mut output:
        files_seen.insert(f);
    }

-    let mut record_style = Style::default();
-    record_style.fill = Fill::Color(rgb(0x80, 0x80, 0x80));
-    record_style.stroke = Stroke::None;
-
    writeln!(svg, "{}", EndSvg)?;

    let mut layer_events_str = String::new();
--- a/pageserver/src/http/routes.rs
+++ b/pageserver/src/http/routes.rs
@@ -10,6 +10,7 @@ use std::time::Duration;

 use anyhow::{anyhow, Context, Result};
 use enumset::EnumSet;
+use futures::future::join_all;
 use futures::StreamExt;
 use futures::TryFutureExt;
 use humantime::format_rfc3339;
@@ -40,6 +41,7 @@ use pageserver_api::models::TenantShardSplitRequest;
 use pageserver_api::models::TenantShardSplitResponse;
 use pageserver_api::models::TenantSorting;
 use pageserver_api::models::TenantState;
+use pageserver_api::models::TenantWaitLsnRequest;
 use pageserver_api::models::TimelineArchivalConfigRequest;
 use pageserver_api::models::TimelineCreateRequestMode;
 use pageserver_api::models::TimelineCreateRequestModeImportPgdata;
@@ -95,6 +97,8 @@ use crate::tenant::timeline::CompactOptions;
 use crate::tenant::timeline::CompactRequest;
 use crate::tenant::timeline::CompactionError;
 use crate::tenant::timeline::Timeline;
+use crate::tenant::timeline::WaitLsnTimeout;
+use crate::tenant::timeline::WaitLsnWaiter;
 use crate::tenant::GetTimelineError;
 use crate::tenant::OffloadedTimeline;
 use crate::tenant::{LogicalSizeCalculationCause, PageReconstructError};
@@ -2790,6 +2794,63 @@ async fn secondary_download_handler(
    json_response(status, progress)
 }

+async fn wait_lsn_handler(
+    mut request: Request<Body>,
+    cancel: CancellationToken,
+) -> Result<Response<Body>, ApiError> {
+    let tenant_shard_id: TenantShardId = parse_request_param(&request, "tenant_shard_id")?;
+    let wait_lsn_request: TenantWaitLsnRequest = json_request(&mut request).await?;
+
+    let state = get_state(&request);
+    let tenant = state
+        .tenant_manager
+        .get_attached_tenant_shard(tenant_shard_id)?;
+
+    let mut wait_futures = Vec::default();
+    for timeline in tenant.list_timelines() {
+        let Some(lsn) = wait_lsn_request.timelines.get(&timeline.timeline_id) else {
+            continue;
+        };
+
+        let fut = {
+            let timeline = timeline.clone();
+            let ctx = RequestContext::new(TaskKind::MgmtRequest, DownloadBehavior::Error);
+            async move {
+                timeline
+                    .wait_lsn(
+                        *lsn,
+                        WaitLsnWaiter::HttpEndpoint,
+                        WaitLsnTimeout::Custom(wait_lsn_request.timeout),
+                        &ctx,
+                    )
+                    .await
+            }
+        };
+        wait_futures.push(fut);
+    }
+
+    if wait_futures.is_empty() {
+        return json_response(StatusCode::NOT_FOUND, ());
+    }
+
+    let all_done = tokio::select! {
+        results = join_all(wait_futures) => {
+            results.iter().all(|res| res.is_ok())
+        },
+        _ = cancel.cancelled() => {
+            return Err(ApiError::Cancelled);
+        }
+    };
+
+    let status = if all_done {
+        StatusCode::OK
+    } else {
+        StatusCode::ACCEPTED
+    };
+
+    json_response(status, ())
+}
+
 async fn secondary_status_handler(
    request: Request<Body>,
    _cancel: CancellationToken,
@@ -3577,6 +3638,9 @@ pub fn make_router(
        .post("/v1/tenant/:tenant_shard_id/secondary/download", |r| {
            api_handler(r, secondary_download_handler)
        })
+        .post("/v1/tenant/:tenant_shard_id/wait_lsn", |r| {
+            api_handler(r, wait_lsn_handler)
+        })
        .put("/v1/tenant/:tenant_shard_id/break", |r| {
            testing_api_handler("set tenant state to broken", r, handle_tenant_break)
        })
--- a/pageserver/src/metrics.rs
+++ b/pageserver/src/metrics.rs
@@ -3,7 +3,7 @@ use metrics::{
    register_counter_vec, register_gauge_vec, register_histogram, register_histogram_vec,
    register_int_counter, register_int_counter_pair_vec, register_int_counter_vec,
    register_int_gauge, register_int_gauge_vec, register_uint_gauge, register_uint_gauge_vec,
-    Counter, CounterVec, GaugeVec, Histogram, HistogramVec, IntCounter, IntCounterPair,
+    Counter, CounterVec, Gauge, GaugeVec, Histogram, HistogramVec, IntCounter, IntCounterPair,
    IntCounterPairVec, IntCounterVec, IntGauge, IntGaugeVec, UIntGauge, UIntGaugeVec,
 };
 use once_cell::sync::Lazy;
@@ -398,6 +398,15 @@ pub(crate) static WAIT_LSN_TIME: Lazy<Histogram> = Lazy::new(|| {
    .expect("failed to define a metric")
 });

+static FLUSH_WAIT_UPLOAD_TIME: Lazy<GaugeVec> = Lazy::new(|| {
+    register_gauge_vec!(
+        "pageserver_flush_wait_upload_seconds",
+        "Time spent waiting for preceding uploads during layer flush",
+        &["tenant_id", "shard_id", "timeline_id"]
+    )
+    .expect("failed to define a metric")
+});
+
 static LAST_RECORD_LSN: Lazy<IntGaugeVec> = Lazy::new(|| {
    register_int_gauge_vec!(
        "pageserver_last_record_lsn",
@@ -2569,6 +2578,7 @@ pub(crate) struct TimelineMetrics {
    timeline_id: String,
    pub flush_time_histo: StorageTimeMetrics,
    pub flush_delay_histo: StorageTimeMetrics,
+    pub flush_wait_upload_time_gauge: Gauge,
    pub compact_time_histo: StorageTimeMetrics,
    pub create_images_time_histo: StorageTimeMetrics,
    pub logical_size_histo: StorageTimeMetrics,
@@ -2620,6 +2630,9 @@ impl TimelineMetrics {
            &shard_id,
            &timeline_id,
        );
+        let flush_wait_upload_time_gauge = FLUSH_WAIT_UPLOAD_TIME
+            .get_metric_with_label_values(&[&tenant_id, &shard_id, &timeline_id])
+            .unwrap();
        let compact_time_histo = StorageTimeMetrics::new(
            StorageTimeOperation::Compact,
            &tenant_id,
@@ -2766,6 +2779,7 @@ impl TimelineMetrics {
            timeline_id,
            flush_time_histo,
            flush_delay_histo,
+            flush_wait_upload_time_gauge,
            compact_time_histo,
            create_images_time_histo,
            logical_size_histo,
@@ -2815,6 +2829,14 @@ impl TimelineMetrics {
        self.resident_physical_size_gauge.get()
    }

+    pub(crate) fn flush_wait_upload_time_gauge_add(&self, duration: f64) {
+        self.flush_wait_upload_time_gauge.add(duration);
+        crate::metrics::FLUSH_WAIT_UPLOAD_TIME
+            .get_metric_with_label_values(&[&self.tenant_id, &self.shard_id, &self.timeline_id])
+            .unwrap()
+            .add(duration);
+    }
+
    pub(crate) fn shutdown(&self) {
        let was_shutdown = self
            .shutdown
@@ -2832,6 +2854,7 @@ impl TimelineMetrics {
        let shard_id = &self.shard_id;
        let _ = LAST_RECORD_LSN.remove_label_values(&[tenant_id, shard_id, timeline_id]);
        let _ = DISK_CONSISTENT_LSN.remove_label_values(&[tenant_id, shard_id, timeline_id]);
+        let _ = FLUSH_WAIT_UPLOAD_TIME.remove_label_values(&[tenant_id, shard_id, timeline_id]);
        let _ = STANDBY_HORIZON.remove_label_values(&[tenant_id, shard_id, timeline_id]);
        {
            RESIDENT_PHYSICAL_SIZE_GLOBAL.sub(self.resident_physical_size_get());
--- a/pageserver/src/page_service.rs
+++ b/pageserver/src/page_service.rs
@@ -1708,6 +1708,7 @@ impl PageServerHandler {
                .wait_lsn(
                    not_modified_since,
                    crate::tenant::timeline::WaitLsnWaiter::PageService,
+                    timeline::WaitLsnTimeout::Default,
                    ctx,
                )
                .await?;
@@ -2044,6 +2045,7 @@ impl PageServerHandler {
                .wait_lsn(
                    lsn,
                    crate::tenant::timeline::WaitLsnWaiter::PageService,
+                    crate::tenant::timeline::WaitLsnTimeout::Default,
                    ctx,
                )
                .await?;
--- a/pageserver/src/pgdatadir_mapping.rs
+++ b/pageserver/src/pgdatadir_mapping.rs
@@ -17,20 +17,21 @@ use crate::span::{
    debug_assert_current_span_has_tenant_and_timeline_id,
    debug_assert_current_span_has_tenant_and_timeline_id_no_shard_id,
 };
-use crate::tenant::storage_layer::IoConcurrency;
+use crate::tenant::storage_layer::{IoConcurrency, ValuesReconstructState};
 use crate::tenant::timeline::GetVectoredError;
 use anyhow::{ensure, Context};
 use bytes::{Buf, Bytes, BytesMut};
 use enum_map::Enum;
 use itertools::Itertools;
-use pageserver_api::key::Key;
 use pageserver_api::key::{
    dbdir_key_range, rel_block_to_key, rel_dir_to_key, rel_key_range, rel_size_to_key,
    relmap_file_key, repl_origin_key, repl_origin_key_range, slru_block_to_key, slru_dir_to_key,
-    slru_segment_key_range, slru_segment_size_to_key, twophase_file_key, twophase_key_range,
-    CompactKey, AUX_FILES_KEY, CHECKPOINT_KEY, CONTROLFILE_KEY, DBDIR_KEY, TWOPHASEDIR_KEY,
+    slru_segment_key_range, slru_segment_size_key_to_segno, slru_segment_size_to_key,
+    twophase_file_key, twophase_key_range, CompactKey, AUX_FILES_KEY, CHECKPOINT_KEY,
+    CONTROLFILE_KEY, DBDIR_KEY, TWOPHASEDIR_KEY,
 };
-use pageserver_api::keyspace::SparseKeySpace;
+use pageserver_api::key::{rel_size_key_to_rel, Key};
+use pageserver_api::keyspace::{KeySpaceRandomAccum, SparseKeySpace};
 use pageserver_api::record::NeonWalRecord;
 use pageserver_api::reltag::{BlockNumber, RelTag, SlruKind};
 use pageserver_api::shard::ShardIdentity;
@@ -110,10 +111,21 @@ pub(crate) enum CollectKeySpaceError {
    Decode(#[from] DeserializeError),
    #[error(transparent)]
    PageRead(PageReconstructError),
+    #[error(transparent)]
+    GetVectored(GetVectoredError),
    #[error("cancelled")]
    Cancelled,
 }

+impl From<GetVectoredError> for CollectKeySpaceError {
+    fn from(err: GetVectoredError) -> Self {
+        match err {
+            GetVectoredError::Cancelled => Self::Cancelled,
+            err => Self::GetVectored(err),
+        }
+    }
+}
+
 impl From<PageReconstructError> for CollectKeySpaceError {
    fn from(err: PageReconstructError) -> Self {
        match err {
@@ -1071,11 +1083,30 @@ impl Timeline {
                .into_iter()
                .collect();
            rels.sort_unstable();
+            let mut relsize_keys_to_collect = KeySpaceRandomAccum::new();
            for rel in rels {
                let relsize_key = rel_size_to_key(rel);
-                let mut buf = self.get(relsize_key, lsn, ctx).await?;
+                relsize_keys_to_collect.add_key(relsize_key);
+            }
+            // Skip the vectored-read max key check by using `get_vectored_impl`.
+            let io_concurrency = IoConcurrency::spawn_from_conf(
+                self.conf,
+                self.gate
+                    .enter()
+                    .map_err(|_| CollectKeySpaceError::Cancelled)?,
+            );
+            let res = self
+                .get_vectored_impl(
+                    relsize_keys_to_collect.consume_keyspace(),
+                    lsn,
+                    &mut ValuesReconstructState::new(io_concurrency),
+                    ctx,
+                )
+                .await?;
+            for (relsize_key, buf) in res {
+                let mut buf = buf?;
                let relsize = buf.get_u32_le();
-
+                let rel = rel_size_key_to_rel(relsize_key);
                result.add_range(rel_block_to_key(rel, 0)..rel_block_to_key(rel, relsize));
                result.add_key(relsize_key);
            }
@@ -1094,11 +1125,30 @@ impl Timeline {
                let dir = SlruSegmentDirectory::des(&buf)?;
                let mut segments: Vec<u32> = dir.segments.iter().cloned().collect();
                segments.sort_unstable();
+                let mut segsize_keys_to_collect = KeySpaceRandomAccum::new();
                for segno in segments {
                    let segsize_key = slru_segment_size_to_key(kind, segno);
-                    let mut buf = self.get(segsize_key, lsn, ctx).await?;
+                    segsize_keys_to_collect.add_key(segsize_key);
+                }
+                // Skip the vectored-read max key check by using `get_vectored_impl`.
+                let io_concurrency = IoConcurrency::spawn_from_conf(
+                    self.conf,
+                    self.gate
+                        .enter()
+                        .map_err(|_| CollectKeySpaceError::Cancelled)?,
+                );
+                let res = self
+                    .get_vectored_impl(
+                        segsize_keys_to_collect.consume_keyspace(),
+                        lsn,
+                        &mut ValuesReconstructState::new(io_concurrency),
+                        ctx,
+                    )
+                    .await?;
+                for (segsize_key, buf) in res {
+                    let mut buf = buf?;
                    let segsize = buf.get_u32_le();
-
+                    let segno = slru_segment_size_key_to_segno(segsize_key);
                    result.add_range(
                        slru_block_to_key(kind, segno, 0)..slru_block_to_key(kind, segno, segsize),
                    );
--- a/pageserver/src/tenant.rs
+++ b/pageserver/src/tenant.rs
@@ -37,6 +37,8 @@ use remote_timeline_client::manifest::{
    OffloadedTimelineManifest, TenantManifest, LATEST_TENANT_MANIFEST_VERSION,
 };
 use remote_timeline_client::UploadQueueNotReadyError;
+use remote_timeline_client::FAILED_REMOTE_OP_RETRIES;
+use remote_timeline_client::FAILED_UPLOAD_WARN_THRESHOLD;
 use std::collections::BTreeMap;
 use std::fmt;
 use std::future::Future;
@@ -2558,7 +2560,12 @@ impl Tenant {
                    // sizes etc. and that would get confused if the previous page versions
                    // are not in the repository yet.
                    ancestor_timeline
-                        .wait_lsn(*lsn, timeline::WaitLsnWaiter::Tenant, ctx)
+                        .wait_lsn(
+                            *lsn,
+                            timeline::WaitLsnWaiter::Tenant,
+                            timeline::WaitLsnTimeout::Default,
+                            ctx,
+                        )
                        .await
                        .map_err(|e| match e {
                            e @ (WaitLsnError::Timeout(_) | WaitLsnError::BadState { .. }) => {
@@ -5308,27 +5315,37 @@ impl Tenant {
            return Ok(());
        }

-        upload_tenant_manifest(
-            &self.remote_storage,
-            &self.tenant_shard_id,
-            self.generation,
-            &manifest,
+        // Remote storage does no retries internally, so wrap it
+        match backoff::retry(
+            || async {
+                upload_tenant_manifest(
+                    &self.remote_storage,
+                    &self.tenant_shard_id,
+                    self.generation,
+                    &manifest,
+                    &self.cancel,
+                )
+                .await
+            },
+            |_e| self.cancel.is_cancelled(),
+            FAILED_UPLOAD_WARN_THRESHOLD,
+            FAILED_REMOTE_OP_RETRIES,
+            "uploading tenant manifest",
            &self.cancel,
        )
        .await
-        .map_err(|e| {
-            if self.cancel.is_cancelled() {
-                TenantManifestError::Cancelled
-            } else {
-                TenantManifestError::RemoteStorage(e)
+        {
+            None => Err(TenantManifestError::Cancelled),
+            Some(Err(_)) if self.cancel.is_cancelled() => Err(TenantManifestError::Cancelled),
+            Some(Err(e)) => Err(TenantManifestError::RemoteStorage(e)),
+            Some(Ok(_)) => {
+                // Store the successfully uploaded manifest, so that future callers can avoid
+                // re-uploading the same thing.
+                *guard = Some(manifest);
+
+                Ok(())
            }
-        })?;
-
-        // Store the successfully uploaded manifest, so that future callers can avoid
-        // re-uploading the same thing.
-        *guard = Some(manifest);
-
-        Ok(())
+        }
    }
 }

@@ -5455,6 +5472,7 @@ pub(crate) mod harness {
                compaction_algorithm: Some(tenant_conf.compaction_algorithm),
                l0_flush_delay_threshold: tenant_conf.l0_flush_delay_threshold,
                l0_flush_stall_threshold: tenant_conf.l0_flush_stall_threshold,
+                l0_flush_wait_upload: Some(tenant_conf.l0_flush_wait_upload),
                gc_horizon: Some(tenant_conf.gc_horizon),
                gc_period: Some(tenant_conf.gc_period),
                image_creation_threshold: Some(tenant_conf.image_creation_threshold),
--- a/pageserver/src/tenant/config.rs
+++ b/pageserver/src/tenant/config.rs
@@ -289,6 +289,10 @@ pub struct TenantConfOpt {
    #[serde(default)]
    pub l0_flush_stall_threshold: Option<usize>,

+    #[serde(skip_serializing_if = "Option::is_none")]
+    #[serde(default)]
+    pub l0_flush_wait_upload: Option<bool>,
+
    #[serde(skip_serializing_if = "Option::is_none")]
    #[serde(default)]
    pub gc_horizon: Option<u64>,
@@ -408,6 +412,9 @@ impl TenantConfOpt {
            l0_flush_stall_threshold: self
                .l0_flush_stall_threshold
                .or(global_conf.l0_flush_stall_threshold),
+            l0_flush_wait_upload: self
+                .l0_flush_wait_upload
+                .unwrap_or(global_conf.l0_flush_wait_upload),
            gc_horizon: self.gc_horizon.unwrap_or(global_conf.gc_horizon),
            gc_period: self.gc_period.unwrap_or(global_conf.gc_period),
            image_creation_threshold: self
@@ -474,6 +481,7 @@ impl TenantConfOpt {
            mut compaction_algorithm,
            mut l0_flush_delay_threshold,
            mut l0_flush_stall_threshold,
+            mut l0_flush_wait_upload,
            mut gc_horizon,
            mut gc_period,
            mut image_creation_threshold,
@@ -518,6 +526,7 @@ impl TenantConfOpt {
        patch
            .l0_flush_stall_threshold
            .apply(&mut l0_flush_stall_threshold);
+        patch.l0_flush_wait_upload.apply(&mut l0_flush_wait_upload);
        patch.gc_horizon.apply(&mut gc_horizon);
        patch
            .gc_period
@@ -590,6 +599,7 @@ impl TenantConfOpt {
            compaction_algorithm,
            l0_flush_delay_threshold,
            l0_flush_stall_threshold,
+            l0_flush_wait_upload,
            gc_horizon,
            gc_period,
            image_creation_threshold,
@@ -649,6 +659,7 @@ impl From<TenantConfOpt> for models::TenantConfig {
            compaction_threshold: value.compaction_threshold,
            l0_flush_delay_threshold: value.l0_flush_delay_threshold,
            l0_flush_stall_threshold: value.l0_flush_stall_threshold,
+            l0_flush_wait_upload: value.l0_flush_wait_upload,
            gc_horizon: value.gc_horizon,
            gc_period: value.gc_period.map(humantime),
            image_creation_threshold: value.image_creation_threshold,
--- a/pageserver/src/tenant/mgr.rs
+++ b/pageserver/src/tenant/mgr.rs
@@ -1643,6 +1643,7 @@ impl TenantManager {
                        .wait_lsn(
                            *target_lsn,
                            crate::tenant::timeline::WaitLsnWaiter::Tenant,
+                            crate::tenant::timeline::WaitLsnTimeout::Default,
                            ctx,
                        )
                        .await
--- a/pageserver/src/tenant/remote_timeline_client/index.rs
+++ b/pageserver/src/tenant/remote_timeline_client/index.rs
@@ -222,6 +222,10 @@ impl LayerFileMetadata {
            shard,
        }
    }
+    /// Helper to get both generation and file size in a tuple
+    pub fn generation_file_size(&self) -> (Generation, u64) {
+        (self.generation, self.file_size)
+    }
 }

 /// Limited history of earlier ancestors.
--- a/pageserver/src/tenant/secondary/downloader.rs
+++ b/pageserver/src/tenant/secondary/downloader.rs
@@ -559,6 +559,13 @@ impl JobGenerator<PendingDownload, RunningDownload, CompleteDownload, DownloadCo
    }
 }

+enum LayerAction {
+    Download,
+    NoAction,
+    Skip,
+    Touch,
+}
+
 /// This type is a convenience to group together the various functions involved in
 /// freshening a secondary tenant.
 struct TenantDownloader<'a> {
@@ -1008,69 +1015,17 @@ impl<'a> TenantDownloader<'a> {
                return (Err(UpdateError::Restart), touched);
            }

-            // Existing on-disk layers: just update their access time.
-            if let Some(on_disk) = timeline_state.on_disk_layers.get(&layer.name) {
-                tracing::debug!("Layer {} is already on disk", layer.name);
-
-                if cfg!(debug_assertions) {
-                    // Debug for https://github.com/neondatabase/neon/issues/6966: check that the files we think
-                    // are already present on disk are really there.
-                    match tokio::fs::metadata(&on_disk.local_path).await {
-                        Ok(meta) => {
-                            tracing::debug!(
-                                "Layer {} present at {}, size {}",
-                                layer.name,
-                                on_disk.local_path,
-                                meta.len(),
-                            );
-                        }
-                        Err(e) => {
-                            tracing::warn!(
-                                "Layer {} not found at {} ({})",
-                                layer.name,
-                                on_disk.local_path,
-                                e
-                            );
-                            debug_assert!(false);
-                        }
-                    }
-                }
-
-                if on_disk.metadata != layer.metadata || on_disk.access_time != layer.access_time {
-                    // We already have this layer on disk.  Update its access time.
-                    tracing::debug!(
-                        "Access time updated for layer {}: {} -> {}",
-                        layer.name,
-                        strftime(&on_disk.access_time),
-                        strftime(&layer.access_time)
-                    );
-                    touched.push(layer);
-                }
-                continue;
-            } else {
-                tracing::debug!("Layer {} not present on disk yet", layer.name);
-            }
-
-            // Eviction: if we evicted a layer, then do not re-download it unless it was accessed more
-            // recently than it was evicted.
-            if let Some(evicted_at) = timeline_state.evicted_at.get(&layer.name) {
-                if &layer.access_time > evicted_at {
-                    tracing::info!(
-                        "Re-downloading evicted layer {}, accessed at {}, evicted at {}",
-                        layer.name,
-                        strftime(&layer.access_time),
-                        strftime(evicted_at)
-                    );
-                } else {
-                    tracing::trace!(
-                        "Not re-downloading evicted layer {}, accessed at {}, evicted at {}",
-                        layer.name,
-                        strftime(&layer.access_time),
-                        strftime(evicted_at)
-                    );
+            match self.layer_action(&timeline_state, &layer).await {
+                LayerAction::Download => (),
+                LayerAction::NoAction => continue,
+                LayerAction::Skip => {
                    self.skip_layer(layer);
                    continue;
                }
+                LayerAction::Touch => {
+                    touched.push(layer);
+                    continue;
+                }
            }

            match self
@@ -1091,6 +1046,86 @@ impl<'a> TenantDownloader<'a> {
        (Ok(()), touched)
    }

+    async fn layer_action(
+        &self,
+        timeline_state: &SecondaryDetailTimeline,
+        layer: &HeatMapLayer,
+    ) -> LayerAction {
+        // Existing on-disk layers: just update their access time.
+        if let Some(on_disk) = timeline_state.on_disk_layers.get(&layer.name) {
+            tracing::debug!("Layer {} is already on disk", layer.name);
+
+            if cfg!(debug_assertions) {
+                // Debug for https://github.com/neondatabase/neon/issues/6966: check that the files we think
+                // are already present on disk are really there.
+                match tokio::fs::metadata(&on_disk.local_path).await {
+                    Ok(meta) => {
+                        tracing::debug!(
+                            "Layer {} present at {}, size {}",
+                            layer.name,
+                            on_disk.local_path,
+                            meta.len(),
+                        );
+                    }
+                    Err(e) => {
+                        tracing::warn!(
+                            "Layer {} not found at {} ({})",
+                            layer.name,
+                            on_disk.local_path,
+                            e
+                        );
+                        debug_assert!(false);
+                    }
+                }
+            }
+
+            if on_disk.metadata.generation_file_size() != on_disk.metadata.generation_file_size() {
+                tracing::info!(
+                    "Re-downloading layer {} with changed size or generation: {:?}->{:?}",
+                    layer.name,
+                    on_disk.metadata.generation_file_size(),
+                    on_disk.metadata.generation_file_size()
+                );
+                return LayerAction::Download;
+            }
+            if on_disk.metadata != layer.metadata || on_disk.access_time != layer.access_time {
+                // We already have this layer on disk.  Update its access time.
+                tracing::debug!(
+                    "Access time updated for layer {}: {} -> {}",
+                    layer.name,
+                    strftime(&on_disk.access_time),
+                    strftime(&layer.access_time)
+                );
+                return LayerAction::Touch;
+            }
+            return LayerAction::NoAction;
+        } else {
+            tracing::debug!("Layer {} not present on disk yet", layer.name);
+        }
+
+        // Eviction: if we evicted a layer, then do not re-download it unless it was accessed more
+        // recently than it was evicted.
+        if let Some(evicted_at) = timeline_state.evicted_at.get(&layer.name) {
+            if &layer.access_time > evicted_at {
+                tracing::info!(
+                    "Re-downloading evicted layer {}, accessed at {}, evicted at {}",
+                    layer.name,
+                    strftime(&layer.access_time),
+                    strftime(evicted_at)
+                );
+            } else {
+                tracing::trace!(
+                    "Not re-downloading evicted layer {}, accessed at {}, evicted at {}",
+                    layer.name,
+                    strftime(&layer.access_time),
+                    strftime(evicted_at)
+                );
+                return LayerAction::Skip;
+            }
+        }
+        LayerAction::Download
+    }
+
    async fn download_timeline(
        &self,
        timeline: HeatMapTimeline,
--- a/pageserver/src/tenant/timeline.rs
+++ b/pageserver/src/tenant/timeline.rs
@@ -144,15 +144,19 @@ use self::layer_manager::LayerManager;
 use self::logical_size::LogicalSize;
 use self::walreceiver::{WalReceiver, WalReceiverConf};

-use super::config::TenantConf;
-use super::remote_timeline_client::index::IndexPart;
-use super::remote_timeline_client::RemoteTimelineClient;
-use super::secondary::heatmap::{HeatMapLayer, HeatMapTimeline};
-use super::storage_layer::{LayerFringe, LayerVisibilityHint, ReadableLayer};
-use super::upload_queue::NotInitialized;
-use super::GcError;
 use super::{
-    debug_assert_current_span_has_tenant_and_timeline_id, AttachedTenantConf, MaybeOffloaded,
+    config::TenantConf, storage_layer::LayerVisibilityHint, upload_queue::NotInitialized,
+    MaybeOffloaded,
+};
+use super::{debug_assert_current_span_has_tenant_and_timeline_id, AttachedTenantConf};
+use super::{remote_timeline_client::index::IndexPart, storage_layer::LayerFringe};
+use super::{
+    remote_timeline_client::RemoteTimelineClient, remote_timeline_client::WaitCompletionError,
+    storage_layer::ReadableLayer,
+};
+use super::{
+    secondary::heatmap::{HeatMapLayer, HeatMapTimeline},
+    GcError,
 };

 #[cfg(test)]
@@ -897,10 +901,17 @@ impl From<GetReadyAncestorError> for PageReconstructError {
    }
 }

+pub(crate) enum WaitLsnTimeout {
+    Custom(Duration),
+    // Use the [`PageServerConf::wait_lsn_timeout`] default
+    Default,
+}
+
 pub(crate) enum WaitLsnWaiter<'a> {
    Timeline(&'a Timeline),
    Tenant,
    PageService,
+    HttpEndpoint,
 }

 /// Argument to [`Timeline::shutdown`].
@@ -1146,7 +1157,7 @@ impl Timeline {
        vectored_res
    }

-    pub(super) async fn get_vectored_impl(
+    pub(crate) async fn get_vectored_impl(
        &self,
        keyspace: KeySpace,
        lsn: Lsn,
@@ -1297,6 +1308,7 @@ impl Timeline {
        &self,
        lsn: Lsn,
        who_is_waiting: WaitLsnWaiter<'_>,
+        timeout: WaitLsnTimeout,
        ctx: &RequestContext, /* Prepare for use by cancellation */
    ) -> Result<(), WaitLsnError> {
        let state = self.current_state();
@@ -1313,7 +1325,7 @@ impl Timeline {
                | TaskKind::WalReceiverConnectionPoller => {
                    let is_myself = match who_is_waiting {
                        WaitLsnWaiter::Timeline(waiter) => Weak::ptr_eq(&waiter.myself, &self.myself),
-                        WaitLsnWaiter::Tenant | WaitLsnWaiter::PageService => unreachable!("tenant or page_service context are not expected to have task kind {:?}", ctx.task_kind()),
+                        WaitLsnWaiter::Tenant | WaitLsnWaiter::PageService | WaitLsnWaiter::HttpEndpoint => unreachable!("tenant or page_service context are not expected to have task kind {:?}", ctx.task_kind()),
                    };
                    if is_myself {
                        if let Err(current) = self.last_record_lsn.would_wait_for(lsn) {
@@ -1329,13 +1341,14 @@ impl Timeline {
            }
        }

+        let timeout = match timeout {
+            WaitLsnTimeout::Custom(t) => t,
+            WaitLsnTimeout::Default => self.conf.wait_lsn_timeout,
+        };
+
        let _timer = crate::metrics::WAIT_LSN_TIME.start_timer();

-        match self
-            .last_record_lsn
-            .wait_for_timeout(lsn, self.conf.wait_lsn_timeout)
-            .await
-        {
+        match self.last_record_lsn.wait_for_timeout(lsn, timeout).await {
            Ok(()) => Ok(()),
            Err(e) => {
                use utils::seqwait::SeqWaitError::*;
@@ -2168,8 +2181,8 @@ impl Timeline {
    }

    fn get_l0_flush_delay_threshold(&self) -> Option<usize> {
-        // Default to delay L0 flushes at 3x compaction threshold.
-        const DEFAULT_L0_FLUSH_DELAY_FACTOR: usize = 3;
+        // Disable L0 flushes by default. This and compaction needs further tuning.
+        const DEFAULT_L0_FLUSH_DELAY_FACTOR: usize = 0; // TODO: default to e.g. 3

        // If compaction is disabled, don't delay.
        if self.get_compaction_period() == Duration::ZERO {
@@ -2197,10 +2210,9 @@ impl Timeline {
    }

    fn get_l0_flush_stall_threshold(&self) -> Option<usize> {
-        // Default to stall L0 flushes at 5x compaction threshold.
-        // TODO: stalls are temporarily disabled by default, see below.
-        #[allow(unused)]
-        const DEFAULT_L0_FLUSH_STALL_FACTOR: usize = 5;
+        // Disable L0 stalls by default. In ingest benchmarks, we see image compaction take >10
+        // minutes, blocking L0 compaction, and we can't stall L0 flushes for that long.
+        const DEFAULT_L0_FLUSH_STALL_FACTOR: usize = 0; // TODO: default to e.g. 5

        // If compaction is disabled, don't stall.
        if self.get_compaction_period() == Duration::ZERO {
@@ -2232,13 +2244,8 @@ impl Timeline {
            return None;
        }

-        // Disable stalls by default. In ingest benchmarks, we see image compaction take >10
-        // minutes, blocking L0 compaction, and we can't stall L0 flushes for that long.
-        //
-        // TODO: fix this.
-        // let l0_flush_stall_threshold = l0_flush_stall_threshold
-        //    .unwrap_or(DEFAULT_L0_FLUSH_STALL_FACTOR * compaction_threshold);
-        let l0_flush_stall_threshold = l0_flush_stall_threshold?;
+        let l0_flush_stall_threshold = l0_flush_stall_threshold
+            .unwrap_or(DEFAULT_L0_FLUSH_STALL_FACTOR * compaction_threshold);

        // 0 disables backpressure.
        if l0_flush_stall_threshold == 0 {
@@ -2252,6 +2259,14 @@ impl Timeline {
        Some(max(l0_flush_stall_threshold, compaction_threshold))
    }

+    fn get_l0_flush_wait_upload(&self) -> bool {
+        let tenant_conf = self.tenant_conf.load();
+        tenant_conf
+            .tenant_conf
+            .l0_flush_wait_upload
+            .unwrap_or(self.conf.default_tenant_conf.l0_flush_wait_upload)
+    }
+
    fn get_image_creation_threshold(&self) -> usize {
        let tenant_conf = self.tenant_conf.load();
        tenant_conf
@@ -3584,7 +3599,12 @@ impl Timeline {
            }
        }
        ancestor
-            .wait_lsn(self.ancestor_lsn, WaitLsnWaiter::Timeline(self), ctx)
+            .wait_lsn(
+                self.ancestor_lsn,
+                WaitLsnWaiter::Timeline(self),
+                WaitLsnTimeout::Default,
+                ctx,
+            )
            .await
            .map_err(|e| match e {
                e @ WaitLsnError::Timeout(_) => GetReadyAncestorError::AncestorLsnTimeout(e),
@@ -4034,6 +4054,27 @@ impl Timeline {
            // release lock on 'layers'
        };

+        // Backpressure mechanism: wait with continuation of the flush loop until we have uploaded all layer files.
+        // This makes us refuse ingest until the new layers have been persisted to the remote
+        // TODO: remove this, and rely on l0_flush_{delay,stall}_threshold instead.
+        if self.get_l0_flush_wait_upload() {
+            let start = Instant::now();
+            self.remote_client
+                .wait_completion()
+                .await
+                .map_err(|e| match e {
+                    WaitCompletionError::UploadQueueShutDownOrStopped
+                    | WaitCompletionError::NotInitialized(
+                        NotInitialized::ShuttingDown | NotInitialized::Stopped,
+                    ) => FlushLayerError::Cancelled,
+                    WaitCompletionError::NotInitialized(NotInitialized::Uninitialized) => {
+                        FlushLayerError::Other(anyhow!(e).into())
+                    }
+                })?;
+            let duration = start.elapsed().as_secs_f64();
+            self.metrics.flush_wait_upload_time_gauge_add(duration);
+        }
+
        // FIXME: between create_delta_layer and the scheduling of the upload in `update_metadata_file`,
        // a compaction can delete the file and then it won't be available for uploads any more.
        // We still schedule the upload, resulting in an error, but ideally we'd somehow avoid this
--- a/pageserver/src/tenant/timeline/import_pgdata.rs
+++ b/pageserver/src/tenant/timeline/import_pgdata.rs
@@ -113,7 +113,7 @@ pub async fn doit(
            match res {
                Ok(_) => break,
                Err(err) => {
-                    info!(?err, "indefintely waiting for pgdata to finish");
+                    info!(?err, "indefinitely waiting for pgdata to finish");
                    if tokio::time::timeout(std::time::Duration::from_secs(10), cancel.cancelled())
                        .await
                        .is_ok()
--- a/pageserver/src/tenant/timeline/import_pgdata/importbucket_client.rs
+++ b/pageserver/src/tenant/timeline/import_pgdata/importbucket_client.rs
@@ -308,7 +308,7 @@ impl ControlFile {
            202107181 => 14,
            202209061 => 15,
            202307071 => 16,
-            /* XXX pg17 */
+            202406281 => 17,
            catversion => {
                anyhow::bail!("unrecognized catalog version {catversion}")
            }
--- a/pageserver/src/tenant/timeline/walreceiver/connection_manager.rs
+++ b/pageserver/src/tenant/timeline/walreceiver/connection_manager.rs
@@ -164,9 +164,10 @@ pub(super) async fn connection_manager_loop_step(
                    Ok(Some(broker_update)) => connection_manager_state.register_timeline_update(broker_update),
                    Err(status) => {
                        match status.code() {
-                            Code::Unknown if status.message().contains("stream closed because of a broken pipe") || status.message().contains("connection reset") => {
+                            Code::Unknown if status.message().contains("stream closed because of a broken pipe") || status.message().contains("connection reset") || status.message().contains("error reading a body from connection") => {
                                // tonic's error handling doesn't provide a clear code for disconnections: we get
                                // "h2 protocol error: error reading a body from connection: stream closed because of a broken pipe"
+                                // => https://github.com/neondatabase/neon/issues/9562
                                info!("broker disconnected: {status}");
                            },
                            _ => {
@@ -273,7 +274,7 @@ pub(super) async fn connection_manager_loop_step(
                    };

                last_discovery_ts = Some(std::time::Instant::now());
-                debug!("No active connection and no candidates, sending discovery request to the broker");
+                info!("No active connection and no candidates, sending discovery request to the broker");

                // Cancellation safety: we want to send a message to the broker, but publish_one()
                // function can get cancelled by the other select! arm. This is absolutely fine, because
--- a/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs
+++ b/pageserver/src/tenant/timeline/walreceiver/walreceiver_connection.rs
@@ -118,7 +118,7 @@ pub(super) async fn handle_walreceiver_connection(
    cancellation: CancellationToken,
    connect_timeout: Duration,
    ctx: RequestContext,
-    node: NodeId,
+    safekeeper_node: NodeId,
    ingest_batch_size: u64,
 ) -> Result<(), WalReceiverError> {
    debug_assert_current_span_has_tenant_and_timeline_id();
@@ -140,7 +140,7 @@ pub(super) async fn handle_walreceiver_connection(

    let (replication_client, connection) = {
        let mut config = wal_source_connconf.to_tokio_postgres_config();
-        config.application_name(format!("pageserver-{}", node.0).as_str());
+        config.application_name(format!("pageserver-{}", timeline.conf.id.0).as_str());
        config.replication_mode(tokio_postgres::config::ReplicationMode::Physical);
        match time::timeout(connect_timeout, config.connect(postgres::NoTls)).await {
            Ok(client_and_conn) => client_and_conn?,
@@ -162,7 +162,7 @@ pub(super) async fn handle_walreceiver_connection(
        latest_wal_update: Utc::now().naive_utc(),
        streaming_lsn: None,
        commit_lsn: None,
-        node,
+        node: safekeeper_node,
    };
    if let Err(e) = events_sender.send(TaskStateUpdate::Progress(connection_status)) {
        warn!("Wal connection event listener dropped right after connection init, aborting the connection: {e}");
--- a/proxy/src/context/parquet.rs
+++ b/proxy/src/context/parquet.rs
@@ -423,11 +423,11 @@ async fn upload_parquet(
    .await
    .ok_or_else(|| anyhow::Error::new(TimeoutOrCancel::Cancel))
    .and_then(|x| x)
-    .context("request_data_upload")
+    .with_context(|| format!("request_data_upload: path={path}"))
    .err();

    if let Some(err) = maybe_err {
-        tracing::error!(%id, error = ?err, "failed to upload request data");
+        tracing::error!(%id, %path, error = ?err, "failed to upload request data");
    }

    Ok(buffer.writer())
--- a/proxy/src/usage_metrics.rs
+++ b/proxy/src/usage_metrics.rs
@@ -396,13 +396,13 @@ async fn upload_backup_events(
        TimeoutOrCancel::caused_by_cancel,
        FAILED_UPLOAD_WARN_THRESHOLD,
        FAILED_UPLOAD_MAX_RETRIES,
-        "request_data_upload",
+        "usage_metrics_upload",
        cancel,
    )
    .await
    .ok_or_else(|| anyhow::Error::new(TimeoutOrCancel::Cancel))
    .and_then(|x| x)
-    .context("request_data_upload")?;
+    .with_context(|| format!("usage_metrics_upload: path={remote_path}"))?;
    Ok(())
 }

--- a/storage_controller/src/pageserver_client.rs
+++ b/storage_controller/src/pageserver_client.rs
@@ -2,8 +2,9 @@ use pageserver_api::{
    models::{
        detach_ancestor::AncestorDetached, LocationConfig, LocationConfigListResponse,
        PageserverUtilization, SecondaryProgress, TenantScanRemoteStorageResponse,
-        TenantShardSplitRequest, TenantShardSplitResponse, TimelineArchivalConfigRequest,
-        TimelineCreateRequest, TimelineInfo, TopTenantShardsRequest, TopTenantShardsResponse,
+        TenantShardSplitRequest, TenantShardSplitResponse, TenantWaitLsnRequest,
+        TimelineArchivalConfigRequest, TimelineCreateRequest, TimelineInfo, TopTenantShardsRequest,
+        TopTenantShardsResponse,
    },
    shard::TenantShardId,
 };
@@ -299,4 +300,17 @@ impl PageserverClient {
            self.inner.top_tenant_shards(request).await
        )
    }
+
+    pub(crate) async fn wait_lsn(
+        &self,
+        tenant_shard_id: TenantShardId,
+        request: TenantWaitLsnRequest,
+    ) -> Result<StatusCode> {
+        measured_request!(
+            "wait_lsn",
+            crate::metrics::Method::Post,
+            &self.node_id_label,
+            self.inner.wait_lsn(tenant_shard_id, request).await
+        )
+    }
 }
--- a/storage_controller/src/reconciler.rs
+++ b/storage_controller/src/reconciler.rs
@@ -3,7 +3,7 @@ use crate::persistence::Persistence;
 use crate::{compute_hook, service};
 use pageserver_api::controller_api::{AvailabilityZone, PlacementPolicy};
 use pageserver_api::models::{
-    LocationConfig, LocationConfigMode, LocationConfigSecondary, TenantConfig,
+    LocationConfig, LocationConfigMode, LocationConfigSecondary, TenantConfig, TenantWaitLsnRequest,
 };
 use pageserver_api::shard::{ShardIdentity, TenantShardId};
 use pageserver_client::mgmt_api;
@@ -348,6 +348,32 @@ impl Reconciler {
        Ok(())
    }

+    async fn wait_lsn(
+        &self,
+        node: &Node,
+        tenant_shard_id: TenantShardId,
+        timelines: HashMap<TimelineId, Lsn>,
+    ) -> Result<StatusCode, ReconcileError> {
+        const TIMEOUT: Duration = Duration::from_secs(10);
+
+        let client = PageserverClient::new(
+            node.get_id(),
+            node.base_url(),
+            self.service_config.jwt_token.as_deref(),
+        );
+
+        client
+            .wait_lsn(
+                tenant_shard_id,
+                TenantWaitLsnRequest {
+                    timelines,
+                    timeout: TIMEOUT,
+                },
+            )
+            .await
+            .map_err(|e| e.into())
+    }
+
    async fn get_lsns(
        &self,
        tenant_shard_id: TenantShardId,
@@ -461,6 +487,39 @@ impl Reconciler {
        node: &Node,
        baseline: HashMap<TimelineId, Lsn>,
    ) -> anyhow::Result<()> {
+        // Signal to the pageserver that it should ingest up to the baseline LSNs.
+        loop {
+            match self.wait_lsn(node, tenant_shard_id, baseline.clone()).await {
+                Ok(StatusCode::OK) => {
+                    // Everything is caught up
+                    return Ok(());
+                }
+                Ok(StatusCode::ACCEPTED) => {
+                    // Some timelines are not caught up yet.
+                    // They'll be polled below.
+                    break;
+                }
+                Ok(StatusCode::NOT_FOUND) => {
+                    // None of the timelines are present on the pageserver.
+                    // This is correct if they've all been deleted, but
+                    // let let the polling loop below cross check.
+                    break;
+                }
+                Ok(status_code) => {
+                    tracing::warn!(
+                        "Unexpected status code ({status_code}) returned by wait_lsn endpoint"
+                    );
+                    break;
+                }
+                Err(e) => {
+                    tracing::info!("🕑 Can't trigger LSN wait on {node} yet, waiting ({e})",);
+                    tokio::time::sleep(Duration::from_millis(500)).await;
+                    continue;
+                }
+            }
+        }
+
+        // Poll the LSNs until they catch up
        loop {
            let latest = match self.get_lsns(tenant_shard_id, node).await {
                Ok(l) => l,
--- a/test_runner/fixtures/metrics.py
+++ b/test_runner/fixtures/metrics.py
@@ -165,6 +165,7 @@ PAGESERVER_PER_TENANT_METRICS: tuple[str, ...] = (
    "pageserver_evictions_with_low_residence_duration_total",
    "pageserver_aux_file_estimated_size",
    "pageserver_valid_lsn_lease_count",
+    "pageserver_flush_wait_upload_seconds",
    counter("pageserver_tenant_throttling_count_accounted_start"),
    counter("pageserver_tenant_throttling_count_accounted_finish"),
    counter("pageserver_tenant_throttling_wait_usecs_sum"),
--- a/test_runner/regress/test_attach_tenant_config.py
+++ b/test_runner/regress/test_attach_tenant_config.py
@@ -141,6 +141,7 @@ def test_fully_custom_config(positive_env: NeonEnv):
        "compaction_threshold": 13,
        "l0_flush_delay_threshold": 25,
        "l0_flush_stall_threshold": 42,
+        "l0_flush_wait_upload": True,
        "compaction_target_size": 1048576,
        "checkpoint_distance": 10000,
        "checkpoint_timeout": "13m",
--- a/test_runner/regress/test_branching.py
+++ b/test_runner/regress/test_branching.py
@@ -19,7 +19,6 @@ from fixtures.pageserver.utils import wait_until_tenant_active
 from fixtures.utils import query_scalar
 from performance.test_perf_pgbench import get_scales_matrix
 from requests import RequestException
-from requests.exceptions import RetryError


 # Test branch creation
@@ -177,8 +176,11 @@ def test_cannot_create_endpoint_on_non_uploaded_timeline(neon_env_builder: NeonE

        env.neon_cli.mappings_map_branch(initial_branch, env.initial_tenant, env.initial_timeline)

-        with pytest.raises(RuntimeError, match="is not active, state: Loading"):
-            env.endpoints.create_start(initial_branch, tenant_id=env.initial_tenant)
+        with pytest.raises(RuntimeError, match="ERROR: Not found: Timeline"):
+            env.endpoints.create_start(
+                initial_branch, tenant_id=env.initial_tenant, basebackup_request_tries=2
+            )
+        ps_http.configure_failpoints(("before-upload-index-pausable", "off"))
    finally:
        env.pageserver.stop(immediate=True)

@@ -219,7 +221,10 @@ def test_cannot_branch_from_non_uploaded_branch(neon_env_builder: NeonEnvBuilder

        branch_id = TimelineId.generate()

-        with pytest.raises(RetryError, match="too many 503 error responses"):
+        with pytest.raises(
+            PageserverApiException,
+            match="Cannot branch off the timeline that's not present in pageserver",
+        ):
            ps_http.timeline_create(
                env.pg_version,
                env.initial_tenant,
--- a/test_runner/regress/test_import_pgdata.py
+++ b/test_runner/regress/test_import_pgdata.py
@@ -14,10 +14,8 @@ from fixtures.pageserver.http import (
    ImportPgdataIdemptencyKey,
    PageserverApiException,
 )
-from fixtures.pg_version import PgVersion
 from fixtures.port_distributor import PortDistributor
 from fixtures.remote_storage import RemoteStorageKind
-from fixtures.utils import run_only_on_postgres
 from pytest_httpserver import HTTPServer
 from werkzeug.wrappers.request import Request
 from werkzeug.wrappers.response import Response
@@ -39,10 +37,6 @@ smoke_params = [
 ]


-@run_only_on_postgres(
-    [PgVersion.V14, PgVersion.V15, PgVersion.V16],
-    "newer control file catalog version and struct format isn't supported",
-)
@pytest.mark.parametrize("shard_count,stripe_size,rel_block_size", smoke_params)
 def test_pgdata_import_smoke(
    vanilla_pg: VanillaPostgres,
@@ -117,13 +111,15 @@ def test_pgdata_import_smoke(
        # TODO: would be nicer to just compare pgdump

        # Enable IO concurrency for batching on large sequential scan, to avoid making
-        # this test unnecessarily onerous on CPU
+        # this test unnecessarily onerous on CPU. Especially on debug mode, it's still
+        # pretty onerous though, so increase statement_timeout to avoid timeouts.
        assert ep.safe_psql_many(
            [
                "set effective_io_concurrency=32;",
+                "SET statement_timeout='300s';",
                "select count(*), sum(data::bigint)::bigint from t",
            ]
-        ) == [[], [(expect_nrows, expect_sum)]]
+        ) == [[], [], [(expect_nrows, expect_sum)]]

    validate_vanilla_equivalence(vanilla_pg)

@@ -317,10 +313,6 @@ def test_pgdata_import_smoke(
        br_initdb_endpoint.safe_psql("select * from othertable")


-@run_only_on_postgres(
-    [PgVersion.V14, PgVersion.V15, PgVersion.V16],
-    "newer control file catalog version and struct format isn't supported",
-)
 def test_fast_import_binary(
    test_output_dir,
    vanilla_pg: VanillaPostgres,
--- a/test_runner/regress/test_remote_storage.py
+++ b/test_runner/regress/test_remote_storage.py
@@ -786,6 +786,54 @@ def test_empty_branch_remote_storage_upload_on_restart(neon_env_builder: NeonEnv
        create_thread.join()


+def test_paused_upload_stalls_checkpoint(
+    neon_env_builder: NeonEnvBuilder,
+):
+    """
+    This test checks that checkpoints block on uploads to remote storage.
+    """
+    neon_env_builder.enable_pageserver_remote_storage(RemoteStorageKind.LOCAL_FS)
+
+    env = neon_env_builder.init_start(
+        initial_tenant_conf={
+            # Set a small compaction threshold
+            "compaction_threshold": "3",
+            # Disable GC
+            "gc_period": "0s",
+            # disable PITR
+            "pitr_interval": "0s",
+        }
+    )
+
+    env.pageserver.allowed_errors.append(
+        f".*PUT.* path=/v1/tenant/{env.initial_tenant}/timeline.* request was dropped before completing"
+    )
+
+    tenant_id = env.initial_tenant
+    timeline_id = env.initial_timeline
+
+    client = env.pageserver.http_client()
+    layers_at_creation = client.layer_map_info(tenant_id, timeline_id)
+    deltas_at_creation = len(layers_at_creation.delta_layers())
+    assert (
+        deltas_at_creation == 1
+    ), "are you fixing #5863? make sure we end up with 2 deltas at the end of endpoint lifecycle"
+
+    # Make new layer uploads get stuck.
+    # Note that timeline creation waits for the initial layers to reach remote storage.
+    # So at this point, the `layers_at_creation` are in remote storage.
+    client.configure_failpoints(("before-upload-layer-pausable", "pause"))
+
+    with env.endpoints.create_start("main", tenant_id=tenant_id) as endpoint:
+        # Build two tables with some data inside
+        endpoint.safe_psql("CREATE TABLE foo AS SELECT x FROM generate_series(1, 10000) g(x)")
+        wait_for_last_flush_lsn(env, endpoint, tenant_id, timeline_id)
+
+        with pytest.raises(ReadTimeout):
+            client.timeline_checkpoint(tenant_id, timeline_id, timeout=5)
+        client.configure_failpoints(("before-upload-layer-pausable", "off"))
+
+
 def wait_upload_queue_empty(
    client: PageserverHttpClient, tenant_id: TenantId, timeline_id: TimelineId
 ):
Author	SHA1	Message	Date
Alex Chi Z	3fb6e258dc	feat(pageserver): use vectored_get in collect_keyspace Signed-off-by: Alex Chi Z <chi@neon.tech>	2025-01-28 20:31:15 +01:00
Vlad Lazar	c54cd9e76a	storcon: signal LSN wait to pageserver during live migration (#10452 ) ## Problem We've seen the ingest connection manager get stuck shortly after a migration. ## Summary of changes A speculative mitigation is to use the same mechanism as get page requests for kicking LSN ingest. The connection manager monitors LSN waits and queries the broker if no updates are received for the timeline. Closes https://github.com/neondatabase/neon/issues/10351	2025-01-28 17:33:07 +00:00
Erik Grinaker	1010b8add4	pageserver: add `l0_flush_wait_upload` setting (#10534 ) ## Problem We need a setting to disable the flush upload wait, to test L0 flush backpressure in staging. ## Summary of changes Add `l0_flush_wait_upload` setting.	2025-01-28 17:21:05 +00:00
Folke Behrens	ae4b2af299	fix(proxy): Use correct identifier for usage metrics upload (#10538 ) ## Problem The request data and usage metrics S3 requests use the same identifier shown in logs, causing confusion about what type of upload failed. ## Summary of changes Use the correct identifier for usage metrics uploads. neondatabase/cloud#23084	2025-01-28 17:08:17 +00:00
Tristan Partin	15fecb8474	Update axum to 0.8.1 (#10332 ) Only a few things that needed updating: - async_trait was removed - Message::Text takes a Utf8Bytes object instead of a String Signed-off-by: Tristan Partin <tristan@neon.tech> Co-authored-by: Conrad Ludgate <connor@neon.tech>	2025-01-28 15:32:59 +00:00
Erik Grinaker	47677ba578	pageserver: disable L0 backpressure by default (#10535 ) ## Problem We'll need further improvements to compaction before enabling L0 flush backpressure by default. See: https://neondb.slack.com/archives/C033RQ5SPDH/p1738066068960519?thread_ts=1737818888.474179&cid=C033RQ5SPDH. Touches #5415. ## Summary of changes Disable `l0_flush_delay_threshold` by default.	2025-01-28 14:51:30 +00:00
Arpad Müller	83b6bfa229	Re-download layer if its local and on-disk metadata diverge (#10529 ) In #10308, we noticed many warnings about the local layer having different sizes on-disk compared to the metadata. However, the layer downloader would never redownload layer files if the sizes or generation numbers change. This is obviously a bug, which we aim to fix with this PR. This change also moves the code deciding what to do about a layer to a dedicated function: before we handled the "routing" via control flow, but now it's become too complicated and it is nicer to have the different verdicts for a layer spelled out in a list/match.	2025-01-28 13:39:53 +00:00
Erik Grinaker	ed942b05f7	Revert "pageserver: revert flush backpressure" (#10402 )" (#10533 ) This reverts commit `9e55d79803`. We'll still need this until we can tune L0 flush backpressure and compaction. I'll add a setting to disable this separately.	2025-01-28 13:33:58 +00:00
Vlad Lazar	62a717a2ca	pageserver: use PS node id for SK appname (#10522 ) ## Problem This one is fairly embarrassing. Safekeeper node id was used in the pageserver application name when connecting to safekeepers. ## Summary of changes Use the right node id. Closes https://github.com/neondatabase/neon/issues/10461	2025-01-28 13:11:51 +00:00
Peter Bendel	c8fbbb9b65	Test ingest_benchmark with different stripe size and also PostgreSQL version 17 (#10510 ) We want to verify if pageserver stripe size has an impact on ingest performance. We want to verify if ingest performance has improved or regressed with postgres version 17. ## Summary of changes - Allow to create new project with different postgres versions - allow to pre-shard new project with different stripe sizes instead of relying on storage manager to shard_split the project once a threshold is exceeded Replaces https://github.com/neondatabase/neon/pull/10509 Test run https://github.com/neondatabase/neon/actions/runs/12986410381	2025-01-27 21:06:05 +00:00
John Spray	d73f4a6470	pageserver: retry wrapper on manifest upload (#10524 ) ## Problem On remote storage errors (e.g. I/O timeout) uploading tenant manifest, all of compaction could fail. This is a problem IRL because we shouldn't abort compaction on a single IO error, and in tests because it generates spurious failures. Related: https://github.com/orgs/neondatabase/projects/51/views/2?sliceBy%5Bvalue%5D=jcsp&pane=issue&itemId=93692919&issue=neondatabase%7Cneon%7C10389 ## Summary of changes - Use `backoff::retry` when uploading tenant manifest	2025-01-27 21:02:25 +00:00
Heikki Linnakangas	5477d7db93	fast_import: fixes for Postgres v17 (#10414 ) Now that the tests are run on v17, they're also run in debug mode, which is slow. Increase statement_timeout in the test to work around that.	2025-01-27 19:47:49 +00:00
Arpad Müller	eb9832d846	Remove PQ_LIB_DIR env var (#10526 ) We now don't need libpq any more for the build of the storage controller, as we use `diesel-async` since #10280. Therefore, we remove the env var that gave cargo/rustc the location for libpq. Follow-up of #10280	2025-01-27 19:38:18 +00:00
Christian Schwarz	3d36dfe533	fix: noisy `broker subscription failed` error during storage broker deploys (#10521 ) During broker deploys, pageservers log this noisy WARN en masse. I can trivially reproduce the WARN message in neon_local by SIGKILLing broker during e.g. `pgbench -i`. I don't understand why tonic is not detecting the error as `Code::Unavailable`. Until we find time to understand that / fix upstream, this PR adds the error message to the existing list of known error messages that get demoted to INFO level. Refs: - refs https://github.com/neondatabase/neon/issues/9562	2025-01-27 19:19:55 +00:00